Graph databases like Neo4j have a very important performance characteristic call...

benkrug · on Sept 18, 2019

I don't buy the index-free hype. Leaving aside the problem of a distributed implementation, imagine you have a node with lots of labeled edges going out, and you're only interested in following some of those edges. You have to find the right ones somehow - full scan of all the edges, or - an index. Or, for a more detailed comment (not mine), see eg https://www.arangodb.com/2016/04/index-free-adjacency-hybrid... .

AndrewBowman · on Sept 19, 2019

There are of course certain structures you can use to ease the selection of the relationships you want to follow, and Neo4j does use some of its own, breaking down relationships on a node first by type, and then by incoming or outgoing, allowing a quick selection of relevant relationships to follow depending on what's desired.

Whether additional structures are used at this level or not, the point of index-free adjacency is you do not need to utilize an index at the table/label level. The cost of performing each expansion is not dependent upon the total number of relationships or nodes in the graph, as it would be for table joins. You are only ever considering the relationships on each specific node at a time as you traverse.

Seems to me as long as you have that then you've got (table) index-free adjacency.

twic · on Sept 18, 2019

There's no reason that a relational database couldn't also have index-free adjacency. Alongside every foreign key, you store a file pointer to the row with that key.

I assume that the reason relational databases don't do this is that the speedup isn't worth the complexity and performance cost of keeping those pointers up to date for the kinds of queries that relational databases usually do.

AndrewBowman · on Sept 19, 2019

There is a older, but seemingly relevant, question/answer for this on stack overflow: https://stackoverflow.com/a/5611541/92359

> if you shrink a table, or update a partitioned table (causing a row to move to another partition) or if you are rebuilding a table, or export/import a table, or... or... or... the rowid will change.

If the rowids are not stable across db operations, it wouldn't make sense to use them for implementing index-free adjacency. Do any alternatives remain? If not you're back to joins.

One of the reasons why Neo4j can use index-free adjacency is that the ids used for nodes and relationships are pointers to the location of the nodes and relationships in the relevant store files. Those are stable across updates and deletes of other data, and when you delete a node, all its relationships must be deleted first so there are no hanging relationships.

tossAfterUsing · on Sept 18, 2019

interviewed with a know-it-all type sr. engineer once, and he asked me an incredibly defensive question like, "why would i use a graph database instead of postgresql, it cant do anything that can't be done in postgres!?!?"

i'm no expert on any of this, but my response was still sufficient to get hired: you don't need a graph db, but it's easier to understand what cypher is doing than what sql is doing... that means more productivity to your lower-cost junior developers.

dragonwriter · on Sept 18, 2019

OTOH, if you could have it opt-in on a per-FK basis in an RDBMS, the cost could be avoided where not needed, but the performance benefit available where needed.

dataflow · on Sept 18, 2019

Ahhh, this is the kind of thing I was looking for. Thanks!