I've stumbled upon an article describing nested sets model for storing hierarchical data and was quite intrigued by the supposed performance boost.
after implementing this in our database, i've found that the adjecency list model performs far better. though the article I've read and a couple of SA posts (one, two) stated that any tree traversal is supposed to benefit, my tests demonstrate the opposite.
Am I missing something?
there are roughly 1.1 Million entries in the following table:
SQL> show table tax_nodes;
TAX_ID INTEGER Not Null
PARENT_TAX_ID INTEGER Nullable
RANK VARCHAR(50) Nullable
DIVISION_ID INTEGER Nullable
TREE_RIGHT INTEGER Nullable
TREE_LEFT INTEGER Nullable
CONSTRAINT INTEG_202:
Primary key (TAX_ID)
Adjecency List Query to fetch parents
with recursive lineage as
(select tax_id, parent_tax_id
from tax_nodes
where tax_id = 365612
union ALL
select tax_id, parent_tax_id
from tax_nodes t
join lineage l on l.parent_tax_id = t.tax_id
)
select l.tax_id
from lineage l
Nested Sets Query to fetch parents
select
p2.tax_id from tax_nodes as p1,
tax_nodes as p2
where
p1.tree_left between p2.tree_left and p2.tree_right
and p1.tax_id = 365612
Result
365612 336487 10260 10241 10240 35237 10239 1 //1 is the root
Why is the Adjecency List performing better though most resources suggest otherwise?
I'm running Firebird 2.5 superserver by the way.
UPDATE
On @MarkRotteveel's hint, I took a look at the execution plans and I suspect the NATURAL
being the culprit here. Question now is, how do I get rid of it?
adjecency list execution plan
PLAN (LINEAGE TAX_NODES INDEX (RDB$PRIMARY108))
PLAN (LINEAGE T INDEX (RDB$PRIMARY108))
nested sets execution plan
PLAN JOIN (P1 INDEX (RDB$PRIMARY108), P2 NATURAL)
1 Answer 1
The execution plan of your adjacency list example shows that it is able to use the primary key index (RDB$PRIMARY108
) for both parts of the UNION
. This is very efficient, mainly because you start with a single value, and then recursively look up its descendents (which is probably just a small set of the entire set of records).
For the nested set execution plan it is not able to use an index, because it has to scan all records for the values of p2.tree_left
and p2.tree_right
. It can only use the primary key index for looking up p1.tax_id = 365612
.
You might be able to improve execution if you add a descending index for tax_nodes.tree_left
and an ascending index for tax_nodes.tree_right
. You may need to swap ascending/descending because I usually do it wrong initially, even if I account for usually doing it wrong. However you may also need to change the query condition so the optimizer picks the indexes for tree_left
and tree_right
:
p2.tree_left <= p1.tree_left AND p2.tree_right >= p1.tree_left
-
I've added the indeces and the plan looks better already. I will benchmark it and report back. using
a between b and c
andb<=a and a>=c
produced the same plan, though:PLAN JOIN (P1 INDEX (RDB$PRIMARY175), P2 INDEX (IDX_TAX, UQ_TAX2))
lordvlad– lordvlad07/10/2014 09:46:17Commented Jul 10, 2014 at 9:46 -
@lordvlad If
BETWEEN
produces the same plan, then the Firebird optimizer is probably better at transforming the condition than I expected ;)Mark Rotteveel– Mark Rotteveel07/10/2014 10:37:18Commented Jul 10, 2014 at 10:37 -
1benchmarks say that the two querys are now more or less equally performant. Thanks!lordvlad– lordvlad07/10/2014 11:06:17Commented Jul 10, 2014 at 11:06
-
1ookay. so now I've added an index to the column
PARENT_TAX_ID
and now the adjecency-model is again 4-10 times faster, though the execution plans did not change. can I now safely assume that firebird's recursive CTEs are better than nested sets?lordvlad– lordvlad07/11/2014 08:25:35Commented Jul 11, 2014 at 8:25 -
1@lordvlad I don't think you can conclusively say that, it probably depends on the distribution of data etc. However if adding an index improves the query time, I'd expect that index to show up in the plan.Mark Rotteveel– Mark Rotteveel07/11/2014 08:29:25Commented Jul 11, 2014 at 8:29
CREATE TABLE
as well (including extra indexes if any)?