PostgreSQL CTE Recursion - How to Improve Performance for Parent->Child Tree (8 levels deep, 890 roots)

Question 1

I am working on a query that performs PostgreSQL CTE Recursion in PostgreSQL 13. This query is intended to recurse from the root nodes to the deepest leaf node with certain properties (labeled 'begat' in the resource_dependency table, with a matching "sample type" property from another table).

https://explain.dalibo.com/plan/g2d2f5fg9c563b0d#plan https://explain.depesz.com/s/eaRu

It seems like both the CTE table construction and moreso the scanning of the CTE to apply the constraints are the most time consuming parts of the query. Most work appears to be done in memory with the given settings. We still seem to scan 780GB of cache (cache hit) which seems like duplicate work.

The actual number of nodes in any given tree may be 1000-5000. In this operation, we are starting from 890 root nodes. There are both 1:many and many:1 relationships in this tree (splitting, recombining)

Doing some experiments in a blank environment I notice that:

To get all parent->child resource_ids in single column, we need to do left join so all leaf nodes will have a null child row with their calculated depth.
We track path to ensure we don't get in an infinite cycle. While path is important for avoiding cycles, it it not relevant for the final calculation. Some paths will be duplicating information for the same root -> descendant. This makes the CTE table much larger

Are there any techniques that can be used here to make this recursion less painful? We need to calculate the deepest leaf node from the root matching the starting root to only the deepest leaf with the 'begat' relationship and matching 'sample type' property.

Question 2

Have you considered a graph database, e.g. Apache AGE?

Question 3

Long term that might be a consideration, but my intuition would be that requires a good bit of schema rework to split the PostgreSQL information into a graph DB. The graph is contextualized and filtered with other tables / PKs in the PostgreSQL DB. One problem I am uncovering is that each "fan-in" multiplies the # of CTE rows as the "path" column differs, 1 -> 1000 -> 1 -> 10 where I alternate fanning out back to fanning in results in 2021 unique parent->child rows (including the terminal left join null child nodes). Without removing unique paths, the CTE is 32000 rows.

Question 4

This can certainly be improved. But it qualifies as "paid work for a consulter" rather than as question here on dba.SE.

Question 5

One hack you could try is to set enable_nestloop to off for that one query.

Question 6

The index ix_resource_dependency_prior_id could benefit from adding the column label since much filtering has been done. barcode also needs to be in an index.

Question 7

I tried several optimizations for this query. It seems that having a tree with 1 or more fan-ins (many samples -> 1 sample for example) when calculating columns for path and cycle results in mostly duplicated information (only path will differ).

For a hierarchy of records where the integer points (the main join condition) fan out and back in like so: 1 sample -> 1000 samples -> 1 sample -> 10 samples -> 1 sample

The resulting CTE work table is 32000 records long, where only path differs. Dropping path (and the associated cycle column) reduces the the CTE work table to 2021 rows (# of resource_dependency edges + 1 leaf node). 1000 connections between 1 Gen 0 and 1000 Gen 1 1000 connections between 1000 Gen 1 and 1 Gen 2 10 connections between 1 Gen 2 and 10 Gen 3 10 connections between 1 Gen 3 and 1 Gen 4 1 final row for leaf node (child_id is null) <<< target row information

When tracking path and cycle, the fan-ins to the same child_id result in us having to repeatedly join that same child_id integer N times where N is the number of samples fanned in from parent to child.

UNION would deduplicate rows in the work table, but the unique paths used for cycle detection prevent that.

Some solutions that seem to work:

disabling nested loop does seem help in some cases, as the row estimate from the CTE working table often seems underestimated. For this tree with the original query, performance improved from 103 seconds to 37 seconds
removing cycle detection by dropping path and cycle calculated columns from the query resolves in 600ms (!)
Similar techniques that work: -- using a fixed upper limit for generation (depth) allows us to mitigate cycle performance issues without tracking path and complete in less than 1 second -- using a 2nd CTE (non-materialized seems to work) allows us to deduplicate rows from the original CTE by dropping path and cycle information in a 2nd step (7.5 seconds). The 1st CTE work is the same, but the primary query has many less rows to scan

In summary: when using CTE Recursion (which is always materialized FWIW), UNION performance can degrade if path and cycle are tracked (at least with the technique used by the query shown here). Performance of path and cycle tracking appears similar to UNION ALL performance. Relationships pointing to the same join condition for the next recursive step (in my case an integer ID) will inflate the CTE working table.

Some other techniques which might mitigate cycle issues without hurting query performance:

Prevent cycles from being inserted
Remove path and cycle information before the primary query with a 2nd CTE
Used an upper limit on generation/depth or use a query timeout

Explains for the optimizations: https://explain.depesz.com/s/4ZJB

Question 8

Additional optimizations: 1. Create a compound index on resource_depedency(prior_id, label) which reduces work within recursion. 2. Rewrite query to minimize # of joins in the recursive CTE. This greatly reduces the size the the Work Table and increases efficiency. explain.dalibo.com/plan/6d8bb7a7ccb948h3

Justin Lowen Justin Lowen 782 silver badges8 bronze badges · Answer 1 · 2024-10-18 16:03:27Z

I tried several optimizations for this query. It seems that having a tree with 1 or more fan-ins (many samples -> 1 sample for example) when calculating columns for path and cycle results in mostly duplicated information (only path will differ).

For a hierarchy of records where the integer points (the main join condition) fan out and back in like so: 1 sample -> 1000 samples -> 1 sample -> 10 samples -> 1 sample

The resulting CTE work table is 32000 records long, where only path differs. Dropping path (and the associated cycle column) reduces the the CTE work table to 2021 rows (# of resource_dependency edges + 1 leaf node). 1000 connections between 1 Gen 0 and 1000 Gen 1 1000 connections between 1000 Gen 1 and 1 Gen 2 10 connections between 1 Gen 2 and 10 Gen 3 10 connections between 1 Gen 3 and 1 Gen 4 1 final row for leaf node (child_id is null) <<< target row information

When tracking path and cycle, the fan-ins to the same child_id result in us having to repeatedly join that same child_id integer N times where N is the number of samples fanned in from parent to child.

UNION would deduplicate rows in the work table, but the unique paths used for cycle detection prevent that.

Some solutions that seem to work:

disabling nested loop does seem help in some cases, as the row estimate from the CTE working table often seems underestimated. For this tree with the original query, performance improved from 103 seconds to 37 seconds
removing cycle detection by dropping path and cycle calculated columns from the query resolves in 600ms (!)
Similar techniques that work: -- using a fixed upper limit for generation (depth) allows us to mitigate cycle performance issues without tracking path and complete in less than 1 second -- using a 2nd CTE (non-materialized seems to work) allows us to deduplicate rows from the original CTE by dropping path and cycle information in a 2nd step (7.5 seconds). The 1st CTE work is the same, but the primary query has many less rows to scan

In summary: when using CTE Recursion (which is always materialized FWIW), UNION performance can degrade if path and cycle are tracked (at least with the technique used by the query shown here). Performance of path and cycle tracking appears similar to UNION ALL performance. Relationships pointing to the same join condition for the next recursive step (in my case an integer ID) will inflate the CTE working table.

Some other techniques which might mitigate cycle issues without hurting query performance:

Prevent cycles from being inserted
Remove path and cycle information before the primary query with a 2nd CTE
Used an upper limit on generation/depth or use a query timeout

Explains for the optimizations: https://explain.depesz.com/s/4ZJB

Additional optimizations: 1. Create a compound index on resource_depedency(prior_id, label) which reduces work within recursion. 2. Rewrite query to minimize # of joins in the recursive CTE. This greatly reduces the size the the Work Table and increases efficiency. explain.dalibo.com/plan/6d8bb7a7ccb948h3

Stack Exchange Network

PostgreSQL CTE Recursion - How to Improve Performance for Parent->Child Tree (8 levels deep, 890 roots)

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

PostgreSQL CTE Recursion - How to Improve Performance for Parent->Child Tree (8 levels deep, 890 roots)

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions