Improving stats for a join to avoid Nested Loops with LIMIT

Question 1

I have a main fact table t1 with ~6.9m rows, a second dimension table t2 with 1329 rows, and a third table t3 with ~6.9m rows (1-1 with t1, separate because it's produced by a different batch process).

When I filter by something in t2, and ORDER BY something in t3, with a LIMIT 500, PostgreSQL 14.5 picks a Nested Loops plan. The filter on t2 matches about 24% of the rows in t2, but in t1, all those values only occur in 10 rows out of 6.9 million.

I believe the planner is assuming that if my condition matches 24% of rows in t2, it'll match 24% of rows in t1, too, so it can quickly scan backwards down the ORDER BY index on t3 and stop after 500 rows. Instead, it ends up scanning nearly the entire table for several minutes.

I've tried bumping the stats target to 1000 for t1.t2_id, which is more than the number of unique values (559). The stats for t2 are good: it thinks my filter will match 0.2437923 of the results for t2, and it does (324 rows out of 1329). The problem is it thinks that means it'll match 0.2437923 of the results for t1, too, and it's nowhere close. There's a step where it thinks it'll match 1682925/6903110 rows of t1 (exactly 0.2437923 again), but it's actually just 10.

Is there a way to make it understand more about the distribution of the t2_id column as it applies to the join - that it's vastly different between t1 and t2?

explain analyze SELECT t1.col1, t3.t3_value, t1.t2_id, t2..., t1....
 FROM t1
 INNER JOIN t2 ON t1.t2_id = t2.t2_id
 INNER JOIN t3 ON t1.t1_id = t3.t1_id
 WHERE ((t1.some_date >= '2022-01-01') AND (t1.some_date <= '2022-12-31')) AND (t2.somearray @> '{"Some value"}'::text[])
 ORDER BY t3.t3_value DESC NULLS LAST
 LIMIT 500;
 Limit (cost=1.15..1393.95 rows=500 width=168) (actual time=252.891..241643.196 rows=10 loops=1)
 -> Nested Loop (cost=1.15..4687957.28 rows=1682925 width=168) (actual time=252.889..241643.186 rows=10 loops=1)
 -> Nested Loop (cost=0.86..4515363.25 rows=6903110 width=142) (actual time=0.546..238786.757 rows=6905136 loops=1)
 -> Index Scan Backward using t3_t3_value_idx on t39b4 t3 (cost=0.43..384965.62 rows=6905138 width=10) (actual time=0.419..86244.771 rows=6905136 loops=1)
 -> Index Scan using t1_id on t1 t1 (cost=0.43..0.60 rows=1 width=140) (actual time=0.022..0.022 rows=1 loops=6905136)
 Index Cond: (t1_id = t3.t1_id)
 Filter: ((some_date >= '2022-01-01'::date) AND (some_date <= '2022-12-31'::date))
 -> Memoize (cost=0.29..0.31 rows=1 width=33) (actual time=0.000..0.000 rows=0 loops=6905136)
 Cache Key: t1.t2_id
 Cache Mode: logical
 Hits: 6904577 Misses: 559 Evictions: 0 Overflows: 0 Memory Usage: 40kB
 -> Index Scan using t2_id on t2 t2 (cost=0.28..0.30 rows=1 width=33) (actual time=0.015..0.015 rows=0 loops=559)
 Index Cond: (t2_id = t1.t2_id)
 Filter: (somearray @> '{Some value}'::text[])
 Rows Removed by Filter: 1
 Planning Time: 2.817 ms
 Execution Time: 241643.279 ms

I can get decent performance with set enable_nestloop = off, but when Nested Loops works, it's sometimes 100X faster... the problem is it's sometimes 100X slower, too.

Related questions:

For reference, here's the plan with enable_nestloop = off:

Limit (cost=491707.51..491765.85 rows=500 width=168) (actual time=2272.171..2299.444 rows=10 loops=1)
 -> Gather Merge (cost=491707.51..655369.37 rows=1402718 width=168) (actual time=2249.844..2277.115 rows=10 loops=1)
 Workers Planned: 2
 Workers Launched: 2
 -> Sort (cost=490707.49..492460.89 rows=701359 width=168) (actual time=2220.992..2220.999 rows=3 loops=3)
 Sort Key: t3.t3_value DESC NULLS LAST
 Sort Method: quicksort Memory: 25kB
 Worker 0: Sort Method: quicksort Memory: 26kB
 Worker 1: Sort Method: quicksort Memory: 25kB
 -> Parallel Hash Join (cost=304205.51..455759.53 rows=701359 width=168) (actual time=2066.257..2220.918 rows=3 loops=3)
 Hash Cond: (t3.t1_id = t1.t1_id)
 -> Parallel Seq Scan on t3 t3 (cost=0.00..93305.41 rows=2877141 width=10) (actual time=0.195..431.625 rows=2301712 loops=3)
 -> Parallel Hash (cost=278999.52..278999.52 rows=701359 width=166) (actual time=1203.337..1203.341 rows=3 loops=3)
 Buckets: 131072 Batches: 32 Memory Usage: 1056kB
 -> Hash Join (cost=66.61..278999.52 rows=701359 width=166) (actual time=1188.304..1202.565 rows=3 loops=3)
 Hash Cond: (t1.t2_id = t2.t2_id)
 -> Parallel Seq Scan on t1 t1 (cost=0.00..271357.04 rows=2876870 width=140) (actual time=11.000..987.052 rows=2301712 loops=3)
 Filter: ((somedate >= '2022-01-01'::date) AND (somedate <= '2022-12-31'::date))
 -> Hash (cost=62.56..62.56 rows=324 width=33) (actual time=0.292..0.294 rows=324 loops=3)
 Buckets: 1024 Batches: 1 Memory Usage: 31kB
 -> Bitmap Heap Scan on t2 (cost=10.51..62.56 rows=324 width=33) (actual time=0.087..0.233 rows=324 loops=3)
 Recheck Cond: (somearray @> '{Some value}'::text[])
 Heap Blocks: exact=18
 -> Bitmap Index Scan on t2_somearray_idx (cost=0.00..10.43 rows=324 width=0) (actual time=0.054..0.055 rows=324 loops=3)
 Index Cond: (somearray @> '{Some value}'::text[])
Execution Time: 2301.592 ms

Question 2

None of postgresql's statistics will help with this. If you want the best chance for the best plan, query t2 first yielding as array, then stuff the array literal into an =ANY against t1 and t3.

Question 3

I was afraid that would be the answer! Thank you.

Question 4

I came across a trick to prevent the planner from using the index for a pipelined sort. Instead of:

ORDER BY t3.t3_value DESC NULLS LAST

I was able to get it to use a better plan with:

ORDER BY t3.t3_value + 0 DESC NULLS LAST

(+ 0 for a number, or || '' for a string, or + INTERVAL '0 second' for a date, etc.)

It doesn't fix the estimates, so it still might not choose the best plan, but it tends to choose a better plan than before. This seems to be helpful when there is a very different distribution of values between a large fact table and a smaller dimension table, and the filter condition is on the dimension table, so the estimates are way off.

As per the earlier accepted answer, there isn't a way to fix this with statistics.

Question 5

You can try to drop index t2_somearray_idx on t2.

If you don't have indexes on t1.some_date, the only factor which affect the join sequence planning is the index t2_somearray_idx.

Question 6

Thanks, it seems to give the same plan after I drop that index. I think it's actually using the primary key index on t2 in the nested loop (it's coming in from t1, so to speak), and then filtering out the somearray values manually. I'm questioning the somearray index in general, as it's not very selective, but removal doesn't seem to help here.

jjanes jjanes 42.4k3 gold badges44 silver badges54 bronze badges · Accepted Answer · 2023-05-29 10:24:26Z

1

None of postgresql's statistics will help with this. If you want the best chance for the best plan, query t2 first yielding as array, then stuff the array literal into an =ANY against t1 and t3.

Share

Improve this answer

edited Jun 1, 2023 at 13:12

answered May 29, 2023 at 10:24

jjanes's user avatar

jjanes jjanes

42.4k3 gold badges44 silver badges54 bronze badges

1

I was afraid that would be the answer! Thank you.

MZS
– MZS

2023年06月01日 13:04:44 +00:00
Commented Jun 1, 2023 at 13:04

Add a comment |

Stack Exchange Network

Improving stats for a join to avoid Nested Loops with LIMIT

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions