0

I have a main fact table t1 with ~6.9m rows, a second dimension table t2 with 1329 rows, and a third table t3 with ~6.9m rows (1-1 with t1, separate because it's produced by a different batch process).

When I filter by something in t2, and ORDER BY something in t3, with a LIMIT 500, PostgreSQL 14.5 picks a Nested Loops plan. The filter on t2 matches about 24% of the rows in t2, but in t1, all those values only occur in 10 rows out of 6.9 million.

I believe the planner is assuming that if my condition matches 24% of rows in t2, it'll match 24% of rows in t1, too, so it can quickly scan backwards down the ORDER BY index on t3 and stop after 500 rows. Instead, it ends up scanning nearly the entire table for several minutes.

I've tried bumping the stats target to 1000 for t1.t2_id, which is more than the number of unique values (559). The stats for t2 are good: it thinks my filter will match 0.2437923 of the results for t2, and it does (324 rows out of 1329). The problem is it thinks that means it'll match 0.2437923 of the results for t1, too, and it's nowhere close. There's a step where it thinks it'll match 1682925/6903110 rows of t1 (exactly 0.2437923 again), but it's actually just 10.

Is there a way to make it understand more about the distribution of the t2_id column as it applies to the join - that it's vastly different between t1 and t2?

explain analyze SELECT t1.col1, t3.t3_value, t1.t2_id, t2..., t1....
 FROM t1
 INNER JOIN t2 ON t1.t2_id = t2.t2_id
 INNER JOIN t3 ON t1.t1_id = t3.t1_id
 WHERE ((t1.some_date >= '2022-01-01') AND (t1.some_date <= '2022-12-31')) AND (t2.somearray @> '{"Some value"}'::text[])
 ORDER BY t3.t3_value DESC NULLS LAST
 LIMIT 500;
 Limit (cost=1.15..1393.95 rows=500 width=168) (actual time=252.891..241643.196 rows=10 loops=1)
 -> Nested Loop (cost=1.15..4687957.28 rows=1682925 width=168) (actual time=252.889..241643.186 rows=10 loops=1)
 -> Nested Loop (cost=0.86..4515363.25 rows=6903110 width=142) (actual time=0.546..238786.757 rows=6905136 loops=1)
 -> Index Scan Backward using t3_t3_value_idx on t39b4 t3 (cost=0.43..384965.62 rows=6905138 width=10) (actual time=0.419..86244.771 rows=6905136 loops=1)
 -> Index Scan using t1_id on t1 t1 (cost=0.43..0.60 rows=1 width=140) (actual time=0.022..0.022 rows=1 loops=6905136)
 Index Cond: (t1_id = t3.t1_id)
 Filter: ((some_date >= '2022-01-01'::date) AND (some_date <= '2022-12-31'::date))
 -> Memoize (cost=0.29..0.31 rows=1 width=33) (actual time=0.000..0.000 rows=0 loops=6905136)
 Cache Key: t1.t2_id
 Cache Mode: logical
 Hits: 6904577 Misses: 559 Evictions: 0 Overflows: 0 Memory Usage: 40kB
 -> Index Scan using t2_id on t2 t2 (cost=0.28..0.30 rows=1 width=33) (actual time=0.015..0.015 rows=0 loops=559)
 Index Cond: (t2_id = t1.t2_id)
 Filter: (somearray @> '{Some value}'::text[])
 Rows Removed by Filter: 1
 Planning Time: 2.817 ms
 Execution Time: 241643.279 ms

I can get decent performance with set enable_nestloop = off, but when Nested Loops works, it's sometimes 100X faster... the problem is it's sometimes 100X slower, too.

Related questions:

For reference, here's the plan with enable_nestloop = off:

Limit (cost=491707.51..491765.85 rows=500 width=168) (actual time=2272.171..2299.444 rows=10 loops=1)
 -> Gather Merge (cost=491707.51..655369.37 rows=1402718 width=168) (actual time=2249.844..2277.115 rows=10 loops=1)
 Workers Planned: 2
 Workers Launched: 2
 -> Sort (cost=490707.49..492460.89 rows=701359 width=168) (actual time=2220.992..2220.999 rows=3 loops=3)
 Sort Key: t3.t3_value DESC NULLS LAST
 Sort Method: quicksort Memory: 25kB
 Worker 0: Sort Method: quicksort Memory: 26kB
 Worker 1: Sort Method: quicksort Memory: 25kB
 -> Parallel Hash Join (cost=304205.51..455759.53 rows=701359 width=168) (actual time=2066.257..2220.918 rows=3 loops=3)
 Hash Cond: (t3.t1_id = t1.t1_id)
 -> Parallel Seq Scan on t3 t3 (cost=0.00..93305.41 rows=2877141 width=10) (actual time=0.195..431.625 rows=2301712 loops=3)
 -> Parallel Hash (cost=278999.52..278999.52 rows=701359 width=166) (actual time=1203.337..1203.341 rows=3 loops=3)
 Buckets: 131072 Batches: 32 Memory Usage: 1056kB
 -> Hash Join (cost=66.61..278999.52 rows=701359 width=166) (actual time=1188.304..1202.565 rows=3 loops=3)
 Hash Cond: (t1.t2_id = t2.t2_id)
 -> Parallel Seq Scan on t1 t1 (cost=0.00..271357.04 rows=2876870 width=140) (actual time=11.000..987.052 rows=2301712 loops=3)
 Filter: ((somedate >= '2022-01-01'::date) AND (somedate <= '2022-12-31'::date))
 -> Hash (cost=62.56..62.56 rows=324 width=33) (actual time=0.292..0.294 rows=324 loops=3)
 Buckets: 1024 Batches: 1 Memory Usage: 31kB
 -> Bitmap Heap Scan on t2 (cost=10.51..62.56 rows=324 width=33) (actual time=0.087..0.233 rows=324 loops=3)
 Recheck Cond: (somearray @> '{Some value}'::text[])
 Heap Blocks: exact=18
 -> Bitmap Index Scan on t2_somearray_idx (cost=0.00..10.43 rows=324 width=0) (actual time=0.054..0.055 rows=324 loops=3)
 Index Cond: (somearray @> '{Some value}'::text[])
Execution Time: 2301.592 ms
asked May 29, 2023 at 2:59

3 Answers 3

1

None of postgresql's statistics will help with this. If you want the best chance for the best plan, query t2 first yielding as array, then stuff the array literal into an =ANY against t1 and t3.

answered May 29, 2023 at 10:24
1
  • I was afraid that would be the answer! Thank you. Commented Jun 1, 2023 at 13:04
2

I came across a trick to prevent the planner from using the index for a pipelined sort. Instead of:

ORDER BY t3.t3_value DESC NULLS LAST

I was able to get it to use a better plan with:

ORDER BY t3.t3_value + 0 DESC NULLS LAST

(+ 0 for a number, or || '' for a string, or + INTERVAL '0 second' for a date, etc.)

It doesn't fix the estimates, so it still might not choose the best plan, but it tends to choose a better plan than before. This seems to be helpful when there is a very different distribution of values between a large fact table and a smaller dimension table, and the filter condition is on the dimension table, so the estimates are way off.

As per the earlier accepted answer, there isn't a way to fix this with statistics.

answered Apr 24, 2024 at 14:34
0

You can try to drop index t2_somearray_idx on t2.

If you don't have indexes on t1.some_date, the only factor which affect the join sequence planning is the index t2_somearray_idx.

answered May 31, 2023 at 3:51
1
  • Thanks, it seems to give the same plan after I drop that index. I think it's actually using the primary key index on t2 in the nested loop (it's coming in from t1, so to speak), and then filtering out the somearray values manually. I'm questioning the somearray index in general, as it's not very selective, but removal doesn't seem to help here. Commented Jun 1, 2023 at 13:04

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.