2

Changing only limit from 40 to 50 in following query triggers different execution plans. And unfortunatly one I needed is much slower. So the question is: why this happening and how can I force postgresql to use faster plan? I'm using postgresql 14.5

SELECT "Id"
 FROM "Podcasts" AS P 
 INNER JOIN "PodcastCategories" AS PC ON P."Id"=PC."PodcastId"
 WHERE "LastPublishDate" IS NOT NULL AND "Dead" = false AND "Hidden" = false AND PC."CategoryId" = ANY (ARRAY[1]) 
 AND P."LastPublishDate"<'2023-01-14 23:00:00+00'
 ORDER BY "LastPublishDate" DESC
 LIMIT 50

This is plan for limit 40 and it is expected one and fast!

Limit (cost=1000.87..53095.17 rows=40 width=12) (actual time=46.797..606.536 rows=40 loops=1)
->Gather Merge (cost=1000.87..60909.31 rows=46 width=12) (actual time=46.796..606.518 rows=40 loops=1)
 Workers Planned: 2
 Workers Launched: 2
 ->Nested Loop (cost=0.84..59903.98 rows=19 width=12) (actual time=23.367..448.066 rows=15 loops=3)
 -> Parallel Index Only Scan using ""IX_Podcasts_LastPublishDate"" on ""Podcasts"" p (cost=0.42..55488.81 rows=2442 width=12) (actual time=0.791..63.207 rows=259 loops=3)
 Index Cond: (""LastPublishDate"" < '2023-01-14 23:00:00+00'::timestamp with time zone)
 Heap Fetches: 776
 -> Index Only Scan using ""PK_PodcastCategories"" on ""PodcastCategories"" pc (cost=0.42..1.80 rows=1 width=4) (actual time=1.487..1.487 rows=0 loops=776)
 Index Cond: ((""PodcastId"" = p.""Id"") AND (""CategoryId"" = ANY ('{1}'::integer[])))
 Heap Fetches: 21
Planning Time: 2.468 ms
Execution Time: 606.588 ms

This is plan, when limit is 50 and it runs much slower

Limit (cost=59885.72..59888.83 rows=27 width=12) (actual time=34419.067..34436.304 rows=50 loops=1)
->Gather Merge (cost=59885.72..59888.83 rows=27 width=12) (actual time=34419.065..34436.298 rows=50 loops=1)
 Workers Planned: 1
 Workers Launched: 1
->Sort (cost=58885.71..58885.78 rows=27 width=12) (actual time=34415.504..34415.510 rows=40 loops=2)
 Sort Method: top-N heapsort Memory: 28kB
 Sort Key: p.""LastPublishDate"" DESC
 Worker 0: Sort Method: top-N heapsort Memory: 29kB
 ->Parallel Hash Join (cost=55858.19..58885.07 rows=27 width=12) (actual time=34386.500..34412.404 rows=10528 loops=2)
 Hash Cond: (pc.""PodcastId"" = p.""Id"")
 ->Parallel Bitmap Heap Scan on ""PodcastCategories"" pc (cost=336.90..3313.48 rows=19163 width=4) (actual time=94.378..2852.945 rows=16934 loops=2)
 Recheck Cond: (""CategoryId"" = ANY ('{1}'::integer[]))
 Heap Blocks: exact=1292"
 ->Bitmap Index Scan on ""IX_PodcastCategories_CategoryId"" (cost=0.00..328.75 rows=32577 width=0) (actual time=91.542..91.543 rows=33879 loops=1)
 Index Cond: (""CategoryId"" = ANY ('{1}'::integer[]))
 ->Parallel Hash (cost=55490.76..55490.76 rows=2442 width=12) (actual time=31518.266..31518.267 rows=130037 loops=2)
 Buckets: 131072 (originally 8192) Batches: 4 (originally 1) Memory Usage: 4128kB
 ->Parallel Index Only Scan using ""IX_Podcasts_LastPublishDate"" on ""Podcasts"" p (cost=0.42..55490.76 rows=2442 width=12) (actual time=0.029..30960.929 rows=130037 loops=2)
 Index Cond: (""LastPublishDate"" < '2023-01-14 23:00:00+00'::timestamp with time zone)
 Heap Fetches: 260290
Planning Time: 0.348 ms
Execution Time: 34436.367 ms
asked Jan 15, 2023 at 16:44

1 Answer 1

4

At some point it thinks reading all the qualifying points and sorting them will be faster than walking an index in already sorted order and filtering out the ones that don't qualify until it reaches the LIMIT. And it is true, at some point it would be faster. But it misestimates at which point the change-over will happen, probably because it grossly misestimates the number of rows meeting the LastPublishDate criterion (2442 estimated vs 130037 actual).

I think there is no good reason for such a horrible misestimate. Your table seems to be severely under-analyzed. And probably also under-vacuumed, based on the large number of heap fetches you are seeing.

answered Jan 16, 2023 at 0:08
3
  • And, on top, I would check if random_page_cost is set correctly. Commented Jan 16, 2023 at 7:03
  • creating apropriate statistics probably would help. doc: postgresql.org/docs/current/sql-createstatistics.html Commented Jan 16, 2023 at 8:49
  • @RabbanKeyak I don't see what extended statistic is likely to help here. Just the default statistics should be good enough, they just need to be kept more up to date. Commented Jan 16, 2023 at 19:37

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.