Specific index issue with postgresql planner/optimizer

Question 1

i have an issue where i have no idea anymore and can't seem to find a similar one already discussed, so...

Am/have to be on pgsql 10.15 / RHEL x86_64.

rel A has about 813 million records.
format is like id,....,snapzeit.
col snapzeit is not nullable, nonunique, timestamp.
there is an index on date_trunc('minute', snapzeit).
no refint constraints involved.
recently analyzed

rel B is only one col named t of timestamp and has like 13 rows; unique timestamps. rel B is only there because i came from working with generate_series, which was even slower than my current issue.

I want to have all from A where snapzeit = B.t, which would be around 255k rows.

If i filter by giving a list of literals (same values as in B) it is lightning fast:

select * from A
where date_trunc('minute',snapzeit) in ('2021-11-01 04:30',...);

query plan:

Bitmap Heap Scan on A (cost=1294171.21..24450480.97 rows=69102630 width=193) (actual time=16.184..278.121 rows=255589 loops=1)
 Recheck Cond: (date_trunc('minute'::text, snapzeit) = ANY ('{"2021年11月17日 04:00:00",$CUT_OUT$'::timestamp without time zone[]))
 Heap Blocks: exact=5879
 -> Bitmap Index Scan on IDX_A (cost=0.00..1276895.55 rows=69102630 width=0) (actual time=15.367..15.368 rows=255589 loops=1)
 Index Cond: (date_trunc('minute'::text, snapzeit) = ANY ('{"2021年11月17日 04:00:00",$CUT_OUT$'::timestamp without time zone[]))
Planning time: 0.157 ms
Execution time: 456.024 ms

If i use rel B it takes ages. I added a second predicate so it comes back in 2021 at all:

select * from A
where date_trunc('minute',snapzeit) in (
 select date_trunc('minute',t) from B
)
and snapzeit >= '2021-11-10';

query plan:

Nested Loop (cost=622164.37..18870692.83 rows=12866340 width=193) (actual time=3051.373..42762.069 rows=255589 loops=1)
 -> HashAggregate (cost=1.21..1.38 rows=17 width=8) (actual time=0.057..0.133 rows=17 loops=1)
 Group Key: date_trunc('minute'::text, B.t)
 -> Seq Scan on test (cost=0.00..1.17 rows=17 width=8) (actual time=0.024..0.039 rows=17 loops=1)
 -> Bitmap Heap Scan on A (cost=622163.16..1108754.04 rows=128663 width=193) (actual time=2478.495..2492.003 rows=15035 loops=17)
 Recheck Cond: ((date_trunc('minute'::text, snapzeit) = date_trunc('minute'::text, B.t)) AND (snapzeit >= '2021-11-10 00:00:00'::timestamp without time zone))
 Heap Blocks: exact=5879
 -> BitmapAnd (cost=622163.16..622163.16 rows=128663 width=0) (actual time=2478.424..2478.424 rows=0 loops=17)
 -> Bitmap Index Scan on IDX_A (cost=0.00..75117.91 rows=4065181 width=0) (actual time=0.870..0.870 rows=15035 loops=17)
 Index Cond: (date_trunc('minute'::text, snapzeit) = date_trunc('minute'::text, B.t))
 -> Bitmap Index Scan on IDX2_A (cost=0.00..540579.67 rows=25732679 width=0) (actual time=2464.373..2464.373 rows=36137255 loops=17)
 Index Cond: (snapzeit >= '2021-11-10 00:00:00'::timestamp without time zone)
Planning time: 0.313 ms
Execution time: 42936.373 ms

rel A has also an IDX on snapzeit without that date_trunc.

Tried variations with ANY(), inner join A and B on A.snapzeit=B.t, without date_trunc in subselect; it stays slow.

My expectation would be a similar performance as in the "literal" case; so, where comes all the action from in the second case while it could go with those handful of values to the index and that's it ?

On the other hand i feel i overlook something easy/obvious. Any hint ?

Cheers and thanx alot, JJ

Edit:

Enter array() and all goes fast:

select * from A
 where date_trunc('minute',snapzeit) = ANY(ARRAY (
 select date_trunc('minute',t) from B
 ));

Solved by @LaurenzAlbe, Thanx.

While i'd still say the planner/optimizer should do that on its own.

Cheers, JJ

Question 2

Is the behavior the same if you replace IN (SELECT ...) with = ANY (array(SELECT ...))?

Question 3

@LaurenzAlbe No, it is not. That seems to be the solution ! It is superfast for case 2 when i give array() and use =any(). any() without array() still slow. What is it ?

Question 4

It is guesswork. PostgreSQL can use an index scan with = ANY (as you see in your "fast" execution plan).

Question 5

@LaurenzAlbe True..but wouldn't that transform on an argument be the planner's job ? Just to have my expectaions right.

Question 6

Yes, but obviously that doesn't happen, so you have to help the planner.

Question 7

Experimentation shows that the following is enough to make PostgreSQL use the faster index scan: change

WHERE date_trunc('minute',snapzeit) IN (SELECT ...)

to

WHERE date_trunc('minute',snapzeit) = ANY (array(SELECT ...))

Laurenz Albe Laurenz Albe 61.9k4 gold badges57 silver badges93 bronze badges · Accepted Answer · 2021-11-18 15:44:27Z

Experimentation shows that the following is enough to make PostgreSQL use the faster index scan: change

WHERE date_trunc('minute',snapzeit) IN (SELECT ...)

to

WHERE date_trunc('minute',snapzeit) = ANY (array(SELECT ...))

Stack Exchange Network

Specific index issue with postgresql planner/optimizer

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Specific index issue with postgresql planner/optimizer

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions