i have an issue where i have no idea anymore and can't seem to find a similar one already discussed, so...
Am/have to be on pgsql 10.15 / RHEL x86_64.
- rel A has about 813 million records.
- format is like id,....,snapzeit.
- col snapzeit is not nullable, nonunique, timestamp.
- there is an index on date_trunc('minute', snapzeit).
- no refint constraints involved.
- recently analyzed
rel B is only one col named t of timestamp and has like 13 rows; unique timestamps. rel B is only there because i came from working with generate_series, which was even slower than my current issue.
I want to have all from A where snapzeit = B.t, which would be around 255k rows.
If i filter by giving a list of literals (same values as in B) it is lightning fast:
select * from A
where date_trunc('minute',snapzeit) in ('2021-11-01 04:30',...);
query plan:
Bitmap Heap Scan on A (cost=1294171.21..24450480.97 rows=69102630 width=193) (actual time=16.184..278.121 rows=255589 loops=1)
Recheck Cond: (date_trunc('minute'::text, snapzeit) = ANY ('{"2021年11月17日 04:00:00",$CUT_OUT$'::timestamp without time zone[]))
Heap Blocks: exact=5879
-> Bitmap Index Scan on IDX_A (cost=0.00..1276895.55 rows=69102630 width=0) (actual time=15.367..15.368 rows=255589 loops=1)
Index Cond: (date_trunc('minute'::text, snapzeit) = ANY ('{"2021年11月17日 04:00:00",$CUT_OUT$'::timestamp without time zone[]))
Planning time: 0.157 ms
Execution time: 456.024 ms
If i use rel B it takes ages. I added a second predicate so it comes back in 2021 at all:
select * from A
where date_trunc('minute',snapzeit) in (
select date_trunc('minute',t) from B
)
and snapzeit >= '2021-11-10';
query plan:
Nested Loop (cost=622164.37..18870692.83 rows=12866340 width=193) (actual time=3051.373..42762.069 rows=255589 loops=1)
-> HashAggregate (cost=1.21..1.38 rows=17 width=8) (actual time=0.057..0.133 rows=17 loops=1)
Group Key: date_trunc('minute'::text, B.t)
-> Seq Scan on test (cost=0.00..1.17 rows=17 width=8) (actual time=0.024..0.039 rows=17 loops=1)
-> Bitmap Heap Scan on A (cost=622163.16..1108754.04 rows=128663 width=193) (actual time=2478.495..2492.003 rows=15035 loops=17)
Recheck Cond: ((date_trunc('minute'::text, snapzeit) = date_trunc('minute'::text, B.t)) AND (snapzeit >= '2021-11-10 00:00:00'::timestamp without time zone))
Heap Blocks: exact=5879
-> BitmapAnd (cost=622163.16..622163.16 rows=128663 width=0) (actual time=2478.424..2478.424 rows=0 loops=17)
-> Bitmap Index Scan on IDX_A (cost=0.00..75117.91 rows=4065181 width=0) (actual time=0.870..0.870 rows=15035 loops=17)
Index Cond: (date_trunc('minute'::text, snapzeit) = date_trunc('minute'::text, B.t))
-> Bitmap Index Scan on IDX2_A (cost=0.00..540579.67 rows=25732679 width=0) (actual time=2464.373..2464.373 rows=36137255 loops=17)
Index Cond: (snapzeit >= '2021-11-10 00:00:00'::timestamp without time zone)
Planning time: 0.313 ms
Execution time: 42936.373 ms
rel A has also an IDX on snapzeit without that date_trunc.
Tried variations with ANY(), inner join A and B on A.snapzeit=B.t, without date_trunc in subselect; it stays slow.
My expectation would be a similar performance as in the "literal" case; so, where comes all the action from in the second case while it could go with those handful of values to the index and that's it ?
On the other hand i feel i overlook something easy/obvious. Any hint ?
Cheers and thanx alot, JJ
Edit:
Enter array() and all goes fast:
select * from A
where date_trunc('minute',snapzeit) = ANY(ARRAY (
select date_trunc('minute',t) from B
));
Solved by @LaurenzAlbe, Thanx.
While i'd still say the planner/optimizer should do that on its own.
Cheers, JJ
1 Answer 1
Experimentation shows that the following is enough to make PostgreSQL use the faster index scan: change
WHERE date_trunc('minute',snapzeit) IN (SELECT ...)
to
WHERE date_trunc('minute',snapzeit) = ANY (array(SELECT ...))
Explore related questions
See similar questions with these tags.
IN (SELECT ...)
with= ANY (array(SELECT ...))
?= ANY
(as you see in your "fast" execution plan).