Why does Postgres not use an index when joining a materialized view?

Question 1

I have a table defined as:

CREATE TABLE addresses (
 "address" varchar(42),
 "other_info1" text,
 "other_info2" bigint
);

and I have some manual corrections to that table defined as:

CREATE TABLE address_corrections (
 "address" varchar(42),
 "other_info1" text,
 "other_info2" bigint
);

I have a simple union + dedup query that I am persisting as a materialized view corrected_addresses with the same schema as the two tables above. The size of the materialized view is about the same size as the original table addresses, which is about 400k rows. I have an index on the materialized view which matches the same exact index on the original addresses table:

CREATE INDEX corrected_addresses_address_idx ON corrected_addresses(address);

Now when querying some other tables I want to join on this table to get access to other_info1 and other_info2. Simple enough. However, this join appears to not use the index on address. The query looks like this:

EXPLAIN (analyze, buffers, format text)
SELECT transfer.from_name, transfer.to_name, transfer.value,
 transfer.address, address.other_info1, address.other_info2
FROM transfers transfer
LEFT JOIN corrected_addresses address ON transfer.address=address.address
WHERE transfer.from_name='some name here';

with either addresses or corrected_addresses as the join table.

Here is the result for the original address table:

Gather (cost=1205.50..43489.92 rows=6241 width=173) (actual time=1.422..12.292 rows=2 loops=1)
 Workers Planned: 2
 Workers Launched: 2
 Buffers: shared hit=12
 -> Nested Loop Left Join (cost=205.50..41865.82 rows=2600 width=173) (actual time=0.184..0.198 rows=1 loops=3)
 Buffers: shared hit=12
 -> Parallel Bitmap Heap Scan on transfers transfer (cost=205.07..25042.94 rows=2600 width=204) (actual time=0.176..0.180 rows=1 loops=3)
 Recheck Cond: ((from_name)::text = 'some name here'::text)
 Heap Blocks: exact=2
 Buffers: shared hit=8
 -> Bitmap Index Scan on transfers_from_name_timestamp_idx (cost=0.00..203.51 rows=6241 width=0) (actual time=0.400..0.401 rows=2 loops=1)
 Index Cond: ((from_name)::text = 'some name here'::text)
 Buffers: shared hit=6
 -> Memoize (cost=0.43..7.99 rows=1 width=55) (actual time=0.017..0.018 rows=0 loops=2)
 Cache Key: transfer.address
 Cache Mode: logical
 Hits: 0 Misses: 2 Evictions: 0 Overflows: 0 Memory Usage: 1kB
 Buffers: shared hit=4
 -> Index Scan using addresses_idx on addresses a (cost=0.42..7.98 rows=1 width=55) (actual time=0.013..0.013 rows=0 loops=2)
 Index Cond: ((address)::text = (transfer.address)::text)
 Buffers: shared hit=4
Planning:
 Buffers: shared hit=334 dirtied=1
Planning Time: 5.692 ms
Execution Time: 12.601 ms

And for the materialized view:

Gather (cost=10501.47..35970.26 rows=6241 width=173) (actual time=100.881..111.703 rows=2 loops=1)
 Workers Planned: 2
 Workers Launched: 2
 Buffers: shared hit=5327
 -> Parallel Hash Left Join (cost=9501.47..34346.16 rows=2600 width=173) (actual time=94.729..95.259 rows=1 loops=3)
 Hash Cond: ((transfer.address)::text = (a.address)::text)
 Buffers: shared hit=5327
 -> Parallel Bitmap Heap Scan on transfers transfer (cost=205.07..25042.94 rows=2600 width=204) (actual time=0.162..0.164 rows=1 loops=3)
 Recheck Cond: ((from_name)::text = 'some name here'::text)
 Heap Blocks: exact=1
 Buffers: shared hit=8
 -> Bitmap Index Scan on transfers_from_name_timestamp_idx (cost=0.00..203.51 rows=6241 width=0) (actual time=0.070..0.071 rows=2 loops=1)
 Index Cond: ((from_name)::text = 'some name here'::text)
 Buffers: shared hit=6
 -> Parallel Hash (cost=7051.18..7051.18 rows=179618 width=55) (actual time=92.901..92.901 rows=143694 loops=3)
 Buckets: 524288 Batches: 1 Memory Usage: 41856kB
 Buffers: shared hit=5255
 -> Parallel Seq Scan on corrected_addresses a (cost=0.00..7051.18 rows=179618 width=55) (actual time=0.006..27.550 rows=143694 loops=3)
 Buffers: shared hit=5255
Planning:
 Buffers: shared hit=37
Planning Time: 0.482 ms
Execution Time: 111.758 ms

Both of those execution times are small, I believe that's partially because of caching. The difference on a fresh run is ~100ms vs. a few seconds. Sorry I couldn't get an example here.

Does anyone know why Postgres is not using the index for the materialized view?

Question 2

Also please clarify why you expect an index to be useful - is the join going to only need to see a small amount of your MView? There’s no filters in your query which usually means every row is necessary.

Question 3

Thanks for the responses. Just updated it with more info. Re: "is the join going to only need to see a small amount of your MView" yes, actually the join will retrieve exactly one or zero rows from the addresses table

Question 4

What do you mean by "An index on that column list will not support an index only scan for that query"? Can you tell that from the plan?

Question 5

It seems changing this to a lateral join with a subquery and LIMIT 1 forces the index. Still unsure why the materialized view works differently, and the lateral join is uglier, but at least I have something workable here.

Question 6

Does analyze transfer; change anything? Maybe even vacuum analyze transfer;

Question 7

The horrible estimation of 'transfers' condition (expected 2600, found 1) caused the cost of the hash join and the nested loop to appear falsely similar.

Why does misestimating by 2600 fold cause one query to use a different plan but not the other? No idea, probably just luck. I doubt it is even reproducible, much less interesting. Fix the root problem.

If an ANALYZE transfers doesn't fix it, then you will need to dig into pg_stats for column "from_name".

Question 8

For those who stumble across this question later, here is what I did to solve it:

SELECT transfer.from_name, transfer.to_name, transfer.value,
 transfer.address, address.other_info1, address.other_info2
FROM transfers transfer
LEFT JOIN LATERAL (SELECT * FROM corrected_addresses a_inner ON transfer.address=a_inner.address LIMIT 1) AS address ON true
WHERE transfer.from_name='some name here';

I suppose the LIMIT 1 here makes this a meaningfully different query from the previous join. In my case this is desirable but that won't be the case for everyone. Still, why the index was used on the original table and not the materialized view is beyond me, but if it works it works.

Question 9

That's a band-aid, not a solution.

jjanes jjanesjjanes 42.4k3 gold badges44 silver badges54 bronze badges · Answer 1 · 2022-05-28 14:30:45Z

The horrible estimation of 'transfers' condition (expected 2600, found 1) caused the cost of the hash join and the nested loop to appear falsely similar.

Why does misestimating by 2600 fold cause one query to use a different plan but not the other? No idea, probably just luck. I doubt it is even reproducible, much less interesting. Fix the root problem.

If an ANALYZE transfers doesn't fix it, then you will need to dig into pg_stats for column "from_name".

henryperson henrypersonhenryperson 214 bronze badges · Answer 2 · 2022-05-28 14:19:25Z

For those who stumble across this question later, here is what I did to solve it:

SELECT transfer.from_name, transfer.to_name, transfer.value,
 transfer.address, address.other_info1, address.other_info2
FROM transfers transfer
LEFT JOIN LATERAL (SELECT * FROM corrected_addresses a_inner ON transfer.address=a_inner.address LIMIT 1) AS address ON true
WHERE transfer.from_name='some name here';

I suppose the LIMIT 1 here makes this a meaningfully different query from the previous join. In my case this is desirable but that won't be the case for everyone. Still, why the index was used on the original table and not the materialized view is beyond me, but if it works it works.

1

That's a band-aid, not a solution.

Laurenz Albe
– Laurenz Albe

2022年05月30日 06:51:18 +00:00
Commented May 30, 2022 at 6:51

Stack Exchange Network

Why does Postgres not use an index when joining a materialized view?

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Why does Postgres not use an index when joining a materialized view?

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions