Why is PostgreSQL reading from heap and how to improve cache usage?

Question 1

I have an SQL query that executes slowly on PostgreSQL 13.15 with 128GB of memory. The query mainly performs a Bitmap Heap Scan, and I’ve noticed that many reads are coming from the heap instead of the cache. Here’s a link to the query and execution plan.

An index that frequently appears in the plan looks like this:

create index ix_contacts__addresses_value__normalized
 on contacts__addresses using gin (company_id, field_name, field_id, value__normalized);

My memory settings:

• shared_buffers = 32GB
• work_mem = 64MB
• effective_cache_size = 96GB
• maintenance_work_mem = 1320MB

Questions:

Why is the query reading so much from the heap?
How can I configure PostgreSQL to better use memory (shared_buffers, work_mem, and other settings) to avoid unnecessary heap reads?
Should I consider modifying the indexes to reduce execution time?

I would greatly appreciate any advice on optimizing caching and overall performance.

Question 2

Perhaps you should read about how bitmaps work in Postgres. Also "heap reads" <> disk I/O, as is evident from the hit ratio reported in the plan.

Question 3

@mustaccio Thank you for your response! I’ll definitely dive deeper into understanding how bitmaps work in Postgres. Do you know if there’s a way to fine-tune PostgreSQL settings to increase cache usage and reduce heap reads? Could PostgreSQL Warm-up help with this? We’re using it but haven’t noticed much effect so far.

Question 4

"the heap instead of the cache" -- that's a non sequitur. The heap instead of the index, or the disk instead of the cache; both heap and index pages can be read from the disk or from the cache; in your case it's mostly the latter.

Question 5

What makes you think it is reading from the heap so much? And which heap? "Heap" is pretty generic term.

Question 6

Your first two questions don't make sense, for reasons outlined by mustaccio.

For your 3rd question, multicolumn GIN indexes are not like multicolumn btree indexes. Each column has to be handled individually, and then the result combined internally. So instead of jumping just to the rows which meet all 4 conditions, it first needs to make lists of all rows satisfying each of the separate conditions and then determine which rows are on all four lists. That is a lot of work if any of the conditions are not very selective.

You would probably be better served by a multicolumn btree index over (company_id, field_name, field_id), and then possibly a GIN index over just value__normalized, although I suspect that last one is not really needed.

But it might be better to incorporate the JSONB into an expressional btree index like (company_id, field_name, field_id, (value__normalized->>'country')). This would require you to rewrite the query to test with ->> rather than with @>.

Question 7

Hey, thanks for your answer. Your advice with JSONB really boosted performance. It was exactly what I needed. Thanks again!

jjanes jjanes 42.5k3 gold badges44 silver badges54 bronze badges · Accepted Answer · 2024-09-17 15:46:03Z

Your first two questions don't make sense, for reasons outlined by mustaccio.

For your 3rd question, multicolumn GIN indexes are not like multicolumn btree indexes. Each column has to be handled individually, and then the result combined internally. So instead of jumping just to the rows which meet all 4 conditions, it first needs to make lists of all rows satisfying each of the separate conditions and then determine which rows are on all four lists. That is a lot of work if any of the conditions are not very selective.

You would probably be better served by a multicolumn btree index over (company_id, field_name, field_id), and then possibly a GIN index over just value__normalized, although I suspect that last one is not really needed.

But it might be better to incorporate the JSONB into an expressional btree index like (company_id, field_name, field_id, (value__normalized->>'country')). This would require you to rewrite the query to test with ->> rather than with @>.

Hey, thanks for your answer. Your advice with JSONB really boosted performance. It was exactly what I needed. Thanks again!

Stack Exchange Network

Why is PostgreSQL reading from heap and how to improve cache usage?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Why is PostgreSQL reading from heap and how to improve cache usage?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions