I have an SQL query that executes slowly on PostgreSQL 13.15 with 128GB of memory. The query mainly performs a Bitmap Heap Scan, and I’ve noticed that many reads are coming from the heap instead of the cache. Here’s a link to the query and execution plan.
An index that frequently appears in the plan looks like this:
create index ix_contacts__addresses_value__normalized
on contacts__addresses using gin (company_id, field_name, field_id, value__normalized);
My memory settings:
• shared_buffers = 32GB
• work_mem = 64MB
• effective_cache_size = 96GB
• maintenance_work_mem = 1320MB
Questions:
- Why is the query reading so much from the heap?
- How can I configure PostgreSQL to better use memory (shared_buffers, work_mem, and other settings) to avoid unnecessary heap reads?
- Should I consider modifying the indexes to reduce execution time?
I would greatly appreciate any advice on optimizing caching and overall performance.
-
Perhaps you should read about how bitmaps work in Postgres. Also "heap reads" <> disk I/O, as is evident from the hit ratio reported in the plan.mustaccio– mustaccio2024年09月13日 18:36:03 +00:00Commented Sep 13, 2024 at 18:36
-
@mustaccio Thank you for your response! I’ll definitely dive deeper into understanding how bitmaps work in Postgres. Do you know if there’s a way to fine-tune PostgreSQL settings to increase cache usage and reduce heap reads? Could PostgreSQL Warm-up help with this? We’re using it but haven’t noticed much effect so far.Mykola Shved– Mykola Shved2024年09月14日 09:24:12 +00:00Commented Sep 14, 2024 at 9:24
-
2"the heap instead of the cache" -- that's a non sequitur. The heap instead of the index, or the disk instead of the cache; both heap and index pages can be read from the disk or from the cache; in your case it's mostly the latter.mustaccio– mustaccio2024年09月14日 13:23:30 +00:00Commented Sep 14, 2024 at 13:23
-
What makes you think it is reading from the heap so much? And which heap? "Heap" is pretty generic term.jjanes– jjanes2024年09月17日 15:27:37 +00:00Commented Sep 17, 2024 at 15:27
1 Answer 1
Your first two questions don't make sense, for reasons outlined by mustaccio.
For your 3rd question, multicolumn GIN indexes are not like multicolumn btree indexes. Each column has to be handled individually, and then the result combined internally. So instead of jumping just to the rows which meet all 4 conditions, it first needs to make lists of all rows satisfying each of the separate conditions and then determine which rows are on all four lists. That is a lot of work if any of the conditions are not very selective.
You would probably be better served by a multicolumn btree index over (company_id, field_name, field_id)
, and then possibly a GIN index over just value__normalized, although I suspect that last one is not really needed.
But it might be better to incorporate the JSONB into an expressional btree index like (company_id, field_name, field_id, (value__normalized->>'country'))
. This would require you to rewrite the query to test with ->>
rather than with @>
.
-
Hey, thanks for your answer. Your advice with JSONB really boosted performance. It was exactly what I needed. Thanks again!Mykola Shved– Mykola Shved2024年09月20日 15:24:44 +00:00Commented Sep 20, 2024 at 15:24
Explore related questions
See similar questions with these tags.