postgres query optimization for batch fetches with 100M+ rows

Question 1

Table Definition

CREATE TABLE public.table_1 (
 id bigserial NOT NULL, 
 generation_date date NOT NULL, 
 "namespace" varchar(16) NOT NULL, 
 item_id int8 NOT NULL, 
 generated_on timestamptz NOT NULL, 
 col_type varchar(32) NOT NULL, 
 value_id varchar(32) NOT NULL, 
 col_identity varchar(64) NOT NULL, 
 col_hash varchar(128) NOT NULL, 
 col_id varchar(36) NOT NULL, 
 jsonb_col_1 jsonb NOT NULL, 
 jsonb_col_2 jsonb NOT NULL, 
 jsonb_col_3 jsonb NOT NULL, 
 jsonb_col_4 jsonb NOT NULL, 
 CONSTRAINT table_1_pkey PRIMARY KEY (id, generation_date)) 
 PARTITION BY RANGE (generation_date);
CREATE INDEX idx_table_1_eid_colid_genon_desc ON ONLY public.table_1 USING btree (item_id, col_identity, generated_on DESC) INCLUDE (namespace, col_type, value_id);
CREATE INDEX ix_table_1_eid_kpi_type ON ONLY public.table_1 USING btree (item_id, value_id);
CREATE INDEX idx_table_1_filters_gin ON ONLY public.table_1 USING gin (jsonb_col_1)
CREATE UNIQUE INDEX col_pkey ON ONLY public.table_1 USING btree (id, generation_date)

Table Analyze:

analyzing "public.table_1"
"table_1": scanned 3000000 of 14190205 pages, containing 14746737 live rows and 0 dead rows; 3000000 rows in sample, 69753074 estimated total rows

Some meta information wrt cardinality:

The table named table_1 in public schema is partitioned by generation_date where each date has ~70-75M rows and has the above mentioned indexes and extended statistics is set to 10K for item_id attribute.

NB:

Table is roughly 700GB on disk
namespace has only 1 unique value
col_identity has only 4 unique values
col_type has only 5 unique values
value_id has only 20 unique values
item_id has roughly 1.6M unique values
generated_on is a timestamp field (UTC) (lots of distinct values, unevenly distributed) which shares the same value for any given (item_id, col_identity) combination
- Implying that there're multiple rows for each combination on the table present
work_mem is at 16MB

The goal is to achieve the below:

Fetch all rows for the given batch of item_id values for the maximum value of generated_on corresponding the value of the given col_identity (i.e How do I efficiently get "the most recent corresponding value for generated_on" wrt item_id and col_identity combinations?) with some extra filters on value_id and col_type.

QUERY:

explain (ANALYZE, COSTS, VERBOSE, BUFFERS, SETTINGS, FORMAT text) 
select table_1.*
from table_1 join (
 SELECT
 item_id, 
 col_identity,
 max(generated_on) max_generated_on
 FROM
 table_1
 WHERE
 table_1.item_id in (1,2,3,4,....32)
 and table_1.generation_date IN ('2025-09-14')
 and table_1.col_identity = 'col_identity_value_1'
 and table_1.value_id in ('1', '2', '3', '4', '5', '6', '7', '8')
 and table_1.col_type in ('type_1', 'type_2')
 group by
 table_1.item_id,
 table_1.col_identity
) as find_max_generated_on_subquery on 
 table_1.item_id = find_max_generated_on_subquery.item_id
 and find_max_generated_on_subquery.col_identity = table_1.col_identity
 and find_max_generated_on_subquery.max_generated_on = table_1.generated_on
where
 table_1.generation_date IN ('2025-09-14')
 and (
 (table_1.jsonb_col_1 ? 'key_1' and table_1.jsonb_col_1 ? 'key_2') or 
 ((table_1.jsonb_col_1 ? 'key_1' and table_1.jsonb_col_1 ? 'key_2' and table_1.jsonb_col_1 ? 'key_3'))
 );

Explain Output:

Nested Loop (cost=1.15..487724.16 rows=2 width=1459) (actual time=19.488..1787.995 rows=36323 loops=1)
 Output: table_1.*
 Buffers: shared hit=590 read=98
 I/O Timings: shared read=28.294
 -> GroupAggregate (cost=0.57..210296.26 rows=106648 width=45) (actual time=19.463..1763.775 rows=244 loops=1)
 Output: table_1_1.entity_id, table_1_1.col_identity, max(table_1_1.generated_on)
 Group Key: table_1_1.entity_id
 Buffers: shared hit=745 read=10084
 I/O Timings: shared read=1726.691
 -> Index Scan using idx_table_1_eid_colid_genon_desc on public.table_1 table_1_1 (cost=0.57..208696.54 rows=106648 width=45) (actual time=2.985..1761.119 rows=34775 loops=1)
 Output: table_1_1.entity_id, table_1_1.col_identity, table_1_1.generated_on
 Index Cond: ((table_1_1.entity_id = ANY ('{1,2,3,..32}'::bigint[])) AND ((table_1_1.col_identity)::text = 'col_identity_value_1'::text))
 Filter: (((table_1_1.col_type)::text = ANY ('{type_1,type_2}'::text[])) 
 AND (table_1_1.generation_date = '2025-09-14'::date) 
 AND ((table_1_1.value_id)::text = ANY ('{'1', '2', '3', '4', '5', '6', '7', '8'}'::text[])))
 Rows Removed by Filter: 1553
 Buffers: shared hit=745 read=10084
 I/O Timings: shared read=1726.691
 -> Index Scan using table_1_p20250914_idx_table_1_eid_colid_genon_desc on public.table_1_p20250914 table_1 (cost=0.57..2.59 rows=1 width=1459) (actual time=0.011..0.077 rows=149 loops=244)
 Output: table_1.*
 Index Cond: ((table_1.entity_id = table_1_1.entity_id) AND ((table_1.col_identity)::text = (table_1_1.col_identity)::text) AND (table_1.generated_on = (max(table_1_1.generated_on))))
 Filter: ((table_1.jsonb_col_1 ? 'key_1'::text) AND (table_1.jsonb_col_1 ? 'key_2'::text) AND (table_1.generation_date = '2025-09-14'::date))
 Buffers: shared hit=10748
Settings: 
 enable_partitionwise_join = 'on', 
 maintenance_io_concurrency = '1', 
 effective_cache_size = '87801200kB', 
 enable_partitionwise_aggregate = 'on', 
 random_page_cost = '1',
 work_mem = '16MB'
Planning:
 Buffers: shared hit=141
Planning Time: 1.615 ms
Execution Time: 1789.477 ms

I am trying to help understand the above explain and optimise the query planner so that it doesn't overestimate the rows the way it's doing right now. Can someone share what can be possible checked / added / verified to improve the planner and query execution as it's taking roughly 6-8 seconds otherwise.

Question 2

Rather than using a self-join, you should try using the simpler DISTINCT ON clause proprietary to PostgreSQL:

SELECT DISTINCT ON (item_id, col_identity)
*
FROM table_1
WHERE ...
ORDER BY item_id, col_identity,
 generated_on DESC;

That should be able to use your index.

Setting the statistics target to 10000 is insanely high. Unless all your queries are long-running analytical ones, the increased planning time will be painful.

With your query, most of the time is spent table I/O for the index scan. This could be reduced by rewriting the table in index order:

CLUSTER table_1 USING idx_table_1_eid_colid_genon_desc;

The bad estimates might be improved using extended statistics. But I see little point in trying to optimize a query that had better be rewritten.

Question 3

Thank you for passing by, I'm curious, The query plan execution finished in 6 milliseconds, not the actual query, right? I was using dBeaver and when all the rows are fetched, the query took roughly 7-8 secs. OR that's because of transferring the data back to client? Another novice question, Should one worry about the planner statistics difference wrt estimated rows OR it depends on the underlying data distribution is sometime fine to differ? Thanks again!

Question 4

You edited the question and replaced the plan that took 6 milliseconds with one that took almost 2 seconds, so my answer matched your original question. Anyway, the new plan is more useful. Yes, you should worry about the difference between estimate and reality.

Question 5

Thanks; I will try to re-write the query and test out your suggestions. Thanks for the help, a bit new to pg!

Laurenz Albe Laurenz Albe 62.1k4 gold badges57 silver badges93 bronze badges · Answer 1 · 2025-09-14 18:08:38Z

Rather than using a self-join, you should try using the simpler DISTINCT ON clause proprietary to PostgreSQL:

SELECT DISTINCT ON (item_id, col_identity)
*
FROM table_1
WHERE ...
ORDER BY item_id, col_identity,
 generated_on DESC;

That should be able to use your index.

Setting the statistics target to 10000 is insanely high. Unless all your queries are long-running analytical ones, the increased planning time will be painful.

With your query, most of the time is spent table I/O for the index scan. This could be reduced by rewriting the table in index order:

CLUSTER table_1 USING idx_table_1_eid_colid_genon_desc;

The bad estimates might be improved using extended statistics. But I see little point in trying to optimize a query that had better be rewritten.

Thank you for passing by, I'm curious, The query plan execution finished in 6 milliseconds, not the actual query, right? I was using dBeaver and when all the rows are fetched, the query took roughly 7-8 secs. OR that's because of transferring the data back to client? Another novice question, Should one worry about the planner statistics difference wrt estimated rows OR it depends on the underlying data distribution is sometime fine to differ? Thanks again!
You edited the question and replaced the plan that took 6 milliseconds with one that took almost 2 seconds, so my answer matched your original question. Anyway, the new plan is more useful. Yes, you should worry about the difference between estimate and reality.
Thanks; I will try to re-write the query and test out your suggestions. Thanks for the help, a bit new to pg!

Stack Exchange Network

postgres query optimization for batch fetches with 100M+ rows

Table Definition

Table Analyze:

Some meta information wrt cardinality:

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

postgres query optimization for batch fetches with 100M+ rows

Table Definition

Table Analyze:

Some meta information wrt cardinality:

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions