Window function in Postgres not using index

Question 1

Let's assume the table, intermediate10, is ~2.2 TB in size.

The following query takes ~4 days to run on a pretty powerful DB box (32 CPUs, 256 GB RAM) that is optimized to allow up to 32 parallel workers and has sufficiently high work_mem:

create table subset as(
 select 
 *
 from
 (
 select
 *,
 RANK() OVER (PARTITION BY col1, col2 ORDER BY random()) AS rankct
 from
 intermediate10
 where
 col3 <= 20
 ) a
 where rankct <= 50
)

I understand that there is an extraneous subquery above, but that is an artifact from some logic I had to remove before posting. Regardless, this does not materially change the query plan or its efficiency.

I have an index on intermediate10: CREATE INDEX ON intermediate10 (col1, col2);

but the query plan isn't using it:

Subquery Scan on a (cost=842128231.97..882842032.15 rows=361900446 width=1350)
 Filter: (a.rankct <= 50)
 -> WindowAgg (cost=842128231.97..869270765.42 rows=1085701338 width=1358)
 -> Sort (cost=842128231.97..844842485.32 rows=1085701338 width=1350)
 Sort Key: intermediate10.col1, intermediate10.col2, (random())
 -> Seq Scan on intermediate10 (cost=0.00..314458488.95 rows=1085701338 width=1350)
 Filter: (col3 <= 20)

Interestingly, if the order by random() is removed, the query will at least parallelize:

WindowAgg (cost=471738126.21..673065708.94 rows=1467031808 width=1350)
 -> Gather Merge (cost=471738126.21..647392652.30 rows=1467031808 width=1342)
 Workers Planned: 4
 -> Sort (cost=471737126.15..472654021.03 rows=366757952 width=1342)
 Sort Key: col1, col2
 -> Parallel Seq Scan on intermediate10 (cost=0.00..297073917.52 rows=366757952 width=1342)

but having that random selection of the 50 in the "sample" is not negotiable.

Needless to say, a 4-day runtime for this is unacceptable.

How could this be optimized?

Question 2

Why do you think the index on col1 and col2 would be useful? What about col3?

Question 3

Admittedly, that index should probably be created, but even if I remove the filter on col3, the query planner still doesn’t use the index on col1 and col2.

Question 4

A partial index might help: (col1, col2) where (col3 <= 20)

Question 5

It probably doesn't use the index because it thinks it will be slower. You can force it to use the index anyway (in recent enough versions to offer incremental sorts) by setting enable_seqscan=off. And in my hands, it actually is slower.

It can read the index in logical order, but it has to read the entire index. And for every index entry, it has to jump to the table so that it can look up col3, so it can filter out the ones >20. So it will be reading the entire table, and doing so (unless the table is clustered in line with the index) in random order. This is a recipe for IO disaster.

A more useful index might be on (col1,col2,col3). Then it can filter out the bad values on col3 without visiting the table, and do an index-only scan. The table should be well-vacuumed to make that happen.

Another possible index would be on (col3,col1,col2). For this one, it would only have to visit the logical part of the index holding values for col3<=20. We have no idea how many that is, though, since you didn't show an EXPLAIN (ANALYZE, BUFFERS) or give us other useful information. If it is small enough, this would be good. Then it likely needs to do a full sort of the surviving rows, as the inequality on col3 ruined whatever order would have been present on (col1,col2). But at least it could still be an index-only scan.

Question 6

Good points here and I appreciate it -- if I remove the col3 filter, I still don't get any kind of index usage or parallelism. Interestingly, if I remove the order by random() from within the partition clause, it will spawn multiple workers (though still no index use): -> Parallel Seq Scan on intermediate10 (cost=0.00..297073917.52 rows=366757952 width=1342)

Question 7

If the sample has to be rigorously randomized, then you have no choice but to visit ever point which has a chance of being included. That rules out a lot of possibilities. The next option would be partitioning, but if your data is so large even you can't even build a new index, you certainly can't retrofit it into partitions!

Question 8

Without the filter on col3, I get an index scan with an incremental sort. What version are you on? Is it at least 13? Even if you don't filter on col3, you are still selecting it (due to the * select list) and so can't get the index-only scan with the existing index

Question 9

I'm on 13.4 and it's not showing the index scan, unfortunately

jjanes jjanes 42.4k3 gold badges44 silver badges54 bronze badges · Answer 1 · 2021-12-09 22:05:36Z

It probably doesn't use the index because it thinks it will be slower. You can force it to use the index anyway (in recent enough versions to offer incremental sorts) by setting enable_seqscan=off. And in my hands, it actually is slower.

It can read the index in logical order, but it has to read the entire index. And for every index entry, it has to jump to the table so that it can look up col3, so it can filter out the ones >20. So it will be reading the entire table, and doing so (unless the table is clustered in line with the index) in random order. This is a recipe for IO disaster.

A more useful index might be on (col1,col2,col3). Then it can filter out the bad values on col3 without visiting the table, and do an index-only scan. The table should be well-vacuumed to make that happen.

Another possible index would be on (col3,col1,col2). For this one, it would only have to visit the logical part of the index holding values for col3<=20. We have no idea how many that is, though, since you didn't show an EXPLAIN (ANALYZE, BUFFERS) or give us other useful information. If it is small enough, this would be good. Then it likely needs to do a full sort of the surviving rows, as the inequality on col3 ruined whatever order would have been present on (col1,col2). But at least it could still be an index-only scan.

Good points here and I appreciate it -- if I remove the col3 filter, I still don't get any kind of index usage or parallelism. Interestingly, if I remove the order by random() from within the partition clause, it will spawn multiple workers (though still no index use): -> Parallel Seq Scan on intermediate10 (cost=0.00..297073917.52 rows=366757952 width=1342)
If the sample has to be rigorously randomized, then you have no choice but to visit ever point which has a chance of being included. That rules out a lot of possibilities. The next option would be partitioning, but if your data is so large even you can't even build a new index, you certainly can't retrofit it into partitions!
Without the filter on col3, I get an index scan with an incremental sort. What version are you on? Is it at least 13? Even if you don't filter on col3, you are still selecting it (due to the * select list) and so can't get the index-only scan with the existing index
I'm on 13.4 and it's not showing the index scan, unfortunately

Stack Exchange Network

Window function in Postgres not using index

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Window function in Postgres not using index

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions