Let's assume the table, intermediate10, is ~2.2 TB in size.
The following query takes ~4 days to run on a pretty powerful DB box (32 CPUs, 256 GB RAM) that is optimized to allow up to 32 parallel workers and has sufficiently high work_mem
:
create table subset as(
select
*
from
(
select
*,
RANK() OVER (PARTITION BY col1, col2 ORDER BY random()) AS rankct
from
intermediate10
where
col3 <= 20
) a
where rankct <= 50
)
I understand that there is an extraneous subquery above, but that is an artifact from some logic I had to remove before posting. Regardless, this does not materially change the query plan or its efficiency.
I have an index on intermediate10:
CREATE INDEX ON intermediate10 (col1, col2);
but the query plan isn't using it:
Subquery Scan on a (cost=842128231.97..882842032.15 rows=361900446 width=1350)
Filter: (a.rankct <= 50)
-> WindowAgg (cost=842128231.97..869270765.42 rows=1085701338 width=1358)
-> Sort (cost=842128231.97..844842485.32 rows=1085701338 width=1350)
Sort Key: intermediate10.col1, intermediate10.col2, (random())
-> Seq Scan on intermediate10 (cost=0.00..314458488.95 rows=1085701338 width=1350)
Filter: (col3 <= 20)
Interestingly, if the order by random()
is removed, the query will at least parallelize:
WindowAgg (cost=471738126.21..673065708.94 rows=1467031808 width=1350)
-> Gather Merge (cost=471738126.21..647392652.30 rows=1467031808 width=1342)
Workers Planned: 4
-> Sort (cost=471737126.15..472654021.03 rows=366757952 width=1342)
Sort Key: col1, col2
-> Parallel Seq Scan on intermediate10 (cost=0.00..297073917.52 rows=366757952 width=1342)
but having that random selection of the 50 in the "sample" is not negotiable.
Needless to say, a 4-day runtime for this is unacceptable.
How could this be optimized?
1 Answer 1
It probably doesn't use the index because it thinks it will be slower. You can force it to use the index anyway (in recent enough versions to offer incremental sorts) by setting enable_seqscan=off. And in my hands, it actually is slower.
It can read the index in logical order, but it has to read the entire index. And for every index entry, it has to jump to the table so that it can look up col3, so it can filter out the ones >20. So it will be reading the entire table, and doing so (unless the table is clustered in line with the index) in random order. This is a recipe for IO disaster.
A more useful index might be on (col1,col2,col3)
. Then it can filter out the bad values on col3 without visiting the table, and do an index-only scan. The table should be well-vacuumed to make that happen.
Another possible index would be on (col3,col1,col2)
. For this one, it would only have to visit the logical part of the index holding values for col3<=20. We have no idea how many that is, though, since you didn't show an EXPLAIN (ANALYZE, BUFFERS)
or give us other useful information. If it is small enough, this would be good. Then it likely needs to do a full sort of the surviving rows, as the inequality on col3 ruined whatever order would have been present on (col1,col2). But at least it could still be an index-only scan.
-
Good points here and I appreciate it -- if I remove the col3 filter, I still don't get any kind of index usage or parallelism. Interestingly, if I remove the order by random() from within the partition clause, it will spawn multiple workers (though still no index use): -> Parallel Seq Scan on intermediate10 (cost=0.00..297073917.52 rows=366757952 width=1342)kmypwn– kmypwn2021年12月10日 03:53:59 +00:00Commented Dec 10, 2021 at 3:53
-
If the sample has to be rigorously randomized, then you have no choice but to visit ever point which has a chance of being included. That rules out a lot of possibilities. The next option would be partitioning, but if your data is so large even you can't even build a new index, you certainly can't retrofit it into partitions!jjanes– jjanes2021年12月10日 19:29:53 +00:00Commented Dec 10, 2021 at 19:29
-
Without the filter on col3, I get an index scan with an incremental sort. What version are you on? Is it at least 13? Even if you don't filter on col3, you are still selecting it (due to the * select list) and so can't get the index-only scan with the existing indexjjanes– jjanes2021年12月10日 19:53:08 +00:00Commented Dec 10, 2021 at 19:53
-
I'm on 13.4 and it's not showing the index scan, unfortunatelykmypwn– kmypwn2021年12月15日 16:58:13 +00:00Commented Dec 15, 2021 at 16:58
col1
andcol2
would be useful? What aboutcol3
?(col1, col2) where (col3 <= 20)