I have a 200 GB database with 26 tables that I need to create indexes on.
Currently index creation takes a long time on the biggest tables, as they have ~12gb and 400,000,000 rows in them (>4 hours).
I've set maintenance_work_mem
to 100gb, and set max_parallel_workers
to 30.
Are there any other parameters I should tune to improve the index creation speed?
This is on AWS Aurora, using postgres-10.6, in case that makes a difference.
There is no one else using the database and downtime/full locks are fine.
-
Note: I've read this answer already, as well dba.stackexchange.com/questions/95496/…, but it's a bit outdated and does not have a complete answerJonLuca– JonLuca2019年04月09日 16:52:06 +00:00Commented Apr 9, 2019 at 16:52
1 Answer 1
This is on AWS Aurora, using postgres-10.6, in case that makes a difference.
Yes, this matters quite a lot. Native parallel btree index builds were introduced in v11, so your "max_parallel_workers" setting won't matter for index builds under v10.
Unless you upgrade, you will have to parallelize them yourself by opening multiple sessions in parallel and building one index in each one. You will probably want to lower "maintenance_work_mem" as well if you have parallel processes (either of the manual or the v11 variety) as each one can claim that much memory.
-
1He can't upgrade (yet). The latest Aurora Postgres 2.2 is based on Postgres 10.6. Their storage system has its own parallelization - independent from Postgres' parallel implementation. I am not sure how that interacts with
max_parallel_workers
exactly, but overall it massively outperforms RDS Postgres (even v11) in writing activity - with the notable exception of building indexes, where both are on par from what I have seen so far. Unfortunate for the particular task of the OP.Erwin Brandstetter– Erwin Brandstetter2019年04月09日 23:45:14 +00:00Commented Apr 9, 2019 at 23:45 -
1Ah, I didn't know the difference between Aurora and RDS on version availability. I would expect parallel workers to be beneficial only for their CPU parallelism, not IO parallelism. I was assuming he had enough IO available so that CPU would be limiting--although if it were to parallelize 30-ways, maybe that is not a good assumption.jjanes– jjanes2019年04月10日 14:30:27 +00:00Commented Apr 10, 2019 at 14:30
Explore related questions
See similar questions with these tags.