3

I'm using pgloader to perform a one-time migration from MySQL to Postgres.

For that purpose, I want to temporarily configure Postgres specifically for that workload.

  • reduce WAL as much as possible: wal_level = minimal
  • archive_mode = off
  • Should I autovacuum = off ?

What else can I do to improve things without severely putting my data at risk?

Ee.g. I don't want to fsync = off. Although I can re-do this migration, this additional time would still be unwanted.

Additional setting after Feedback

  • synchronous_commit=off (thanks @a_horse_with_no_name )
asked Dec 27, 2016 at 16:28
2
  • 3
    synchronous_commit=off and if you are on an older Postgres version most probably also need to increase checkpoint_segments Commented Dec 27, 2016 at 16:39
  • Dang, knew I forgot something. I'm using 9.6, updated the questions title. Also added your suggestion, thx! Commented Dec 28, 2016 at 9:42

1 Answer 1

2

I found that the following changes to postgres.conf yielded the best results so far:

archive_mode = off # old: on
autovacuum = off # old: on
commit_delay = 100000 # old: 0
maintenance_work_mem = 1GB # old: 512MB
max_wal_size = 10GB # old: 1GB
synchronous_commit = off # old: on
wal_level = minimal # old: archive
work_mem = 1GB # old: 8MB

Also I had quite trouble getting the most out of a single pgloader invocation to migrate all the table. This would have been the most desired way, because pgloader that way also re-creates all the FK between the imported tables.

But the problems I was facing (pgloader hanging, not achieving the performance I was aiming for in terms of total time) I settled with this approach:

  • run a separate pgloader instance for every individual table, in parallel, inside tmux
  • using tmux wait-for the longest import migration
  • run a separate *.sql script for (almost each) table for post-processing (add FKs, add triggers), again in parallel inside a tmux
  • run vacuumdb --jobs=24 --analyze

To fit this purpose, the machine in question had been pimped:

  • 14 cores (instead of 8)
  • 64GB RAM (instead of 32GB; but they will eventually stay)

Also, for each individual pgloader invocation I had a seperate loader config with individual batch size configuration. The default batch size of 25000 wouldn't work for the bigger tables.

In the end, for my purpose and workload, this cut down the time

  • from 4h38m (1h11m importing, 1h41m adding FKs, 1h22 vacuuming)
  • to 2h8m6s (74m30s importing, 25m50s adding FKs, 27m44s vacuuming)

(Actually, the latest numbers are slightly below 2h even, but I don't have more detailed stats for them)

answered Jan 16, 2017 at 8:52

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.