How can I seamlessly upgrade the major version of an AWS RDS postgres database?

Question 1

This morning I was involved in upgrading a PostgreSQL database on AWS RDS. We wanted to move from version 9.3.3 to version 9.4.4. We had "tested" the upgrade on a staging database, but the staging database is both much smaller, and doesn't use Multi-AZ. It turned out this test was pretty inadequate.

Our production database uses Multi-AZ. We've done minor version upgrades in the past, and in those cases RDS will upgrade the standby first and then promote it to master. Thus the only downtime incurred is ~60s during the failover.

We assumed the same would happen for the major version upgrade, but oh how wrong we were.

Some details about our setup:

db.m3.large
Provisioned IOPS (SSD)
300 GB storage, of which 139 GB is used
We had RDS OS upgrades outstanding, we wanted to batch with this upgrade to minimise downtime

Here are the RDS events logged while we performed the upgrade:

enter image description here

Database CPU was maxed out between about 08:44 and 10:27. A lot of this time seemed to be occupied by RDS taking a pre-upgrade and post-upgrade snapshot.

The AWS docs don't warn of such repercussions, although from reading them it is clear that an obvious flaw in our approach is that we didn't create a copy of the production database in the Multi-AZ setup and try to upgrade it as a trial run

In general it was very frustrating because RDS gave us very little information about what it was doing and how long it was likely to take. (Again, doing a trial run would have helped...)

Apart from that, we want to learn from this incident so here are our questions:

Is this kind of thing normal when doing a major version upgrade on RDS?
If we wanted to do a major version upgrade in the future with minimal downtime, how would we go about it? Is there some kind of clever way to use replication to make it more seamless?

Question 2

After the upgrade we noticed that postgres was trying to do a sequential scan on some tables with millions of records, where it should have used an index instead (thus hitting our query timeout). A manual ANALYZE to update the statistics solved it. If anyone has any insight about this that would be great too.

Question 3

Jon, any 2021 updates on this? Have you done any other major upgrades? I'm looking to upgrade from 5.6.50->5.7->8.0 on a 61GB database. We are NOT Multi-AZ, though I'm open to moving to that (and doubling our costs) if that will help avoid what looks like a possible 2-3 hour outage.

Question 4

[UPDATE] Please note this answer is deprecated for nowadays usage. The question was regarding from version 9.3.3 to version 9.4.4. Please consider Dmitry Gusev post below.

You can use pg_dumpall -f dump.sql command, that will dump your entire database to a SQL file format, In a way that you can reconstruct it from scratch pointing to other endpoint. Using psql -h endpoint-host.com.br -f dump.sql for short.

But to do that, you will need some EC2 instance with some reasonable space in disk (to fit your database dump). Also, you will need to install yum install postgresql94.x86_64 to be able to run dump and restore commands.

See examples at PG Dumpall DOC.

Remember that to keep integrity of your data, it is recommended (some cases it will be mandatory) that you shutdown the systems that connect to the database during this maintenance window.

Also, if you need speed up things, consider using pg_dump instead pg_dumpall, by taking advantage of parallelism (-j njobs) parameter, when you determine the number of CPUs involved in the process, for example -j 8 will use until 8 CPUs. By default the behavior of pg_dumpall or pg_dump is use only 1. The only advantage by using pg_dump instead pg_dumpall is that you will need to run the command for each database that you have, and also dump the ROLES (groups and users) separated.

See examples at PG Dump DOC and PG Restore DOC.

Question 5

To use parallel feature you will need use: pg_dump -h host -U user -W pass -Fc -f output_file.dmp -j 8 database_name

Question 6

... and to restore using parallelism: pg_restore -h host -d database_name -U user -W pass -C -Fc -j 8 output_file.dmp

Question 7

Can't you just create a new rds instance from a snapshot of your production env?

Question 8

AWS also recommends logical replication to minimise downtime during major version upgrade.

For upgrades from PostgreSQL 10.x to upper version native replication can be used, for older versions you can use pglogical (you can use pglogical on latest versions also) -- both solutions supported by RDS.

setup new empty RDS instance of desired version
dump schema from existing instance in 3 parts (global objects, pre-data, and post-data)
apply global & pre-data to your new instance
setup logical replication using publication / subscription
apply post-data schema changes once databases are in sync, to enforce referential integrity
shutdown application that is writing to old DB
wait for replication to finish
re-start application with new connection string

Steps 6-8 depend on application, of course, you may have other tools to stop writes in your application stack.

You can find details with examples in this YouTube video:

https://www.youtube.com/watch?v=S8kX250UIVo

Vinnix Vinnix 1761 silver badge5 bronze badges · Answer 1 · 2016-09-10 16:46:56Z

[UPDATE] Please note this answer is deprecated for nowadays usage. The question was regarding from version 9.3.3 to version 9.4.4. Please consider Dmitry Gusev post below.

You can use pg_dumpall -f dump.sql command, that will dump your entire database to a SQL file format, In a way that you can reconstruct it from scratch pointing to other endpoint. Using psql -h endpoint-host.com.br -f dump.sql for short.

But to do that, you will need some EC2 instance with some reasonable space in disk (to fit your database dump). Also, you will need to install yum install postgresql94.x86_64 to be able to run dump and restore commands.

See examples at PG Dumpall DOC.

Remember that to keep integrity of your data, it is recommended (some cases it will be mandatory) that you shutdown the systems that connect to the database during this maintenance window.

Also, if you need speed up things, consider using pg_dump instead pg_dumpall, by taking advantage of parallelism (-j njobs) parameter, when you determine the number of CPUs involved in the process, for example -j 8 will use until 8 CPUs. By default the behavior of pg_dumpall or pg_dump is use only 1. The only advantage by using pg_dump instead pg_dumpall is that you will need to run the command for each database that you have, and also dump the ROLES (groups and users) separated.

See examples at PG Dump DOC and PG Restore DOC.

To use parallel feature you will need use: pg_dump -h host -U user -W pass -Fc -f output_file.dmp -j 8 database_name
... and to restore using parallelism: pg_restore -h host -d database_name -U user -W pass -C -Fc -j 8 output_file.dmp
Can't you just create a new rds instance from a snapshot of your production env?

Dmitry Gusev Dmitry Gusev 1861 silver badge4 bronze badges · Answer 2 · 2021-09-09 10:45:41Z

AWS also recommends logical replication to minimise downtime during major version upgrade.

For upgrades from PostgreSQL 10.x to upper version native replication can be used, for older versions you can use pglogical (you can use pglogical on latest versions also) -- both solutions supported by RDS.

setup new empty RDS instance of desired version
dump schema from existing instance in 3 parts (global objects, pre-data, and post-data)
apply global & pre-data to your new instance
setup logical replication using publication / subscription
apply post-data schema changes once databases are in sync, to enforce referential integrity
shutdown application that is writing to old DB
wait for replication to finish
re-start application with new connection string

Steps 6-8 depend on application, of course, you may have other tools to stop writes in your application stack.

You can find details with examples in this YouTube video:

https://www.youtube.com/watch?v=S8kX250UIVo

Stack Exchange Network

How can I seamlessly upgrade the major version of an AWS RDS postgres database?

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

How can I seamlessly upgrade the major version of an AWS RDS postgres database?

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions