How to verify that postgres 9.3 db has no corruption?

Question 1

I have a postgres 9.3 db running on ubuntu server.
About a month ago we've had some problems with the hardware on the server which was resolved by our VPS hosting company.

The problem was resolved fairly quickly and everything seemed to be working fine.
We have backups running using barman that is on a different server - and backups and restores were working (I checked).

The first sign of data corruption was few days ago: I decided to do a full pg_dump on our DB as I do every once in a while, and it failed (invalid page header in block...) - The data seemed to have been corrupted a long time ago - around the time of the hardware problem (that was the date on the corrupted record). I resorted to finding the corrupted record and I deleted it and restored it manually.

After that I was able to do a full pg_dump.

In order to check for additional corruption - I've set up different db server from a backup and run pg_repack on all the tables to verify that I'm able to rebuild all the indexes and tables.

My questions are:
1. How can I be certain that I don't have any additional corruption in my DB?
2. What can I do to periodically check my data integrity?
3. What else can I do to verify the integrity of our DB besides dumping the whole DB and re-indexing it (which I already did)?

P.S - I don't have block checksums enabled.

Question 2

If you can afford some downtime, and if you can pg_dump and later on pg_restore everything on a different server, I'd check that the SELECT count(*) FROM every table coincide with the source (and/or your exepctations)... If that's meaningful, Id' assume that the restored database is not corrupt. Use this new one... If that's not meaningful...

Question 3

What if I select count(*) from every table in my existing DB? Or do dump and restore to a different server and verify that it is working and that the restore DB count results make sense. Isn't that enough? I can afford some down time but I really prefer not to.

Question 4

Isn't that enough? Possibly yes. And most probably, if something went wrong with your disk, you might have a few blocks with some bad data, and the rest is ok... Your choice depends mostly on the level of safety that you need.

Question 5

AFAIK: no. There's not a "database check tool" that will respond "uncorrupted or corrupted".

Question 6

I didn't have them originally, and now I'm considering to add them - but the downtime I will have while doing pg_dump, initdb and then pg_restore is something that I would really prefer to avoid. Also I'm not sure about the performance penalty and how severe it is...

Question 7

If you have initialized your cluster with checksum, you can control this while backuping with pg_backrest.

PostgreSQL has supported page-level checksums since 9.3. If page checksums are enabled pgBackRest will validate the checksums for every file that is copied during a backup.

You will find more information about cheksums here: https://www.postgresql.org/docs/current/static/app-initdb.html#APP-INITDB-DATA-CHECKSUMS

You can also make a pg_dump to /dev/null (if you don't have enough space to keep a logical dump). This should ensure you that you can read each data block.

PGDG had a discussion this year beacuse some wanted to activate checksum by default. You can find it there: http://www.postgresql-archive.org/Checksums-by-default-td5940158.html

Question 8

If I may, there are now new tools for this:

pg_verify_checksum (v 11)
pg_checksum (v 12)

Both can activate or deactivate checksums on the cluster as well as run checks on an offline cluster. If you are worried about corruption you can run either on a whole cluster just portion(s) of it. Note that from personal experience, pg_verify_checksum works fine on older versions and pg_checksum is expect to have the same behavior.

Also eventhough it is now quite old, I would advise reading this post: http://blog.ringerc.id.au/2012/10/avoiding-postgresql-database-corruption.html

Question 9

As stated elsewhere, there are no corruption-specific tools in core PostgreSQL similar to those found in SQL Server (DBCC CHECKDB) or Oracle. Which does not mean that you cannot look after corruption and try to prevent it as much as possible. A few thoughts:

Use good hardware. This hardware should properly honor fsync. Not tunable in your case, but important nevertheless.
Use checksums when you initialize your cluster. Sure there is a little penalty for doing so, but it really is worth the hassle. You could create a new cluster, dump and reload your database in this new instance and enjoy a more robust environment. It is available since 9.3, so fine for you.
Run regular, automated logical backups with pg_dump. It's good to have an automated physical backup with Barman, but this will not check corruption. Only pg_dump (which will read all rows in all tables) will give you insurance against table corruption. This is something you should definetely automate. If you don't have the place, redirect pg_dump to /dev/null. Corrupted indexes can always be rebuilt if you have a correct table. I think there are some index corruption finding tools out there.

Question 10

I also had a corrupted index which did not cause pg_dump to fail - only pg_repack/vacuum full where failing because of this - so I guess that just doing a pg_dump is not enough (maybe automate a pg_dump and a pg_restore on a different server?) I will most probably re-create the cluster with checksums and restore it with pg_restore - my main problem is the downtime that comes along with it. Is it possible to enable checksum only on a slave server with streaming replication? or will replication not work in that case?

Question 11

It's always good to have an automated backup/restore mechanism and I can only encourage you to have one. If you want to replicate between a cluster without checksum and one with checksum and minimize downtime, you will have to go with logical replication (Slony, Londiste, Bucardo, ...). It will be long to set up but the switch will be shorter.

Question 12

When you're "pg_dumping", you read only data, not indexes. With pg_restore, it will recreate your index from scratch and you still won't know if your index is corrupted on your production server. Of course, having an automated backup/restore mechanism is good, but I suggest to use physical backup instead of logical.

Question 13

I agree with your first part, I even said "Only pg_dump [...] will give you insurance against table corruption". And I highly recommend having a physical backup solution such as the excellent barman, but please plan a logical backup from time to time too if you care about about corruption.

Arkhena Arkhena 1,61010 silver badges15 bronze badges · Answer 1 · 2017-05-15 08:04:19Z

If you have initialized your cluster with checksum, you can control this while backuping with pg_backrest.

PostgreSQL has supported page-level checksums since 9.3. If page checksums are enabled pgBackRest will validate the checksums for every file that is copied during a backup.

You will find more information about cheksums here: https://www.postgresql.org/docs/current/static/app-initdb.html#APP-INITDB-DATA-CHECKSUMS

You can also make a pg_dump to /dev/null (if you don't have enough space to keep a logical dump). This should ensure you that you can read each data block.

PGDG had a discussion this year beacuse some wanted to activate checksum by default. You can find it there: http://www.postgresql-archive.org/Checksums-by-default-td5940158.html

Emeric Hunter Emeric Hunter 811 bronze badge · Answer 2 · 2019-10-28 09:10:04Z

If I may, there are now new tools for this:

pg_verify_checksum (v 11)
pg_checksum (v 12)

Both can activate or deactivate checksums on the cluster as well as run checks on an offline cluster. If you are worried about corruption you can run either on a whole cluster just portion(s) of it. Note that from personal experience, pg_verify_checksum works fine on older versions and pg_checksum is expect to have the same behavior.

Also eventhough it is now quite old, I would advise reading this post: http://blog.ringerc.id.au/2012/10/avoiding-postgresql-database-corruption.html

KookieMonster KookieMonster 2,03719 silver badges18 bronze badges · Answer 3 · 2017-05-15 07:52:54Z

As stated elsewhere, there are no corruption-specific tools in core PostgreSQL similar to those found in SQL Server (DBCC CHECKDB) or Oracle. Which does not mean that you cannot look after corruption and try to prevent it as much as possible. A few thoughts:

Use good hardware. This hardware should properly honor fsync. Not tunable in your case, but important nevertheless.
Use checksums when you initialize your cluster. Sure there is a little penalty for doing so, but it really is worth the hassle. You could create a new cluster, dump and reload your database in this new instance and enjoy a more robust environment. It is available since 9.3, so fine for you.
Run regular, automated logical backups with pg_dump. It's good to have an automated physical backup with Barman, but this will not check corruption. Only pg_dump (which will read all rows in all tables) will give you insurance against table corruption. This is something you should definetely automate. If you don't have the place, redirect pg_dump to /dev/null. Corrupted indexes can always be rebuilt if you have a correct table. I think there are some index corruption finding tools out there.

I also had a corrupted index which did not cause pg_dump to fail - only pg_repack/vacuum full where failing because of this - so I guess that just doing a pg_dump is not enough (maybe automate a pg_dump and a pg_restore on a different server?) I will most probably re-create the cluster with checksums and restore it with pg_restore - my main problem is the downtime that comes along with it. Is it possible to enable checksum only on a slave server with streaming replication? or will replication not work in that case?
It's always good to have an automated backup/restore mechanism and I can only encourage you to have one. If you want to replicate between a cluster without checksum and one with checksum and minimize downtime, you will have to go with logical replication (Slony, Londiste, Bucardo, ...). It will be long to set up but the switch will be shorter.
When you're "pg_dumping", you read only data, not indexes. With pg_restore, it will recreate your index from scratch and you still won't know if your index is corrupted on your production server. Of course, having an automated backup/restore mechanism is good, but I suggest to use physical backup instead of logical.
I agree with your first part, I even said "Only pg_dump [...] will give you insurance against table corruption". And I highly recommend having a physical backup solution such as the excellent barman, but please plan a logical backup from time to time too if you care about about corruption.

Stack Exchange Network

How to verify that postgres 9.3 db has no corruption?

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

How to verify that postgres 9.3 db has no corruption?

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions