What to do with WAL files for Postgres Slave reset

Question 1

So last night our PG Slave ran out of space after a lot of reconfiguring of disk space, new drives etc its now reporting the following error:

FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 00000001000018F70000008A has already been removed

From the reading around this I've done, it appears that the only solution is to re-sync the slave with pg_start_backup() et al. Based on this, I have a few questions.

Is there a better way of fixing the slave that I've simply missed or overlooked?
Do I need to clear out the WAL files on the slave and/or master prior or during the backup?
Does pg_start_backup lock the database during this time?

As requested, the log file can be found: http://pastebin.com/9F8vJh6R, have removed the rest of the file as its just 5 hours of the same repeated error

Many thanks

Question 2

What happens if you start the slave? Why can it not resume at the point where it stopped?

Question 3

I had to restart it as the process stopped due to there being no disk space. So couldn't get it to start up due to the same issue, which is where the new drive, moving of WAL files came into effect. The process is running, and it did start to recover, but then it hit the error in the question and just keeps on logging the same error

Question 4

Could you post more of the logs, from where it starts to recover to the error? If it's a lot of text, you could link to pastebin.com

Question 5

Added it to the question

Question 6

I think the cleanest and safest way is just to rebuild the slave completely. pg_start_backup() is not supposed to lock the DB, except when you pass fast = TRUE to it - this latter will cause a slowdown of any concurrently executing queries. See postgresql.org/docs/9.4/static/…

Question 7

The message:

requested WAL segment 00000001000018F70000008A has already been removed

Means that the master hasn't kept enough history to bring the standby back up to date. Since you are using version 9.1, you can use pg_basebackup to create a new slave. We use a command like:

pg_basebackup -h masterhost -U postgres -D path --progress --verbose -c fast

This doesn't lock the master, and you don't have to rsync or call pg_start_backup() and friends.

Question 8

You and that command might be the answer to my dreams...To clarify, I can run that on my existing slave? If so, would it be best to clear out the WAL archive & the data directory on the slave?

Question 9

If you have a WAL achive, you can try restore_command as Craig Ringer suggests. The pg_basebackup creates an entirely new slave in an empty directory.

Question 10

If you have WAL archiving enabled on the master (archive_command is set and archive_mode is on), set a restore_command in your replica's recovery.conf to allow it to fetch WAL from the WAL archive.

If there's no WAL archive, then there's no record of needed deltas between the master and the replica anymore. So you must resync them.

Typically this is done by making a new pg_basebackup of the replica. If the replica is big, though, it can be helpful to use rsync to resync the replica from the master doing block compares. To do this, you:

pg_start_backup() on the master
Stop the replica if running
rsync the master to the replica
pg_stop_backup() on the master
Copy any additional files from pg_xlog on the master to the replica, up to the file reported by pg_stop_backup()
Start the replica

It's simpler if you have WAL archiving enabled, since you then don't have to manually copy WAL, you just set a restore_command on the replica.

All sound too complicated? Use pg_basebackup.

As for your other questions:

NEVER delete WAL from the master. Ever. Extremely bad. Hands off pg_xlog.
pg_start_backup doesn't "lock" the database. It does prevent VACUUM from cleaning up dead rows, so it can increase bloat on high write activity tables, but that's about it.

Andomar Andomar 3,51525 silver badges32 bronze badges · Accepted Answer · 2015-04-23 09:14:44Z

The message:

requested WAL segment 00000001000018F70000008A has already been removed

Means that the master hasn't kept enough history to bring the standby back up to date. Since you are using version 9.1, you can use pg_basebackup to create a new slave. We use a command like:

pg_basebackup -h masterhost -U postgres -D path --progress --verbose -c fast

This doesn't lock the master, and you don't have to rsync or call pg_start_backup() and friends.

You and that command might be the answer to my dreams...To clarify, I can run that on my existing slave? If so, would it be best to clear out the WAL archive & the data directory on the slave?
If you have a WAL achive, you can try restore_command as Craig Ringer suggests. The pg_basebackup creates an entirely new slave in an empty directory.

Stack Exchange Network

What to do with WAL files for Postgres Slave reset

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

What to do with WAL files for Postgres Slave reset

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions