So last night our PG Slave ran out of space after a lot of reconfiguring of disk space, new drives etc its now reporting the following error:
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 00000001000018F70000008A has already been removed
From the reading around this I've done, it appears that the only solution is to re-sync the slave with pg_start_backup() et al. Based on this, I have a few questions.
- Is there a better way of fixing the slave that I've simply missed or overlooked?
- Do I need to clear out the WAL files on the slave and/or master prior or during the backup?
- Does pg_start_backup lock the database during this time?
As requested, the log file can be found: http://pastebin.com/9F8vJh6R, have removed the rest of the file as its just 5 hours of the same repeated error
Many thanks
2 Answers 2
The message:
requested WAL segment 00000001000018F70000008A has already been removed
Means that the master hasn't kept enough history to bring the standby back up to date. Since you are using version 9.1, you can use pg_basebackup to create a new slave. We use a command like:
pg_basebackup -h masterhost -U postgres -D path --progress --verbose -c fast
This doesn't lock the master, and you don't have to rsync
or call pg_start_backup()
and friends.
-
You and that command might be the answer to my dreams...To clarify, I can run that on my existing slave? If so, would it be best to clear out the WAL archive & the data directory on the slave?justcompile– justcompile2015年04月23日 09:41:09 +00:00Commented Apr 23, 2015 at 9:41
-
If you have a WAL achive, you can try
restore_command
as Craig Ringer suggests. Thepg_basebackup
creates an entirely new slave in an empty directory.Andomar– Andomar2015年04月23日 09:42:59 +00:00Commented Apr 23, 2015 at 9:42
If you have WAL archiving enabled on the master (archive_command
is set and archive_mode
is on
), set a restore_command
in your replica's recovery.conf
to allow it to fetch WAL from the WAL archive.
If there's no WAL archive, then there's no record of needed deltas between the master and the replica anymore. So you must resync them.
Typically this is done by making a new pg_basebackup
of the replica. If the replica is big, though, it can be helpful to use rsync
to resync the replica from the master doing block compares. To do this, you:
pg_start_backup()
on the master- Stop the replica if running
rsync
the master to the replicapg_stop_backup()
on the master- Copy any additional files from
pg_xlog
on the master to the replica, up to the file reported bypg_stop_backup()
- Start the replica
It's simpler if you have WAL archiving enabled, since you then don't have to manually copy WAL, you just set a restore_command
on the replica.
All sound too complicated? Use pg_basebackup
.
As for your other questions:
- NEVER delete WAL from the master. Ever. Extremely bad. Hands off
pg_xlog
. pg_start_backup
doesn't "lock" the database. It does preventVACUUM
from cleaning up dead rows, so it can increase bloat on high write activity tables, but that's about it.
pg_start_backup()
is not supposed to lock the DB, except when you passfast = TRUE
to it - this latter will cause a slowdown of any concurrently executing queries. See postgresql.org/docs/9.4/static/…