Zeroing WAL segments in Postgres

Question 1

We have a relatively low-volume Postgres database with continuous archiving set up to compress each WAL segment and send it to S3. Because it's a low-volume system, it hits an archive_timeout every 10 minutes or so and archives the mostly-unused WAL segment, which used to compress very well as it was mostly just zeroes.

However, Postgres recycles its WAL segments to avoid the cost of allocating new files at each WAL switch, which is useful in a high-load situation but it means that after a burst of heavier-than-normal activity our WAL segment files are now full of junk from previous segments and do not compress very well at all. We're storing a lot of copies of all of this junk.

Is there a way to reduce the amount of space we're using to keep our WAL archive? Some suboptimal possibilities:

Prevent Postgres from recycling the WAL segments somehow, so it starts out with a zeroed file each time. The docs do not indicate that there is an option for doing this but I might have missed it.
Have Postgres zero the WAL segment file when it starts/finishes using it. Again, the docs do not seem to suggest this is possible.
Externally zero or remove some of the WAL segment files while they're not in use. Is there a safe way to determine which files this is?
Zero the unused portion of the segment before archiving it using the output from pg_xlogdump to find where the junk starts. Possible, although I don't fancy it. At least by doing this in the archive command you can be sure that Postgres isn't going to reuse the file.
Only archive the used portion of the segment file, again by interpreting the output of pg_xlogdump somehow, and then pad it with zeroes during restore. Also sounds possible although I don't really fancy it.

Question 2

Interesting problem. May I ask what continuous archiving you are using for?

Question 3

@dezso Despite the low churn, it's considered Very Important to reduce the risk of losing any of this data as far as possible and to have an audit trail of the changes that are made. The WAL archiving is a last-line-of-defence (there are other mechanisms in play too) so keeping it cheap would be good.

Question 4

Starting in version 9.4, it now automatically zeroes the tail end of the WAL file. (Actually it is just mostly zero, there are some block headers that don't get zeroed, but still the result is very compressible).

In version 9.2, there is a program named pg_clearxlogtail you can use. You can add it into your archive_command before the compression step.

If you are using 9.3, you are out of luck.

Note that checkpoints do not inherently cause log file switches. It is probably archive_timeout which is causing the switches.

Question 5

D'oh. Yes, we're on 9.3, so have slipped through the crack between those two solutions. And yes, sorry, you're right it's the archive_timeout that causes the switches. Corrected the OP, thanks.

jjanes jjanes 42.4k3 gold badges44 silver badges54 bronze badges · Accepted Answer · 2017-08-03 18:12:59Z

Starting in version 9.4, it now automatically zeroes the tail end of the WAL file. (Actually it is just mostly zero, there are some block headers that don't get zeroed, but still the result is very compressible).

In version 9.2, there is a program named pg_clearxlogtail you can use. You can add it into your archive_command before the compression step.

If you are using 9.3, you are out of luck.

Note that checkpoints do not inherently cause log file switches. It is probably archive_timeout which is causing the switches.

D'oh. Yes, we're on 9.3, so have slipped through the crack between those two solutions. And yes, sorry, you're right it's the archive_timeout that causes the switches. Corrected the OP, thanks.

Stack Exchange Network

Zeroing WAL segments in Postgres

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Zeroing WAL segments in Postgres

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions