3

Let me start with the caveat that I am still green with Postgres.

I am working on a postgres 9.2 Active/Standby cluster on Debian wheezy for an application, based off of the ClusterLabs pgsql cluster documentation.

In the lab I am able to get this working without a problem. But on the production cluster I'm building, I keep running into a problem.

I brought the database files over from the current single production postgres server. By this I mean I shutdown postgres and tar-ed up the data directory and copied it over the the cluster's Master node. I put the files in place, set the permissions, and was able to start-up postgres on the Master via corosync just fine.

In preparing the slave, I used the pg_basebackup tool to bring the database over from the Master and this is where I keep having issues. As it is transferring, at about 57% I see the error:

$ pg_basebackup -h db-master -U u_repl -D /db/data/postgresql/9.2/main/ -X stream -P
pg_basebackup: could not receive data from WAL stream: SSL connection has been closed unexpectedly
176472/176472 kB (100%), 1/1 tablespace
pg_basebackup: child process exited with error 1`

And on the server, I see:

2016年04月06日 21:05:31 UTC LOG: terminating walsender process due to replication timeout

But the transfer doesn't stop and keeps going to completion.

I found this question here on stackexchange about setting "ssl_renegotiation_limit" to 0, but this didn't make much difference.

Anyone have any ideas? I am completely baffled as to why this would error, but keep on going just fine. It is the same procedure I used in the lab setup... the only difference is that the production database is much bigger in size.

Thoughts?? Thank you kindly! -Peter.

asked Apr 6, 2016 at 21:40
0

1 Answer 1

3

Many thanks to Albe Laurenz from the pgsql-admin mailing list.

The server error message means that the client did not send a status update within wal_sender_timeout milliseconds, see documentation.

The basebackup needs to complete before this wal_sender_timeout period, else the server resets the connection.

Side note, I am running 9.2 so this parameter is called replication_timeout in the older version.

Tombart
1,16011 silver badges23 bronze badges
answered May 9, 2016 at 1:14
1
  • Set replication_timeout =0 on PG 9.2.1( effectively disabling it) and that worked! Commented Nov 29, 2018 at 23:22

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.