5

I have found some questions about the same error, but didn't find any of them answering my problem.

The setup is that I have two Postgres11 clusters (A and B) and they are making use of publication and subscription features to copy data from A to B.

A (source DB- publication) --------------> B (target DB - subscription)

This works fine, but often (not always) when the data volume being inserted on a table in node A increases, it gives the following error.

"terminating walsender process due to replication timeout"

The data volume at the moment being entered is about 30K rows per second continuously for hours through COPY command.

Earlier the wal_sender_timeout was set to 5 sec and I would see this error much often. I then increased it to 1 min and the frequency of this error reduced. But I don't want to keep increasing it without understanding what is causing it. I looked at the code of walsender.c and found that it was coming from here.

if (wal_sender_timeout > 0 && last_processing >= timeout)
 {
 /*
 * Since typically expiration of replication timeout means
 * communication problem, we don't send the error message to the
 * standby.
 */
 ereport(COMMERROR,
 (errmsg("terminating walsender process due to replication timeout")));
 
 WalSndShutdown();
 }

But I am still not clear that which parameter is making the sender assume that the receiver node is inactive and therefore it should stop the wal_sender.

SourceDB

sourcedb=# show wal_sender_timeout;
 wal_sender_timeout
--------------------
 1min
(1 row)
sourcedb=# select * from pg_replication_slots;
 slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn
------------------------------------+----------+-----------+--------+----------+-----------+--------+------------+------+--------------+----------------+--------------------
-
 sub_target_DB | pgoutput | logical | 16501 | sourcedb | f | t | 68229 | | 98839088 | 116D0/C36886F8 | 116D0/C3E5D370
 

TargetDB

targetdb=# show wal_receiver_timeout;
 wal_receiver_timeout
----------------------
 1min
(1 row)
targetdb=# show wal_retrieve_retry_interval ;
 wal_retrieve_retry_interval
-----------------------------
 5s
(1 row)
targetdb=# show wal_receiver_status_interval;
 wal_receiver_status_interval
------------------------------
 2s
(1 row)
targetdb=# select * from pg_stat_subscription;
 subid | subname | pid | relid | received_lsn | last_msg_send_time | last_msg_receipt_time | latest_end_lsn | l
atest_end_time
------------+------------------------------------+-------+-------+----------------+-------------------------------+-------------------------------+----------------+---------
----------------------
 2378695757 | sub_target_DB | 62371 | | 116D1/2BA8F170 | 2021年08月20日 09:05:15.398423+09 | 2021年08月20日 09:05:15.398471+09 | 116D1/2BA8F170 | 2021-08-
20 09:05:15.398423+09

Edit 1: Are there any disadvantages to keeping the wal_sender_timeout or wal_receiver_timeout to much higher values? I know that in case of an actual failure, the WAL segments would keep piling up in the pg_wal folder of the sender. But is there a safe limit?

Edit 2: Increased the wal_sender_timeout to 5 mins and the error started appearing more frequently instead. Not only that, it even killed the active subscription and stopped replicating data. Had to restart it. So clearly, just increasing the wal_sender_timeout hasn't helped.

asked Aug 20, 2021 at 0:11

1 Answer 1

-1

I know this is an older issue, but I was able to resolve this by increasing wal_sender_timeout.

If you're getting more errors, you might want to explicitly designate the time increment in the parameter (use m or s).

answered Nov 14, 2022 at 22:07

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.