2

I'd configured repmgr replication on node1 and node3 (primary and standby respectively), and the setup worked sucessfully creating new records and objects on standby as expected. But after some weeks I'd noticed that replication wasn't working anymore, however some repmgr commands are returning results as the replication are working. I tried to restart and register again the standby node, but it doesn't worked.

How can I continue to replicate?

Here's status of nodes:

-bash-4.2$ psql -V
psql (PostgreSQL) 10.3

NODE1 - PRIMARY

-bash-4.2$ repmgr node check
Node "node1":
 Server role: OK (node is primary)
 Replication lag: OK (N/A - node is primary)
 WAL archiving: OK (0 pending archive ready files)
 Downstream servers: OK (this node has no downstream nodes)
 Replication slots: OK (node has no replication slots)
-bash-4.2$

NODE3 - STANDBY

-bash-4.2$ repmgr -f /etc/repmgr/10/repmgr.conf node check 
Node "node3":
 Server role: OK (node is standby)
 Replication lag: OK (0 seconds)
 WAL archiving: OK (0 pending archive ready files)
 Downstream servers: CRITICAL (1 of 1 downstream nodes not attached; missing: node3 (ID: 3))
 Replication slots: OK (node has no replication slots)
-bash-4.2$ repmgr node status 
Node "node3":
 PostgreSQL version: 10.3
 Total data size: 2393 MB
 Conninfo: host=node3 user=repmgr dbname=repmgr connect_timeout=2
 Role: standby
 WAL archiving: disabled (on standbys "archive_mode" must be set to "always" to be effective)
 Archive command: /bin/true
 WALs pending archiving: 0 pending files
 Replication connections: 0 (of maximal 10)
 Replication slots: 0 (of maximal 10)
 Upstream node: node3 (ID: 3)
 Replication lag: 0 seconds
 Last received LSN: 4/AC000000
 Last replayed LSN: 4/AC000140
asked Apr 24, 2018 at 16:45

2 Answers 2

1

You should probably raise your wal limits to keep more files around, also not a bad idea is to set them aside using the archive_command, like this

archive_command = 'test ! -f /postgres/archive/%f && cp -n %p /postgres/archive/%f'
wal_keep_segments = 256

Raise it high enough for your use case , 256 is just an example here, the paths need adjustments to match your installation.

secondly, use cluster show to verify the cluster is healty, it's more clear than to check the node.

lastly: Did you register the standby after cloning ? You don't show this in your command list. After the cloning you need to start and then register it

repmgr standby register

If it already existed in the repmgr.nodes table, add --force

answered Jun 13, 2018 at 12:49
2

Some needed wal files to replicate wasn't found on primary. Then I reinstated the standby cloning it again.

Commands submitted on standby server:

 pg_ctl stop
 repmgr -f /etc/repmgr/10/repmgr.conf --force --rsync-only -h node1 -d repmgr -U repmgr --verbose standby clone
 pg_ctl start
 repmgr node status
 repmgr node check
answered Apr 24, 2018 at 19:41
2
  • 1
    That is common practice and the only way to recreate the slave when you are lacking the needed WAL files. Good luck playing with repmgr, I really like it but the learning curve is a bit steep sometimes. Commented Sep 12, 2018 at 13:34
  • 1
    This really helped me rescue a downstream node. The command used in my case (as the PostgreSQL user) was:repmgr -f /etc/repmgr/12/repmgr.conf --force -h <<upstream>> -d repmgr -U repmgr --verbose standby clone Commented Feb 7, 2023 at 23:20

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.