1

I have setup streaming replication for my postgres servers. I am using PostgreSQL 9.2.6. Replication seems to be working fine. I am monitoring this with two ways using nagios:

  1. Log_delay and
  2. Byte_lag

I am very frequently getting critical alert for log_delay and in the same time byte_lag is not throwing up any alert. log_delay will be OK after 1 or 2 minutes. Can any one please suggest me If I am missing something in my setup?

Queries are given below.

Log_delay

SELECT 
 CASE WHEN pg_last_xlog_receive_location() = pg_last_xlog_replay_location() 
 THEN 0 
 ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp()) 
 END AS log_delay; 

Byte_lag

SELECT sent_offset - ( replay_offset - (sent_xlog - replay_xlog) * 255 * 16 ^ 6 ) 
 AS byte_lag 
FROM ( 
 SELECT client_addr,
 ('x' || lpad(split_part(sent_location, '/', 1), 8, '0'))::bit(32)::bigint 
 AS sent_xlog,
 ('x' || lpad(split_part(replay_location, '/', 1), 8, '0'))::bit(32)::bigint 
 AS replay_xlog,
 ('x' || lpad(split_part(sent_location, '/', 2), 8, '0'))::bit(32)::bigint 
 AS sent_offset,
 ('x' || lpad(split_part(replay_location, '/', 2), 8, '0'))::bit(32)::bigint 
 AS replay_offset 
FROM pg_stat_replication ) AS s;
András Váczi
31.8k13 gold badges103 silver badges152 bronze badges
asked Oct 11, 2015 at 2:07
0

1 Answer 1

2

Since you're using 9.2, it would probably be better to use pg_xlog_location_diff instead of your byte_lag function. Something like this is much clearer, less error prone, and should give you what you are looking for.

SELECT client_hostname,
 client_addr, 
 pg_xlog_location_diff(sent_location, replay_location) AS difference_in_bytes
FROM pg_stat_replication;

Alternatively, there is always the check_postgres.pl Nagios plugin. It has a huge number of implemented checks that can definitely help keep your PostgreSQL installation happy and healthy.

The particular function you would want for your needs is hot_standby_delay.

Once you have it installed, and the symlinks made, you could simply do something like this from your primary:

check_postgres_hot_standby_delay --dbport=5432,5432 --dbuser=postgres,postgres --dbhost=,10.0.1.2 --warning="1048576 and 2min" --critical="16777216 and 10min" -v

And get alerts if the bytes and the time falls out of your set thresholds.

answered Oct 22, 2015 at 19:18

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.