11

I'm looking at altering how backups are done and am wondering if there is a way to determine which databases in a postgreql cluster have not been recently changed?

Instead of using pg_dumpall, I'd like to use pg_dump and only dump those databases that have changed since the last backup (some databases don't get updated very often)-- the idea being that if nothing has changed then the current backup should still be good.

Does anyone know of a way to determine when a specific database was last updated/changed?

Thanks...

Update:

I was hoping to not have to write triggers all over tha place as I have no control over the creation of databases in one particular cluster (let alone creation of db objects within a database).

Digging further, it looks like there is a correlation between the contents of the $PGDATA/global/pg_database file (specifically the second field) and the directory names under $PGDATA/base.

Going out on a limb, I'd guess that the second field of the pg_database file is the database oid and that each database has its own subdirectory under $PGDATA/base (with the oid for the subdirectory name). Is that correct? If so, is it reasonable to use the file timestamps from the files under $PGDATA/base/* as the trigger for needing a backup?

...or is there a better way?

Thanks again...

asked Aug 26, 2011 at 18:48
3
  • stackoverflow.com/questions/899203/… Commented Aug 26, 2011 at 19:00
  • Never assume that the current backup is good. You always want to take new backups on your regular schedule. Commented Aug 26, 2011 at 19:14
  • Sonu Singh - I can't control the addition of databases, let alone tables to this cluster so triggers won't work-- plus (to my knowledge) triggers won't catch ddl changes. mrdenny♦ - Correct. However, I'd like to avoid generating redundant incremental backups between the periodic full backups. Commented Aug 26, 2011 at 19:53

5 Answers 5

9

While using select datname, xact_commit from pg_stat_database; as suggested by @Jack Douglas doesn't quite work (apparently due to autovacuum), select datname, tup_inserted, tup_updated, tup_deleted from pg_stat_database does appear to work. Both DML and DDL changes will change the values of tup_* columns while a vacuum does not (vacuum analyze on the other hand...).

In the off chance that this may be useful for others, I'm including the backup script that I've put in place. This works for Pg 8.4.x but not for 8.2.x-- YMMV depending on the version of Pg used.

#!/usr/bin/env perl
=head1 Synopsis
pg_backup -- selectively backup a postgresql database cluster
=head1 Description
Perform backups (pg_dump*) of postgresql databases in a cluster on an
as needed basis.
For some database clusters, there may be databases that are:
 a. rarely updated/changed and therefore shouldn't require dumping as 
 often as those databases that are frequently changed/updated.
 b. are large enough that dumping them without need is undesirable.
The global data is always dumped without regard to whether any 
individual databses need backing up or not.
=head1 Usage
pg_backup [OPTION]...
General options:
 -F, --format=c|t|p output file format for data dumps 
 (custom, tar, plain text) (default is custom)
 -a, --all backup (pg_dump) all databases in the cluster 
 (default is to only pg_dump databases that have
 changed since the last backup)
 --backup-dir directory to place backup files in 
 (default is ./backups)
 -v, --verbose verbose mode
 --help show this help, then exit
Connection options:
 -h, --host=HOSTNAME database server host or socket directory
 -p, --port=PORT database server port number
 -U, --username=NAME connect as specified database user
 -d, --database=NAME connect to database name for global data
=head1 Notes
This utility has been developed against PostgreSQL version 8.4.x. Older 
versions of PostgreSQL may not work.
`vacuum` does not appear to trigger a backup unless there is actually 
something to vacuum whereas `vacuum analyze` appears to always trigger a 
backup.
=head1 Copyright and License
Copyright (C) 2011 by Gregory Siems
This library is free software; you can redistribute it and/or modify it 
under the same terms as PostgreSQL itself, either PostgreSQL version 
8.4 or, at your option, any later version of PostgreSQL you may have 
available.
=cut
use strict;
use warnings;
use Getopt::Long;
use Data::Dumper;
use POSIX qw(strftime);
my %opts = get_options();
my $connect_options = '';
$connect_options .= "--$_=$opts{$_} " for (qw(username host port));
my $shared_dump_args = ($opts{verbose})
 ? $connect_options . ' --verbose '
 : $connect_options;
my $backup_prefix = (exists $opts{host} && $opts{host} ne 'localhost')
 ? $opts{backup_dir} . '/' . $opts{host} . '-'
 : $opts{backup_dir} . '/';
do_main();
########################################################################
sub do_main {
 backup_globals();
 my $last_stats_file = $backup_prefix . 'last_stats';
 # get the previous pg_stat_database data
 my %last_stats;
 if ( -f $last_stats_file) {
 %last_stats = parse_stats (split "\n", slurp_file ($last_stats_file));
 }
 # get the current pg_stat_database data
 my $cmd = 'psql ' . $connect_options;
 $cmd .= " $opts{database} " if (exists $opts{database});
 $cmd .= "-Atc \"
 select date_trunc('minute', now()), datid, datname, 
 xact_commit, tup_inserted, tup_updated, tup_deleted 
 from pg_stat_database 
 where datname not in ('template0','template1','postgres'); \"";
 $cmd =~ s/\ns+/ /g;
 my @stats = `$cmd`;
 my %curr_stats = parse_stats (@stats);
 # do a backup if needed
 foreach my $datname (sort keys %curr_stats) {
 my $needs_backup = 0;
 if ($opts{all}) {
 $needs_backup = 1;
 }
 elsif ( ! exists $last_stats{$datname} ) {
 $needs_backup = 1;
 warn "no last stats for $datname\n" if ($opts{debug});
 }
 else {
 for (qw (tup_inserted tup_updated tup_deleted)) {
 if ($last_stats{$datname}{$_} != $curr_stats{$datname}{$_}) {
 $needs_backup = 1;
 warn "$_ stats do not match for $datname\n" if ($opts{debug});
 }
 }
 }
 if ($needs_backup) {
 backup_db ($datname);
 }
 else {
 chitchat ("Database \"$datname\" does not currently require backing up.");
 }
 }
 # update the pg_stat_database data
 open my $fh, '>', $last_stats_file || die "Could not open $last_stats_file for output. !$\n";
 print $fh @stats;
 close $fh;
}
sub parse_stats {
 my @in = @_;
 my %stats;
 chomp @in;
 foreach my $line (@in) {
 my @ary = split /\|/, $line;
 my $datname = $ary[2];
 next unless ($datname);
 foreach my $key (qw(tmsp datid datname xact_commit tup_inserted tup_updated tup_deleted)) {
 my $val = shift @ary;
 $stats{$datname}{$key} = $val;
 }
 }
 return %stats;
}
sub backup_globals {
 chitchat ("Backing up the global data.");
 my $backup_file = $backup_prefix . 'globals-only.backup.gz';
 my $cmd = 'pg_dumpall --globals-only ' . $shared_dump_args;
 $cmd .= " --database=$opts{database} " if (exists $opts{database});
 do_dump ($backup_file, "$cmd | gzip");
}
sub backup_db {
 my $database = shift;
 chitchat ("Backing up database \"$database\".");
 my $backup_file = $backup_prefix . $database . '-schema-only.backup.gz';
 do_dump ($backup_file, "pg_dump --schema-only --create --format=plain $shared_dump_args $database | gzip");
 $backup_file = $backup_prefix . $database . '.backup';
 do_dump ($backup_file, "pg_dump --format=". $opts{format} . " $shared_dump_args $database");
}
sub do_dump {
 my ($backup_file, $cmd) = @_;
 my $temp_file = $backup_file . '.new';
 warn "Command is: $cmd > $temp_file" if ($opts{debug});
 chitchat (`$cmd > $temp_file`);
 if ( -f $temp_file ) {
 chitchat (`mv $temp_file $backup_file`);
 }
}
sub chitchat {
 my @ary = @_;
 return unless (@ary);
 chomp @ary;
 my $first = shift @ary;
 my $now = strftime "%Y%m%d-%H:%M:%S", localtime;
 print +(join "\n ", "$now $first", @ary), "\n";
}
sub get_options {
 Getopt::Long::Configure('bundling');
 my %opts = ();
 GetOptions(
 "a" => \$opts{all},
 "all" => \$opts{all},
 "p=s" => \$opts{port},
 "port=s" => \$opts{port},
 "U=s" => \$opts{username},
 "username=s" => \$opts{username},
 "h=s" => \$opts{host},
 "host=s" => \$opts{host},
 "F=s" => \$opts{format},
 "format=s" => \$opts{format},
 "d=s" => \$opts{database},
 "database=s" => \$opts{database},
 "backup-dir=s" => \$opts{backup_dir},
 "help" => \$opts{help},
 "v" => \$opts{verbose},
 "verbose" => \$opts{verbose},
 "debug" => \$opts{debug},
 );
 # Does the user need help?
 if ($opts{help}) {
 show_help();
 }
 $opts{host} ||= $ENV{PGHOSTADDR} || $ENV{PGHOST} || 'localhost';
 $opts{port} ||= $ENV{PGPORT} || '5432';
 $opts{host} ||= $ENV{PGHOST} || 'localhost';
 $opts{username} ||= $ENV{PGUSER} || $ENV{USER} || 'postgres';
 $opts{database} ||= $ENV{PGDATABASE} || $opts{username};
 $opts{backup_dir} ||= './backups';
 my %formats = (
 c => 'custom',
 custom => 'custom',
 t => 'tar',
 tar => 'tar',
 p => 'plain',
 plain => 'plain',
 );
 $opts{format} = (defined $opts{format})
 ? $formats{$opts{format}} || 'custom'
 : 'custom';
 warn Dumper \%opts if ($opts{debug});
 return %opts;
}
sub show_help {
 print `perldoc -F 0ドル`;
 exit;
}
sub slurp_file { local (*ARGV, $/); @ARGV = shift; <> }
__END__

Update: the script has been put on github here.

answered Aug 31, 2011 at 17:53
1
  • Quite nice code, thanks for sharing. BTW, it could be github'ed, don't you think so? :-) Commented Mar 15, 2012 at 11:19
2

It looks like you can use pg_stat_database to get a transaction count and check if this changes from one backup run to the next:

select datname, xact_commit from pg_stat_database;
 datname | xact_commit 
-----------+-------------
 template1 | 0
 template0 | 0
 postgres | 136785

If someone has called pg_stat_reset you can't be certain if a db has changed or not, but you may consider it unlikely enough that that would happen, followed by exactly the right number of transactions to match your last reading.

--EDIT

see this SO question for why this might not work. Not sure why this might happen but enabling logging might shed some light....

answered Aug 27, 2011 at 9:07
5
  • If someone did call pg_stat_reset then the probability of the xact_commit value matching the previous whould be pretty low, no? So that certainly looks to catch the existence of DML changes. Now all I need is to catch if there have been DDL changes. Commented Aug 27, 2011 at 16:37
  • DDL is transactional in postgres - I'd expect the commit count to increase in that case too. Not checked though... Commented Aug 27, 2011 at 17:42
  • You Sir, are correct. I had forgotten about Pg DDL being transactional and a quick create table ... test does appear to increment xact_commit. Commented Aug 27, 2011 at 21:04
  • 1
    Further testing shows the xact_commit increasing even though there is no user activity going on-- autovacuum perhaps? Commented Aug 27, 2011 at 22:03
  • This definitely doesn't work for backup purposes. xact_commit increases very frequently, even when nobody is connected to the database. Commented Jun 19, 2018 at 16:27
1

From digging around the postgres docs and newsgroups:

txid_current() will give you a new xid - if you call the function again at a later date, if you get a xid one higher, you know that no transactions committed between the two calls. You may get false positives though - eg if someone else calls txid_current()

answered Aug 27, 2011 at 0:07
3
  • Thank you for the suggestion. I don't belive this will work however as txid_current() appears to operate at the cluster level rather than the database level. Commented Aug 27, 2011 at 1:52
  • I looked for some doc on that and couldn't find - do you have a link? Commented Aug 27, 2011 at 7:38
  • 1
    No link. I tested by switching between databases and running "select current_database(), txid_current();" and comparing the results. Commented Aug 27, 2011 at 16:30
0

Rember the time-stamp on your files containing the DB-data and look if they have changed. If they did there was a write.

Edit after WAL-hint: You should do this only after flushing the outstanding writes.

answered Aug 27, 2011 at 20:58
1
  • 2
    That's not rely reliable. There could be changes that are not yet written (flushed) to the datafiles, i.e. they were only written to the WAL. Commented Aug 27, 2011 at 22:01
0

Postgresql 9.5 let us to track last modified timestamp check out this link https://thirumal-opensource.blogspot.in/2017/03/to-track-last-modified-commit-or-get.html

answered Mar 31, 2017 at 11:48

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.