The PostgreSQL built-in backup mechanism isn't always very suitable. Sometimes, you want to put the application in a quiescent state, because it has external data with which you want to backup at the same time you back up the PG data. But the only way to put the application in a quiescent state is to "lock" the database also. PG lacks a database-wide or cluster-wide locking mechanism. Putting PG into a read-only state would be a piece in the following solution:
- Quiesce application data (disable logins)
- Quiesce database (by making it read-only)
- Perform a PG checkpoint or pg_xlog_switch()
- Create a snapshot of the App and Data volumes
- Resume the database (make it RW again)
- Resume the application
- Backup the snapshots
-
I think this related answer should serve you well: dba.stackexchange.com/questions/11893/… (maybe duplicate?)Erwin Brandstetter– Erwin Brandstetter2017年08月21日 15:34:35 +00:00Commented Aug 21, 2017 at 15:34
-
@ErwinBrandstetter thanks, it's in the right direction.Otheus– Otheus2017年08月21日 16:46:19 +00:00Commented Aug 21, 2017 at 16:46
2 Answers 2
After culling answers elsewhere on the internet, I devised a solution. The other answers were in and of themselves, incomplete. So I am presenting an answer here in hopes it will benefit others.
The Strategy
- Disable connections to the database (not the cluster).
- Set the database's
default_transaction_read_only
setting totrue
. - Terminate the existing connections to that database.
- Re-enable (read-only) connections.
Once that is done, you would (in my solution):
- Perform the
CHECKPOINT
(I think this is the safest, but apg_xlog_switch()
would be appropriate for very high-load servers) - Take the volume snapshot
- Reverse the previous steps. (But this is tricky!)
Pitfalls
- Terminating connections while they are mid-transaction is probably a bad idea. Better to kill idle connections, wait for a few seconds, then kill idle ones, wait a few more, repeat until they are all gone.
- At some point, you'll have to kill open/hung queries or abort the backup.
- At the start of a transaction of session, Postgresql takes a kind of a snapshot of the process table. You have to reset this snapshot every time you go to check if unwanted processes are still there. See
pg_stat_clear_snapshot()
Restoring the read-write state is not so simple. If read-only connections now exist, you must terminate them in order for the new read-write status to take effect. But new connections might arrive while killing existing ones. So again, you must
- Disable connections to the database
- change default_transaction_read_only status to
false
- kill the existing connections
- Re-enable (r/w) connections to the database
Alternate strategy
Another strategy is to change the permissions on the role used by the application. This can be quite messy and is less general.
For instance, you'd have to revoke/re-grant on not just tables, but sequences, large objects, and probably the schema itself. Further, what exactly is the behavior of existing connections when you change the access? Probably no impact, which means you also need to kill those backends. Finally, let's say the application has read-write access to most tables, but not to others in the schema. You'd have to make sure your re-granting doesn't include those objects as well.
Another possibility is to LOCK all the tables, by querying the catalog and performing a dynamic query. That seemed to perilous for my tastes.
Implementation
Pause_service
The database instance name is 'gitlabhq' and the username of the application is 'gitlab'. Replace it with your own:
psql -Upostgres <<'PAUSE_DB'
-- 1. disable new connections
alter database gitlabhq_production with allow_connections = off;
-- 2. Make DB read-only
alter database gitlabhq set default_transaction_read_only = true;
-- 3. Inobtrusively but safely terminate current connections
DO $X$ BEGIN
-- kill open idle connections, try up to 9x. Last time, kill regardless
FOR i IN 1..10 LOOP
PERFORM pg_terminate_backend(pid) from pg_stat_activity where usename = 'gitlab'
and (i >= 10 OR state in ('idle', 'disabled' ));
PERFORM pg_stat_clear_snapshot();
EXIT WHEN NOT EXISTS ( select pid from pg_stat_activity where usename = 'gitlab' );
RAISE NOTICE 'pg backends still open: sleeping 2 seconds';
PERFORM pg_sleep(2);
PERFORM pg_stat_clear_snapshot();
END LOOP;
-- send notice if still open connections
IF EXISTS ( select pid from pg_stat_activity where usename = 'gitlab' ) THEN
RAISE NOTICE 'Hung backends. Backup might not be 100%% consistent';
END IF;
END;$X$;
-- 4. Allow read-only connections while checkpointing/snapshotting
alter database gitlabhq with allow_connections = on;
CHECKPOINT;
Resume
alter database gitlabhq_production with allow_connections = off;
alter database gitlabhq set default_transaction_read_only = false;
SELECT pg_stat_clear_snapshot();
SELECT pg_terminate_backend(pid) from pg_stat_activity where usename = 'gitlab';
alter database gitlabhq with allow_connections = on;
There's a possibility that in this last step you will kill long-running read-only/SELECT queries, but in my experience, such long-running queries can last minutes if not hours, and it's acceptable to kill these in order to ensure uptime for everyone else.
-
2
alter database gitlabhq set default_transaction_read_only = true;
doesn't really help because new connections can simply dostart transaction read write
orset default_transaction_read_only = false;
user1822– user18222017年08月22日 14:28:19 +00:00Commented Aug 22, 2017 at 14:28 -
and some clients, poolers, etc will do so by default unless explicitly asked for a read-only txn by the app. So whether this works depends a lot on the app.Craig Ringer– Craig Ringer2017年08月23日 01:15:21 +00:00Commented Aug 23, 2017 at 1:15
-
@a_horse_with_no_name If your app does this, then you'll have to keep the connections disallowed for that app.Otheus– Otheus2017年08月24日 07:54:08 +00:00Commented Aug 24, 2017 at 7:54
-
1@CraigRinger What poolers / apps do this?Otheus– Otheus2017年08月24日 07:54:21 +00:00Commented Aug 24, 2017 at 7:54
-
1Even if you disconnect all clients, the database is not read-only. Background jobs like VACUUM might still change the database!A. Scherbaum– A. Scherbaum2018年11月20日 12:26:29 +00:00Commented Nov 20, 2018 at 12:26
I think it'd be desirable to have this functionality as an official PostgreSQL feature, personally.
Doing it simply
If you don't want to get your hands dirty with C coding for a PostgreSQL extension, you can just put a connection pooler in front of PostgreSQL. Like pgBouncer.
pgBouncer has the ability to pause application activity built-in. Though to make it very useful you need to connect directly (not via pgbouncer) and cancel active connections once you've paused new ones from coming in. Just select pg_terminate_backend(pid) from pg_stat_activity where pid <> pg_backend_pid()
.
Doing it right
If you're willing to get your hands dirty, though, it can be done with a C extension. The extension must:
Be loaded in
shared_preload_libraries
so it can register a small static shared memory segment with a boolean flag like db_is_locked.Register a
ProcessUtility_hook
andExecutorStart_hook
that tests the is-locked flag in shmem and, if set, sleeps in aWaitLatch
loop until it sees the flag has been cleared again. (You could possibly use a parser hook instead).Write two SQL-callable functions in C. One sets the flag. Another clears the flag and iterates through
PGPROC
setting the latch of all user processes, so they know to wake up immediately.Optionally write a 3rd function that, if the flag is set, iterates through
PGXACT
to find open write transactions and signals them to terminate.
All this has already been implemented as part of the BDR extension, but it's part of a much larger system. You could quite likely extract the relevant parts into your own extension. See bdr_locks.c
, bdr_commandfilter.c
, bdr_executor.c
, bdr.c
, etc.
Note that this won't make PostgreSQL read-only on disk - the checkpointer will continue to run, the bgwriter will continue to run, the archiver will still run, etc. So it's not enough to let you make a DB backup without an atomic file system snapshot or pg_start_backup()
/ pg_stop_backup()
. But it's fine for your use case, pausing application actiivity at the DB.
-
The bouncer container might be the best option. I'm using this for a dockerized gitlab setup, and I want to eventually move the databases to their own server without drastically reconfiguring the docker setup.Otheus– Otheus2017年09月01日 11:05:46 +00:00Commented Sep 1, 2017 at 11:05
-
Checking the shared memory segment for each process/executor might be a tad expensive, no? Wouldn't it be better simply to load the extension before the snapshot and unload it afterward?Otheus– Otheus2017年09月08日 12:58:23 +00:00Commented Sep 8, 2017 at 12:58
-
@Otheus Huh? No, really not. If you want to ensure reliable ordering you must have a LWLock or spinlock on the shmem flag, and acquire it before testing the flag. But that's still ridiculously cheap compared to all the things PostgreSQL does in execution of even a simple query. Trace the query execution path for
SELECT * FROM tablename
and you'll be amazed at what happens. Trust me, accessing a simple shmem segment that's preloaded inshared_preload_libraries
costs nothing in comparision.Craig Ringer– Craig Ringer2017年09月11日 01:05:47 +00:00Commented Sep 11, 2017 at 1:05 -
@Otheus But more importantly, what you describe won't work. You don't have any way to load the extension into existing, running backends, unless you can intervene on each connection on the application side to run a LOAD or similar. And if you can do that, you can just pause all your queries app-side so you don't need the extension at all. You need the extension preloaded and running in all backends, so it's already there and already testing the "read only" flag when you decide to actually set it.Craig Ringer– Craig Ringer2017年09月11日 01:07:19 +00:00Commented Sep 11, 2017 at 1:07
-
To give you an idea, if I
sudo perf probe -x /usr/pgsql-9.6/bin/postgres --add LWLockAcquire
then in apsql
sessionselect pg_backend_pid()
and trace that session withsudo perf record -e probe_postgres:LWLockAcquire -p 26367
, runSELECT * FROM some_empty_table
, and end theperf
record session, that PostgreSQL backend has taken over 400 LWLocks. So really, a single shared memory test is practically free. If you can show it makes 0.1% difference in performance I'll be impressed.Craig Ringer– Craig Ringer2017年09月11日 01:24:06 +00:00Commented Sep 11, 2017 at 1:24
Explore related questions
See similar questions with these tags.