Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit 79e427d

Browse files
committed
Day 40
1 parent 1982b11 commit 79e427d

File tree

1 file changed

+153
-0
lines changed

1 file changed

+153
-0
lines changed
Lines changed: 153 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,153 @@
1+
Originally from: [tweet](https://twitter.com/samokhvalov/status/1721036531580973494), [LinkedIn post]().
2+
3+
---
4+
5+
# How to break a database, Part 2: Simulate infamous transaction ID wraparound
6+
7+
> I post a new PostgreSQL "howto" article every day. Join me in this
8+
> journey – [subscribe](https://twitter.com/samokhvalov/), provide feedback, share!
9+
10+
See also [Part 1: How to Corrupt](0039_how_to_break_a_database_part_1_how_to_corrupt.md).
11+
12+
## Straightforward simulation method
13+
14+
The method requires some time but works pretty well. It has been discussed in a couple of places:
15+
16+
- My tweet [A question from Telegram chat](https://twitter.com/samokhvalov/status/1415575072081809409)
17+
- Article
18+
[How to simulate the deadly, infamous, misunderstood & complex ‘Transaction Wraparound Problem’ in PostgreSQL](https://fatdba.com/2021/07/20/how-to-simulate-the-deadly-transaction-wraparound-problem-in-postgresql/)
19+
20+
First, open a "long-running transaction" (with normal XID assigned to it, as it would be a writing transaction) – and
21+
keep the session open (we'll use a named pipe a.k.a. FIFO to "hang" our `psql` session):
22+
23+
```bash
24+
mkfifo dummy
25+
26+
psql -Xc "
27+
set idle_in_transaction_session_timeout = 0;
28+
begin;
29+
select pg_current_xact_id()
30+
" \
31+
-f dummy &
32+
```
33+
34+
Make sure the session is open and in transaction:
35+
36+
```bash
37+
❯ psql -Xc "select state
38+
from pg_stat_activity
39+
where
40+
pid <> pg_backend_pid()
41+
and query ~ 'idle_in_tran'
42+
"
43+
state
44+
---------------------
45+
idle in transaction
46+
(1 row)
47+
```
48+
49+
Now, having an open long-running transaction with assigned XID, we just need to shift transaction ID at high speed – we
50+
can use the same function for it, and `pgbench` with multiple connections:
51+
52+
```bash
53+
pgbench -c8 -j8 -P60 -T36000 -rn \
54+
-f - <<< 'select pg_current_xact_id()'
55+
```
56+
57+
This should move the current XID at very high pace (100-200k TPS). And since we have an open a long-running
58+
transaction, `autovacuum` cannot operate.
59+
60+
While `pgbench` is running, we can observe the state of the database using a monitoring tool that includes XID/multiXID
61+
wraparound checks (every tool must, but not every tool does), or just some snippet – e.g.
62+
from [Managing Transaction ID Exhaustion (Wraparound) in PostgreSQL](https://crunchydata.com/blog/managing-transaction-id-wraparound-in-postgresql).
63+
64+
After a few hours:
65+
66+
```
67+
WARNING: database "nik" must be vacuumed within 39960308 transactions
68+
HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database.
69+
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
70+
```
71+
72+
And if we proceed, then this:
73+
74+
```bash
75+
nik=# create table t3();
76+
ERROR: database is not accepting commands to avoid wraparound data loss in database "nik"
77+
HINT: Stop the postmaster and vacuum that database in single-user mode.
78+
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
79+
```
80+
81+
Good, the harm is done! It's time to release FIFO (this will close our long-lasting idle-in-transaction session):
82+
83+
```bash
84+
exec 3>dummy && exec 3>&-
85+
```
86+
87+
## How to escape – the basics
88+
89+
How to escape? We'll probably discuss this in a separate howto, but let's talk about some basics.
90+
91+
See real-life use cases to learn from the other people's mistakes:
92+
93+
- [Sentry's case](https://blog.sentry.io/transaction-id-wraparound-in-postgres/),
94+
- [Mailchimp's one](https://mailchimp.com/what-we-learned-from-the-recent-mandrill-outage/)
95+
96+
are good lessons.
97+
98+
Escaping from this takes, in general case, takes a lot of time. Traditionally, as the `HINT` above suggests, it is done
99+
in the single-user mode. But I very much like the ideas by
100+
[@PostSQL](https://twitter.com/PostSQL)
101+
from the GCP team – see the talk [Do you vacuum every day?](https://youtube.com/watch?v=JcRi8Z7rkPg) and discussion in
102+
the `pgsql-hackers` mailing list
103+
[We should stop telling users to "vacuum that database in single-user mode"](https://postgresql.org/message-id/flat/CAMT0RQTmRj_Egtmre6fbiMA9E2hM3BsLULiV8W00stwa3URvzA%40mail.gmail.com).
104+
105+
## Alternative method to simulate wraparound
106+
107+
There is another method for simulation which works much faster – no need to wait many hours, we can use `pg_resetwal`'s
108+
option `-x` ([docs](https://postgresql.org/docs/current/app-pgresetwal.html)) to fast-forward the transaction ID. This
109+
method is described in [Transaction ID wraparound: a walk on the wild
110+
side](https://cybertec-postgresql.com/en/transaction-id-wraparound-a-walk-on-the-wild-side/). It's quite interesting
111+
and looks like this:
112+
113+
```bash
114+
pg_ctl stop -D $PGDATA
115+
116+
pg_resetwal -x 2137483648 -D testdb ### 2^31 - 10000000
117+
118+
dd if=/dev/zero of=$PGDATA/pg_xact/07F6 bs=8192 count=15
119+
```
120+
121+
## MultiXact ID wraparound
122+
123+
MultiXact ID is also possible:
124+
[Multixacts and Wraparound](https://postgresql.org/docs/current/routine-vacuuming.html#VACUUM-FOR-MULTIXACT-WRAPAROUND).
125+
This risk is often overlooked. But it exists, especially if you have FKs and/or use
126+
`SELECT ... FOR SHARE` (and `SELECT ... FOR UPDATE`) combined with substransactions).
127+
128+
Some articles to learn about it:
129+
130+
- [A foreign key pathology to avoid](https://thebuild.com/blog/2023/01/18/a-foreign-key-pathology-to-avoid/)
131+
- [Subtransactions considered harmful: Problem 3: unexpected use of Multixact IDs](https://postgres.ai/blog/20210831-postgresql-subtransactions-considered-harmful#problem-3-unexpected-use-of-multixact-ids)
132+
- [Notes on some PostgreSQL implementation details](https://buttondown.email/nelhage/archive/notes-on-some-postgresql-implementation-details/)
133+
134+
Simulation of MultiXact ID wraparound is also possible – for this, we need to create multiple overlapping transactions
135+
that lock the same rows (for example, using explicit `SELECT ... FOR SHARE`).
136+
137+
> 🎯 **TODO:** working example – left for future edits.
138+
139+
Everyone should not only monitor (with alerting) for traditional XID wraparound risks, but also for MultiXID wraparound
140+
risks. And it should be included in the snippets showing how much of the "capacity" is used.
141+
142+
> 🎯 **TODO:** snippet to show both XID w-d and MultiXID w-d, at both DB and table level
143+
> Obviously, multixid wraparrounds are encountered less often in the wild – I don't see people have `datminmxid` and
144+
> `relminmxid` used in the snippets.
145+
> Basic version:
146+
>
147+
> ```sql
148+
> select
149+
> datname,
150+
> age(datfrozenxid),
151+
> mxid_age(datminmxid)
152+
> from pg_database;
153+
> ```

0 commit comments

Comments
(0)

AltStyle によって変換されたページ (->オリジナル) /