I'm trying to understand whether the repeatable read isolation level is good enough for my scenario in an application that uses Postgres, but the docs are making it difficult to understand which is best suited.
I have an application that starts a transaction that first reads the current value of a single row based on the primary key of that row, calculates what the new state of that row should be after receiving an event from a message queue, and then updates the state of that row in the database.
Given that there are multiple instances of the application deployed, and that multiple events on the message queue can be received for the same primary key at the same time, it's possible that 2 transactions could attempt to enter the above described transaction at the same time. Is repeatable read
isolation good enough for this case, or do I need to consider using serializable
? My assumption is that if the first transaction attempts to commit its result to the database whilst the second transaction is in progress, the second transaction will fail with a conflict because it sees that the row it was attempting to update was modified by the first transaction - is this correct?
And a follow-up question: I'd like to understand how much more 'expensive' it is to use serializable transaction isolation vs repeatable read - I want to understand the underlying mechanism of what's going on within postgres - i.e. is there locking going on, and how does that affect performance of other queries running at the same time?
-
1"calculates what the new state of that row should be after receiving an event from a message queue" What are the inputs to that calculation?jjanes– jjanes2021年02月05日 16:56:46 +00:00Commented Feb 5, 2021 at 16:56
3 Answers 3
REPEATABLE READ
is sufficient for your case. It will by definition prevent a "lost update".
SERIALIZABLE
is quite a bit more expensive than REPEATABLE READ
, which is the "cheapest" of all isolation levels. More locks will be taken (SI locks that don't block anything, but can cause a transaction to abort), and these locks have to survive a commit. It is impossible to name a figure how much more expensive it will be; that depends on your workload.
-
1According to this article, "repeatable read" is not the "cheapest" of the isolation levels, but rather the second cheapest (the cheapest being "read committed" instead): prisma.io/dataguide/postgresql/inserting-and-modifying-data/…Venryx– Venryx2022年02月03日 09:19:10 +00:00Commented Feb 3, 2022 at 9:19
-
1This level is different from Read Committed in that a query in a repeatable read transaction sees a snapshot as of the start of the first non-transaction-control statement in the transaction, not as of the start of the current statement within the transaction. And acquiring a snapshot is not for free.Laurenz Albe– Laurenz Albe2022年02月03日 09:41:55 +00:00Commented Feb 3, 2022 at 9:41
-
1Or are you saying that
READ COMMITTED
creates multiple snapshots within the same transaction, whereasREPEATABLE READ
only makes one? (and since each snapshot takes time,READ COMMITTED
can thus be slower)Venryx– Venryx2022年02月03日 09:50:01 +00:00Commented Feb 3, 2022 at 9:50 -
1Yes, precisely.Laurenz Albe– Laurenz Albe2022年02月03日 09:51:06 +00:00Commented Feb 3, 2022 at 9:51
-
2Ah. It would be nice to have benchmarks to confirm the perf difference, but I can see the rationale behind it. Thanks for mentioning.Venryx– Venryx2022年02月03日 09:52:22 +00:00Commented Feb 3, 2022 at 9:52
Why not stick with the default transaction isolation level of read committed
which everyone expects and understands best?
Use optimistic locking, for example
update <table>
set <column> = <new value>
where id = <id>
and <column> = <old value>
If that update fails, it means someone else got there before you and you should try again. Depending on your exact use case, there may be optimisations you can make to prevent this from happening a lot.
Or maybe you can make your statement atomic depending on the exact function or expression you are updating your columns with?
update <table>
set <column> = <expression>(<column>)
where id = <id>
-
What if
UPDATE
doesn't fit a given use-case, and aDELETE
followed byINSERT
needs to be used?payne– payne2022年01月26日 06:52:33 +00:00Commented Jan 26, 2022 at 6:52
SELECT FOR UPDATE
I think this is the perfect solution for your use case. E.g. considering a simpler dummy case of multiple incrementer threads which I believe models well the question:
CREATE TABLE "MyInt" ( i INTEGER NOT NULL )
INSERT INTO "MyInt" VALUES (0)
and then multiple parallel updaters:
SELECT * FROM "MyInt"
// set newI = i + 1 in your server code
UPDATE "MyInt" SET i = ${newI}
As it stands, many updates would be lost. But if you do instead:
BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED
SELECT * FROM "MyInt" FOR UPDATE
// set newI = i + 1 in your code
UPDATE "MyInt" SET i = ${newI}
COMMIT
the updates won't be lost anymore, because the FOR UPDATE
locks the row from other SELECT FOR UPDATE
until the transaction commits, and other threads just wait. https://www.postgresql.org/docs/13/explicit-locking.html#LOCKING-ROWS documents:
FOR UPDATE causes the rows retrieved by the SELECT statement to be locked as though for update. This prevents them from being locked, modified or deleted by other transactions until the current transaction ends. That is, other transactions that attempt UPDATE, DELETE, SELECT FOR UPDATE, SELECT FOR NO KEY UPDATE, SELECT FOR SHARE or SELECT FOR KEY SHARE of these rows will be blocked until the current transaction ends; conversely, SELECT FOR UPDATE will wait for a concurrent transaction that has run any of those commands on the same row, and will then lock and return the updated row (or no row, if the row was deleted). Within a REPEATABLE READ or SERIALIZABLE transaction, however, an error will be thrown if a row to be locked has changed since the transaction started. For further discussion see Section 13.4.
I have tested this with this test code.
Documentation quote that says that REPEATABLE READ
would also be enough
FOR UPDATE
+ READ COMMITTED
is the best approach I think, as it does the job and is a bit faster than REPEATABLE READ
on my simple benchmark (4.2s vs 3.2s). But just for completeness, the fact that REPEATABLE READ
alone also works is quite clear in the docs, just to confirm what Laurenz said further: https://www.postgresql.org/docs/14/transaction-iso.html#XACT-REPEATABLE-READ
Applications using this level must be prepared to retry transactions due to serialization failures.
UPDATE, DELETE, SELECT FOR UPDATE, and SELECT FOR SHARE commands behave the same as SELECT in terms of searching for target rows: they will only find target rows that were committed as of the transaction start time. However, such a target row might have already been updated (or deleted or locked) by another concurrent transaction by the time it is found. In this case, the repeatable read transaction will wait for the first updating transaction to commit or roll back (if it is still in progress). If the first updater rolls back, then its effects are negated and the repeatable read transaction can proceed with updating the originally found row. But if the first updater commits (and actually updated or deleted the row, not just locked it) then the repeatable read transaction will be rolled back with the message
ERROR: could not serialize access due to concurrent update because a repeatable read transaction cannot modify or lock rows changed by other transactions after the repeatable read transaction began.
When an application receives this error message, it should abort the current transaction and retry the whole transaction from the beginning. The second time through, the transaction will see the previously-committed change as part of its initial view of the database, so there is no logical conflict in using the new version of the row as the starting point for the new transaction's update.
When such an error is detected, you have to run ROLLBACK
, and try again.