Postgres Repeatable Read vs Serializable

Question 1

I'm trying to understand whether the repeatable read isolation level is good enough for my scenario in an application that uses Postgres, but the docs are making it difficult to understand which is best suited.

I have an application that starts a transaction that first reads the current value of a single row based on the primary key of that row, calculates what the new state of that row should be after receiving an event from a message queue, and then updates the state of that row in the database.

Given that there are multiple instances of the application deployed, and that multiple events on the message queue can be received for the same primary key at the same time, it's possible that 2 transactions could attempt to enter the above described transaction at the same time. Is repeatable read isolation good enough for this case, or do I need to consider using serializable? My assumption is that if the first transaction attempts to commit its result to the database whilst the second transaction is in progress, the second transaction will fail with a conflict because it sees that the row it was attempting to update was modified by the first transaction - is this correct?

And a follow-up question: I'd like to understand how much more 'expensive' it is to use serializable transaction isolation vs repeatable read - I want to understand the underlying mechanism of what's going on within postgres - i.e. is there locking going on, and how does that affect performance of other queries running at the same time?

Question 2

"calculates what the new state of that row should be after receiving an event from a message queue" What are the inputs to that calculation?

Question 3

REPEATABLE READ is sufficient for your case. It will by definition prevent a "lost update".

SERIALIZABLE is quite a bit more expensive than REPEATABLE READ, which is the "cheapest" of all isolation levels. More locks will be taken (SI locks that don't block anything, but can cause a transaction to abort), and these locks have to survive a commit. It is impossible to name a figure how much more expensive it will be; that depends on your workload.

Question 4

According to this article, "repeatable read" is not the "cheapest" of the isolation levels, but rather the second cheapest (the cheapest being "read committed" instead): prisma.io/dataguide/postgresql/inserting-and-modifying-data/…

Question 5

This level is different from Read Committed in that a query in a repeatable read transaction sees a snapshot as of the start of the first non-transaction-control statement in the transaction, not as of the start of the current statement within the transaction. And acquiring a snapshot is not for free.

Question 6

Or are you saying that READ COMMITTED creates multiple snapshots within the same transaction, whereas REPEATABLE READ only makes one? (and since each snapshot takes time, READ COMMITTED can thus be slower)

Question 7

Yes, precisely.

Question 8

Ah. It would be nice to have benchmarks to confirm the perf difference, but I can see the rationale behind it. Thanks for mentioning.

Question 9

Why not stick with the default transaction isolation level of read committed which everyone expects and understands best?

Use optimistic locking, for example

update <table>
 set <column> = <new value>
 where id = <id>
 and <column> = <old value>

If that update fails, it means someone else got there before you and you should try again. Depending on your exact use case, there may be optimisations you can make to prevent this from happening a lot.

Or maybe you can make your statement atomic depending on the exact function or expression you are updating your columns with?

update <table>
 set <column> = <expression>(<column>)
 where id = <id>

Question 10

What if UPDATE doesn't fit a given use-case, and a DELETE followed by INSERT needs to be used?

Question 11

SELECT FOR UPDATE

I think this is the perfect solution for your use case. E.g. considering a simpler dummy case of multiple incrementer threads which I believe models well the question:

CREATE TABLE "MyInt" ( i INTEGER NOT NULL )
INSERT INTO "MyInt" VALUES (0)

and then multiple parallel updaters:

SELECT * FROM "MyInt"
// set newI = i + 1 in your server code
UPDATE "MyInt" SET i = ${newI}

As it stands, many updates would be lost. But if you do instead:

BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED
SELECT * FROM "MyInt" FOR UPDATE
// set newI = i + 1 in your code
UPDATE "MyInt" SET i = ${newI}
COMMIT

the updates won't be lost anymore, because the FOR UPDATE locks the row from other SELECT FOR UPDATE until the transaction commits, and other threads just wait. https://www.postgresql.org/docs/13/explicit-locking.html#LOCKING-ROWS documents:

FOR UPDATE causes the rows retrieved by the SELECT statement to be locked as though for update. This prevents them from being locked, modified or deleted by other transactions until the current transaction ends. That is, other transactions that attempt UPDATE, DELETE, SELECT FOR UPDATE, SELECT FOR NO KEY UPDATE, SELECT FOR SHARE or SELECT FOR KEY SHARE of these rows will be blocked until the current transaction ends; conversely, SELECT FOR UPDATE will wait for a concurrent transaction that has run any of those commands on the same row, and will then lock and return the updated row (or no row, if the row was deleted). Within a REPEATABLE READ or SERIALIZABLE transaction, however, an error will be thrown if a row to be locked has changed since the transaction started. For further discussion see Section 13.4.

I have tested this with this test code.

Documentation quote that says that REPEATABLE READ would also be enough

FOR UPDATE + READ COMMITTED is the best approach I think, as it does the job and is a bit faster than REPEATABLE READ on my simple benchmark (4.2s vs 3.2s). But just for completeness, the fact that REPEATABLE READ alone also works is quite clear in the docs, just to confirm what Laurenz said further: https://www.postgresql.org/docs/14/transaction-iso.html#XACT-REPEATABLE-READ

Applications using this level must be prepared to retry transactions due to serialization failures.

UPDATE, DELETE, SELECT FOR UPDATE, and SELECT FOR SHARE commands behave the same as SELECT in terms of searching for target rows: they will only find target rows that were committed as of the transaction start time. However, such a target row might have already been updated (or deleted or locked) by another concurrent transaction by the time it is found. In this case, the repeatable read transaction will wait for the first updating transaction to commit or roll back (if it is still in progress). If the first updater rolls back, then its effects are negated and the repeatable read transaction can proceed with updating the originally found row. But if the first updater commits (and actually updated or deleted the row, not just locked it) then the repeatable read transaction will be rolled back with the message

ERROR: could not serialize access due to concurrent update because a repeatable read transaction cannot modify or lock rows changed by other transactions after the repeatable read transaction began.

When an application receives this error message, it should abort the current transaction and retry the whole transaction from the beginning. The second time through, the transaction will see the previously-committed change as part of its initial view of the database, so there is no logical conflict in using the new version of the row as the starting point for the new transaction's update.

When such an error is detected, you have to run ROLLBACK, and try again.

Laurenz Albe Laurenz Albe 61.9k4 gold badges57 silver badges93 bronze badges · Answer 1 · 2021-02-05 10:12:34Z

7

REPEATABLE READ is sufficient for your case. It will by definition prevent a "lost update".

SERIALIZABLE is quite a bit more expensive than REPEATABLE READ, which is the "cheapest" of all isolation levels. More locks will be taken (SI locks that don't block anything, but can cause a transaction to abort), and these locks have to survive a commit. It is impossible to name a figure how much more expensive it will be; that depends on your workload.

Share

Improve this answer

answered Feb 5, 2021 at 10:12

Laurenz Albe's user avatar

Laurenz Albe Laurenz Albe

61.9k4 gold badges57 silver badges93 bronze badges

11

1

According to this article, "repeatable read" is not the "cheapest" of the isolation levels, but rather the second cheapest (the cheapest being "read committed" instead): prisma.io/dataguide/postgresql/inserting-and-modifying-data/…

Venryx
– Venryx

2022年02月03日 09:19:10 +00:00
Commented Feb 3, 2022 at 9:19
1

This level is different from Read Committed in that a query in a repeatable read transaction sees a snapshot as of the start of the first non-transaction-control statement in the transaction, not as of the start of the current statement within the transaction. And acquiring a snapshot is not for free.

Laurenz Albe
– Laurenz Albe

2022年02月03日 09:41:55 +00:00
Commented Feb 3, 2022 at 9:41
1

Or are you saying that READ COMMITTED creates multiple snapshots within the same transaction, whereas REPEATABLE READ only makes one? (and since each snapshot takes time, READ COMMITTED can thus be slower)

Venryx
– Venryx

2022年02月03日 09:50:01 +00:00
Commented Feb 3, 2022 at 9:50
1

Yes, precisely.

Laurenz Albe
– Laurenz Albe

2022年02月03日 09:51:06 +00:00
Commented Feb 3, 2022 at 9:51
2

Ah. It would be nice to have benchmarks to confirm the perf difference, but I can see the rationale behind it. Thanks for mentioning.

Venryx
– Venryx

2022年02月03日 09:52:22 +00:00
Commented Feb 3, 2022 at 9:52

| Show 6 more comments

Colin 't Hart Colin 't Hart 9,48515 gold badges37 silver badges44 bronze badges · Answer 2 · 2021-02-05 15:06:14Z

Why not stick with the default transaction isolation level of read committed which everyone expects and understands best?

Use optimistic locking, for example

update <table>
 set <column> = <new value>
 where id = <id>
 and <column> = <old value>

If that update fails, it means someone else got there before you and you should try again. Depending on your exact use case, there may be optimisations you can make to prevent this from happening a lot.

Or maybe you can make your statement atomic depending on the exact function or expression you are updating your columns with?

update <table>
 set <column> = <expression>(<column>)
 where id = <id>

What if UPDATE doesn't fit a given use-case, and a DELETE followed by INSERT needs to be used?

Ciro Santilli OurBigBook.com Ciro Santilli OurBigBook.com 1515 bronze badges · Answer 3 · 2021-11-30 17:06:35Z

SELECT FOR UPDATE

I think this is the perfect solution for your use case. E.g. considering a simpler dummy case of multiple incrementer threads which I believe models well the question:

CREATE TABLE "MyInt" ( i INTEGER NOT NULL )
INSERT INTO "MyInt" VALUES (0)

and then multiple parallel updaters:

SELECT * FROM "MyInt"
// set newI = i + 1 in your server code
UPDATE "MyInt" SET i = ${newI}

As it stands, many updates would be lost. But if you do instead:

BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED
SELECT * FROM "MyInt" FOR UPDATE
// set newI = i + 1 in your code
UPDATE "MyInt" SET i = ${newI}
COMMIT

the updates won't be lost anymore, because the FOR UPDATE locks the row from other SELECT FOR UPDATE until the transaction commits, and other threads just wait. https://www.postgresql.org/docs/13/explicit-locking.html#LOCKING-ROWS documents:

FOR UPDATE causes the rows retrieved by the SELECT statement to be locked as though for update. This prevents them from being locked, modified or deleted by other transactions until the current transaction ends. That is, other transactions that attempt UPDATE, DELETE, SELECT FOR UPDATE, SELECT FOR NO KEY UPDATE, SELECT FOR SHARE or SELECT FOR KEY SHARE of these rows will be blocked until the current transaction ends; conversely, SELECT FOR UPDATE will wait for a concurrent transaction that has run any of those commands on the same row, and will then lock and return the updated row (or no row, if the row was deleted). Within a REPEATABLE READ or SERIALIZABLE transaction, however, an error will be thrown if a row to be locked has changed since the transaction started. For further discussion see Section 13.4.

I have tested this with this test code.

Documentation quote that says that REPEATABLE READ would also be enough

FOR UPDATE + READ COMMITTED is the best approach I think, as it does the job and is a bit faster than REPEATABLE READ on my simple benchmark (4.2s vs 3.2s). But just for completeness, the fact that REPEATABLE READ alone also works is quite clear in the docs, just to confirm what Laurenz said further: https://www.postgresql.org/docs/14/transaction-iso.html#XACT-REPEATABLE-READ

Applications using this level must be prepared to retry transactions due to serialization failures.

UPDATE, DELETE, SELECT FOR UPDATE, and SELECT FOR SHARE commands behave the same as SELECT in terms of searching for target rows: they will only find target rows that were committed as of the transaction start time. However, such a target row might have already been updated (or deleted or locked) by another concurrent transaction by the time it is found. In this case, the repeatable read transaction will wait for the first updating transaction to commit or roll back (if it is still in progress). If the first updater rolls back, then its effects are negated and the repeatable read transaction can proceed with updating the originally found row. But if the first updater commits (and actually updated or deleted the row, not just locked it) then the repeatable read transaction will be rolled back with the message

ERROR: could not serialize access due to concurrent update because a repeatable read transaction cannot modify or lock rows changed by other transactions after the repeatable read transaction began.

When an application receives this error message, it should abort the current transaction and retry the whole transaction from the beginning. The second time through, the transaction will see the previously-committed change as part of its initial view of the database, so there is no logical conflict in using the new version of the row as the starting point for the new transaction's update.

When such an error is detected, you have to run ROLLBACK, and try again.

Stack Exchange Network

Postgres Repeatable Read vs Serializable

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Postgres Repeatable Read vs Serializable

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions