3

Recently I learnt that serializable isolation level in Postgres is based on optimistic locking, hence theoretically can be performed fully concurrently as long as transactions do not interfere with each other (e.g., not doing read-then-write operations on the same rows). However, in practice, algorithm for detecting such interferences may produce false positives. As written in the docs, row-level locks can be promoted to page-level locks, if it's preferred from resources usage point of view.

This increases chances of getting serialization error. For example, when I try to update two different rows concurrently, and it turns out that those two rows are stored on the same page and that both transactions acquired page-level lock, the one that commits later will get serialization error.

I was trying to address this by increasing max_pred_locks_per_transaction, max_pred_locks_per_relation and max_pred_locks_per_page, but no matter how big those values are, I still get the error.

For instance, let's take a look at an example that simulates 1k concurrent, independent money transfer operations.

With the following config:

enable_seqscan = off
max_locks_per_transaction = 4192
max_pred_locks_per_transaction = 4192
max_pred_locks_per_relation = 4192
max_pred_locks_per_page = 4192

Having the following table:

 create table if not exists accounts (
 id bigserial primary key,
 balance numeric(9, 2) not null
 );

When I execute the following queries:

 session #1> begin isolation level serializable;
 session #2> begin isolation level serializable;
 session #1> select id, balance from accounts where id in (select n from generate_series(1,1000,4) n); -- select from accts with id=1, 5, 9, ..., 997
 session #1> select id, balance from accounts where id in (select n+1 from generate_series(1,1000,4) n); --- select from accts with id=2, 6, 10, ..., 998
 session #2> select id, balance from accounts where id in (select n from generate_series(3,1000,4) n); --- select from accts with id=3, 7, 11, ..., 999
 session #2> select id, balance from accounts where id in (select n+1 from generate_series(3,1000,4) n); --- select from accts with id=4, 8, 12, ..., 1000
 session #3> select locktype, count(*) from pg_locks group by locktype;
 session #1> update accounts set balance = 50 where id in (select n from generate_series(1,1000,4) n);
 session #1> update accounts set balance = 50 where id in (select n+1 from generate_series(1,1000,4) n);
 session #2> update accounts set balance = 50 where id in (select n from generate_series(3,1000,4) n);
 session #2> update accounts set balance = 50 where id in (select n+1 from generate_series(3,1000,4) n);
 session #1> commit;
 session #2> commit;

Commit of transaction in session #2 gets rejected:

 ERROR: could not serialize access due to read/write dependencies among transactions
 DETAIL: Reason code: Canceled on identification as a pivot, during commit attempt.
 HINT: The transaction might succeed if retried.

And select on pg_locks (executed in the middle on session #3) returns the following results:

 developer=# select locktype, count(*) from pg_locks group by locktype;
 locktype | count 
 ---------------+-------
 page | 14
 virtualxid | 3
 tuple | 1750
 transactionid | 1
 relation | 7
 (5 rows)

There are 14 page pred locks, even though only 1750 tuple pred locks were acquired, meaning there was still room to allocate more tuple-level locks.

I understand that in certain cases, tuple lock got promoted to page lock, and as a database user, I must be prepared to retry such transactions. Nonetheless it increases response time, and I'm wondering if it's possible to somehow setup the DB so that, for instance, in case of 1k concurrent updates, the DB would still use tuple-level locks and not go for page-level locks. Is it required to adjust some other configurations to achieve that? Thanks in advance!

asked Jun 16 at 15:11

1 Answer 1

1

You forgot to add the statements that populate accounts to make your example reproducible.

However, I suspect that the problem may be that your SELECT and/or UPDATE statements perform a sequential scan or an index scan on accounts. The documentation says the following about sequential scans:

A sequential scan will always necessitate a relation-level predicate lock. This can result in an increased rate of serialization failures. It may be helpful to encourage the use of index scans by reducing random_page_cost and/or increasing cpu_tuple_cost.

But it need not be a sequential scan that is your problem. If you read a row using an index scan, the predicate lock taken will lock the index page, so that PostgreSQL can detect if another transaction inserts a row that might match the first query's predicate. Your example looks tailored to fail according to these lines.

You will have to investigate with EXPLAIN to understand what exactly causes the serialization error.

I assume that your example is artificial. For your real workload, I can give you the following recommendations:

  • don't read or modify more rows in a single transaction than is absolutely necessary for consistency

  • you can reduce the likelihood of false positive serialization errors caused by predicate locks on an index page by creating the index with a lower fillfactor — but that is by no means for free, because the index will become less efficient

answered Jun 17 at 6:22
5
  • Yeah I forgot about mentioning my INSERT statements. Nonetheless, when I run the first SELECT query with EXPLAIN ANALYZE, I get the following information: Index Scan using accounts_pkey on accounts (cost=0.28..54.66 rows=1000 width=9) (actual time=0.607..5.415 rows=1000 loops=1), which means that Index Scan was used, and when I look into pg_locks table concurrently in different session, I can see that there were 7 page pred locks acquired by that select, where I prefer row locks. My example is artificial, but same thing happens when I try to run 1k concurrent transfers between 2 accounts. Commented Jun 20 at 10:13
  • I could address this by optimistic locking based on some version column and even read committed isolation level in this case, but I was just wondering if it's doable and performant enough by using serialziation isolation level, which is also based on optimistic locking. Commented Jun 20 at 10:16
  • The bigger the table gets, the less likely you are to hit the same index page in both transactions, particularly if the transactions are short. Commented Jun 20 at 10:31
  • Actually, I just learnt that in this case we can apply repeatable read. It prevents lost updates if we perform update the same rows in concurrent transactions. Commented Jun 25 at 14:31
  • I had wondered if I should suggest that. But your mix of SELECT and UPDATE didn't look like serializability can be guaranteed with REPEATABLE READ. If that isolation level's guarantees are good enough for your use case, great. Commented Jun 25 at 15:07

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.