Optimizing concurrent updates in Postgres

Question 1

I'm running concurrent Postgres queries like this:

UPDATE foo SET bar = bar + 1 WHERE baz = 1234

Each query affects the fixed K number of rows, and I can't find a way to enforce the order in which the rows are updated, I end up with deadlocks. Currently I fix the problem by enforcing the order by hand, but this means I have to execute many more queries than I normally would while also raising the search complexity from O(log N + K) to O(K log N).

Is there a way to improve performance without ending up vulnerable to deadlocks? I suspect that replacing the (baz) index with the (baz, id) index might work provided that Postgres updates the rows in the same order that it have scanned them, is this an approach worth pursuing?

Question 2

There is no ORDER BY in an SQL UPDATE command. Postgres updates rows in arbitrary order. See:

UPDATE with ORDER BY

To avoid deadlocks with absolute certainty, you could run your statements in serializable transaction isolation. But that's more expensive and you need to prepare to repeat commands after a serialization failure.

Your best course of action is probably to lock explicitly with SELECT ... ORDER BY ... FOR UPDATE in a subquery, CTE, or a standalone SELECT in a transaction - in default READ COMMITTED isolation level. Quoting Tom Lane on pgsql-general:

Should be all right --- the FOR UPDATE locking is always the last step in the SELECT pipeline.

This should do the job:

BEGIN;
SELECT 1
FROM foo 
WHERE baz = 1234
ORDER BY bar
FOR UPDATE;
UPDATE foo
SET bar = bar + 1
WHERE baz = 1234;
COMMIT;

A multicolumn index on (baz, bar) might be perfect for performance. But since bar is obviously updated a lot, a single-column index on just (baz) might be better overall. Depends on a couple of factors. How many rows per baz? Are HOT updates possible without the multicolumn index? ...

If baz is updated concurrently, there is still an unlikely corner case chance for conflicts. The manual:

It is possible for a SELECT command running at the READ COMMITTED transaction isolation level and using ORDER BY and a locking clause to return rows out of order. ...

Also, if you should have a unique constraint involving bar, consider a DEFERRABLE constraint to avoid unique violations within the same command. Related answer:

Constraint defined DEFERRABLE INITIALLY IMMEDIATE is still DEFERRED?

Question 3

IF I'm ordering by id or some other unique column instead of bar, there shouldn't be a corner case or a performance hit, right?

Question 4

@AlexeiAverchenko: Yes, a unique column that is never updated would be perfect for this - and a multicolumn index inlcluding this column in second position.

score 16 · Accepted Answer · 2014-06-17 12:12:15Z

There is no ORDER BY in an SQL UPDATE command. Postgres updates rows in arbitrary order. See:

UPDATE with ORDER BY

To avoid deadlocks with absolute certainty, you could run your statements in serializable transaction isolation. But that's more expensive and you need to prepare to repeat commands after a serialization failure.

Your best course of action is probably to lock explicitly with SELECT ... ORDER BY ... FOR UPDATE in a subquery, CTE, or a standalone SELECT in a transaction - in default READ COMMITTED isolation level. Quoting Tom Lane on pgsql-general:

Should be all right --- the FOR UPDATE locking is always the last step in the SELECT pipeline.

This should do the job:

BEGIN;
SELECT 1
FROM foo 
WHERE baz = 1234
ORDER BY bar
FOR UPDATE;
UPDATE foo
SET bar = bar + 1
WHERE baz = 1234;
COMMIT;

A multicolumn index on (baz, bar) might be perfect for performance. But since bar is obviously updated a lot, a single-column index on just (baz) might be better overall. Depends on a couple of factors. How many rows per baz? Are HOT updates possible without the multicolumn index? ...

If baz is updated concurrently, there is still an unlikely corner case chance for conflicts. The manual:

It is possible for a SELECT command running at the READ COMMITTED transaction isolation level and using ORDER BY and a locking clause to return rows out of order. ...

Also, if you should have a unique constraint involving bar, consider a DEFERRABLE constraint to avoid unique violations within the same command. Related answer:

Constraint defined DEFERRABLE INITIALLY IMMEDIATE is still DEFERRED?

IF I'm ordering by id or some other unique column instead of bar, there shouldn't be a corner case or a performance hit, right?
@AlexeiAverchenko: Yes, a unique column that is never updated would be perfect for this - and a multicolumn index inlcluding this column in second position.

Stack Exchange Network

Optimizing concurrent updates in Postgres

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions