I have unique index on column for UPSERT. When I am trying to update column using v = v + 1
expression my unique index breaks.
SQL
CREATE TABLE test(v bigint, data jsonb DEFAULT '{}'::jsonb);
INSERT INTO test(v) SELECT vv FROM generate_series(0, 10000) as vv;
CREATE UNIQUE INDEX uniq_ind ON test(v);
UPDATE test SET v = v + 1;
What I've tried:
Using deferred constraint, but it doesn't work with UPSERT.
Using
CLUSTER
to order rows on disk, so update will have specific order and won't break index. Problem is I have to callCLUSTER
before each query which is very expensive.Implementing UPSERT by hand seems complicated and non performant. (postgres wiki agrees with me on it https://wiki.postgresql.org/wiki/UPSERT#PostgreSQL_.28today.29)
Using multicolumn unique index
(v, flag)
. For this I need to add flag column and replace indexALTER TABLE test ADD COLUMN flag bool DEFAULT false; CREATE UNIQUE INDEX uniq_ind ON test(v, flag);
Then UPDATE and UPSERT looks like
-- UPDATE UPDATE test SET v = v + 1, flag = true; UPDATE test SET flag = false; --UPSERT INSERT INTO test(v) VALUES (123) ON CONFLICT (v, flag) DO UPDATE SET v = EXCLUDED.v;
But it has x2 cost compared to simple update. For now it's most suitable solution.
What is alternatives for UPDATE case or for UPSERT case so I can:
- Efficiently UPSERT rows. (this operation is dominant)
- Update many records with expressions like
v = v + 1
.
Scale is around 1k-10k rows per update. And around 1m-10m records in the table.
1 Answer 1
I second Laurenz' comment: typically it's best to avoid such an update on a UNIQUE
column to begin with.
If that's not possible, one workaround would be to order rows in a subquery and self-join:
UPDATE test t
SET v = t.v + 1
FROM (SELECT * FROM test ORDER BY v DESC) t_ordered -- additional WHERE clauses?
WHERE t_ordered.v = t.v;
db<>fiddle here
This usually works. But no rows are locked in the subquery, so the command is not bullet-proof against concurrent writes. If you want that, you'll have to write-lock the whole table, or use SERIALIZABLE
transaction isolation. Either is expensive for concurrent access.
Related:
Also, you mentioned:
1k-10k rows per update. And around 1m-10 rows in the table.
So the UPDATE
can still conflict with rows that are not updated.
To update the whole table, consider dropping the UNIQUE
constraint before the update and recreate it after, in the same transaction. That takes an exclusive write-lock, of course. But that seems ok while updating the whole table. Recreating the index is cheaper than incrementally updating all rows anyway, and you get a pristine (de-bloated, reindexed) unique index as side effect.
Conflicts with FK constraints pointing to the UNIQUE
column, though ...
-
I've tried to simplify example. Actually I use row and col for storing data for excel like table cells. So real structure is something like
cells(row int, col int, sheet_id uuid, data text)
. And UNIQUE index on(col, row, sheet_id)
. Also I lazily write cells, so I am using UPSERT for thisOlleggerr– Olleggerr2021年09月14日 11:49:08 +00:00Commented Sep 14, 2021 at 11:49
Explore related questions
See similar questions with these tags.
v
this way? What is it tracking?v
on select.row
andcol
for storing data for excel like table cells.