How do I implement insert-if-not-found for transactions at serializable isolation level?

Question 1

I'm having a hard time figuring out how to exactly implement a 'insert if not found' function. Consider the following.

We have a table called artist with 2 columns, (name, id) where name is the unique and id is a serial primary key. It's a contrived example, but it illustrates my problem:

 SESSION A SESSION B
1. SELECT id FROM artist
 WHERE name = 'Bob';
2. INSERT INTO artist (name)
 VALUES ('Bob')
3. INSERT INTO artist (name)
 VALUES ('Bob')
4. code that users 'Bob'
 (e.g., a FK to Bob's ID)
5. ??? Bob already exists, but we
 can't find it
4. COMMIT

Session B begins by trying to find an artist called Bob, which fails. However, Session A then creates Bob. Session B tries to insert an artist called Bob, which fails as it violates the primary key. But here's the bit I don't get -- if I change operation 3 to be a select on artist the table is still empty! This is because I'm using the serializable isolation level, but how can I handle this case?

It seems the only option I have is to abort the entire transaction and try again. If this is the case, should I throw my own 'could not serialize' exception, indicating the application should retry? I already wanted this 'find-or-insert' in a plpgsql function, where I would INSERT, and if that failed SELECT but it seems impossible to find the conflicting row...

Question 2

PostgreSQL version? If you're on 9.1 or above you should be getting serialization failures from that code, so I'm guessing you're on 9.0 or 8.4.

Question 3

@CraigRinger 9.2 actually.

Question 4

Weird. I'd expect you to be getting serialization failures from that code if you're really in the SERIALIZABLE isolation level.

Question 5

@CraigRinger as would I, but I don't. Are you able to try it? I'm curious if I'm just not setting up transactions correctly. (I'm doing BEGIN ISOLATION LEVEL SERIALIZABLE inside a 2 psql shells).

Question 6

I remain puzzled as to why this isn't aborting with a serialization failure, so I've posted on pgsql-general. Here's the thread (link might be a 404 initially, but will go live when the archiver catches up): archives.postgresql.org/message-id/…

Question 7

This is a bit of a FAQ. You'd find more information if you searched for ON DUPLICATE KEY UPDATE (the MySQL syntax), MERGE (the SQL-standard syntax), or UPSERT. It's surprisingly hard.

The best article I've seen on it yet is Depesz's "why is upsert so complicated". There's also the SO question Insert, on duplicate update (postgresql) which has suggestions but lacks explanation and discussion of the issues.

The short answer is that, yes:

It seems the only option I have is to abort the entire transaction and try again.

When using SERIALIZABLE transactions you just have to re-issue them when they fail. Which they will. By design - and much more frequently on Pg 9.1 and above because of greatly improved conflict detection. Upsert-like operations are very high conflict, so you may land up retrying quite a bit. If you can do your upserts in READ COMMITTED transactions instead it'll help, but you should still be prepared to retry because there are some unavoidable race conditions.

Let the transaction fail with a unique violation when you insert the conflicting row. If you get a SQLSTATE 23505 unique_violation failure from the transaction and you know you were attempting an upsert, re-try it. If you get a SQLSTATE 40001 serialization_failure you should also retry.

You fundamentally cannot do that retry within a PL/PgSQL function (without dirty hacks like dblink), it must be application side. If PostgreSQL had stored procedures with autonomous transactions then it'd be possible, but it doesn't. In READ COMMITTED mode you can check for conflicting inserts made since the transaction started, but not after the statement that calls the PL/PgSQL function started, so even in READ COMMITTED your "detect conflict with select" approach simply will not work.

Read depesz's article for a much better and more detailed explanation.

Question 8

Oh, I am completely aware of that article - that's why I said that I wanted to use a PLPGSQL function. But I think I was expecting a serialization_failure, not a unique_violation. I will investigate more though about the READ COMMITED stuff - that is something I wasn't aware of! Oh, also - if I have to handle this at the application level, is there a way to silence this stuff from logs, or will my PG log contain these (rare) unique violations?

Question 9

@ocharles AFAIK there's no way to filter the logs as they're written, but you can add a log_line_prefix in postgresql.conf that logs the SQLSTATE so you can filter them out in log searches / post-processing.

Question 10

On most applications I have worked with this is something that is possible but rarely occurs with good transaction management. I strongly advise that transactions not be open during communications with the user. In most cases, this results in sub-second transaction times. Optimistic locking is your friend here.

Your transactions now becomes:

Users A and B search for Bob and don't find him.
Both A and B try to add Bob.
B's add arrives first and is committed.
A's add arrives later and is dealt with appropriately (design decision).
Both A and B can find Bob.

There is a chance of a race condition if both A and B submit their adds at the same time, but in practice this is highly unlikely. Depending on workflow, updates are more likley to encounter this problem. In this case, the last user to submit usually gets a data update by other user type error. If the get back the updated data, they can retry the update if appropriate. In cases where the second updates does not conflict, the it can be silently skipped or its changes applied as appropriate.

Long running transactions can cause data inconsistencies.

Question 11

Thanks Bill, but I already do this. My transactions are limited to running for around 50ms, but I am a purist and know that if something can go wrong it ultimately will. Probably when I'm not around to provide fixes! So I don't think this is a problem with transaction size, as I'm already managing that.

Question 12

@ocharles Always good when corner cases are handled. I've had more problem with deadlocks on parent-child updates. Its important to always lock in the same order.

Craig Ringer Craig Ringer 57.9k6 gold badges162 silver badges194 bronze badges · Accepted Answer · 2012-10-14 01:12:27Z

This is a bit of a FAQ. You'd find more information if you searched for ON DUPLICATE KEY UPDATE (the MySQL syntax), MERGE (the SQL-standard syntax), or UPSERT. It's surprisingly hard.

The best article I've seen on it yet is Depesz's "why is upsert so complicated". There's also the SO question Insert, on duplicate update (postgresql) which has suggestions but lacks explanation and discussion of the issues.

The short answer is that, yes:

It seems the only option I have is to abort the entire transaction and try again.

When using SERIALIZABLE transactions you just have to re-issue them when they fail. Which they will. By design - and much more frequently on Pg 9.1 and above because of greatly improved conflict detection. Upsert-like operations are very high conflict, so you may land up retrying quite a bit. If you can do your upserts in READ COMMITTED transactions instead it'll help, but you should still be prepared to retry because there are some unavoidable race conditions.

Let the transaction fail with a unique violation when you insert the conflicting row. If you get a SQLSTATE 23505 unique_violation failure from the transaction and you know you were attempting an upsert, re-try it. If you get a SQLSTATE 40001 serialization_failure you should also retry.

You fundamentally cannot do that retry within a PL/PgSQL function (without dirty hacks like dblink), it must be application side. If PostgreSQL had stored procedures with autonomous transactions then it'd be possible, but it doesn't. In READ COMMITTED mode you can check for conflicting inserts made since the transaction started, but not after the statement that calls the PL/PgSQL function started, so even in READ COMMITTED your "detect conflict with select" approach simply will not work.

Read depesz's article for a much better and more detailed explanation.

Oh, I am completely aware of that article - that's why I said that I wanted to use a PLPGSQL function. But I think I was expecting a serialization_failure, not a unique_violation. I will investigate more though about the READ COMMITED stuff - that is something I wasn't aware of! Oh, also - if I have to handle this at the application level, is there a way to silence this stuff from logs, or will my PG log contain these (rare) unique violations?
@ocharles AFAIK there's no way to filter the logs as they're written, but you can add a log_line_prefix in postgresql.conf that logs the SQLSTATE so you can filter them out in log searches / post-processing.

Stack Exchange Network

How do I implement insert-if-not-found for transactions at serializable isolation level?

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

How do I implement insert-if-not-found for transactions at serializable isolation level?

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions