9

In Postgres, is it inefficient to perform an "INSERT... ON CONFLICT DO NOTHING" if >99.9% of the time the record exists and nothing is done?

E.g. let’s say I have a Node process which fetches 100k records which map to rows, of which 99,980 already exist. I could:

  • Insert them all, but do nothing on uniqueness constraint violation
  • Insert where not exists query
  • Select all, do de-duplication "client" side, then insert only the 20 new ones

The first approach is easiest but I wonder if it will be too inefficient for Postgres. I have read that it increments serial numbers in such a "do nothing" case, but would it be OK if I skipped using a serial and used my unique field as the primary (string) key instead?

asked Sep 17, 2021 at 4:51
4
  • Could you perhaps show us your table structure and even some sample data? Commented Sep 17, 2021 at 9:33
  • The two key fields are "ecosystem" and "package", which are unique together. Then there would be an "id" column which would either be an auto-incrementing serial, or a concatenation of ${ecosystem}-${package}. Commented Sep 17, 2021 at 14:08
  • 1
    Thanks for that - but it's best practice to put any relevant information into the question and then use the comments to inform the person who requested it. That way, the question remains the single source of truth. Also, there's a certain amount of cogntive dissonance in having to scroll down through the comments and then back up to the question itself. The actual table DDL would be great! :-) Commented Sep 17, 2021 at 14:13
  • 1
    So basically you have to compare 100k rows that are in the database against 100k rows that are client-side? So you have 100k rows that need to go through the network no matter what method you use? Commented Sep 17, 2021 at 14:28

1 Answer 1

4

INSERT ... ON CONFLICT is as efficient as it can be, but the question is beside the point.

It may well be that an UPDATE is faster, but that UPDATE wouldn't do the right thing for 0.1% of the rows, so it is not a solution.

Note: it is always possible to be faster if you don't have to be correct.

answered Sep 19, 2021 at 4:36

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.