Asked 4 years ago

Viewed 6k times

In Postgres, is it inefficient to perform an "INSERT... ON CONFLICT DO NOTHING" if >99.9% of the time the record exists and nothing is done?

E.g. let’s say I have a Node process which fetches 100k records which map to rows, of which 99,980 already exist. I could:

Insert them all, but do nothing on uniqueness constraint violation
Insert where not exists query
Select all, do de-duplication "client" side, then insert only the 20 new ones

The first approach is easiest but I wonder if it will be too inefficient for Postgres. I have read that it increments serial numbers in such a "do nothing" case, but would it be OK if I skipped using a serial and used my unique field as the primary (string) key instead?

Improve this question

asked Sep 17, 2021 at 4:51

rgareth's user avatar

rgareth rgareth

2032 silver badges5 bronze badges

Could you perhaps show us your table structure and even some sample data?

Vérace
– Vérace

2021年09月17日 09:33:41 +00:00
Commented Sep 17, 2021 at 9:33
The two key fields are "ecosystem" and "package", which are unique together. Then there would be an "id" column which would either be an auto-incrementing serial, or a concatenation of ${ecosystem}-${package}.

rgareth
– rgareth

2021年09月17日 14:08:46 +00:00
Commented Sep 17, 2021 at 14:08
1

Thanks for that - but it's best practice to put any relevant information into the question and then use the comments to inform the person who requested it. That way, the question remains the single source of truth. Also, there's a certain amount of cogntive dissonance in having to scroll down through the comments and then back up to the question itself. The actual table DDL would be great! :-)

Vérace
– Vérace

2021年09月17日 14:13:44 +00:00
Commented Sep 17, 2021 at 14:13
1

So basically you have to compare 100k rows that are in the database against 100k rows that are client-side? So you have 100k rows that need to go through the network no matter what method you use?

Daniel Vérité
– Daniel Vérité

2021年09月17日 14:28:15 +00:00
Commented Sep 17, 2021 at 14:28

Add a comment |

1 Answer 1

Sorted by: Reset to default

INSERT ... ON CONFLICT is as efficient as it can be, but the question is beside the point.

It may well be that an UPDATE is faster, but that UPDATE wouldn't do the right thing for 0.1% of the rows, so it is not a solution.

Note: it is always possible to be faster if you don't have to be correct.

Improve this answer

answered Sep 19, 2021 at 4:36

Laurenz Albe's user avatar

Laurenz Albe Laurenz Albe

62k4 gold badges57 silver badges93 bronze badges

Add a comment |

Your Answer

Draft saved

Draft discarded

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

lang-sql

Stack Exchange Network

INSERT ON CONFLICT DO NOTHING efficiency (PostgreSQL)

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

INSERT ON CONFLICT DO NOTHING efficiency (PostgreSQL)

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions