I'm trying to insert data from one PostgreSQL 9.3 table to another while removing duplicate entries.
I have two tables containing email addresses.
The main table contains emails and a tag for each email address. The combination (email, tag)
must be unique, and for this there is a constraint Unique(email, tag)
on it.
The second table get created on the fly from text files which contain only the email address. There are many duplicates there.
I need to import the data from the temp table into the main one without breaking the constraint above. For a certain file containing email addresses the tag is constant.
Table structures:
CREATE TABLE emails (
email character varying(128),
tag bigint,
CONSTRAINT "unique-tag-email" UNIQUE (email, tag) )
and
CREATE TABLE emails_temp (email character varying(128)
Here is my query:
insert into emails(tag,email)
select
655,t.email
from
emails_temp as t
where
not exists ( select email from emails where email = t.email )
Note: 655 is just a tag of a certain group of email addresses.
This is the error I get:
ERROR: duplicate key value violates unique constraint "unique-tag-email" SQL state: 23505 Detail: Key (email, tag)=([email protected], 655) already exists.
There are indeed, two email addresses [email protected] in the file.
Useless to say, because of this error, there is nothing added in the main table (email).
What am I doing wrong?
1 Answer 1
There are 3 possible kinds of duplicates:
Duplicates within the rows of the bulk insert.
Duplicates between inserted rows and existing rows.
Duplicates between inserted rows and concurrently inserted / updated rows from other transactions.
Just like I explained in this closely related answer:
But things have become easier for 2. and 3. since Postgres 9.5 introduced UPSERT (INSERT .. ON CONFLICT DO NOTHING
).
INSERT INTO emails(tag,email)
SELECT DISTINCT 655, email
FROM emails_temp
ON CONFLICT (email) DO NOTHING;
If your duplicates only stem from duplicate entries in the source (1.), like you indicated, then all you need is DISTINCT
. Works in any version of Postgres:
INSERT INTO emails(tag,email)
SELECT DISTINCT 655, email
FROM emails_temp;
-
Erwin, thanks for the idea. I read about it too, but I can't upgrade to PSQL 9.5. I am stuck with 9.3 for some other reasons which I can't control.Sorin– Sorin2017年01月10日 09:21:38 +00:00Commented Jan 10, 2017 at 9:21
-
@Sorin: Seems like all you need is
DISTINCT
, really. I added a bit above.Erwin Brandstetter– Erwin Brandstetter2017年01月10日 14:09:32 +00:00Commented Jan 10, 2017 at 14:09 -
1For previous versions without
ON CONFLICT
:INSERT INTO emails (tag,email) SELECT DISTINCT 655, email FROM emails_temp EXCEPT SELECT tag, email FROM emails WHERE tag = 655 ;
ypercubeᵀᴹ– ypercubeᵀᴹ2017年01月10日 15:25:30 +00:00Commented Jan 10, 2017 at 15:25
[email protected], 655
already exists, the constraint is violated and therefore the DML won't succeedinsert ... select distinct 655, t.email ...