2

We have a table in the production DB that looks like this:

create table domains
(
 domain_id serial not null
 constraint domains_pkey
 primary key,
 domain_name text not null
 constraint domains_domain_name_key
 unique
)

For some reason, we've got duplicates in that table, even though, in theory, this is impossible.

Here's a query that I use to find these duplicates:

SELECT md5(t1.domain_name), md5(t2.domain_name)
 FROM domains t1
 JOIN domains t2
 ON t1.domain_name = t2.domain_name
 WHERE t1.domain_id != t2.domain_id;

It returns 50 records, and there are over 7M total records in the table.

I understand how to get rid of them. However, I would really like to know how to prevent new duplicates and why this happened?

PostgreSQL version: PostgreSQL 11.7 (Debian 11.7-0+deb10u1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit

asked May 28, 2020 at 12:00
4
  • 1
    Did you verify that the unique constraint is really still there? Maybe someone dropped it? Can you detect how old those duplicates are? Did you get any OS upgrades recenly? Especially if libc is updated, this could mean changes to the collation code which could be a reason for this. Commented May 28, 2020 at 12:11
  • 1. Yeah, the constraint and the index are there 2. The first duplicate appeared in 2018, there are some that appeared yesterday. 3. There were a lot of updates since 2018 Commented May 28, 2020 at 12:13
  • Regarding the libc update: I am referring to something like this: simply.name/pg-lc-collate.html Commented May 28, 2020 at 12:14
  • 2
    If glibc version x has a different understanding on how to compare strings than version x + 1, then it could happen that a row created with x + 1 doesn't detect the string created with x as being the same. Here is another reference postgresql.verite.pro/blog/2018/08/27/glibc-upgrade.html and this: wiki.postgresql.org/wiki/Locale_data_changes Commented May 28, 2020 at 12:27

1 Answer 1

1

It seems that the problem is caused by the glibc update issue that @a_horse_with_no_name highlighted in the comments: https://postgresql.verite.pro/blog/2018/08/27/glibc-upgrade.html

I suppose it would be enough to get rid of the duplicates and reindex that table, there will be no new duplicates after that.

answered May 28, 2020 at 13:04

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.