We have a table in the production DB that looks like this:
create table domains
(
domain_id serial not null
constraint domains_pkey
primary key,
domain_name text not null
constraint domains_domain_name_key
unique
)
For some reason, we've got duplicates in that table, even though, in theory, this is impossible.
Here's a query that I use to find these duplicates:
SELECT md5(t1.domain_name), md5(t2.domain_name)
FROM domains t1
JOIN domains t2
ON t1.domain_name = t2.domain_name
WHERE t1.domain_id != t2.domain_id;
It returns 50 records, and there are over 7M total records in the table.
I understand how to get rid of them. However, I would really like to know how to prevent new duplicates and why this happened?
PostgreSQL version: PostgreSQL 11.7 (Debian 11.7-0+deb10u1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
-
1Did you verify that the unique constraint is really still there? Maybe someone dropped it? Can you detect how old those duplicates are? Did you get any OS upgrades recenly? Especially if libc is updated, this could mean changes to the collation code which could be a reason for this.user1822– user18222020年05月28日 12:11:56 +00:00Commented May 28, 2020 at 12:11
-
1. Yeah, the constraint and the index are there 2. The first duplicate appeared in 2018, there are some that appeared yesterday. 3. There were a lot of updates since 2018user578468– user5784682020年05月28日 12:13:22 +00:00Commented May 28, 2020 at 12:13
-
Regarding the libc update: I am referring to something like this: simply.name/pg-lc-collate.htmluser1822– user18222020年05月28日 12:14:42 +00:00Commented May 28, 2020 at 12:14
-
2If glibc version x has a different understanding on how to compare strings than version x + 1, then it could happen that a row created with x + 1 doesn't detect the string created with x as being the same. Here is another reference postgresql.verite.pro/blog/2018/08/27/glibc-upgrade.html and this: wiki.postgresql.org/wiki/Locale_data_changesuser1822– user18222020年05月28日 12:27:58 +00:00Commented May 28, 2020 at 12:27
1 Answer 1
It seems that the problem is caused by the glibc update issue that @a_horse_with_no_name highlighted in the comments: https://postgresql.verite.pro/blog/2018/08/27/glibc-upgrade.html
I suppose it would be enough to get rid of the duplicates and reindex that table, there will be no new duplicates after that.