Locate multiple duplicate columns in PostgreSQL table

Question 1

I am trying to report on duplicate records in a single table which has a unique key of app_cao_number. The duplicates occur if either: 1. The Passport field is duplicated; 2. The ID field is duplicated, or; 3. The Surname+FirstName are duplicated.

I can do this easily enough with three passes of the table using ORDER BY. But I am hoping to use a single SELECT statement, with subqueries, to do the job.

Starting with just finding duplicate IDs I have the following statement:

SELECT app_cao_number, app_id,
 (SELECT app_id FROM people p2 
 WHERE p2.app_id IS NOT null 
 AND p2.app_id <> ''
 AND p1.app_cao_number <> p2.app_cao_number 
 AND p1.app_id = p2.app_id 
 GROUP BY p2.app_id) AS DupId
FROM people p1
WHERE app_id IS NOT null
AND app_id <> ''

This appears to get me the results that I want, but also include rows that have a null DupId - despite my attempts to ignore blank and null values in the SELECT statement. Once this works I should be able to expand it to include the passport and name checks.

Please can someone explain why I have the following data output with nulls in the DupId column? Thank you.

enter image description here

Further: I thought it might be the GROUP BY clause, but I replaced it with a DISTINCT clause (below), but this gave the same result.

(SELECT DISTINCT p2.app_id FROM people p2 
 WHERE p2.app_id IS NOT null 
 AND p2.app_id <> ''
 AND p1.app_cao_number <> p2.app_cao_number 
 AND p1.app_id = p2.app_id 
 ) AS DupId

UPDATE

sample fiddle

Question 2

Look for the model - does you need something like this?

fiddle

create table test (id int, value1 int, value2 int)

✓

insert into test values
(1,11,21),
(2,12,22),
(3,13,23),
(4,14,24),
(5,12,24),
(6,16,26),
(7,17,24),
(8,18,28)

8 rows affected

select t1.id id, 
 t2.id dup_id,
 case when t1.value1 = t2.value1 then 'value 1'
 when t1.value2 = t2.value2 then 'value 2'
 else 'some error'
 end dup_field,
 case when t1.value1 = t2.value1 then t1.value1 :: text
 when t1.value2 = t2.value2 then t1.value2 :: text
 else 'some error'
 end dup_value
from test t1, test t2
where t1.id < t2.id
and ( t1.value1 = t2.value1
 or
 t1.value2 = t2.value2 )

id | dup_id | dup_field | dup_value
-: | -----: | :-------- | :--------
 2 | 5 | value 1 | 12 
 4 | 5 | value 2 | 24 
 4 | 7 | value 2 | 24 
 5 | 7 | value 2 | 24

Question 3

Thanks Akina. I see what you mean about fiddle now - I've never used/seen that before. I'll try to use it in future. I've tried your solution - but cut it down to simplify it - and it still produces a lot of columns that are empty. Here is my code:

select t1.app_id id, t2.app_id dup_id, case when t1.app_id = t2.app_id then 'value 1' else 'some error' end dup_field from sa_appl_contacts t1, sa_appl_contacts t2 where t1.app_cao_number < t2.app_cao_number and ( t1.app_id = t2.app_id)

Question 4

@PaulPritchard Create a fiddle with YOUR sample data and add the link into your question text accompanied by desured result for that data. Here is my code If you need to search for duplicates in one field only then you do not need in CASE at all... it still produces a lot of columns that are empty Your code cannot give NULL in any field. While creating fiddle try to use example data which reproduces empty fields output for your query.

Question 5

Hi @Akina, here is my fiddle: dbfiddle.uk/… This now produces 'twice' the result I'm looking for. I haven't worried about the nulls here yet, I just want to get the output right.

Question 6

on further looking at my example I see that it is adding a row for each duplicate now. I just need a single row.

Question 7

@PaulPritchard here is my fiddle It seems that you "want strange". Why you output t1.passport ppt, t2.passport dup_ppt when you set that t1.passport = t2.passport in WHERE??? They're always equal... This now produces 'twice' the result I'm looking for. Be more precise - WHAT value in WHAT field do you tell about?

Question 8

If the subquery in the SELECT list returns no result, you will get a NULL value. You seem to expect that that would result in that result row from being excluded, but that is not the case.

What about a simple query like

SELECT app_id, count(app_cao_number)
FROM people
GROUP BY app_id HAVING count(app_cao_number) > 1;

Question 9

Hi again @Laurenz. You helped me before recently. This is such a simple, elegant answer and works perfectly for the duplicate app_id problem. Thank you. However. Extending this to find duplicates in the password column doesn't look very easy. I've searched for multiple GROUP BY clauses and found nothing. But at least doing three passes over the data using this command will be very quick.

Akina Akina 20.8k2 gold badges20 silver badges22 bronze badges · Accepted Answer · 2019-12-06 07:47:58Z

0

Look for the model - does you need something like this?

fiddle

create table test (id int, value1 int, value2 int)

✓

insert into test values
(1,11,21),
(2,12,22),
(3,13,23),
(4,14,24),
(5,12,24),
(6,16,26),
(7,17,24),
(8,18,28)

8 rows affected

select t1.id id, 
 t2.id dup_id,
 case when t1.value1 = t2.value1 then 'value 1'
 when t1.value2 = t2.value2 then 'value 2'
 else 'some error'
 end dup_field,
 case when t1.value1 = t2.value1 then t1.value1 :: text
 when t1.value2 = t2.value2 then t1.value2 :: text
 else 'some error'
 end dup_value
from test t1, test t2
where t1.id < t2.id
and ( t1.value1 = t2.value1
 or
 t1.value2 = t2.value2 )

id | dup_id | dup_field | dup_value
-: | -----: | :-------- | :--------
 2 | 5 | value 1 | 12 
 4 | 5 | value 2 | 24 
 4 | 7 | value 2 | 24 
 5 | 7 | value 2 | 24

Share

Improve this answer

answered Dec 6, 2019 at 7:47

Akina's user avatar

Akina Akina

20.8k2 gold badges20 silver badges22 bronze badges

7

Thanks Akina. I see what you mean about fiddle now - I've never used/seen that before. I'll try to use it in future. I've tried your solution - but cut it down to simplify it - and it still produces a lot of columns that are empty. Here is my code: select t1.app_id id, t2.app_id dup_id, case when t1.app_id = t2.app_id then 'value 1' else 'some error' end dup_field from sa_appl_contacts t1, sa_appl_contacts t2 where t1.app_cao_number < t2.app_cao_number and ( t1.app_id = t2.app_id)

Paul Pritchard
– Paul Pritchard

2019年12月06日 08:13:39 +00:00
Commented Dec 6, 2019 at 8:13
@PaulPritchard Create a fiddle with YOUR sample data and add the link into your question text accompanied by desured result for that data. Here is my code If you need to search for duplicates in one field only then you do not need in CASE at all... it still produces a lot of columns that are empty Your code cannot give NULL in any field. While creating fiddle try to use example data which reproduces empty fields output for your query.

Akina
– Akina

2019年12月06日 08:43:11 +00:00
Commented Dec 6, 2019 at 8:43
Hi @Akina, here is my fiddle: dbfiddle.uk/… This now produces 'twice' the result I'm looking for. I haven't worried about the nulls here yet, I just want to get the output right.

Paul Pritchard
– Paul Pritchard

2019年12月06日 09:11:43 +00:00
Commented Dec 6, 2019 at 9:11
on further looking at my example I see that it is adding a row for each duplicate now. I just need a single row.

Paul Pritchard
– Paul Pritchard

2019年12月06日 09:18:17 +00:00
Commented Dec 6, 2019 at 9:18
@PaulPritchard here is my fiddle It seems that you "want strange". Why you output t1.passport ppt, t2.passport dup_ppt when you set that t1.passport = t2.passport in WHERE??? They're always equal... This now produces 'twice' the result I'm looking for. Be more precise - WHAT value in WHAT field do you tell about?

Akina
– Akina

2019年12月06日 09:28:57 +00:00
Commented Dec 6, 2019 at 9:28

| Show 2 more comments

Stack Exchange Network

Locate multiple duplicate columns in PostgreSQL table

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Locate multiple duplicate columns in PostgreSQL table

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions