I have a table with many duplicate records . My table looks like this:
ID |First_Name | Last_Name |City
1 | Alan | Smith |Los Angeles
2 | Alan | Smith |
3 | Alan | Smith |New York
4 | Alan | Smith |
5 | Alan | Smith |Houston
I can find duplicate records with this query:
SELECT First_Name, Last_Name, COUNT(*)
FROM tab_Name
GROUP BY First_Name, Last_Name
HAVING COUNT(*) > 1
I'd like to remove duplicate records only under following 2 conditions:
- The First_Name and Last_Name are the same
- the city column is empty(null).
What's the correct way to achieve this? Please let me know if more explanation is needed. Thank you!
1 Answer 1
I think this is what you want...
I assume your table is called t
:
CREATE TABLE t
(
`ID` int PRIMARY KEY,
`First_Name` varchar(50),
`Last_Name` varchar(50),
`City` varchar(50)
) ;
INSERT INTO t
(`ID`, `First_Name`, `Last_Name`, `City`)
VALUES
(1, 'Alan', 'Smith', 'Los Angeles'),
(2, 'Alan', 'Smith', NULL),
(3, 'Alan', 'Smith', 'New York'),
(4, 'Alan', 'Smith', NULL),
(5, 'Alan', 'Smith', 'Houston')
;
This query DELETE
s the rows you don't want...
DELETE
FROM
t
WHERE
t.id IN
(
SELECT
*
FROM
(
SELECT
id
FROM
t AS t1
WHERE
City IS NULL
AND
EXISTS
(
SELECT 1
FROM t AS t2
WHERE t2.`First_Name` = t1.`First_Name`
AND t2.`Last_Name` = t1.`Last_Name`
AND t2.`City` IS NOT NULL
AND t2.`ID` <> t1.`ID`
)
) AS s0
) ;
SELECT * FROM t
ID | First_Name | Last_Name | City
-: | :--------- | :-------- | :---------- 1 | Alan | Smith | Los Angeles 3 | Alan | Smith | New York 5 | Alan | Smith | Houston
dbfiddle here
--
NOTEs:
The need of a subquery within of a subquery is a limitation of MySQL (most SQL databases don't need it). See You can't specify target table 'NAME' for update in FROM clause
I've assumed that you wouldn't want to delete any row if there is not one with a City. I.e.: if you have
(15, 'John', 'Doe', NULL)
and(16, 'John', 'Doe', NULL)
you leave both on the table.
(15, 'John', 'Doe', null)
and(16, 'John', 'Doe', null)
? Would you delete both of them? None? One? Which one?