0

I have a table with many duplicate records . My table looks like this:

ID |First_Name | Last_Name |City
1 | Alan | Smith |Los Angeles
2 | Alan | Smith |
3 | Alan | Smith |New York
4 | Alan | Smith |
5 | Alan | Smith |Houston 

I can find duplicate records with this query:

SELECT First_Name, Last_Name, COUNT(*) 
FROM tab_Name 
GROUP BY First_Name, Last_Name 
HAVING COUNT(*) > 1 

I'd like to remove duplicate records only under following 2 conditions:

  • The First_Name and Last_Name are the same
  • the city column is empty(null).

What's the correct way to achieve this? Please let me know if more explanation is needed. Thank you!

asked May 30, 2017 at 22:49
2
  • What would you do if you have, for instance (15, 'John', 'Doe', null) and (16, 'John', 'Doe', null)? Would you delete both of them? None? One? Which one? Commented May 30, 2017 at 22:58
  • Only keep one - the record has the lowest ID number(s). In your example, (15, 'John', 'Doe', null) will be the only one remains in the table. All other duplicate records should be deleted. Thanks! Commented May 30, 2017 at 23:06

1 Answer 1

0

I think this is what you want...

I assume your table is called t:

 CREATE TABLE t
 (
 `ID` int PRIMARY KEY, 
 `First_Name` varchar(50), 
 `Last_Name` varchar(50), 
 `City` varchar(50)
 ) ;
 INSERT INTO t
 (`ID`, `First_Name`, `Last_Name`, `City`)
 VALUES
 (1, 'Alan', 'Smith', 'Los Angeles'),
 (2, 'Alan', 'Smith', NULL),
 (3, 'Alan', 'Smith', 'New York'),
 (4, 'Alan', 'Smith', NULL),
 (5, 'Alan', 'Smith', 'Houston')
 ;

This query DELETEs the rows you don't want...

 DELETE
 FROM
 t
 WHERE
 t.id IN
 (
 SELECT 
 *
 FROM
 (
 SELECT
 id
 FROM
 t AS t1
 WHERE
 City IS NULL
 AND
 EXISTS
 (
 SELECT 1
 FROM t AS t2
 WHERE t2.`First_Name` = t1.`First_Name`
 AND t2.`Last_Name` = t1.`Last_Name`
 AND t2.`City` IS NOT NULL
 AND t2.`ID` <> t1.`ID`
 ) 
 ) AS s0 
 ) ; 
 SELECT * FROM t

 ID | First_Name | Last_Name | City
-: | :--------- | :-------- | :---------- 1 | Alan | Smith | Los Angeles 3 | Alan | Smith | New York 5 | Alan | Smith | Houston

dbfiddle here

--

NOTEs:

  1. The need of a subquery within of a subquery is a limitation of MySQL (most SQL databases don't need it). See You can't specify target table 'NAME' for update in FROM clause

  2. I've assumed that you wouldn't want to delete any row if there is not one with a City. I.e.: if you have (15, 'John', 'Doe', NULL) and (16, 'John', 'Doe', NULL) you leave both on the table.

answered May 30, 2017 at 23:17
0

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.