I'm having trouble writing a query for the table below.
CREATE TABLE `c_example` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`c_id` int(11) DEFAULT NULL,
`guid` varchar(32) DEFAULT NULL,
`provider_id` int(11) DEFAULT NULL,
`test_timestamp` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
) ENGINE=MyISAM AUTO_INCREMENT=5731 DEFAULT CHARSET=utf8;
In this table the data will look like
1,1,313f516ad35e477482aa435ae53c6b3a,1,2016年05月09日 22:53:14
2,1,313f516ad35e477482aa435ae53c6b3a,2,2016年05月09日 22:53:14
3,1,313f516ad35e477482aa435ae53c6b3a,3,2016年05月09日 22:53:14
4,1,f76d2c14a86a42378badd04e4051e569,1,2016年05月10日 22:53:14
5,1,f76d2c14a86a42378badd04e4051e569,2,2016年05月10日 22:53:14
6,1,f76d2c14a86a42378badd04e4051e569,3,2016年05月10日 22:53:14
7,2,c040cecfbff648b2a3ecbd9d0688efec,1,2016年05月11日 22:53:14
8,2,c040cecfbff648b2a3ecbd9d0688efec,2,2016年05月11日 22:53:14
9,2,c040cecfbff648b2a3ecbd9d0688efec,3,2016年05月11日 22:53:14
10,2,79c87ba31df14ec89f0da39e98b6a088,1,2016年05月12日 22:53:14
11,2,79c87ba31df14ec89f0da39e98b6a088,2,2016年05月12日 22:53:14
12,2,79c87ba31df14ec89f0da39e98b6a088,3,2016年05月12日 22:53:14
Now, I want to delete all the rows where the unique combination of c_id, guid that has a count> 1 and delete the ones with the oldest timestamp, so I'm left with.
4,1,f76d2c14a86a42378badd04e4051e569,1,2016年05月10日 22:53:14
5,1,f76d2c14a86a42378badd04e4051e569,2,2016年05月10日 22:53:14
6,1,f76d2c14a86a42378badd04e4051e569,3,2016年05月10日 22:53:14
10,2,79c87ba31df14ec89f0da39e98b6a088,1,2016年05月12日 22:53:14
11,2,79c87ba31df14ec89f0da39e98b6a088,2,2016年05月12日 22:53:14
12,2,79c87ba31df14ec89f0da39e98b6a088,3,2016年05月12日 22:53:14
I'm having trouble coming up with the delete statement.
1 Answer 1
To solve this problem I did the following.
Created the table as per the question above. I then populated it with this data (a bit extra added for testing purposes).
INSERT INTO c_example
VALUES
(1, 1, '313f516ad35e477482aa435ae53c6b3a', 1, '2016-05-09 21:53:14'),
(2, 1, '313f516ad35e477482aa435ae53c6b3a', 2, '2016-05-09 21:53:14'),
(3, 1, '313f516ad35e477482aa435ae53c6b3a', 3, '2016-05-09 21:53:14'),
(4, 1, 'f76d2c14a86a42378badd04e4051e569', 1, '2016-05-10 21:53:14'),
(5, 1, 'f76d2c14a86a42378badd04e4051e569', 2, '2016-05-10 21:53:14'),
(6, 1, 'f76d2c14a86a42378badd04e4051e569', 3, '2016-05-10 21:53:14'),
(7, 2, 'c040cecfbff648b2a3ecbd9d0688efec', 1, '2016-05-11 21:53:14'),
(8, 2, 'c040cecfbff648b2a3ecbd9d0688efec', 2, '2016-05-11 21:53:14'),
(9, 2, 'c040cecfbff648b2a3ecbd9d0688efec', 3, '2016-05-11 21:53:14'),
(10, 2, '79c87ba31df14ec89f0da39e98b6a088', 1, '2016-05-12 21:53:14'),
(11, 2, '79c87ba31df14ec89f0da39e98b6a088', 2, '2016-05-12 21:53:14'),
(12, 2, '79c87ba31df14ec89f0da39e98b6a088', 3, '2016-05-12 21:53:14'),
(13, 3, '79c87ba31df14ec89f0da39e98b6a088', 3, '2016-05-12 21:53:14'),
(14, 4, '79c87ba31df14ec89f0da39e98b6a088', 3, '2016-05-12 21:53:14');
First, we need to obtain the c_id
s and test_timestamp
s of the records we are going to keep:
SELECT c_id, MAX(test_timestamp) AS mydate1
FROM c_example t1
GROUP BY t1.c_id
ORDER BY c_id, MAX(test_timestamp)
Result:
+------+---------------------+
| c_id | mydate1 |
+------+---------------------+
| 1 | 2016年05月10日 21:53:14 |
| 2 | 2016年05月12日 21:53:14 |
| 3 | 2016年05月12日 21:53:14 |
| 4 | 2016年05月12日 21:53:14 |
+------+---------------------+
Now, we use these to find the id
s corresponding to records with with these c_id
s and test_timestamp
s.
SELECT t1.id, t1.c_id, t1.guid, t1.test_timestamp FROM c_example t1
JOIN
(
SELECT c_id, MAX(test_timestamp) AS mydate1
FROM c_example t1
GROUP BY t1.c_id
ORDER BY c_id, MAX(test_timestamp)
) t2 ON t1.c_id = t2.c_id AND t1.test_timestamp = t2.mydate1
Result:
+----+------+----------------------------------+---------------------+
| id | c_id | guid | test_timestamp |
+----+------+----------------------------------+---------------------+
| 4 | 1 | f76d2c14a86a42378badd04e4051e569 | 2016年05月10日 21:53:14 |
| 5 | 1 | f76d2c14a86a42378badd04e4051e569 | 2016年05月10日 21:53:14 |
| 6 | 1 | f76d2c14a86a42378badd04e4051e569 | 2016年05月10日 21:53:14 |
| 10 | 2 | 79c87ba31df14ec89f0da39e98b6a088 | 2016年05月12日 21:53:14 |
| 11 | 2 | 79c87ba31df14ec89f0da39e98b6a088 | 2016年05月12日 21:53:14 |
| 12 | 2 | 79c87ba31df14ec89f0da39e98b6a088 | 2016年05月12日 21:53:14 |
| 13 | 3 | 79c87ba31df14ec89f0da39e98b6a088 | 2016年05月12日 21:53:14 |
| 14 | 4 | 79c87ba31df14ec89f0da39e98b6a088 | 2016年05月12日 21:53:14 |
+----+------+----------------------------------+---------------------+
These are the records we want to keep. You could just SELECT t1.id
for only the id
- I just selected the extra fields for clarity.
So, you might think that this query would work (it does in PostgreSQL)
DELETE FROM c_example
WHERE c_example.id NOT IN
(
SELECT t1.id FROM c_example t1
JOIN
(
SELECT c_id, MAX(test_timestamp) AS mydate1
FROM c_example t1
GROUP BY t1.c_id
ORDER BY c_id, MAX(test_timestamp)
) t2 ON t1.c_id = t2.c_id AND t1.test_timestamp = t2.mydate1
);
But, due to the delightful (ahem...) quirks of MySQL's brand of SQL, instead of working, one is presented with this error message:
ERROR 1093 (HY000): You can't specify target table 'c_example' for update in FROM clause
As it was put so aptly here "Sometimes I wonder what drugs the MySQL devs are on..." (found while searching for a solution - also to be found in that thread).
So, you have to add a further level of nesting to the query and finally, a query which works:
DELETE FROM c_example
WHERE c_example.id NOT IN
(
SELECT t3.id FROM
(
SELECT t1.id, t1.c_id, t1.guid, t1.test_timestamp FROM c_example t1
JOIN
(
SELECT c_id, MAX(test_timestamp) AS mydate1
FROM c_example t1
GROUP BY t1.c_id
ORDER BY c_id, MAX(test_timestamp)
) t2 ON t1.c_id = t2.c_id AND t1.test_timestamp = t2.mydate1
)AS t3
);
--
Result:
Query OK, 6 rows affected (0.66 sec)
And run the SELECT
to verify.
mysql> SELECT * FROM c_example;
+----+------+----------------------------------+-------------+---------------------+
| id | c_id | guid | provider_id | test_timestamp |
+----+------+----------------------------------+-------------+---------------------+
| 4 | 1 | f76d2c14a86a42378badd04e4051e569 | 1 | 2016年05月10日 21:53:14 |
| 5 | 1 | f76d2c14a86a42378badd04e4051e569 | 2 | 2016年05月10日 21:53:14 |
| 6 | 1 | f76d2c14a86a42378badd04e4051e569 | 3 | 2016年05月10日 21:53:14 |
| 10 | 2 | 79c87ba31df14ec89f0da39e98b6a088 | 1 | 2016年05月12日 21:53:14 |
| 11 | 2 | 79c87ba31df14ec89f0da39e98b6a088 | 2 | 2016年05月12日 21:53:14 |
| 12 | 2 | 79c87ba31df14ec89f0da39e98b6a088 | 3 | 2016年05月12日 21:53:14 |
| 13 | 3 | 79c87ba31df14ec89f0da39e98b6a088 | 3 | 2016年05月12日 21:53:14 |
| 14 | 4 | 79c87ba31df14ec89f0da39e98b6a088 | 3 | 2016年05月12日 21:53:14 |
+----+------+----------------------------------+-------------+---------------------+
8 rows in set (0.00 sec)
Which should be the right answer - if this doesn't solve your issue, please clarify the question.
INSERT
statements?(c_id, provider_id)
, not unique combination of(c_id, guid)
.(c_id, guid)
- I managed to do it in 3 separate statements, looping through the results of the previous statements. In keep thinking I should be able to do it in one statement.