Can someone help me with a MySql query to delete all rows greater than n entries ordered by date?
I.e. say I have 1200 rows of data with a timestamp column. I need to order it by date and preserve rows only up to the first 200.
If I have only 199 rows of data, then I need to preserve them all.
2 Answers 2
This will keep the first 200 rows (and possibly a few more, with identical timestamps):
DELETE t
FROM
tableX AS t
JOIN
( SELECT timestampColumn AS ts
FROM tableX
ORDER BY ts ASC
LIMIT 1 OFFSET 199
) tlimit
ON t.timestampColumn > tlimit.ts
Additional question: Keep the first 200 rows for every user (by the uid
column):
DELETE tt
FROM
( SELECT DISTINCT uid --- these 3 lines can be replaced
FROM tableX --- with: UserTable AS du
) AS du --- a table that you probably have
JOIN
tableX AS tt
ON tt.uid = du.uid
AND tt.timestampColumn >
( SELECT timestampColumn AS ts
FROM tableX
WHERE uid = du.uid
ORDER BY ts ASC
LIMIT 1 OFFSET 199
)
A (uid, timestampColumn)
will be useful with a big table.
-
This looks awesome. What if I don’t have 200 entries?Quintin Par– Quintin Par2012年01月22日 12:36:37 +00:00Commented Jan 22, 2012 at 12:36
-
1Then the subquery will return 0 rows and the join will be void and thus no rows at all will be deleted. Note that this will keep the 200 oldest rows. If you want to keep the 200 newest, you'll have to change
ASC
toDESC
and>
to<
.ypercubeᵀᴹ– ypercubeᵀᴹ2012年01月22日 19:53:21 +00:00Commented Jan 22, 2012 at 19:53 -
Which one do you think will be faster? Your’s or @Ladadadada’s?Quintin Par– Quintin Par2012年01月23日 04:10:09 +00:00Commented Jan 23, 2012 at 4:10
-
Perhaps you can test with your data and tell us? There is not going to be much difference, I think.ypercubeᵀᴹ– ypercubeᵀᴹ2012年01月23日 12:37:08 +00:00Commented Jan 23, 2012 at 12:37
-
If I had to do this for all users how should I change the query? I.e. each row has a uid column and I need to delete entries greater than 200 for each user.Quintin Par– Quintin Par2012年01月23日 16:09:53 +00:00Commented Jan 23, 2012 at 16:09
If you have an AUTOINCREMENT primary key, and we can assume that you don't update the timestamp, then this should work and be reasonably fast:
DELETE FROM table WHERE id >
( SELECT max(id) FROM
( SELECT id FROM table ORDER BY date DESC LIMIT 200 )
AS table) ;
Given the assumptions above, ordering by id
rather than date
would give exactly the same results. I have also assumed you want to delete anything older than the newest 200 so the ORDER BY
is DESC
.
-
If I had to do this for all users how should I change the query? I.e. each row has a uid column and I need to delete entries greater than 200 for each user.Quintin Par– Quintin Par2012年01月23日 08:28:55 +00:00Commented Jan 23, 2012 at 8:28
-
I don't think I can figure out a way to do that in a single SQL query (although it probably is possible.) I would just do a
GROUP BY uid
query to find which users have more than 200 notifications and then do a loop in whatever language you are using to delete all but the most recent 200 for those users.Ladadadada– Ladadadada2012年01月23日 10:35:50 +00:00Commented Jan 23, 2012 at 10:35 -
The problem with the programming language approach is that I will potentially end up iterating millions of users.Quintin Par– Quintin Par2012年01月23日 12:10:19 +00:00Commented Jan 23, 2012 at 12:10