I'm trying to delete a bunch of rows from a table matching a query. The general form of my query is:
DELETE FROM mytable WHERE _id IN (SELECT _id FROM mytable WHERE ...);
The _id
column is a SERIAL PRIMARY KEY
.
If I run the inner SELECT
query on its own it runs in about 1 second and returns about 100,000 rows. But when I add the DELETE
call onto it, it seems to just sit and chug away. And chug away. I let it run for about a minute or so, and then cancelled it assuming no progress was being made.
I added EXPLAIN ANALYZE
onto the front of it to see if I could check what was taking so long, but that call also hangs for a very long period of time.
Can anybody tell me what could be going on such that it's very quick to identify the rows I'd like to delete but takes an indeterminate amount of time to delete them?
Update: The full query is as follows.
DELETE FROM cards
WHERE _id IN (
SELECT _id
FROM cards
LEFT JOIN game_results ON game_results.card_id = cards._id
WHERE NOT available
AND game_id IS NULL
)
A simplified version of the tables would be:
CREATE TABLE cards (_id INTEGER PRIMARY KEY, available BOOLEAN);
CREATE TABLE games (_id INTEGER PRIMARY KEY);
CREATE TABLE game_results (game_id INTEGER REFERENCES games, card_id INTEGER REFERENCES cards);
I'm sure this query would eventually finish running, I'm just surprised at how long it's taking given that retrieving all of the IDs to delete is so quick.
Solution:
The problem was three-fold!
I had one of the earlier (very slow!) queries running that had been triggered by an aborted script. The query was still running even though the script had been aborted. As such, operations like trying to drop indexes were locked waiting for the query to finish. Manually cancelling that query using
pg_cancel_backend
allowed me to experiment with dropping indexes and foreign key constraints.Foreign key constraints seem to be the issue slowing everything down. The fact that the
game_results
table referencedcards
made the delete take forever. The funny thing is, I was only deleting cards explicitly unused in the game_results, but of course that didn't stop the foreign key checks from happening upon delete! That made things very slow. I dropped the foreign key constraint before running the delete and that sped things up to a level I felt was appropriate.For whatever reason, the second delete query in the accepted answer ran much quicker than the first.
By combining these three things I was able to do the whole delete in a few seconds, whereas until I had combined these three factors together my queries were running to 5-10 minutes (before I was cancelling them!).
Thanks to all for the help!
3 Answers 3
Assuming that the columns are not nullable, the query can be simplified (without the self join) to using either NOT IN
or NOT EXISTS
:
DELETE FROM cards
WHERE available = FALSE
AND _id NOT IN
( SELECT card_id
FROM game_results
) ;
DELETE FROM cards AS c
WHERE c.available = FALSE
AND NOT EXISTS
( SELECT *
FROM game_results AS gr
WHERE gr.card_id = c._id
) ;
Still, deleting 100K rows can not be instantaneous. Performance may be affected by lack (or excess) of indexes, triggers, cascade deletes, etc.
You can use EXPLAIN
(only, without ANALYZE
) to get the execution plan of a DELETE
. See EXPLAIN
for details, especially if you want to try ANALYZE
or other options of EXPLAIN:
Important: Keep in mind that the statement is actually executed when the
ANALYZE
option is used. AlthoughEXPLAIN
will discard any output that a SELECT would return, other side effects of the statement will happen as usual. If you wish to useEXPLAIN ANALYZE
on anINSERT
,UPDATE
,DELETE
,CREATE TABLE AS
, orEXECUTE
statement without letting the command affect your data, use this approach:BEGIN; EXPLAIN ANALYZE ...; ROLLBACK;
-
Is there a way to peek into the performance including all of those factors? All of the refactors of the query seem to still take a very long time, so that's telling me that some kind of index or trigger issue may be occupying a lot of time.aardvarkk– aardvarkk2017年03月09日 23:20:39 +00:00Commented Mar 9, 2017 at 23:20
-
@aardvarkk can you test without the indexes, or clone the table and test the delete on the clone?Jack Douglas– Jack Douglas2017年03月09日 23:24:12 +00:00Commented Mar 9, 2017 at 23:24
-
@JackDouglas Yep. I'm actually just trying to drop the indexes right now... and the first one I'm trying to drop is taking forever also! Interesting.aardvarkk– aardvarkk2017年03月09日 23:25:01 +00:00Commented Mar 9, 2017 at 23:25
-
1@aardvarkk DROP INDEX should be quick if it is not blocked. You can check: wiki.postgresql.org/wiki/Lock_MonitoringJack Douglas– Jack Douglas2017年03月09日 23:27:31 +00:00Commented Mar 9, 2017 at 23:27
-
2@aardvarkk there is a good chance that is what was blocking the delete too thenJack Douglas– Jack Douglas2017年03月09日 23:32:03 +00:00Commented Mar 9, 2017 at 23:32
Check nothing is blocking the DELETE
:
Looking at pg_locks shows you what locks are granted and what processes are waiting for locks to be acquired. A good query to start looking for lock problems:
SELECT relation::regclass, * FROM pg_locks WHERE NOT GRANTED;
This isn't ideal.
DELETE FROM cards
WHERE _id IN (
SELECT _id
FROM cards
LEFT JOIN game_results ON game_results.card_id = cards._id
WHERE NOT available
AND game_id IS NULL
);
Instead try DELETE ... USING
DELETE FROM cards
USING game_results AS gr
WHERE _id = gr.card_id
AND NOT gr.available
AND gr.game_id IS NULL;
Yeah, I was writing a similar comment. I'm trying to find any "cards" that were never used in a game by doing the left join to the game results and then checking for empty rows. – aardvarkk 1 min ago
If that's what you're trying to do go for this using NOT EXISTS
..
DELETE FROM cards
WHERE NOT available
AND NOT EXISTS (
SELECT *
FROM game_results
WHERE game_results.card_id = _id
);
-
1I'm not sure if this is equivalent rewrite.
USING
is equivalent toINNER JOIN
of the two tables. The original hasLEFT JOIN
.ypercubeᵀᴹ– ypercubeᵀᴹ2017年03月09日 23:06:42 +00:00Commented Mar 9, 2017 at 23:06 -
Yeah, I was writing a similar comment. I'm trying to find any "cards" that were never used in a game by doing the left join to the game results and then checking for empty rows. In fact, I think I can avoid using the games table altogether. Let me update.aardvarkk– aardvarkk2017年03月09日 23:08:23 +00:00Commented Mar 9, 2017 at 23:08
-
So then why do an IN, not a
NOT EXISTS
?Evan Carroll– Evan Carroll2017年03月09日 23:09:19 +00:00Commented Mar 9, 2017 at 23:09 -
@EvanCarroll Great question! That could be a possible solution! I haven't used that structure before. I'll look into it and see what I can find.aardvarkk– aardvarkk2017年03月09日 23:10:12 +00:00Commented Mar 9, 2017 at 23:10
-
@aardvarkk check that outEvan Carroll– Evan Carroll2017年03月09日 23:10:53 +00:00Commented Mar 9, 2017 at 23:10
DELETE FROM mytable WHERE ...
? Why the subquery?SELECT
runs quite quickly. Is it possible that somehow theDELETE
is re-evaluating the innerSELECT
for each row? If so, that'd take a very long time.DELETE
is different than aSELECT
. It has to actually write to the disk and mark all those (100k) rows as deleted. And update all indexes in the table. And cascade delete in other tables (if there are FKs with such actions). And call triggers if you have on the table.IN()
when you could be doingDELETE FROM mytable USING
(PostgreSQL-specific syntax that should be faster and safer).EXPLAIN ANALYZE
actually executes the query — if you want to see the plan without executing, useEXPLAIN