PostgreSQL DELETE query hangs (and so does EXPLAIN ANALYZE!)

Question 1

I'm trying to delete a bunch of rows from a table matching a query. The general form of my query is:

DELETE FROM mytable WHERE _id IN (SELECT _id FROM mytable WHERE ...);

The _id column is a SERIAL PRIMARY KEY.

If I run the inner SELECT query on its own it runs in about 1 second and returns about 100,000 rows. But when I add the DELETE call onto it, it seems to just sit and chug away. And chug away. I let it run for about a minute or so, and then cancelled it assuming no progress was being made.

I added EXPLAIN ANALYZE onto the front of it to see if I could check what was taking so long, but that call also hangs for a very long period of time.

Can anybody tell me what could be going on such that it's very quick to identify the rows I'd like to delete but takes an indeterminate amount of time to delete them?

Update: The full query is as follows.

DELETE FROM cards 
WHERE _id IN (
 SELECT _id
 FROM cards
 LEFT JOIN game_results ON game_results.card_id = cards._id 
 WHERE NOT available
 AND game_id IS NULL
)

A simplified version of the tables would be:

CREATE TABLE cards (_id INTEGER PRIMARY KEY, available BOOLEAN);
CREATE TABLE games (_id INTEGER PRIMARY KEY);
CREATE TABLE game_results (game_id INTEGER REFERENCES games, card_id INTEGER REFERENCES cards);

I'm sure this query would eventually finish running, I'm just surprised at how long it's taking given that retrieving all of the IDs to delete is so quick.

Solution:

The problem was three-fold!

I had one of the earlier (very slow!) queries running that had been triggered by an aborted script. The query was still running even though the script had been aborted. As such, operations like trying to drop indexes were locked waiting for the query to finish. Manually cancelling that query using pg_cancel_backend allowed me to experiment with dropping indexes and foreign key constraints.
Foreign key constraints seem to be the issue slowing everything down. The fact that the game_results table referenced cards made the delete take forever. The funny thing is, I was only deleting cards explicitly unused in the game_results, but of course that didn't stop the foreign key checks from happening upon delete! That made things very slow. I dropped the foreign key constraint before running the delete and that sped things up to a level I felt was appropriate.
For whatever reason, the second delete query in the accepted answer ran much quicker than the first.

By combining these three things I was able to do the whole delete in a few seconds, whereas until I had combined these three factors together my queries were running to 5-10 minutes (before I was cancelling them!).

Thanks to all for the help!

Question 2

Is this the actual query? If yes, why not the simpler: DELETE FROM mytable WHERE ...? Why the subquery?

Question 3

@ypercubeTM The subquery joins to a separate table to check for conditions. It seemed a bit irrelevant to put the details in there because that inner SELECT runs quite quickly. Is it possible that somehow the DELETE is re-evaluating the inner SELECT for each row? If so, that'd take a very long time.

Question 4

I'm not sure what the actual problem is - if there is one. A DELETE is different than a SELECT. It has to actually write to the disk and mark all those (100k) rows as deleted. And update all indexes in the table. And cascade delete in other tables (if there are FKs with such actions). And call triggers if you have on the table.

Question 5

@aardvarkk I read @ypercubeTM to be asking why you're using a subquery in an IN() when you could be doing DELETE FROM mytable USING (PostgreSQL-specific syntax that should be faster and safer).

Question 6

EXPLAIN ANALYZE actually executes the query — if you want to see the plan without executing, use EXPLAIN

Question 7

Assuming that the columns are not nullable, the query can be simplified (without the self join) to using either NOT IN or NOT EXISTS:

DELETE FROM cards 
WHERE available = FALSE 
 AND _id NOT IN 
 ( SELECT card_id
 FROM game_results 
 ) ;
DELETE FROM cards AS c 
WHERE c.available = FALSE
 AND NOT EXISTS 
 ( SELECT * 
 FROM game_results AS gr
 WHERE gr.card_id = c._id
 ) ;

Still, deleting 100K rows can not be instantaneous. Performance may be affected by lack (or excess) of indexes, triggers, cascade deletes, etc.

You can use EXPLAIN (only, without ANALYZE) to get the execution plan of a DELETE. See EXPLAIN for details, especially if you want to try ANALYZE or other options of EXPLAIN:

Important: Keep in mind that the statement is actually executed when the ANALYZE option is used. Although EXPLAIN will discard any output that a SELECT would return, other side effects of the statement will happen as usual. If you wish to use EXPLAIN ANALYZE on an INSERT, UPDATE, DELETE, CREATE TABLE AS, or EXECUTE statement without letting the command affect your data, use this approach:
BEGIN;
EXPLAIN ANALYZE ...;
ROLLBACK;

Question 8

Is there a way to peek into the performance including all of those factors? All of the refactors of the query seem to still take a very long time, so that's telling me that some kind of index or trigger issue may be occupying a lot of time.

Question 9

@aardvarkk can you test without the indexes, or clone the table and test the delete on the clone?

Question 10

@JackDouglas Yep. I'm actually just trying to drop the indexes right now... and the first one I'm trying to drop is taking forever also! Interesting.

Question 11

@aardvarkk DROP INDEX should be quick if it is not blocked. You can check: wiki.postgresql.org/wiki/Lock_Monitoring

Question 12

@aardvarkk there is a good chance that is what was blocking the delete too then

Question 13

Check nothing is blocking the DELETE:

Looking at pg_locks shows you what locks are granted and what processes are waiting for locks to be acquired. A good query to start looking for lock problems:
SELECT relation::regclass, * FROM pg_locks WHERE NOT GRANTED;

Question 14

This isn't ideal.

DELETE FROM cards 
WHERE _id IN (
 SELECT _id
 FROM cards
 LEFT JOIN game_results ON game_results.card_id = cards._id 
 WHERE NOT available
 AND game_id IS NULL
);

Instead try DELETE ... USING

DELETE FROM cards
USING game_results AS gr
WHERE _id = gr.card_id
 AND NOT gr.available
 AND gr.game_id IS NULL;

Yeah, I was writing a similar comment. I'm trying to find any "cards" that were never used in a game by doing the left join to the game results and then checking for empty rows. – aardvarkk 1 min ago

If that's what you're trying to do go for this using NOT EXISTS..

DELETE FROM cards
WHERE NOT available
AND NOT EXISTS (
 SELECT *
 FROM game_results
 WHERE game_results.card_id = _id 
);

Question 15

I'm not sure if this is equivalent rewrite. USING is equivalent to INNER JOIN of the two tables. The original has LEFT JOIN.

Question 16

Yeah, I was writing a similar comment. I'm trying to find any "cards" that were never used in a game by doing the left join to the game results and then checking for empty rows. In fact, I think I can avoid using the games table altogether. Let me update.

Question 17

So then why do an IN, not a NOT EXISTS?

Question 18

@EvanCarroll Great question! That could be a possible solution! I haven't used that structure before. I'll look into it and see what I can find.

Question 19

@aardvarkk check that out

ypercubeTM ypercubeTM 99.7k13 gold badges217 silver badges306 bronze badges · Accepted Answer · 2017-03-09 23:15:30Z

Assuming that the columns are not nullable, the query can be simplified (without the self join) to using either NOT IN or NOT EXISTS:

DELETE FROM cards 
WHERE available = FALSE 
 AND _id NOT IN 
 ( SELECT card_id
 FROM game_results 
 ) ;
DELETE FROM cards AS c 
WHERE c.available = FALSE
 AND NOT EXISTS 
 ( SELECT * 
 FROM game_results AS gr
 WHERE gr.card_id = c._id
 ) ;

Still, deleting 100K rows can not be instantaneous. Performance may be affected by lack (or excess) of indexes, triggers, cascade deletes, etc.

You can use EXPLAIN (only, without ANALYZE) to get the execution plan of a DELETE. See EXPLAIN for details, especially if you want to try ANALYZE or other options of EXPLAIN:

Important: Keep in mind that the statement is actually executed when the ANALYZE option is used. Although EXPLAIN will discard any output that a SELECT would return, other side effects of the statement will happen as usual. If you wish to use EXPLAIN ANALYZE on an INSERT, UPDATE, DELETE, CREATE TABLE AS, or EXECUTE statement without letting the command affect your data, use this approach:
BEGIN;
EXPLAIN ANALYZE ...;
ROLLBACK;

Is there a way to peek into the performance including all of those factors? All of the refactors of the query seem to still take a very long time, so that's telling me that some kind of index or trigger issue may be occupying a lot of time.
@aardvarkk can you test without the indexes, or clone the table and test the delete on the clone?
@JackDouglas Yep. I'm actually just trying to drop the indexes right now... and the first one I'm trying to drop is taking forever also! Interesting.
@aardvarkk DROP INDEX should be quick if it is not blocked. You can check: wiki.postgresql.org/wiki/Lock_Monitoring
@aardvarkk there is a good chance that is what was blocking the delete too then

Stack Exchange Network

PostgreSQL DELETE query hangs (and so does EXPLAIN ANALYZE!)

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

PostgreSQL DELETE query hangs (and so does EXPLAIN ANALYZE!)

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions