5

I'm trying to delete a bunch of rows from a table matching a query. The general form of my query is:

DELETE FROM mytable WHERE _id IN (SELECT _id FROM mytable WHERE ...);

The _id column is a SERIAL PRIMARY KEY.

If I run the inner SELECT query on its own it runs in about 1 second and returns about 100,000 rows. But when I add the DELETE call onto it, it seems to just sit and chug away. And chug away. I let it run for about a minute or so, and then cancelled it assuming no progress was being made.

I added EXPLAIN ANALYZE onto the front of it to see if I could check what was taking so long, but that call also hangs for a very long period of time.

Can anybody tell me what could be going on such that it's very quick to identify the rows I'd like to delete but takes an indeterminate amount of time to delete them?

Update: The full query is as follows.

DELETE FROM cards 
WHERE _id IN (
 SELECT _id
 FROM cards
 LEFT JOIN game_results ON game_results.card_id = cards._id 
 WHERE NOT available
 AND game_id IS NULL
)

A simplified version of the tables would be:

CREATE TABLE cards (_id INTEGER PRIMARY KEY, available BOOLEAN);
CREATE TABLE games (_id INTEGER PRIMARY KEY);
CREATE TABLE game_results (game_id INTEGER REFERENCES games, card_id INTEGER REFERENCES cards);

I'm sure this query would eventually finish running, I'm just surprised at how long it's taking given that retrieving all of the IDs to delete is so quick.

Solution:

The problem was three-fold!

  1. I had one of the earlier (very slow!) queries running that had been triggered by an aborted script. The query was still running even though the script had been aborted. As such, operations like trying to drop indexes were locked waiting for the query to finish. Manually cancelling that query using pg_cancel_backend allowed me to experiment with dropping indexes and foreign key constraints.

  2. Foreign key constraints seem to be the issue slowing everything down. The fact that the game_results table referenced cards made the delete take forever. The funny thing is, I was only deleting cards explicitly unused in the game_results, but of course that didn't stop the foreign key checks from happening upon delete! That made things very slow. I dropped the foreign key constraint before running the delete and that sped things up to a level I felt was appropriate.

  3. For whatever reason, the second delete query in the accepted answer ran much quicker than the first.

By combining these three things I was able to do the whole delete in a few seconds, whereas until I had combined these three factors together my queries were running to 5-10 minutes (before I was cancelling them!).

Thanks to all for the help!

asked Mar 9, 2017 at 22:35
11
  • 1
    Is this the actual query? If yes, why not the simpler: DELETE FROM mytable WHERE ...? Why the subquery? Commented Mar 9, 2017 at 22:44
  • @ypercubeTM The subquery joins to a separate table to check for conditions. It seemed a bit irrelevant to put the details in there because that inner SELECT runs quite quickly. Is it possible that somehow the DELETE is re-evaluating the inner SELECT for each row? If so, that'd take a very long time. Commented Mar 9, 2017 at 22:46
  • 1
    I'm not sure what the actual problem is - if there is one. A DELETE is different than a SELECT. It has to actually write to the disk and mark all those (100k) rows as deleted. And update all indexes in the table. And cascade delete in other tables (if there are FKs with such actions). And call triggers if you have on the table. Commented Mar 9, 2017 at 22:47
  • 2
    @aardvarkk I read @ypercubeTM to be asking why you're using a subquery in an IN() when you could be doing DELETE FROM mytable USING (PostgreSQL-specific syntax that should be faster and safer). Commented Mar 9, 2017 at 23:04
  • 3
    EXPLAIN ANALYZE actually executes the query — if you want to see the plan without executing, use EXPLAIN Commented Mar 9, 2017 at 23:25

3 Answers 3

8

Assuming that the columns are not nullable, the query can be simplified (without the self join) to using either NOT IN or NOT EXISTS:

DELETE FROM cards 
WHERE available = FALSE 
 AND _id NOT IN 
 ( SELECT card_id
 FROM game_results 
 ) ;
DELETE FROM cards AS c 
WHERE c.available = FALSE
 AND NOT EXISTS 
 ( SELECT * 
 FROM game_results AS gr
 WHERE gr.card_id = c._id
 ) ;

Still, deleting 100K rows can not be instantaneous. Performance may be affected by lack (or excess) of indexes, triggers, cascade deletes, etc.

You can use EXPLAIN (only, without ANALYZE) to get the execution plan of a DELETE. See EXPLAIN for details, especially if you want to try ANALYZE or other options of EXPLAIN:

Important: Keep in mind that the statement is actually executed when the ANALYZE option is used. Although EXPLAIN will discard any output that a SELECT would return, other side effects of the statement will happen as usual. If you wish to use EXPLAIN ANALYZE on an INSERT, UPDATE, DELETE, CREATE TABLE AS, or EXECUTE statement without letting the command affect your data, use this approach:

BEGIN;
EXPLAIN ANALYZE ...;
ROLLBACK;
answered Mar 9, 2017 at 23:15
7
  • Is there a way to peek into the performance including all of those factors? All of the refactors of the query seem to still take a very long time, so that's telling me that some kind of index or trigger issue may be occupying a lot of time. Commented Mar 9, 2017 at 23:20
  • @aardvarkk can you test without the indexes, or clone the table and test the delete on the clone? Commented Mar 9, 2017 at 23:24
  • @JackDouglas Yep. I'm actually just trying to drop the indexes right now... and the first one I'm trying to drop is taking forever also! Interesting. Commented Mar 9, 2017 at 23:25
  • 1
    @aardvarkk DROP INDEX should be quick if it is not blocked. You can check: wiki.postgresql.org/wiki/Lock_Monitoring Commented Mar 9, 2017 at 23:27
  • 2
    @aardvarkk there is a good chance that is what was blocking the delete too then Commented Mar 9, 2017 at 23:32
5

Check nothing is blocking the DELETE:

Looking at pg_locks shows you what locks are granted and what processes are waiting for locks to be acquired. A good query to start looking for lock problems:

SELECT relation::regclass, * FROM pg_locks WHERE NOT GRANTED;
answered Mar 9, 2017 at 23:34
2

This isn't ideal.

DELETE FROM cards 
WHERE _id IN (
 SELECT _id
 FROM cards
 LEFT JOIN game_results ON game_results.card_id = cards._id 
 WHERE NOT available
 AND game_id IS NULL
);

Instead try DELETE ... USING

DELETE FROM cards
USING game_results AS gr
WHERE _id = gr.card_id
 AND NOT gr.available
 AND gr.game_id IS NULL;

Yeah, I was writing a similar comment. I'm trying to find any "cards" that were never used in a game by doing the left join to the game results and then checking for empty rows. – aardvarkk 1 min ago

If that's what you're trying to do go for this using NOT EXISTS..

DELETE FROM cards
WHERE NOT available
AND NOT EXISTS (
 SELECT *
 FROM game_results
 WHERE game_results.card_id = _id 
);
answered Mar 9, 2017 at 23:02
7
  • 1
    I'm not sure if this is equivalent rewrite. USING is equivalent to INNER JOIN of the two tables. The original has LEFT JOIN. Commented Mar 9, 2017 at 23:06
  • Yeah, I was writing a similar comment. I'm trying to find any "cards" that were never used in a game by doing the left join to the game results and then checking for empty rows. In fact, I think I can avoid using the games table altogether. Let me update. Commented Mar 9, 2017 at 23:08
  • So then why do an IN, not a NOT EXISTS? Commented Mar 9, 2017 at 23:09
  • @EvanCarroll Great question! That could be a possible solution! I haven't used that structure before. I'll look into it and see what I can find. Commented Mar 9, 2017 at 23:10
  • @aardvarkk check that out Commented Mar 9, 2017 at 23:10

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.