I have two tables on MySQL:
HISTORY_TABLE
(4 million rows) (column1 indexed)EXTRACT_TABLE
(600K rows) (column1 indexed)
For every EXTRACT_TABLE.column1
there are multiple HISTORY_TABLE.column1
rows.
Objective: I want to delete all records that MATCH criteria:
HISTORY_TABLE.column1 = EXTRACT_TABLE.column1
I tried two different possibilities:
- Simple query that matches criteria (10 hours)
- Copying the
NOT EXISTS
rows into a new table (2days+)
I found via searching that deletion can be done in chunks with a stored procedure, but I don't know how.
Is there an example on how to do a cursor based on EXTRACT_TABLE
to read in chunks and delete the other table? Or is there other way to accomplish my objective?
1 Answer 1
Since you are deleting a large percentage of the table, it would be better to build a new table with the rows to keep:
CREATE TABLE new LIKE real;
INSERT INTO new
SELECT * FROM real
LEFT JOIN extract ON ...
WHERE ... IS NULL;
RENAME TABLE real TO old,
new TO real; -- swap tables (sort of)
DROP TABLE old; -- clean up
Even that may be too invasive. See the following for techniques for deleting "in chunks". They won't finish "fast", but they won't have nearly as much impact on other activity. http://mysql.rjweb.org/doc.php/deletebig
(FOREIGN KEYs
and TRIGGERs
are likely to cause trouble.)
-
Do you think LEFT JOIN will be faster than EXISTS (this took me forever)? let me know your thoughts. Thanksuser3861709– user38617092019年11月23日 01:56:43 +00:00Commented Nov 23, 2019 at 1:56
-
@user3861709 - In some situations, the Optimizer will turn one into the other. (I don't have the details.) But such 'rewriting' would be done only if it is reasonably sure that it helps performance.Rick James– Rick James2019年11月23日 03:11:49 +00:00Commented Nov 23, 2019 at 3:11
-
@user3861709 - What takes a long time is saving the old copies of millions of rows in case there is a crash and the data needs to be rolled back. My
INSERT
approach avoids that.Rick James– Rick James2019年11月23日 03:12:41 +00:00Commented Nov 23, 2019 at 3:12 -
@RickJames The only suggestion is that
RENAME
will fail if table has triggers.Kondybas– Kondybas2019年11月24日 11:13:40 +00:00Commented Nov 24, 2019 at 11:13 -
@Kondybas - Thanks. I updated my blog.Rick James– Rick James2019年11月24日 20:07:11 +00:00Commented Nov 24, 2019 at 20:07