Copy a Large INNODB table in MySQL

Question 1

I have an INNODB table that's > 93 million rows. A lot of the data is considered "temp" data and is governed by an "is_active" flag of 1/0. When a user updates a form, new data is written with an "is_active=1" and the previous active records are updated to "is_active=0".

We wanted to move the data to a new table to clean up and ran statement like...

INSERT INTO tblNew (a, b, c)
SELECT a, b, c FROM tblOld WHERE is_active=1

This ran overnight and when I looked in the morning I noticed there were a bunch of processes backed up in the SHOW PROCESS LIST so I did a KILL on the process on the ID, which started the ROLLBACK and brought the server down for another 10 hours... production box of course.

I've been reading a lot on how you can try to repair, etc. and have been doing that all day, but I'm wondering if there's any kind of option I could have added to avoid the need for rollback on failure? Or is there a strategy commit or flush every X number of rows, etc.

I was trying this...

INSERT INTO tblNew (a, b, c)
SELECT a, b, c FROM tblOld WHERE is_active=1 AND pkID > 0 AND pkID < 1000000

Where the pkID was the primary key. I would run it in groups of 550k - 1M and up the number range for PK each run. There's an index on the PK and on is_active, yet I noticed speeds increased each run from 30 seconds to over 5 minutes a run by time it was in the 20M range. Any idea why this would take longer each run when it's the same number of rows for the work?

So 2 in summary, questions...

Can I do something to keep a huge rollback from happening if I stop the process?
Why did inserting the same number of items based on PK and indexed column take progressively longer per run?

Question 2

What percentage of the table is_active normally?

The following should avoid having to do massive updates to flip that flag:

Consider abandoning the is_active flag in the main table. Instead have a separate table with PRIMARY KEY matching the main table, plus (optionally) a timestamp (I'll get back to that in a minute).

Instead of setting is_active, INSERT a row in the new table.

Instead of turning off is_active, DELETE the row.

To check for [in]active, use LEFT JOIN plus WHERE id IS [NOT] NULL.

The TIMESTAMP is to deal with programming errors that insert a row but somehow forget about it. Also add a user_id column for debugging, if relevant.

Question 3

About 30% are currently is_active=0. I We do a cleanup periodically where I DELETE FROM all the inactive. I moved to INNODB a few months ago so I could do this w/out locking the table for selects. I also partitioning based on another ID. IDs of 0 are in their own partition, then I go up by 100k to additional partitions. This table stores the responses (form fields) for individual form entries. So if we have 300k forms submitted, we'd have forms that were not finished w/ ID 0, is_active 0, and the other partitions would contain the various form entries. The delete on edit is what we may do.

Question 4

If you are walking through based on PRIMARY KEY, do it only 1000 ids at a time. But there are probably lots of gaps? If so, this blog discusses how to do 1000 at a time efficiently, even with gaps.

Question 5

And be sure to COMMIT after each chunk. This will keep from gathering up a huge rollback log.

Rick James Rick James 80.7k5 gold badges52 silver badges119 bronze badges · Accepted Answer · 2015-05-15 23:03:18Z

1

What percentage of the table is_active normally?

The following should avoid having to do massive updates to flip that flag:

Consider abandoning the is_active flag in the main table. Instead have a separate table with PRIMARY KEY matching the main table, plus (optionally) a timestamp (I'll get back to that in a minute).

Instead of setting is_active, INSERT a row in the new table.

Instead of turning off is_active, DELETE the row.

To check for [in]active, use LEFT JOIN plus WHERE id IS [NOT] NULL.

The TIMESTAMP is to deal with programming errors that insert a row but somehow forget about it. Also add a user_id column for debugging, if relevant.

Share

Improve this answer

answered May 15, 2015 at 23:03

Rick James's user avatar

Rick James Rick James

80.7k5 gold badges52 silver badges119 bronze badges

3

About 30% are currently is_active=0. I We do a cleanup periodically where I DELETE FROM all the inactive. I moved to INNODB a few months ago so I could do this w/out locking the table for selects. I also partitioning based on another ID. IDs of 0 are in their own partition, then I go up by 100k to additional partitions. This table stores the responses (form fields) for individual form entries. So if we have 300k forms submitted, we'd have forms that were not finished w/ ID 0, is_active 0, and the other partitions would contain the various form entries. The delete on edit is what we may do.

Don
– Don

2015年05月15日 23:10:46 +00:00
Commented May 15, 2015 at 23:10
1

If you are walking through based on PRIMARY KEY, do it only 1000 ids at a time. But there are probably lots of gaps? If so, this blog discusses how to do 1000 at a time efficiently, even with gaps.

Rick James
– Rick James

2015年05月16日 02:45:11 +00:00
Commented May 16, 2015 at 2:45
1

And be sure to COMMIT after each chunk. This will keep from gathering up a huge rollback log.

Rick James
– Rick James

2015年05月16日 02:45:51 +00:00
Commented May 16, 2015 at 2:45

Add a comment |

Stack Exchange Network

Copy a Large INNODB table in MySQL

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Copy a Large INNODB table in MySQL

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions