How to update a (very) large table without locking in MySQL

Question 1

I have a large table (58+ million records) which represents a relation between two records (player and target) in a second table.

Unfortunately, whoever designed our schema didn't think things through properly, and opted to use usernames to represent this relation, instead of the numerical ids for the user records. As things progressed (like they usually do), our usernames are no longer a valid, unique representation of a player so I need to convert these relations to use numerical ids.

Adding the fields without locking was easy thanks to Percona Toolkit, which offers pt-online-schema-change that can ALTER on a live table. Populating the tables, however might be trickier.

The tables looks like this (I've stripped the creates of non-relevant fields), with the two unpopulated fields being player_id and target_id:

CREATE TABLE `player_targets` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `player` varchar(20) NOT NULL,
 `player_id` int(10) unsigned DEFAULT NULL,
 `target` varchar(20) NOT NULL,
 `target_id` int(10) unsigned DEFAULT NULL,
 PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=58000000 DEFAULT CHARSET=latin1;
CREATE TABLE 'player_uuids' (
 `id`int(11) NOT NULL AUTO_INCREMENT,
 `username` varchar(20) NOT NULL,
 PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=600000 DEFAUL CHARSET=latin1;

I was planning on populating the two new fields with a query like this one:

UPDATE player_targets t
INNER JOIN player_uuids u1
 ON u1.username = t.player
INNER JOIN player_uuids u2
 ON u2.username = t.target
SET
 t.player_id = u1.id,
 t.target_id = u2.id
WHERE
 t.player_id IS NULL
 OR t.player_id IS NULL;

With the table storing relations being MyISAM, my interpretation of the documentation is that the UPDATE-query will lock the table until it's finished with all the rows. As the table is large, this will likely not work very well in a live environment.

What would the best approach for this be? writing a script to iterate over batches of relations? Changing the table engine to InnoDB (the table is read-heavy, which I believe is the reason for it being MyISAM)?

Question 2

If your queries need to change to refer to the new fields anyways, why not put the properly-normalized relation into a new table?

Question 3

That's certainly an option - it does raise the concern of making our, already very complex, queries even more complex, with the added downside that we'll have to maintain both this poorly formatted schema and queries for a long time to come. Am of the mindset that it's better to fix it properly now, than later :) I also have the benefit of scheduled downtime for some other parts of this overhaul, but am worried I won't have time to populate the entire table during that time, and wanted to pre-populate as much of it as possible.

Question 4

No what I'm saying is, if you are migrating to int columns in the same table, in order to use it you need to update all the queries anyways. There's no extra work involved in putting it into a separate table.

Question 5

I understood that - there is however extra work in putting it in a separate table, as that will require further joins on all queries that access the table. Sure, we will be updating the queries anyway, but do we really want to paint ourselves even further into a corner by keeping useless legacy data around?

Question 6

It's no extra work - instead of adding fields to the existing table, you make a new table that only has the fields you need. Then the current table goes away and you have the same number of JOINs.

Question 7

You can delete or update by one row in loop statement. In this case there will be no blocking but it will work a little slower. Use limit 1 to do this in procedure:

 PROCEDURE myProcedure()
 BEGIN
 DECLARE c int; -- to calculated affected rows
 set c = 0;
 ml:LOOP
 UPDATE player_targets t
 INNER JOIN player_uuids u1
 ON u1.username = t.player
 INNER JOIN player_uuids u2
 ON u2.username = t.target
 SET
 t.player_id = u1.id,
 t.target_id = u2.id
 WHERE
 t.player_id IS NULL
 OR t.player_id IS NULL 
 LIMIT 1;
 -- check if the loop has completed 
 IF ROW_COUNT() = 0 THEN
 LEAVE ml;
 END IF; 
 set c = c + 1;
 IF c MOD 100 = 0 THEN
 SELECT CONCAT(c, ' row(s) updated');
 END IF;
 END LOOP; 
 SELECT CONCAT(c, ' row(s) updated; The statement has completed');
 END

And call procedure:

Call myProcedure();

Question 8

Yes, iterate over batches. This blog provides some suggestions on how to do it.

It is couched in how to chunk up big DELETEs, but the principles work for big UPDATEs, like yours.

It shows you how to walk through using (typically) the PRIMARY KEY and use LIMIT to decide on the next chunk. No step has to scan the entire table as that would defeat the purpose of chunking.

Question 9

What would the best approach for this be? writing a script to iterate over batches of relations? Changing the table engine to InnoDB (the table is read-heavy, which I believe is the reason for it being MyISAM)?

Change the table to InnoDB. Not just to eliminate table locking reasons. But, for crash recovery, referential integrity and a bunch of other stuff. See this answer for the nitty gritty.

Why would you want a work around? MyISAM is fine to use, but you should have a reason. So you think your DB is read-heavy. Try the hardened default that MySQL provides and if that fails your use-case then you may want to try something else MySQL offers. I wouldn't even want to consider something before trying InnoDB. That's the easy solution: try a different storage engine.

yikekas yikekas 111 bronze badge · Answer 1 · 2017-12-10 08:21:22Z

You can delete or update by one row in loop statement. In this case there will be no blocking but it will work a little slower. Use limit 1 to do this in procedure:

 PROCEDURE myProcedure()
 BEGIN
 DECLARE c int; -- to calculated affected rows
 set c = 0;
 ml:LOOP
 UPDATE player_targets t
 INNER JOIN player_uuids u1
 ON u1.username = t.player
 INNER JOIN player_uuids u2
 ON u2.username = t.target
 SET
 t.player_id = u1.id,
 t.target_id = u2.id
 WHERE
 t.player_id IS NULL
 OR t.player_id IS NULL 
 LIMIT 1;
 -- check if the loop has completed 
 IF ROW_COUNT() = 0 THEN
 LEAVE ml;
 END IF; 
 set c = c + 1;
 IF c MOD 100 = 0 THEN
 SELECT CONCAT(c, ' row(s) updated');
 END IF;
 END LOOP; 
 SELECT CONCAT(c, ' row(s) updated; The statement has completed');
 END

And call procedure:

Call myProcedure();

Rick James Rick James 80.7k5 gold badges52 silver badges119 bronze badges · Answer 2 · 2015-02-07 22:11:59Z

Yes, iterate over batches. This blog provides some suggestions on how to do it.

It is couched in how to chunk up big DELETEs, but the principles work for big UPDATEs, like yours.

It shows you how to walk through using (typically) the PRIMARY KEY and use LIMIT to decide on the next chunk. No step has to scan the entire table as that would defeat the purpose of chunking.

Evan Carroll Evan Carroll 65.7k50 gold badges259 silver badges511 bronze badges · Answer 3 · 2017-01-06 03:47:03Z

What would the best approach for this be? writing a script to iterate over batches of relations? Changing the table engine to InnoDB (the table is read-heavy, which I believe is the reason for it being MyISAM)?

Change the table to InnoDB. Not just to eliminate table locking reasons. But, for crash recovery, referential integrity and a bunch of other stuff. See this answer for the nitty gritty.

Why would you want a work around? MyISAM is fine to use, but you should have a reason. So you think your DB is read-heavy. Try the hardened default that MySQL provides and if that fails your use-case then you may want to try something else MySQL offers. I wouldn't even want to consider something before trying InnoDB. That's the easy solution: try a different storage engine.

Stack Exchange Network

How to update a (very) large table without locking in MySQL

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

How to update a (very) large table without locking in MySQL

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions