(Mysql) Using Memory engine for 4.3 million row table?

Question 1

My new company stores an intermediate data for our loading process in a table using the memory engine. However a quite simple update (set a.id=b.id) is taking something like six hours to run.

I've checked how big the tables are and in this import they have a about 4.3 million rows of data in them. I'm certain the problem is the memory engine and the lack of index on id (there is a PK but its across 5 other attributes) based on HASH.

I'm 95% certain I need to change the table engine and index type but I want to make sure before I push for the change.

EDIT: Here is the statement that is is so slow:

select @Extraction_id := id from extraction;
update items as a
set a.extraction_id=@Extraction_id;

Edit 2: create statement of table

CREATE TABLE `items` (
 `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
 `extraction_id` char(12) NOT NULL,
 `a` char(6) NOT NULL,
 `b` char(10) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,
 `c` date NOT NULL,
 `d` varchar(100) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,
 `e` varchar(255) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,
 `f` decimal(17,6) DEFAULT NULL,
 `g` decimal(17,6) DEFAULT NULL,
 `h` varchar(150) DEFAULT NULL,
 `i` varchar(150) DEFAULT NULL,
 PRIMARY KEY (`id`,`extraction_id`,`a`,`b`,`c,`d`),
 KEY `fk_extraction` (`extraction_id`),
) ENGINE=MEMORY AUTO_INCREMENT=2343054016 DEFAULT CHARSET=latin1;

Question 2

Looks like that updates all 4.3 million rows of the table?? That's a costly task, and it often implies faulty schema design.

Question 3

@RickJames across 3 different tables two of them are these massive tables which takes forever to process.

Question 4

As RickJames said that is a costly operation. Why would you need to update 4 million rows, with the same extraction_id? You have your @Extraction_id value from the previous select. Just use it where you need it, don't update million of rows with it.

Question 5

@ypercubeTM yes but the problem is this query sits in a file that checks that the extraction id hasn't been used before in the production database, if it has it gives it a new one but this means it has to update all of the other tables that have the same conflicting extraction id. The extraction id is set outside of SQL on the raw csv's we get so we have to change it in code.

Question 6

I still don't understand why you have to update all the rows of the table. If the id had been used, why insert 4 million rows with that id in the first place (and not insert them with the new one)? Or even better, don't have an extraction_id in that items table at all. This looks more like a design issue, as Rick has pointed.

Question 7

ENGINE=MEMORY is not designed for big tables. InnoDB is well designed for virtually all use cases; switch to it.

It sounds like you need more than an ENGINE and index type change. Please provide SHOW CREATE TABLE for the relevant tables, and the statement that is so slow.

set a.id=b.id?? -- By convention, DBAs call the PRIMARY KEY id. It is very strange to be setting the PK of one table to the PK of another.

Chunking

One approach to UPDATEing 4M rows will less impact is to "chunk" up the problem. I discuss details here.

Question 8

"set a.id=b.id" is part of the processes clash checking where if the ID has already been used before it creates a new one and applies to to the various import tables.

Question 9

I've tested using InnoDB on a test database using 1m rows and InnoDB takes even longer, so I'm not sure thats going to save us.

Question 10

"ENGINE=MEMORY is not designed for big tables" - in mos serious scenarios you would not consider this a large table. Is MySql really that limited? In Memory Storage is used with hundreds of gigabytes RAM by competitors.

Question 11

"It is very strange to be setting the PK of one table to the PK of another." - no. Horizontal partitioning (splitting fields of a logical table to multiple database tables) is one valid scenario. They obviously all share the same ID key. One of them will be leading (mandatory). I use this quite regularly.

Question 12

@TomTom - And I recommend it -- but only rarely. I am wondering if this is an "over-simplification" of the real problem. And one of those id's is not really spelled id.

Rick James Rick James 80.7k5 gold badges52 silver badges119 bronze badges · Accepted Answer · 2018-01-22 12:26:08Z

0

ENGINE=MEMORY is not designed for big tables. InnoDB is well designed for virtually all use cases; switch to it.

It sounds like you need more than an ENGINE and index type change. Please provide SHOW CREATE TABLE for the relevant tables, and the statement that is so slow.

set a.id=b.id?? -- By convention, DBAs call the PRIMARY KEY id. It is very strange to be setting the PK of one table to the PK of another.

Chunking

One approach to UPDATEing 4M rows will less impact is to "chunk" up the problem. I discuss details here.

Share

Improve this answer

edited Jan 23, 2018 at 2:37

answered Jan 22, 2018 at 12:26

Rick James's user avatar

Rick James Rick James

80.7k5 gold badges52 silver badges119 bronze badges

5

"set a.id=b.id" is part of the processes clash checking where if the ID has already been used before it creates a new one and applies to to the various import tables.

Exostrike
– Exostrike

2018年01月22日 12:38:01 +00:00
Commented Jan 22, 2018 at 12:38
I've tested using InnoDB on a test database using 1m rows and InnoDB takes even longer, so I'm not sure thats going to save us.

Exostrike
– Exostrike

2018年01月22日 13:12:17 +00:00
Commented Jan 22, 2018 at 13:12
"ENGINE=MEMORY is not designed for big tables" - in mos serious scenarios you would not consider this a large table. Is MySql really that limited? In Memory Storage is used with hundreds of gigabytes RAM by competitors.

TomTom
– TomTom

2018年01月22日 13:38:54 +00:00
Commented Jan 22, 2018 at 13:38
1

"It is very strange to be setting the PK of one table to the PK of another." - no. Horizontal partitioning (splitting fields of a logical table to multiple database tables) is one valid scenario. They obviously all share the same ID key. One of them will be leading (mandatory). I use this quite regularly.

TomTom
– TomTom

2018年01月22日 13:39:00 +00:00
Commented Jan 22, 2018 at 13:39
@TomTom - And I recommend it -- but only rarely. I am wondering if this is an "over-simplification" of the real problem. And one of those id's is not really spelled id.

Rick James
– Rick James

2018年01月23日 02:30:40 +00:00
Commented Jan 23, 2018 at 2:30

Add a comment |

Stack Exchange Network

(Mysql) Using Memory engine for 4.3 million row table?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

(Mysql) Using Memory engine for 4.3 million row table?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions