0

My new company stores an intermediate data for our loading process in a table using the memory engine. However a quite simple update (set a.id=b.id) is taking something like six hours to run.

I've checked how big the tables are and in this import they have a about 4.3 million rows of data in them. I'm certain the problem is the memory engine and the lack of index on id (there is a PK but its across 5 other attributes) based on HASH.

I'm 95% certain I need to change the table engine and index type but I want to make sure before I push for the change.

EDIT: Here is the statement that is is so slow:

select @Extraction_id := id from extraction;
update items as a
set a.extraction_id=@Extraction_id;

Edit 2: create statement of table

CREATE TABLE `items` (
 `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
 `extraction_id` char(12) NOT NULL,
 `a` char(6) NOT NULL,
 `b` char(10) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,
 `c` date NOT NULL,
 `d` varchar(100) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,
 `e` varchar(255) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,
 `f` decimal(17,6) DEFAULT NULL,
 `g` decimal(17,6) DEFAULT NULL,
 `h` varchar(150) DEFAULT NULL,
 `i` varchar(150) DEFAULT NULL,
 PRIMARY KEY (`id`,`extraction_id`,`a`,`b`,`c,`d`),
 KEY `fk_extraction` (`extraction_id`),
) ENGINE=MEMORY AUTO_INCREMENT=2343054016 DEFAULT CHARSET=latin1;
asked Jan 22, 2018 at 11:58
8
  • 1
    Looks like that updates all 4.3 million rows of the table?? That's a costly task, and it often implies faulty schema design. Commented Jan 22, 2018 at 13:02
  • @RickJames across 3 different tables two of them are these massive tables which takes forever to process. Commented Jan 22, 2018 at 13:04
  • As RickJames said that is a costly operation. Why would you need to update 4 million rows, with the same extraction_id? You have your @Extraction_id value from the previous select. Just use it where you need it, don't update million of rows with it. Commented Jan 22, 2018 at 13:07
  • @ypercubeTM yes but the problem is this query sits in a file that checks that the extraction id hasn't been used before in the production database, if it has it gives it a new one but this means it has to update all of the other tables that have the same conflicting extraction id. The extraction id is set outside of SQL on the raw csv's we get so we have to change it in code. Commented Jan 22, 2018 at 13:15
  • I still don't understand why you have to update all the rows of the table. If the id had been used, why insert 4 million rows with that id in the first place (and not insert them with the new one)? Or even better, don't have an extraction_id in that items table at all. This looks more like a design issue, as Rick has pointed. Commented Jan 22, 2018 at 13:23

1 Answer 1

0

ENGINE=MEMORY is not designed for big tables. InnoDB is well designed for virtually all use cases; switch to it.

It sounds like you need more than an ENGINE and index type change. Please provide SHOW CREATE TABLE for the relevant tables, and the statement that is so slow.

set a.id=b.id?? -- By convention, DBAs call the PRIMARY KEY id. It is very strange to be setting the PK of one table to the PK of another.

Chunking

One approach to UPDATEing 4M rows will less impact is to "chunk" up the problem. I discuss details here.

answered Jan 22, 2018 at 12:26
5
  • "set a.id=b.id" is part of the processes clash checking where if the ID has already been used before it creates a new one and applies to to the various import tables. Commented Jan 22, 2018 at 12:38
  • I've tested using InnoDB on a test database using 1m rows and InnoDB takes even longer, so I'm not sure thats going to save us. Commented Jan 22, 2018 at 13:12
  • "ENGINE=MEMORY is not designed for big tables" - in mos serious scenarios you would not consider this a large table. Is MySql really that limited? In Memory Storage is used with hundreds of gigabytes RAM by competitors. Commented Jan 22, 2018 at 13:38
  • 1
    "It is very strange to be setting the PK of one table to the PK of another." - no. Horizontal partitioning (splitting fields of a logical table to multiple database tables) is one valid scenario. They obviously all share the same ID key. One of them will be leading (mandatory). I use this quite regularly. Commented Jan 22, 2018 at 13:39
  • @TomTom - And I recommend it -- but only rarely. I am wondering if this is an "over-simplification" of the real problem. And one of those id's is not really spelled id. Commented Jan 23, 2018 at 2:30

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.