7

I need to load 4 million rows of data into a MySQL InnoDB table using LOAD DATA INFILE and would like to know if there are server configuration options I can tweak to get faster load.

It took me 15 minutes to loaded 2 million rows, a performance I thought was disappointing for the LOAD DATA INFILE. My statement looks like this

LOAD DATA LOCAL INFILE 'path/file.csv' INTO TABLE table FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' IGNORE 1 LINES (column1, column2, etc);

RolandoMySQLDBA
185k34 gold badges327 silver badges541 bronze badges
asked Apr 21, 2015 at 1:55
0

2 Answers 2

2

Although LOAD DATA INFILE can work against InnoDB, there are too many ways InnoDB gets tapped to its limits before swapping and bottlenecks takeover.

Here is a Pictorial Representation of InnoDB (from Percona CTO Vadim Tkachenko)

InnoDB Plumbing

The bottlenecks would be goring through the following structures

  • InnoDB Buffer Pool
  • Transaction Logs (ib_lofile0, ib_logfile1)
  • Double Write Buffer
  • Insert Buffer
  • One Rollback Segment
  • Log Buffer

Here are some of my past posts where I discuss LOAD DATA INFILE with InnoDB

SUGGESTION #1

Break up the file into 20 smaller files.

Instead of one LOAD DATA INFILE against a 2 million row file, perform 20 LOAD DATA INFILE against 20 files, each with 100 thousand rows.

The Benefit : Less pressure against the InnoDB Plumbing

SUGGESTION #2 (Optional)

answered Apr 21, 2015 at 2:44
2

I'll bet that you are currently I/O bound. This means that nothing can speed it up. (And Rolando's suggestions may be futile.)

Let's look deeper. Is this LOAD a recurring task? If so, how often? Is everything blocked waiting for table to be reloaded? Simple solution: Load into a different table, then do a double RENAME TABLE to swap it in. Only milliseconds of downtime.

Is the data coming from another machine? Use the network for the "input" side of the LOAD rather than having the one disk fighting for reads versus writes.

Do you have a lot of indexes? There are several directions to take this question. Let's see SHOW CREATE TABLE before barking up these tree(s).

Does the entire load need to be a single transaction? Multiple transactions may be faster because of not overflowing the log file. (I've seen 2x.)

answered Apr 21, 2015 at 3:48

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.