Some context/background on my issue. I'm a sys admin asked to perform some DBA functions etc. I have a very large 2TB data file to load into a mysql table that will end up being used in read-only mode. I'm transitioning from a mysql 5.5 myisam table on Ubuntu 14.04 to a mysql 5.7 innodb table on Ubuntu 18.04 (different systems). The table is partitioned to keep memory usage reasonable within system RAM.
Using myisam, this large load would take a couple of days but that was expected and no issues with the undo or redo logs etc. Since myisam is being depreciated, I tried loading this data into an InnoDB table. Here are the system specs:
2 x 12-core CPUs 512GB system RAM. mysql data is currently on 6TB hard drive
Over the weekend, I let it load for 3-days into a brand new mysql install with these config params:
datadir = /export/home/data/mysql tmpdir = /export/home/SSD1/tmp skip-external-locking key_buffer_size = 16M max_allowed_packet = 256M thread_stack = 192K thread_cache_size = 8 max_heap_table_size = 16G; tmp_table_size = 4G query_cache_limit = 1M query_cache_size = 16M log_error = /var/log/mysql/error.log expire_logs_days = 10 max_binlog_size = 100M innodb_file_per_table=1 innodb_file_format=barracuda innodb_strict_mode=1 innodb_buffer_pool_size = 300G innodb_log_file_size = 200M innodb_log_buffer_size = 100M innodb_flush_log_at_trx_commit = 2
I was monitoring the load and noticed that it had written about 2TB of mysql data files in the database directory which was expected. However, the ibdata1 file had grown from 32M to 330GB. I'm assuming that most of this space is for the undo log transactions? During the load, I ran "show engine innodb status" and noticed this for the undo entries:
TRANSACTIONS
Trx id counter 46678
Purge done for trx's n:o < 46678 undo n:o < 0 state: running but idle
History list length 0
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 421569004301048, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 18023, ACTIVE 418094 sec inserting
mysql tables in use 1, locked 1
100 lock struct(s), heap size 8400, 0 row lock(s), undo log entries 16758402383
Since this table will be mostly used in read-only like it was when it was a MyISAM table, are there any optimizations that would help with the data load? Are there any InnoDB optimizations for a read-only table?
Besides explicitly moving the undo logs to a separate disk/directory, is there anything to be done to improve the performance of the undo logs?
Here's the create table:
CREATE TABLE
Orthologs_New
(
GeneID1
varchar(50) NOT NULL,
GeneID2
varchar(50) NOT NULL,
OrgID1
varchar(50) NOT NULL,
OrgID2
varchar(50) NOT NULL,
PValue
double DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1
/*!50100 PARTITION BY KEY (OrgID1,OrgID2)
PARTITIONS 100 */
I'm not a DBA, so I'm a bit lost here. Thanks in advance!
1 Answer 1
URGENT: If your MyISAM table is 2TB, it may not fit as InnoDB on a 6TB drive. Even if it does fit, there won't be room to do any non-trivial ALTER
.
URGENT: innodb_file_per_table = 1
needed to be set before creating the table. Please elaborate on how you converted from MyISAM to InnoDB; I don't have an obvious answer to the "330GB". There is probably a .frm
file for the table; is there a .ibd
file? If not, then somehow the file_per_table
failed.
Readonly InnoDB loves innodb_buffer_pool_size = 300G
. With 512G of RAM, this is viable and would help some:
innodb_buffer_pool_size = 400G
innodb_buffer_pool_instances = 16
will help a little.
You mentioned partitioning. It usually does not help performance; in fact, it sometimes hurts performance. Please provide SHOW CREATE TABLE
, the number of partitions, and typical queries. In particular, PARTITION BY KEY (OrgID1,OrgID2)
is no better than having INDEX(OrgID1,OrgID2)
.
(Minor warning: The need for this is going away: innodb_file_format=barracuda
.)
More discussion of the engine conversion: http://mysql.rjweb.org/doc.php/myisam2innodb
Based on the provided schema, I recommend these changes for the loading:
InnoDB really likes to have a PK, so add this to the table definition:
id INT UNSIGNED NOT NULL AUTO_INCREMENT, PRIMARY KEY(id)
Get rid of the partitioning.
How much precision do you need in
PValue
?DOUBLE
provides about 16 significant digits at a cost of 8 bytes.FLOAT
gives 7 for 4 bytes. If 7 is 'good enough', consider whether it is worth saving the ~100GB of space that might save.After loading, add the following index. What is the typical
WHERE
clause? (I sense that it is always `WHERE OrgID1='...' AND OrgID2 = '...'.)ALTER TABLE Orthologs_New ADD INDEX(OrgID1,OrgID2);
If you will periodically be reloading the info, then I can give you a recommendation on how to replace the table using RENAME TABLE
.
Is it the case that this pair (OrgID1,OrgID2)
is Unique? If so, the above PK + INDEX would be better done by simply
PRIMARY KEY(OrgID1,OrgID2)
That would slightly speed up SELECTs
. But then the LOAD DATA
may slow down unless the data comes in in that order. If the source data is in that order the LOAD
should be fast. If not, then it must jump around, which will be very heavy on I/O -- depending on the relative size of innodb_buffer_pool_size
(400G or whatever) versus the ultimate size of the table (4TB or whatever). If possible, sort the data by that pair of values before doing the LOAD
.
This may be a faster way to load:
CREATE TABLE ... ENGINE=MyISAM; -- with no keys
LOAD DATA INFILE ...
ALTER TABLE Orthologs_New
ADD (key(s) as discussed above)
ENGINE=InnoDB;
The rationale is that the LOAD
will be quite fast, then the ALTER
will rebuild the table, add the indexes, etc.
If, instead, the 4 columns are unique, then make them the PRIMARY KEY
and arrange them in the PK based on what the common queries will be. Possibly this will eliminate your need for an SP building memory tables. (I am more interested in seeing the queries that it supports than in seeing the SP itself.)
-
The old MyISAM database is on another server. This new server is brand new, so I set the innodb_file_per_table=1. This new server is currently a test server I'm using for testing this load to get an idea of what extra resources InnoDB needs over MyISAM. I read that InnoDB could be 2X MyISAM. I've updated my original question with the table definition. I killed the load on Monday and today (Thurs), it's still trying to clean up after itself, assuming purging undo logs?qman777– qman7772021年08月13日 06:22:37 +00:00Commented Aug 13, 2021 at 6:22
-
@qman777 - Yes, 2x-3x is typical. I have added a bunch to my Answer. Check the size of the file
ibdata1
. It may have bloated.Rick James– Rick James2021年08月13日 15:29:49 +00:00Commented Aug 13, 2021 at 15:29 -
Rick, thanks for your suggestions, this is great insight. 1) we just recently removed the pvalue requirement, so that would help immensely in reducing the database size. 2) after 4 days of killing the load, the job is still "canceling". Should I set the fast shutdown and stop the server to start over? 3) I love the idea of using MyISAM first, but isn't that going away soon? 4) Yes, we reload this table twice per year. The data is growing quite fast and we're considering moving this to a No-SQL DB. 5) just realized we have an index on each of the varchar fields too.qman777– qman7772021年08月13日 19:37:34 +00:00Commented Aug 13, 2021 at 19:37
-
@qman777 - 1) Don't need
pvalue
? This makes me wonder what the table is for. 2) Try it. You have nothing to lose, since the table will be rebuilt anyway. 3) 8.0 is trying hard to get rid of MyISAM. MariaDB will hang onto it for some time. 4) Let me know if you want the RENAME tip. (I can't address NoSQL.) 5) Let's see the queries that need those indexes. Rebuilding these indexes is very time-consuming; maybe you don't need all the indexes, or one index can serve multiple uses.Rick James– Rick James2021年08月13日 21:16:15 +00:00Commented Aug 13, 2021 at 21:16 -
After the load finally cleaned up, I ran innochecksum --page-type-summary ibdata1 and these are the big hitters: 21393069 Undo log page 190752 Freshly allocated page 1311 Insert buffer bitmap 98 System page 1 Transaction system page 3 File Space Header 1310 Extent descriptor page Undo page type: 21392561 insert, 508 update, 0 other Undo page state: 0 active, 189 cached, 0 to_free, 415 to_purge, 0 prepared, 21392465 otherqman777– qman7772021年08月16日 23:29:36 +00:00Commented Aug 16, 2021 at 23:29
LOAD DATA INFILE
? Reading output frommysqldump
? Programatically?ALTER TABLE .. ENGINE=InnoDB
? Other?