I have two percona servers, one running with TokuDB tables and the other with InnoDB tables.
On each server, i have create a single database, with a single table in it containing 10 millions records using sysbench. But using the query below, i noticed that the database on the TOkuDB server was way too large compared to the database on the server with InnoDb tables. The TOkuDB table has a clustering index, and the InnoDB table has a covered index (covering all fields except for the primary key).
So here is the table structure for the InnoDB table:
CREATE TABLE `sbtest1` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`k` int(10) unsigned NOT NULL DEFAULT '0',
`c` char(120) NOT NULL DEFAULT '',
`pad` char(60) NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
KEY `k_1` (`k`),
KEY `covered` (`pad`,`k`,`c`)
) ENGINE=InnoDB AUTO_INCREMENT=10000001 DEFAULT CHARSET=utf8mb4
MAX_ROWS=1000000 ROW_FORMAT=COMPRESSED
And the data size is shown below with the main interest being on database subtest, which contains one InnoDD table with 10 million records:
mysql> SELECT table_schema "Data Base Name",
-> sum( data_length + index_length ) / 1024 / 1024 "Data Base Size in MB",
-> (index_length) / 1024 / 1024 "Index data in MB"
-> FROM information_schema.TABLES GROUP BY table_schema ;
+--------------------+----------------------+------------------+
| Data Base Name | Data Base Size in MB | Index data in MB |
+--------------------+----------------------+------------------+
| information_schema | 0.00976563 | 0.00000000 |
| mysql | 0.77099419 | 0.00390625 |
| performance_schema | 0.00000000 | 0.00000000 |
| sbtest | 1203.36718750 | 67.36718750 |
| test | 0.03906250 | 0.00000000 |
+--------------------+----------------------+------------------+
5 rows in set (0.02 sec)
The second table is a TokuDB table also with 10 million records and running on server 2.
CREATE TABLE `sbtest1` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`k` int(10) unsigned NOT NULL DEFAULT '0',
`c` char(120) NOT NULL DEFAULT '',
`pad` char(60) NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
KEY `k_1` (`k`),
CLUSTERING KEY `pad` (`pad`)
) ENGINE=TokuDB AUTO_INCREMENT=10000001 DEFAULT CHARSET=utf8mb4 MAX_ROWS=1000000 ROW_FORMAT=TOKUDB_QUICKLZ
Data size
mysql> SELECT table_schema "Data Base Name",
-> sum( data_length + index_length ) / 1024 / 1024 "Data Base Size in MB",
-> (index_length) / 1024 / 1024 "Index data in MB"
-> FROM information_schema.TABLES GROUP BY table_schema ;
+--------------------+----------------------+------------------+
| Data Base Name | Data Base Size in MB | Index data in MB |
+--------------------+----------------------+------------------+
| information_schema | 0.00976563 | 0.00000000 |
| mysql | 0.77224922 | 0.00390625 |
| performance_schema | 0.00000000 | 0.00000000 |
| sbtest | 12293.65114307 | 5340.57617188 |
| test | 0.00000000 | 0.00000000 |
+--------------------+----------------------+------------------+
5 rows in set (0.02 sec)
As can seen, the TokuDB is using Quicklz compression, which should give slightly better compression that the InnoDB compressed mechanism. So why does the TOkuDB table appear to have way much more data than the InnoDB table?
Both servers had the same amount of memory and disc space and hardware before the tables were created.
Here is the data size from the datadir:
TokuDB
-rw-rw---- 1 mysql mysql 12M Feb 17 14:48 ibdata1
-rw-rw---- 1 mysql mysql 2.9G Feb 17 12:30 _sbtest_sbtest1_key_pad_25fbaee_3_1d_B_0.tokudb
-rw-rw---- 1 mysql mysql 3.1G Feb 17 12:27 _sbtest_sbtest1_main_11_1_1d_B_0.tokudb
InnoDB
-rw-rw---- 1 mysql mysql 332M Feb 17 14:48 ibdata1
-rw-rw---- 1 mysql mysql 2.3G Feb 17 12:38 sbtest1.ibd
1 Answer 1
First, let's create a table using the TokuDB engine:
CREATE TABLE `toku_test_table` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
`data` VARCHAR(255) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=TokuDB COMPRESSION='quicklz';
In this example, we create a table called toku_test_table using the TokuDB engine with the quicklz compression option.
Now, let's create a table using the InnoDB engine:
CREATE TABLE `innodb_test_table` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
`data` VARCHAR(255) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB ROW_FORMAT=COMPRESSED;
In this example, we create a table called innodb_test_table using the InnoDB engine with the COMPRESSED row format.
Please note that the actual data size and compression ratios will depend on the specific data you insert into these tables and the configuration of your database server.
To further compare the data size between the two tables, you can use the SHOW TABLE STATUS command:
SHOW TABLE STATUS LIKE 'toku_test_table';
SHOW TABLE STATUS LIKE 'innodb_test_table';
This will provide information about the data size and index size for each table.
pad
andc
all set to the same value? Or 'random' garbage? (This significantly impacts compressibility.) AlsoCHAR
is not realistic, especially with utf8mb4. -- I point this out because "realism" is a common flaw in benchmarks.