TokuDB vs InnoDB Database data size

Question 1

I have two percona servers, one running with TokuDB tables and the other with InnoDB tables.

On each server, i have create a single database, with a single table in it containing 10 millions records using sysbench. But using the query below, i noticed that the database on the TOkuDB server was way too large compared to the database on the server with InnoDb tables. The TOkuDB table has a clustering index, and the InnoDB table has a covered index (covering all fields except for the primary key).

So here is the table structure for the InnoDB table:

 CREATE TABLE `sbtest1` (
 `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
 `k` int(10) unsigned NOT NULL DEFAULT '0',
 `c` char(120) NOT NULL DEFAULT '',
 `pad` char(60) NOT NULL DEFAULT '',
 PRIMARY KEY (`id`),
 KEY `k_1` (`k`),
 KEY `covered` (`pad`,`k`,`c`)
 ) ENGINE=InnoDB AUTO_INCREMENT=10000001 DEFAULT CHARSET=utf8mb4 
 MAX_ROWS=1000000 ROW_FORMAT=COMPRESSED

And the data size is shown below with the main interest being on database subtest, which contains one InnoDD table with 10 million records:

mysql> SELECT table_schema "Data Base Name",
 -> sum( data_length + index_length ) / 1024 / 1024 "Data Base Size in MB",
 -> (index_length) / 1024 / 1024 "Index data in MB"
 -> FROM information_schema.TABLES GROUP BY table_schema ;
+--------------------+----------------------+------------------+
| Data Base Name | Data Base Size in MB | Index data in MB |
+--------------------+----------------------+------------------+
| information_schema | 0.00976563 | 0.00000000 |
| mysql | 0.77099419 | 0.00390625 |
| performance_schema | 0.00000000 | 0.00000000 |
| sbtest | 1203.36718750 | 67.36718750 |
| test | 0.03906250 | 0.00000000 |
+--------------------+----------------------+------------------+
5 rows in set (0.02 sec)

The second table is a TokuDB table also with 10 million records and running on server 2.

CREATE TABLE `sbtest1` (
 `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
 `k` int(10) unsigned NOT NULL DEFAULT '0',
 `c` char(120) NOT NULL DEFAULT '',
 `pad` char(60) NOT NULL DEFAULT '',
 PRIMARY KEY (`id`),
 KEY `k_1` (`k`),
 CLUSTERING KEY `pad` (`pad`)
) ENGINE=TokuDB AUTO_INCREMENT=10000001 DEFAULT CHARSET=utf8mb4 MAX_ROWS=1000000 ROW_FORMAT=TOKUDB_QUICKLZ

Data size

mysql> SELECT table_schema "Data Base Name",
 -> sum( data_length + index_length ) / 1024 / 1024 "Data Base Size in MB",
 -> (index_length) / 1024 / 1024 "Index data in MB"
 -> FROM information_schema.TABLES GROUP BY table_schema ;
+--------------------+----------------------+------------------+
| Data Base Name | Data Base Size in MB | Index data in MB |
+--------------------+----------------------+------------------+
| information_schema | 0.00976563 | 0.00000000 |
| mysql | 0.77224922 | 0.00390625 |
| performance_schema | 0.00000000 | 0.00000000 |
| sbtest | 12293.65114307 | 5340.57617188 |
| test | 0.00000000 | 0.00000000 |
+--------------------+----------------------+------------------+
5 rows in set (0.02 sec)

As can seen, the TokuDB is using Quicklz compression, which should give slightly better compression that the InnoDB compressed mechanism. So why does the TOkuDB table appear to have way much more data than the InnoDB table?

Both servers had the same amount of memory and disc space and hardware before the tables were created.

Here is the data size from the datadir:

TokuDB
-rw-rw---- 1 mysql mysql 12M Feb 17 14:48 ibdata1
-rw-rw---- 1 mysql mysql 2.9G Feb 17 12:30 _sbtest_sbtest1_key_pad_25fbaee_3_1d_B_0.tokudb
-rw-rw---- 1 mysql mysql 3.1G Feb 17 12:27 _sbtest_sbtest1_main_11_1_1d_B_0.tokudb
InnoDB
-rw-rw---- 1 mysql mysql 332M Feb 17 14:48 ibdata1
-rw-rw---- 1 mysql mysql 2.3G Feb 17 12:38 sbtest1.ibd

Question 2

Did you check real file sizes in datadir?

Question 3

Yes i did. And the TokuDB has a large main and index table file. The index file is as large as the entire InnoDB. I have updated my question to add this info as well

Question 4

Were pad and c all set to the same value? Or 'random' garbage? (This significantly impacts compressibility.) Also CHAR is not realistic, especially with utf8mb4. -- I point this out because "realism" is a common flaw in benchmarks.

Question 5

pad and c have data generated by sysbench. But i did update some values in pad in both tables, and the updates were the same in both tables. And why is utf8mb4 char not realistic? How does that impact benchmark results if both tables are using utf8mb4?

Question 6

First, let's create a table using the TokuDB engine:

CREATE TABLE `toku_test_table` (
 `id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
 `data` VARCHAR(255) NOT NULL,
 PRIMARY KEY (`id`)
) ENGINE=TokuDB COMPRESSION='quicklz';

In this example, we create a table called toku_test_table using the TokuDB engine with the quicklz compression option.

Now, let's create a table using the InnoDB engine:

CREATE TABLE `innodb_test_table` (
 `id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
 `data` VARCHAR(255) NOT NULL,
 PRIMARY KEY (`id`)
) ENGINE=InnoDB ROW_FORMAT=COMPRESSED;
In this example, we create a table called innodb_test_table using the InnoDB engine with the COMPRESSED row format.

Please note that the actual data size and compression ratios will depend on the specific data you insert into these tables and the configuration of your database server.

To further compare the data size between the two tables, you can use the SHOW TABLE STATUS command:

SHOW TABLE STATUS LIKE 'toku_test_table';
SHOW TABLE STATUS LIKE 'innodb_test_table';

This will provide information about the data size and index size for each table.

warashi nguyen warashi nguyen 2231 silver badge6 bronze badges · Answer 1 · 2023-07-23 22:06:05Z

First, let's create a table using the TokuDB engine:

CREATE TABLE `toku_test_table` (
 `id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
 `data` VARCHAR(255) NOT NULL,
 PRIMARY KEY (`id`)
) ENGINE=TokuDB COMPRESSION='quicklz';

In this example, we create a table called toku_test_table using the TokuDB engine with the quicklz compression option.

Now, let's create a table using the InnoDB engine:

CREATE TABLE `innodb_test_table` (
 `id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
 `data` VARCHAR(255) NOT NULL,
 PRIMARY KEY (`id`)
) ENGINE=InnoDB ROW_FORMAT=COMPRESSED;
In this example, we create a table called innodb_test_table using the InnoDB engine with the COMPRESSED row format.

Please note that the actual data size and compression ratios will depend on the specific data you insert into these tables and the configuration of your database server.

To further compare the data size between the two tables, you can use the SHOW TABLE STATUS command:

SHOW TABLE STATUS LIKE 'toku_test_table';
SHOW TABLE STATUS LIKE 'innodb_test_table';

This will provide information about the data size and index size for each table.

Stack Exchange Network

TokuDB vs InnoDB Database data size

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

TokuDB vs InnoDB Database data size

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions