Sunday, April 22, 2018
MyRocks, malloc and fragmentation -- a strong case for jemalloc
While trying to reproduce a MyRocks performance problem I ran a test using a 4gb block cache and tried both jemalloc and glibc malloc. The test server uses Ubuntu 16.04 which has glibc 2.23 today. The table below lists the VSZ and RSS values for the mysqld process after a test table has been loaded. RSS with glibc malloc is 2.6x larger than with jemalloc. MyRocks and RocksDB are much harder on an allocator than InnoDB and this test shows the value of jemalloc.
I am not sure that it is possible to use a large RocksDB block cache with glibc malloc, where large means that it gets about 80% of RAM.
I previously shared results for MySQL and for MongoDB. There have been improvements over the past few years to make glibc malloc perform better on many-core servers. I don't know whether that work also made it better at avoiding fragmentation.
VSZ(gb) RSS(gb) malloc
7.9 4.8jemalloc-3.6.0
13.612.4glibc-2.23
I am not sure that it is possible to use a large RocksDB block cache with glibc malloc, where large means that it gets about 80% of RAM.
I previously shared results for MySQL and for MongoDB. There have been improvements over the past few years to make glibc malloc perform better on many-core servers. I don't know whether that work also made it better at avoiding fragmentation.
Subscribe to:
Post Comments (Atom)
Challenges compiling old C++ code on modern Linux
I often compile old versions of MySQL, MariaDB, Postgres and RocksDB in my search for performance regressions. Compiling is easy with Postgr...
-
I need stable performance from the servers I use for benchmarks. I also need servers that don't run too hot because too-hot servers caus...
-
I previously used math to explain the number of levels that minimizes write amplification for an LSM tree with leveled compaction. My answe...
-
This provides additional results for Postgres versions 11 through 16 vs Sysbench on a medium server. My previous post is here . The goal is ...
2 comments:
Have you used hugepages via https://github.com/facebook/rocksdb/wiki/Allocating-Some-Indexes-and-Bloom-Filters-using-Huge-Page-TLB and fiddled with arena_block_size as well? Moving indexes to huge pages should reduce your fragmentation.... Also arena_block_size also helps force use of the rocksdb private allocator as opposed to malloc/jemalloc at all....
Reply DeleteI am interested in reading results from such tuning, but I won't run those tests.
DeleteI am wary of depending on huge pages. I have smart friends who prefer we don't use them on production servers. I am also wary of tuning malloc, we already have too much tuning in RocksDB so I don't want to extend that cost to more things.
I have yet to repeat tests to determine the impact of arena_block_size. It is 8mb on MyRocks in Percona Server 5.7.21. That seems large enough to avoid fragmentation, but I don't have time to determine the impact of changing it. Eventually we run out of time and HW for running perf tests and need systems that don't require so much tuning.