I'm not seeing the Postgres write performance increases I thought I would with a single SSD vs a hardware RAID 10 array of (16) 15k RPM SAS drives.
I have a Dell R820 with a PERC H700 hardware RAID card and 16 15k RPM SAS drives in a RAID 10 array, as well as an 800GB Intel s3700 SSD. The server has 128GB of RAM and 64 cores of Xeon E5-4640 at 2.40GHz, running CentOS 6.4 and Postgres 9.2.4.
I'm using pgbench to compare the SAS drives in a RAID 10 array to the single SSD.
15k RPM SAS RAID 10 Results
pgbench -U postgres -p 5432 -T 50 -c 10 pgbench starting vacuum...end. transaction type: TPC-B (sort of) scaling factor: 1 query mode: simple number of clients: 10 number of threads: 1 duration: 50 s number of transactions actually processed: 90992 tps = 1819.625430 (including connections establishing) tps = 1821.417384 (excluding connections establishing)
Single Intel s3700 SSD Results
pgbench -U postgres -p 5444 -T 50 -c 10 pgbench starting vacuum...end. transaction type: TPC-B (sort of) scaling factor: 1 query mode: simple number of clients: 10 number of threads: 1 duration: 50 s number of transactions actually processed: 140597 tps = 2811.687286 (including connections establishing) tps = 2814.578386 (excluding connections establishing)
In real world usage we have a very write-intensive process that takes about 7 minutes to complete, and the RAID 10 array and SSD are within 10 or 15 seconds of each other.
I expected far better performance from the SSD.
Here are Bonnie++ results for the SSD:
Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP openlink2.rady 252G 532 99 375323 97 183855 45 1938 99 478149 54 +++++ +++ Latency 33382us 82425us 168ms 12966us 10879us 10208us Version 1.96 ------Sequential Create------ --------Random Create-------- openlink2.radyn.com -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 5541 46 +++++ +++ +++++ +++ 18407 99 +++++ +++ +++++ +++ Latency 1271us 1055us 1157us 456us 20us 408us
Here are Bonnie++ results for the RAID 10 15k RPM drives:
Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP openlink2.rady 252G 460 99 455060 98 309526 56 2156 94 667844 70 197.9 85 Latency 37811us 62175us 393ms 75392us 169ms 17633us Version 1.96 ------Sequential Create------ --------Random Create-------- openlink2.radyn.com -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 12045 95 +++++ +++ +++++ +++ 16851 98 +++++ +++ +++++ +++ Latency 7879us 504us 555us 449us 24us 377us
Here are dd results for the SSD:
dd if=/dev/zero of=/path/on/ssd bs=1M count=4096 conv=fdatasync,notrunc 4294967296 bytes (4.3 GB) copied, 12.7438 s, 337 MB/s
And here are dd results for the RAID 10 15k RPM drives:
dd if=/dev/zero of=/path/on/array bs=1M count=4096 conv=fdatasync,notrunc 4294967296 bytes (4.3 GB) copied, 8.45972 s, 508 MB/s
I'd post the Postgres config, but its clear the SSD isn't outperforming the RAID 10 array, so it doesn't seem applicable.
So is the SSD performing as it should be?
Or is the RAID 10 with 16 fast drives just so good that it outperforms a single SSD? A RAID 10 array of the SSD's would be awesome, but at 2,000ドル each the 8,000ドル price tag is hard to justify (unless we were sure to see the 2x to 5x gains we were hoping for in real world performance gains).
================= Update =================
It turns out we have 16 SAS drives in the array, not 8. I think the combined throughput is
Here are iozone benchmarks which shed more light. The RAID10 array produces better results pretty consistently. 4 or 8 of the SSD's in RAID 10 would likely beat the SAS array (at a steep price, of course).
SSD benchmark http://pastebin.com/vEMHCQhR
16 drive RAID-10 benchmark http://pastebin.com/LQNrm7tT
Here is the Postgres config for the SSD, in case anyone sees any room for improvement to take advantage of the SSD http://pastebin.com/Qsb3Ks7Y
3 Answers 3
I'm not really sure this is a problem in itself, because, as you can see, a single SSD drive can outperform an 8 disk RAID 10 setup in many tests.
Almost all the tests point to a better speed of the single SSD drive:
- better latencies
- lower CPU usage (if I'm reading correctly in some case it's 44% vs 95%)
- no of transactions per second is bigger with 55%
- no of transactions in total is bigger with the same 55%
In a single case that SSD was outperformed, and that was sequential writes. Which, I'd say, is most usual for batches, not for an OLTP style of load. So if you're having mostly these kind of writes, maybe a single SSD is not a solution for you now.
And we're not talking about Fusion-IO drives (which I suspect might bring you that next level you'll expect, but at a next level price).
From the point of view of the DBA that had to work with crappy storage over the years, this is a fair advancement in technology and they seem to work properly, but maybe I have set my expectations too low.
I would expect to see more improvements from your SSD in testing the benchmark with more threads and with higher concurrency, as this is where the SSDs shine. So if you could repeat your tests with way more clients and more threads, I'd be curious about that comparison result.
-
The SSD is performing admirably, considering its going up against the combined bandwidth of 16 SAS drives (I incorrectly said 8 originally). The iozone benchmarks, especially viewed in an Excel chart, reveal that a single SSD isn't keeping up in most ways, sometimes significantly. Do you think 4 or 8 SSD's in RAID10 would be faster than the 16 SAS drives in RAID10 for sequential writes?user1517922– user15179222013年06月26日 16:58:36 +00:00Commented Jun 26, 2013 at 16:58
A couple of possibilities:
- Your machine has a lot of RAM. It is possible that this is fulfilling a large portion of the I/O requests from cache, which would smooth over any differences in performance to a greater or lesser extent.
- If your I/O workload is primarily sequential then a HDD array with a large stripe size will give good sequential throughput. With a 256k stripe you can get 600MB/sec off an array of 10 15k drives, which is faster than the listed write performance of a single S3700.
-
We do have a lot of RAM, and Postgres is setup to use about 32GB of it (plenty to fit our entire 8GB database and all indexes into RAM), so read performance is essentially the same. I think it will take at least a 4 SSD RAID10 array or PCIe flash to beat the write performance we have now.user1517922– user15179222013年06月26日 16:54:36 +00:00Commented Jun 26, 2013 at 16:54
Instead of throwing all your eggs in one basket (all SSD or all HDD), you should consider a hybrid disk layout. What do I mean ?
- HDDs are great for writes, especially with write caching enabled
- SSDs are very good for reads, OK for random writes, but kind of sluggish on sequential writes
- It's obvious time has to be spent writing to the transaction logs in the
pg_xlog
folder. Transaction logs are always written sequentially.
SUGGESTION
Perhaps you should mount pg_xlog
on the RAID10 SAS drives. This may help cut down on the number of SAS drives you actually need. Everything else can remain on SSD. That way:
- Your data is ready for fast reads
- The writing to transaction logs can be on separate, yet faster, HDDs.
Give it a Try !!!
UPDATE 2013年06月27日 07:38 EDT
Here is a chart of all the folders within postgresql:
Item Description
PG_VERSION A file containing the major version number of PostgreSQL
base Subdirectory containing per-database subdirectories
global Subdirectory containing cluster-wide tables, such as pg_database
pg_clog Subdirectory containing transaction commit status data
pg_multixact Subdirectory containing multitransaction status data(used for shared row locks)
pg_notify Subdirectory containing LISTEN/NOTIFY status data
pg_stat_tmp Subdirectory containing temporary files for the statistics subsystem
pg_subtrans Subdirectory containing subtransaction status data
pg_tblspc Subdirectory containing symbolic links to tablespaces
pg_twophase Subdirectory containing state files for prepared transactions
pg_xlog Subdirectory containing WAL (Write Ahead Log) files
Since moving pg_xlog
to SAS improved things 10%, try moving other parts of the ACID compliance to SAS as well. Perhaps moving pg_twophase
, pg_multixact
, pg_clog
may help as well.
-
4Thanks. Moving the pg_xlog to the RAID10 did increase performance on the SSD Postgres cluster, by about 10%.user1517922– user15179222013年06月26日 16:52:20 +00:00Commented Jun 26, 2013 at 16:52
pg_test_fsync
, and rather than usingdd
, usesysbench
's disk I/O tests.