Scaling Digg and Other Web Applications
Todd Hoff
Joe Stump, Lead Architect at Digg, gave this presentation at the Web 2.0 Expo. I couldn't find the actual presentation, but fortunately Kris Jordan took some great notes. That's how key moments in history are accidentally captured forever. Joe was also kind enough to respond to my email questions with a phone call.
In this first part of the post Joe shares some timeless wisdom that you may or may not have read before. I of course take some pains to extract all the wit from the original presentation in favor of simple rules. What really struck me however was how Joe thought MemcacheDB Will be the biggest new kid on the block in scaling. MemcacheDB has been around for a little while and I've never thought of it in that way. Well learn why Joe is so excited by MemcacheDB at the end of the post.
Impressive Stats
Scaling Strategies
IO. Bottlenecks aren't in the language when you are handling so many simultaneous requests. Making PHP 300% faster won't matter. Don't optimize PHP by using single quotes instead of double quotes when
the database is pegged.
- This approach pushes chunks of processing to another service and let's that service schedule the processing on a grid of processors.
- It's faster and more responsive than cron and only slightly less responsive than real-time.
- For example, issuing 5 synchronous database requests slows you down. Do them in parallel.
- Digg uses Gearman. An example use is to get a permalink. Three operations are done parallel: get the current logged, get the permalink, and grab the comments. All three are then combined to return a combined single answer to the client. It's also used for site crawling and logging. It's a different way of thinking.
- See Flickr - Do the Essential Work Up-front and Queue the Rest and The Canonical Cloud Architecture for more information.
- denormalize
- avoid joins
- avoid large scans across databases by partitioning
- cache
- add read slaves
- don't use NFS
Miscellaneous
decisions. Trust that people have it handled and they'll take care of it. Cuts down on meetings because you know people will do the job right.
/1.0/service/id/xml. Version both internal and external services.
MemcacheDB: Evolutionary Step for Code, Revolutionary Step for Performance
Imagine Kevin Rose, the founder of Digg, who at the time of this presentation had 40,000 followers. If Kevin diggs just once a day that's 40,000 writes. As the most active diggers are the most followed it becomes a huge performance bottleneck. Two problems appear.You can't update 40,000 follower accounts at once. Fortunately the queuing system we talked about earlier takes care of that.
The second problem is the huge number of writes that happen. Digg has a write problem. If the average user has 100 followers that’s 300 million diggs day. That's 3,000 writes per second, 7GB of storage per day, and 5TB of data spread across 50 to 60 servers.
With such a heavy write load MySQL wasn’t going to work for Digg. That’s where MemcacheDB comes in. In Initial tests on a laptop MemcacheDB was able to handle 15,000 writes a second. MemcacheDB's own benchmark shows it capable of 23,000 writes/second and 64,000 reads/second. At those write rates it's easy to see why Joe was so excited about MemcacheDB's ability to handle their digg deluge.
What is MemcacheDB? It's a distributed key-value storage system designed for persistent. It is NOT a cache solution, but a persistent storage engine for fast and reliable key-value based object storage and retrieval. It conforms to memcache protocol(not completed, see below), so any memcached client can have connectivity with it. MemcacheDB uses Berkeley DB as a storing backend, so lots of features including transaction and replication are supported.
Before you get too excited keep in mind this is a key-value store. You read and write records by a single key. There aren't multiple indexes and there's no SQL. That's why it can be so fast.
Digg uses MemcacheDB to scale out the huge number of writes that happen when data is denormalized. Remember it's a key-value store. The value is usually a complete application level object merged together from a possibly large number of normalized tables. Denormalizing introduces redundancies because you are keeping copies of data in multiple records instead of just one copy in a nicely normalized table. So denormalization means a lot more writes as data must be copied to all the records that contain a copy. To keep up they needed a database capable of handling their write load. MemcacheDB has the performance, especially when you layer memcached's normal partitioning scheme on top.
I asked Joe why he didn't turn to one of the in-memory data grid solutions? Some of the reasons were:
So it's an evolutionary step for code and a revolutionary step for performance. Digg is looking at using MemcacheDB across the board.