Stack Exchange’s Architecture in Bullet Points
Kyle Brandt
I thought as a break form the normal prose some of our readers might enjoy a short overview of the Stack Exchange Network (including Stack Overflow, Server Fault, and Super User) from a technical view:
Traffic:
- 95 Million Page Views a Month
- 800 HTTP requests a second
- 180 DNS requests a second
- 55 Megabits per second
Data Centers:
- 1 Rack with Peak Internet in OR (Hosts our chat and Data Explorer)
- 2 Racks with Peer 1 in NY (Hosts the rest of the Stack Exchange Network)
Production Servers*:
- 12 Web Servers (Windows Server 2008 R2)
- 2 Database Servers (Windows Server 2008 R2 and SQL Server 2008 R2)
- 2 Load Balancers (Ubuntu Server and HAProxy)
- 2 Caching Servers (Redis on CentOS)
- 1 Router / Firewall (Ubuntu Server)
- 3 DNS Servers (Bind on CentOS)
Software and Technologies Used:
- C# / .NET
- Windows Server 2008 R2
- SQL Server 2008 R2
- Ubuntu Server
- CentOS
- HAProxy for load balancing
- Redis for caching
- CruiseControl.NET for builds
- Lucene.NET for search
- Bacula for backups
- Nagios (with n2rrd and drraw plugins) for monitoring
- Splunk for logs
- SQL Monitor from Red Gate for SQL Server monitoring
- Mercurial / Kiln for source control
- Bind for DNS
Developers and System Administrators:
- 14 Developers
- 2 System Administrators
*(excludes fail over and management servers)
Are you going to post details on the hardware of the various machines?
You can find that detail at http://blog.serverfault.com/post/1432571770/ , there have been some DB upgrades since that post however.
What are you guys using for version control? I know, boring, but I’m curious.
Oops forgot about source control 🙂 Updated the post, it is Mercurial (and many use the kiln interface)
Where is the “updated post”, I don’t see any mention of VCS in the above blog post. 😉
What’s the Ubuntu Server for? HAProxy?
Why Redis on CentOS and not Ubuntu like HAProxy?
Hi Darren,
I actually ran into a problem with Ubuntu server where it would boot into ‘runlevel unknown’ which has made me a little wary of Ubuntu server (I was able to fix it, but the problem has been around for a few releases). For my monitoring and backup systems I really favor Ubuntu as it has a package for everything and I can install new tools easily without hunting for repositories or compiling the programs. For things like the router and HAProxy I will probably opt for CentOS going forward. However, I don’t feel so strongly about this that I felt the need to rebuild the haproxy servers. That reminds me, I totally forgot to mention our 3 bind DNS servers — time to update the post!
If you like Ubuntu’s ‘package for everything’, you might like Arch Linux. It’s a little lighter than Ubuntu, and everything in their repos is up-to-date.
for production, I’d rather stick with a widely accepted stable release than the newest version deemed stable by the package maintainer.
Looks pretty cool to me!
Pingback: Tweets that mention Stack Exchange’s Architecture in Bullet Points - blog.serverfault.com -- Topsy.com()
Built in MVC, not web forms, right?
Yup, this is correct, we’re moving to MVC3 now (like right now, chat is already running on it).
Architecture still seems to be favoring scaling-up vs scaling-out, is that correct?
Does the team use any key-value store systems for analytics processing?
Is there any CDN in front of things at all? I’m assuming no, but wanted to verify.
No we do not use a CDN. All static content is served off the sstatic.net domain hosted in our datacenters.
@Kyle,
Can you expand on the http requests (800/s) vs page views (95m/mo) figures a bit? Is 800/s a peak figure? I’m just curious because 95,000,000/(seconds in a month) is more like 36 hits/second. Is 96% of your traffic really API or Ajax calls? (It’s also possible I’m an idiot.)
Every asset served and AJAX request is also an HTTP request. ‘Page views’ doesn’t include these types of requests.
But really, 96% of your hits are not actual pageviews?
Hey Michael, you might wanna take a look at this: http://www.quantcast.com/p-c1rF4kxgLUzNc Click on ‘month’ and select ‘impressions’ from the left select box
Hi Michael,
Each pageview is going to make multiple HTTP requests. These include requests to our cookie-less sstatic.net static content if that data has not been cached on the client as well things like Ajax requests and API requests. But yes, 800 req/s is the peak amount of requests (Actually a little higher). From a sysadmin perspective I am mostly interested in my peak figures as this is what we need to make sure we can handle.
Thanks for clarifying, Kyle. It’s clearly an impressive system–good work!
Are you guys still using LINQ to SQL?
I’d like to know the answer to that too!
Yes, where it makes sense – in other places we’re using raw SQL.
Awesome! Amazing how few servers considering the popularity/throughput – Good admin! 🙂
Pingback: Stack Exchange’s Architecture in Bullet Points « Interesting Tech()
What does backula backup in your architecture?
Pingback: Stack Exchange’s Architecture in Bullet Points – blog.serverfault.com « Netcrema – creme de la social news via digg + delicious + stumpleupon + reddit()
nicely done. Not into the MS stack but nice job anyway.
Pingback: NerdyRoomTM » Stack Overflow, Server Fault und Co()
I bet those are beefy servers whose specs have been conveniently left out. Look we’re only on two servers and it only cost us 100ドルK + hardware costs. And I’m guessing 100ドルK is nowhere near the actual cost since you have to pay by CPU or CAL. Pooh on the open source guys, they need a bunch of 10ドルK servers to get the same results.
I’m guessing you have not seen statistics for some sites in similar usage to SO – You would be surprised…
I know it is kind of late to ask it – but could you elaborate how we would be surprised?
Is the amount of hardware high or low compared to other sites? What is the usage scenario on these other sites: seldom updates and a relativly static page grwoth or many small updates every second (like on so)? How is the search beeing used?Thanks for any feedback
We’re very open about the hardware…we simply didn’t repeat it in this blog post. You can find the complete specs in an earlier post on this very blog: http://blog.serverfault.com/post/1432571770/
dead link
Thanks for the heads up – at some point that link format died. I’ve updated to the current link here: http://blog.serverfault.com/2010/10/29/1432571770/
I am curious about how you are using Bacula for backup. Are you backing up to tape or disk to tape? I assume the actual backup server is running Linux.
We do both. We backup to disk for fast retrieval and to tape for historical archiving.
Pingback: Marketing Related Articles | Zillion Bits()
RReally like this post. Generous gift out to the community.
Pingback: Ravelry in Bullet Points -- Code Monkey Island()
Hi there, you have 2 HaProxy load balancers but then say * excludes fail over. I assume 1 at each DC? Do you have any failover for the load balancers, and how do you manage that/them (heart beat?), why not use Peer1’s ‘community’ load balancer – any issues using your own?
Question: Why prefer splunk over developing your own solution?
My guess is that it’s substantially cheaper (currently) to use an existing tool that has support than it is to roll their own
Log analysis and event correlation is a difficult problem and massive effort to build yourself. Why not default to a pre-existing product? Splunk seems affordable.
Pingback: Daily del.icio.us for February 12th through February 15th — Vinny Carpenter's blog()
Hi Kyle thanks for posting. Would love to know how you decided on Redis vs. the alternatives like Memcached, Microsoft’s alternative, etc?
Also what are some of your use cases for Splunk? Love it.
Pingback: Exectweets » spolsky at 02/11/11 07:31:42()
use Asp.net mvc ?
-One- firewall? What happens if it breaks or gets rebooted? Have you considered dual ASA’s in stateful active/passive failover, or something like CARP atleast?
Fine print at the bottom: *(excludes fail over and management servers)
🙂
Pingback: OpenQuality.ru | Качество программного обеспечения()
95M page views per month on only 12 app servers and 2DB server is truly impressive! Nice work!
Pingback: links for 2011年03月03日 » krisd's blog()
more info :http://highscalability.com/blog/2011/3/3/stack-overflow-architecture-update-now-at-95-million-page-vi.html
Pingback: Stack Overflow架构升级之路()
Pingback: Interesting Links « Endlessly Curious()
Pingback: [repost] Stack Overflow Architecture Update – Now at 95 Million Page Views a Month » New IT Farmer()
Pingback: Delicious Bookmarks for March 13th through March 14th « Lâmôlabs()
Are you guys using any sort of message queuing for background tasks?
For example, sending emails, resizing images (if you needed to do this), etc. Anything that doesn’t have to be done when a web server request is made.
Kyle, how are you making your SQL Servers highly available? Are they setup in a cluster or are you using mirroring?
Pingback: SQL is dead! Long live SQL! : accidental hacker()
Very good information, thanks! It is nice to see stuff that can be applied to the not google/facebook scale websites as well!
Thanks for the post..I don’t know is it right to ask..What are the best practices you are taking while calculating the badges and all? Is it a background task? or done when people click on the points link or profile page?
Awesome! i’m not a system administrators. why 180 DNS requests a second needs 3 DNS server ?
Awesome! What is the page views a month now in 2012?
Best,
Do you use 12 physical web servers? If so, then what hardware configuration they have?
The application is centralized? How do servers to communicate with the application?
Would be great if you updated this to current…
That was fun. The servers and how do they work?کرکره برقی–راهبند اتوماتیک