Stack Exchange’s Architecture in Bullet Points

Kyle Brandt

I thought as a break form the normal prose some of our readers might enjoy a short overview of the Stack Exchange Network (including Stack Overflow, Server Fault, and Super User) from a technical view:

Traffic:

95 Million Page Views a Month
800 HTTP requests a second
180 DNS requests a second
55 Megabits per second

Data Centers:

1 Rack with Peak Internet in OR (Hosts our chat and Data Explorer)
2 Racks with Peer 1 in NY (Hosts the rest of the Stack Exchange Network)

Production Servers*:

12 Web Servers (Windows Server 2008 R2)
2 Database Servers (Windows Server 2008 R2 and SQL Server 2008 R2)
2 Load Balancers (Ubuntu Server and HAProxy)
2 Caching Servers (Redis on CentOS)
1 Router / Firewall (Ubuntu Server)
3 DNS Servers (Bind on CentOS)

Software and Technologies Used:

C# / .NET
Windows Server 2008 R2
SQL Server 2008 R2
Ubuntu Server
CentOS
HAProxy for load balancing
Redis for caching
CruiseControl.NET for builds
Lucene.NET for search
Bacula for backups
Nagios (with n2rrd and drraw plugins) for monitoring
Splunk for logs
SQL Monitor from Red Gate for SQL Server monitoring
Mercurial / Kiln for source control
Bind for DNS

Developers and System Administrators:

14 Developers
2 System Administrators

*(excludes fail over and management servers)

Posted by Kyle Brandt (@kylembrandt) on February 11th, 2011
Filed under Hardware, Software

« Our Storage Decision

Network Work this Weekend »

phsr

Are you going to post details on the hardware of the various machines?
- Kyle Brandt
  
  You can find that detail at http://blog.serverfault.com/post/1432571770/ , there have been some DB upgrades since that post however.
Chamiltongt

What are you guys using for version control? I know, boring, but I’m curious.
- Kyle Brandt
  
  Oops forgot about source control 🙂 Updated the post, it is Mercurial (and many use the kiln interface)
  - Guest
    
    Where is the “updated post”, I don’t see any mention of VCS in the above blog post. 😉
Josh Kodroff

What’s the Ubuntu Server for? HAProxy?
Darren Newton

Why Redis on CentOS and not Ubuntu like HAProxy?
- Kyle Brandt
  
  Hi Darren,
  
  I actually ran into a problem with Ubuntu server where it would boot into ‘runlevel unknown’ which has made me a little wary of Ubuntu server (I was able to fix it, but the problem has been around for a few releases). For my monitoring and backup systems I really favor Ubuntu as it has a package for everything and I can install new tools easily without hunting for repositories or compiling the programs. For things like the router and HAProxy I will probably opt for CentOS going forward. However, I don’t feel so strongly about this that I felt the need to rebuild the haproxy servers. That reminds me, I totally forgot to mention our 3 bind DNS servers — time to update the post!
  - Charlie
    
    If you like Ubuntu’s ‘package for everything’, you might like Arch Linux. It’s a little lighter than Ubuntu, and everything in their repos is up-to-date.
    - RyanD
      
      for production, I’d rather stick with a widely accepted stable release than the newest version deemed stable by the package maintainer.
Matt Phillips

Looks pretty cool to me!
Pingback: Tweets that mention Stack Exchange’s Architecture in Bullet Points - blog.serverfault.com -- Topsy.com()
masebase

Built in MVC, not web forms, right?
- Nick Craver
  
  Yup, this is correct, we’re moving to MVC3 now (like right now, chat is already running on it).
Faizan Javed Ph.D.

Architecture still seems to be favoring scaling-up vs scaling-out, is that correct?

Does the team use any key-value store systems for analytics processing?
wdh

Is there any CDN in front of things at all? I’m assuming no, but wanted to verify.
- George Beech
  
  No we do not use a CDN. All static content is served off the sstatic.net domain hosted in our datacenters.
Michael Haren

@Kyle,

Can you expand on the http requests (800/s) vs page views (95m/mo) figures a bit? Is 800/s a peak figure? I’m just curious because 95,000,000/(seconds in a month) is more like 36 hits/second. Is 96% of your traffic really API or Ajax calls? (It’s also possible I’m an idiot.)
- Alex Kahn
  
  Every asset served and AJAX request is also an HTTP request. ‘Page views’ doesn’t include these types of requests.
  - Michael Haren
    
    But really, 96% of your hits are not actual pageviews?
    - Richard
      
      Hey Michael, you might wanna take a look at this: http://www.quantcast.com/p-c1rF4kxgLUzNc Click on ‘month’ and select ‘impressions’ from the left select box
- Kyle Brandt
  
  Hi Michael,
  
  Each pageview is going to make multiple HTTP requests. These include requests to our cookie-less sstatic.net static content if that data has not been cached on the client as well things like Ajax requests and API requests. But yes, 800 req/s is the peak amount of requests (Actually a little higher). From a sysadmin perspective I am mostly interested in my peak figures as this is what we need to make sure we can handle.
  - Michael Haren
    
    Thanks for clarifying, Kyle. It’s clearly an impressive system–good work!
Scott R. Jones

Are you guys still using LINQ to SQL?
- Kyle West
  
  I’d like to know the answer to that too!
- Stack Exchange
  
  Yes, where it makes sense – in other places we’re using raw SQL.
wil

Awesome! Amazing how few servers considering the popularity/throughput – Good admin! 🙂
Pingback: Stack Exchange’s Architecture in Bullet Points « Interesting Tech()
ahm

What does backula backup in your architecture?
Pingback: Stack Exchange’s Architecture in Bullet Points – blog.serverfault.com « Netcrema – creme de la social news via digg + delicious + stumpleupon + reddit()
render

nicely done. Not into the MS stack but nice job anyway.
Pingback: NerdyRoomTM » Stack Overflow, Server Fault und Co()
Linux Thegreat

I bet those are beefy servers whose specs have been conveniently left out. Look we’re only on two servers and it only cost us 100ドルK + hardware costs. And I’m guessing 100ドルK is nowhere near the actual cost since you have to pay by CPU or CAL. Pooh on the open source guys, they need a bunch of 10ドルK servers to get the same results.
- wil
  
  I’m guessing you have not seen statistics for some sites in similar usage to SO – You would be surprised…
  - sm
    
    I know it is kind of late to ask it – but could you elaborate how we would be surprised?
    
    Is the amount of hardware high or low compared to other sites? What is the usage scenario on these other sites: seldom updates and a relativly static page grwoth or many small updates every second (like on so)? How is the search beeing used?Thanks for any feedback
- Nick Craver
  
  We’re very open about the hardware…we simply didn’t repeat it in this blog post. You can find the complete specs in an earlier post on this very blog: http://blog.serverfault.com/post/1432571770/
  - Demetri Obenour
    
    dead link
    - Nick Craver
      
      Thanks for the heads up – at some point that link format died. I’ve updated to the current link here: http://blog.serverfault.com/2010/10/29/1432571770/
Railmeat

I am curious about how you are using Bacula for backup. Are you backing up to tape or disk to tape? I assume the actual backup server is running Linux.
- George Beech
  
  We do both. We backup to disk for fast retrieval and to tape for historical archiving.
Pingback: Marketing Related Articles | Zillion Bits()
James

RReally like this post. Generous gift out to the community.
Pingback: Ravelry in Bullet Points -- Code Monkey Island()
Nick

Hi there, you have 2 HaProxy load balancers but then say * excludes fail over. I assume 1 at each DC? Do you have any failover for the load balancers, and how do you manage that/them (heart beat?), why not use Peer1’s ‘community’ load balancer – any issues using your own?
NotNow

Question: Why prefer splunk over developing your own solution?
- Warren
  
  My guess is that it’s substantially cheaper (currently) to use an existing tool that has support than it is to roll their own
- Matt
  
  Log analysis and event correlation is a difficult problem and massive effort to build yourself. Why not default to a pre-existing product? Splunk seems affordable.
Pingback: Daily del.icio.us for February 12th through February 15th — Vinny Carpenter's blog()
Matt

Hi Kyle thanks for posting. Would love to know how you decided on Redis vs. the alternatives like Memcached, Microsoft’s alternative, etc?

Also what are some of your use cases for Splunk? Love it.
Pingback: Exectweets » spolsky at 02/11/11 07:31:42()
Iamxjb

use Asp.net mvc ?
pauska

-One- firewall? What happens if it breaks or gets rebooted? Have you considered dual ASA’s in stateful active/passive failover, or something like CARP atleast?
- George Beech
  
  Fine print at the bottom: *(excludes fail over and management servers)
  
  🙂
Pingback: OpenQuality.ru | Качество программного обеспечения()
Todd Huss

95M page views per month on only 12 app servers and 2DB server is truly impressive! Nice work!
Pingback: links for 2011年03月03日 » krisd's blog()
Iamxjb

more info :http://highscalability.com/blog/2011/3/3/stack-overflow-architecture-update-now-at-95-million-page-vi.html
Pingback: Stack Overflow架构升级之路()
Pingback: Interesting Links « Endlessly Curious()
Pingback: [repost] Stack Overflow Architecture Update – Now at 95 Million Page Views a Month » New IT Farmer()
Pingback: Delicious Bookmarks for March 13th through March 14th « Lâmôlabs()
Diego Barros

Are you guys using any sort of message queuing for background tasks?

For example, sending emails, resizing images (if you needed to do this), etc. Anything that doesn’t have to be done when a web server request is made.
Richard

Kyle, how are you making your SQL Servers highly available? Are they setup in a cluster or are you using mirroring?
Pingback: SQL is dead! Long live SQL! : accidental hacker()
Anonymous

Very good information, thanks! It is nice to see stuff that can be applied to the not google/facebook scale websites as well!
Joy George

Thanks for the post..I don’t know is it right to ask..What are the best practices you are taking while calculating the badges and all? Is it a background task? or done when people click on the points link or profile page?
Kaye

Awesome! i’m not a system administrators. why 180 DNS requests a second needs 3 DNS server ?
Deivid Martins

Awesome! What is the page views a month now in 2012?

Best,
Novkovski Stevo Bato

Do you use 12 physical web servers? If so, then what hardware configuration they have?
Erick Machado Alvarez

The application is centralized? How do servers to communicate with the application?
Josh

Would be great if you updated this to current…
shahyad

That was fun. The servers and how do they work?کرکره برقی–راهبند اتوماتیک

current community

more communities

Stack Exchange’s Architecture in Bullet Points

Kyle Brandt

Traffic:

Data Centers:

Production Servers*:

Software and Technologies Used:

Developers and System Administrators:

Server Fault Blog

Recently

Pages

Archive

Links