7

There appear to be several options available to programs that handle large numbers of socket connections (such as web services, p2p systems, etc).

  1. Spawn a separate thread to handle I/O for each socket.
  2. Use the select system call to multiplex the I/O into a single thread.
  3. Use the poll system call to multiplex the I/O (replacing the select).
  4. Use the epoll system calls to avoid having to repeatedly send sockets fd's through the user/system boundaries.
  5. Spawn a number of I/O threads that each multiplex a relatively small set of the total number of connections using the poll API.
  6. As per #5 except using the epoll API to create a separate epoll object for each independent I/O thread.

On a multicore CPU I would expect that #5 or #6 would have the best performance, but I don't have any hard data backing this up. Searching the web turned up this page describing the experiences of the author testing approaches #2, #3 and #4 above. Unfortunately this web page appears to be around 7 years old with no obvious recent updates to be found.

So my question is which of these approaches have people found to be most efficient and/or is there another approach that works better than any of those listed above? References to real life graphs, whitepapers and/or web available writeups will be appreciated.

latonz
1,75010 silver badges21 bronze badges
asked Sep 27, 2008 at 1:12
1
  • I think this is a solved problem and the answer is here - kegel.com/c10k.html Commented Sep 27, 2008 at 1:26

4 Answers 4

4

Speaking with my experience with running large IRC servers, we used to use select() and poll() (because epoll()/kqueue() weren't available). At around about 700 simultaneous clients, the server would be using 100% of a CPU (the irc server wasn't multithreaded). However, interestingly the server would still perform well. At around 4,000 clients, the server would start to lag.

The reason for this was that at around 700ish clients, when we'd get back to select() there would be one client available for processing. The for() loops scanning to find out which client it was would be eating up most of the CPU. As we got more clients, we'd start getting more and more clients needing processing in each call to select(), so we'd become more efficient.

Moving to epoll()/kqueue(), similar spec'd machines would trivially deal with 10,000 clients, with some (admitidly more powerful machines, but still machines that would be considered tiny by todays standards), have held 30,000 clients without breaking a sweat.

Experiments I've seen with SIGIO seem to suggest it works well for applications where latency is extremely important, where there are only a few active clients doing very little individual work.

I'd recommend using epoll()/kqueue() over select()/poll() in almost any situation. I've not experimented with splitting clients between threads. To be honest, I've never found a service that needed more optimsation work done on the front end client processing to justify the experimentation with threads.

answered Dec 30, 2008 at 11:32
Sign up to request clarification or add additional context in comments.

Comments

2

I have spent the 2 last years working on that specific issue (for the G-WAN web server, which comes with MANY benchmarks and charts exposing all this).

The model that works best under Linux is epoll with one event queue (and, for heavy processing, several worker threads).

If you have little processing (low processing latency) then using one thread will be faster using several threads.

The reason for this is that epoll does not scale on multi-Core CPUs (using several concurrent epoll queues for connection I/O in the same user-mode application will just slow-down your server).

I did not look seriously at epoll's code in the kernel (I only focussed on user-mode so far) but my guess is that the epoll implementation in the kernel is crippled by locks.

This is why using several threads quickly hit the wall.

It goes without saying that such a poor state of things should not last if Linux wants to keep its position as one of the best performing kernels.

answered Jun 22, 2011 at 12:45

Comments

2

From my experience, you'll have the best perf with #6.

I also recommend you look into libevent to deal with abstracting some of these details away. At the very least, you'll be able to see some of their benchmark benchmark results.

Also, about how many sockets are you talking about? Your approach probably doesn't matter too much until you start getting at least a few hundred sockets.

answered Sep 30, 2008 at 20:46

Comments

0

I use epoll() extensively, and it performs well. I routinely have thousands of sockets active, and test with up to 131,072 sockets. And epoll() can always handle it.

I use multiple threads, each of which poll on a subset of sockets. This complicates the code, but takes full advantage of multi-core CPUs.

answered Sep 29, 2008 at 16:44

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.