How to most efficently handle large numbers of file descriptors?

Question 1

There appear to be several options available to programs that handle large numbers of socket connections (such as web services, p2p systems, etc).

Spawn a separate thread to handle I/O for each socket.
Use the select system call to multiplex the I/O into a single thread.
Use the poll system call to multiplex the I/O (replacing the select).
Use the epoll system calls to avoid having to repeatedly send sockets fd's through the user/system boundaries.
Spawn a number of I/O threads that each multiplex a relatively small set of the total number of connections using the poll API.
As per #5 except using the epoll API to create a separate epoll object for each independent I/O thread.

On a multicore CPU I would expect that #5 or #6 would have the best performance, but I don't have any hard data backing this up. Searching the web turned up this page describing the experiences of the author testing approaches #2, #3 and #4 above. Unfortunately this web page appears to be around 7 years old with no obvious recent updates to be found.

So my question is which of these approaches have people found to be most efficient and/or is there another approach that works better than any of those listed above? References to real life graphs, whitepapers and/or web available writeups will be appreciated.

Question 2

I think this is a solved problem and the answer is here - kegel.com/c10k.html

Question 3

Speaking with my experience with running large IRC servers, we used to use select() and poll() (because epoll()/kqueue() weren't available). At around about 700 simultaneous clients, the server would be using 100% of a CPU (the irc server wasn't multithreaded). However, interestingly the server would still perform well. At around 4,000 clients, the server would start to lag.

The reason for this was that at around 700ish clients, when we'd get back to select() there would be one client available for processing. The for() loops scanning to find out which client it was would be eating up most of the CPU. As we got more clients, we'd start getting more and more clients needing processing in each call to select(), so we'd become more efficient.

Moving to epoll()/kqueue(), similar spec'd machines would trivially deal with 10,000 clients, with some (admitidly more powerful machines, but still machines that would be considered tiny by todays standards), have held 30,000 clients without breaking a sweat.

Experiments I've seen with SIGIO seem to suggest it works well for applications where latency is extremely important, where there are only a few active clients doing very little individual work.

I'd recommend using epoll()/kqueue() over select()/poll() in almost any situation. I've not experimented with splitting clients between threads. To be honest, I've never found a service that needed more optimsation work done on the front end client processing to justify the experimentation with threads.

Question 4

I have spent the 2 last years working on that specific issue (for the G-WAN web server, which comes with MANY benchmarks and charts exposing all this).

The model that works best under Linux is epoll with one event queue (and, for heavy processing, several worker threads).

If you have little processing (low processing latency) then using one thread will be faster using several threads.

The reason for this is that epoll does not scale on multi-Core CPUs (using several concurrent epoll queues for connection I/O in the same user-mode application will just slow-down your server).

I did not look seriously at epoll's code in the kernel (I only focussed on user-mode so far) but my guess is that the epoll implementation in the kernel is crippled by locks.

This is why using several threads quickly hit the wall.

It goes without saying that such a poor state of things should not last if Linux wants to keep its position as one of the best performing kernels.

Question 5

From my experience, you'll have the best perf with #6.

I also recommend you look into libevent to deal with abstracting some of these details away. At the very least, you'll be able to see some of their benchmark benchmark results.

Also, about how many sockets are you talking about? Your approach probably doesn't matter too much until you start getting at least a few hundred sockets.

Question 6

I use epoll() extensively, and it performs well. I routinely have thousands of sockets active, and test with up to 131,072 sockets. And epoll() can always handle it.

I use multiple threads, each of which poll on a subset of sockets. This complicates the code, but takes full advantage of multi-core CPUs.

Perry Lorier · Accepted Answer · 2008-12-30 11:32:17Z

Speaking with my experience with running large IRC servers, we used to use select() and poll() (because epoll()/kqueue() weren't available). At around about 700 simultaneous clients, the server would be using 100% of a CPU (the irc server wasn't multithreaded). However, interestingly the server would still perform well. At around 4,000 clients, the server would start to lag.

The reason for this was that at around 700ish clients, when we'd get back to select() there would be one client available for processing. The for() loops scanning to find out which client it was would be eating up most of the CPU. As we got more clients, we'd start getting more and more clients needing processing in each call to select(), so we'd become more efficient.

Moving to epoll()/kqueue(), similar spec'd machines would trivially deal with 10,000 clients, with some (admitidly more powerful machines, but still machines that would be considered tiny by todays standards), have held 30,000 clients without breaking a sweat.

Experiments I've seen with SIGIO seem to suggest it works well for applications where latency is extremely important, where there are only a few active clients doing very little individual work.

I'd recommend using epoll()/kqueue() over select()/poll() in almost any situation. I've not experimented with splitting clients between threads. To be honest, I've never found a service that needed more optimsation work done on the front end client processing to justify the experimentation with threads.

CollectivesTM on Stack Overflow

How to most efficently handle large numbers of file descriptors?

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related