Socket call giving duplicate file descriptors on two or more threads executing at the same time (race condition)

Question 1

I have looked and I haven't seen this answered. I have a multi-threaded c++ networked server type of application. There are multiple threads that use a networking class for different tasks using different and specific port numbers. The client can and does connect/disconnect at different times depending upon the user's needs. When the client does connect, the networking threads all connect basically at the same time. What is happening is that sometimes, two threads will make the socket() call and and both are getting the same file descriptor. Then the bind() call fails with ADDRESS ALREADY IN USE...thanks for not getting me a non-used address socket call, lol.

To be clear, this is not a TIME_WAIT issue and so SO_REUSEADDR won't work here. The old sockets have been successfully closed. This is a race condition on connecting where two threads are executing at or super close to each other using the same networking class and getting the same file descriptor from the socket() call.

The only thing I have found so far is to call netstat from within c++ to search for unused socket addresses. This seems like there could still be a timing race condition. I have more than five networking threads all opening sockets. This is not my design and I also can't change it this late in the game due to risk management. Also, I have a requirement that the connections are 100%, not 99%.

Other than an external program like netstat and searching, is there another way to solve this issue? And if I do have to use netstat, does anyone have reliable code using this method?

I appreciate your time, thank you.

EDIT1: The OS is Linux. I am assuming you are right that the socket API is threadsafe, thank you. What I am clearer on is that this only happens when all of the threads are told to reset by peer. So the threads are all shutting down and restarting their sockets close to each other. I am 100% sure of the file descriptor being the same as per my logs that are all over the place in debug and give me more than enough variable values. It is the file descriptor that is 10 for the error condition but 10 was used in another thread that has shutdown it's socket fd. So I was wrong in saying it was two threads during the startup phase. One has shutdown and let go of the fd of 10 and another is starting and has gotten the fd of 10. Then, the bind fails on this starting up thread. I can't post the code because of rules, sorry.

EDIT2: In between the socket() call and the bind() call, I do use the setsockopt with SO_LINGER with it on and 0 seconds.

Question 2

Without much experience, my first thought: Can't you put a mutex lock on the call to socket, such that two threads cannot call the function at the same time?

Question 3

You need to use some form of thread synchronization (mutex or the like). There are a number of ways to solve this. The quickest would be to declare a global mutex and lock before calling socket.

Question 4

What OS? What addresses are you binding to? (Are you certain the FDs are the same?) Can you show some code?

Question 5

socket() is returning the same file descriptor when called from different threads? socket() is fully multi-thread safe. Post your code.

Question 6

What you are saying is impossible. socket() call not return the same descriptor, no matter how close two calls are. it is 100% thread-safe function. Synchronization is not the issue here.

Question 7

You have a bug where you close the same socket twice. The sequence of events goes like this:

Your code is using some socket, say 10.
You get a connection reset by peer and close socket 10.
Some thread calls socket, it gets socket 10.
Some other thread still thinking it's using the original socket also discovers that the connection is dead and closes socket 10 not realizing what happened in step 2. (For example, maybe it calls send and gets an error because the new socket 10 isn't connected. So it "handles" the error by closing the new socket 10. Oops.)
Some other thread calls socket, it gets socket 10.
You notice that at step 3 and 5 you got the same socket.

You can prove that this is the problem by adding logging to all your calls to close a socket. There will be a close between the two socket calls.

The solution is to make sure that some logical entity in your code always owns a socket that you are using and that only that entity calls close on the socket. No other code can do anything to that socket without coordination of that owning entity.

If, for example, you have separated sending and receiving code, you need to make sure that neither piece can call close on the socket unless the other piece has been fully shut down.

Question 8

EDIT:

as stated by people, it seems that socket() is completly thread-safe and thus does not need synchronization. The culprit is something else and this answer should be discarded.

As stated by the comments in the question, I would also suggest a synchronization using mutexes. However if you only need this function within one process you should consider using a CRITICAL_SECTION instead. Reason being that it is significantly faster than a 'regular' mutex, the only down-side is that it cannot be shared between processes.

Heres my suggested function:

static CRITICAL_SECTION SOCKET_MUTEX;
SOCKET createSocket(int af, int type, int protocol) {
SOCKET result;
EnterCriticalSection(SOCKET_MUTEX);
result = socket(af, type, protocol);
LeaveCriticalSection(SOCKET_MUTEX);
return result;
}

This code does require you to call the following code before using the createSocket function:

InitializeCriticalSection(SOCKET_MUTEX);

Question 9

socket() is fully multithread-safe. Protecting it with a mutex won't accomplish anything.

Question 10

Could you point me to the documentation that says it's thread-safe?

Question 11

socket() is required by POSIX to not only be reentrant and multithread-safe, but also async-signal-safe: pubs.opengroup.org/onlinepubs/9699919799/functions/…

Question 12

i still appreciate the effort, thank you. knowledge still pushes me further to the fix.

David Schwartz 184k18 gold badges229 silver badges293 bronze badges · Accepted Answer · 2015-11-17 19:15:12Z

You have a bug where you close the same socket twice. The sequence of events goes like this:

Your code is using some socket, say 10.
You get a connection reset by peer and close socket 10.
Some thread calls socket, it gets socket 10.
Some other thread still thinking it's using the original socket also discovers that the connection is dead and closes socket 10 not realizing what happened in step 2. (For example, maybe it calls send and gets an error because the new socket 10 isn't connected. So it "handles" the error by closing the new socket 10. Oops.)
Some other thread calls socket, it gets socket 10.
You notice that at step 3 and 5 you got the same socket.

You can prove that this is the problem by adding logging to all your calls to close a socket. There will be a close between the two socket calls.

The solution is to make sure that some logical entity in your code always owns a socket that you are using and that only that entity calls close on the socket. No other code can do anything to that socket without coordination of that owning entity.

If, for example, you have separated sending and receiving code, you need to make sure that neither piece can call close on the socket unless the other piece has been fully shut down.

CollectivesTM on Stack Overflow

Socket call giving duplicate file descriptors on two or more threads executing at the same time (race condition)

2 Answers 2

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

2 Answers 2

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related