I have been banging my head against a socket issue for the last two weeks to no avail. I have a setup of 12 'client' machines and one server machine. The server is given a large task, splits it into 12 smaller tasks and then distributes them to the 12 clients. The clients churn away and once they finish their task, they are supposed to let the server know that they have finished via socket communication. For some reason, this has only been working spottily or not at all (both, the server and the clients, just sit in the while loop).
Here is the code on the server:
socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
socket.bind(('localhost', RandomPort))
socket.listen(0)
socket.settimeout(0.9)
[Give all the clients their tasks, then do the following:]
while 1:
data = 'None'
IP = [0,0]
try:
Client, IP = socket.accept()
data = Client.recv(1024)
if data == 'Done':
Client.send('Thanks')
for ClientIP in ClientIPList():
if ClientIP == IP[0] and data == 'Done':
FinishedCount += 1
if FinishedCount == 12:
break
except:
pass
Here is the code on all the clients:
[Receive task from server and execute. Once finished, do the following:]
while 1:
try:
f = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
f.connect((IPofServer, RandomPort))
f.settimeout(0.5)
f.send('Done')
data = f.recv(1024)
if data == 'Thanks':
f.shutdown(socket.SHUT_RDWR)
f.close()
break
except:
f.close()
time.sleep(2+random.random()*5)
I have used Wireshark and found that the packets are flying around. Yet, the "FinishedCount" never seems to increase... Is there anything glaringly wrong that I have missed in setting this up? This is my first exposure to sockets....
Thank you all for your help in advance!
EDIT: I've made the following changes to the code:
On the server: socket.listen is now socket.listen(5)
5 Answers 5
Alright, this took me a while but I think I figured out what was causing this:
- glglgl's answer is correct - using 'localhost' causes the machine to only listen to itself and not to other machines on the network. This was the main culprit.
- Increasing the number allowed in the que from 0 to 5 reduced the likelihood of getting a "connection refused" error on the client side.
I made the mistake of assuming that socket connections in an infinite while loop can be shut down infinitely fast - however, having an infinite while loop on both sides sometimes caused a client to sometimes be counted twice because the while loops were not synchronized. This, of course, caused 'client-agnostic' finishedCount to increase twice which led the server the believe all clients were done when they weren't. Using chown's code (thank you chown!), this can be dealt with like this:
def main(): sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) sock.bind((HOST, PORT)) sock.listen(0) FINISHEDCLIENTS = [] while 1: data = 'None' IP = [0, 0] try: client, ip = sock.accept() data = client.recv(1024) print "%s: Server recieved: '%s'" % (time.ctime(), data) if data == 'Done': print "%s: Server sending: 'Thanks'" % time.ctime() client.send('Thanks') if ip[0] in CLIENT_IPS and ip[0] not in FINISHEDCLIENTS: FINISHEDCLIENTS.append(ip[0]) if len(FINISHEDCLIENTS) == 12: #raise MyException break except Exception, e: print "%s: Server Exception - %s" % (time.ctime(), e)On the client side, I changed the code to this (where of course, RandomPort is the same as the one used in the server script above):
SentFlag = 0 data = 'no' while SentFlag == 0: try: f = socket.socket(socket.AF_INET, socket.SOCK_STREAM) f.connect((IPofServer, RandomPort)) f.settimeout(20) f.send('Done') data = f.recv(1024) if data == 'Thanks': f.shutdown(socket.SHUT_RDWR) f.close() SentFlag = 1 except: f.close() time.sleep(2*random.random())
PS: My understanding of .shutdown() vs .close() is that closes the connection but not necessarily the socket if it is engaged in another communication. .shutdown() shuts down the socket no matter what else it is doing. I don't have any proof for this though.
I think that should do it - thank you all again for helping fix this code!
Comments
Your server has two bugs:
First, this will break out of the inner for loop, not the while loop:
if FinishedCount == 12:
break
Your while loop has no termination condition.
Second, this pattern:
try:
...
except:
pass
Should never be used. You're swallowing up every single exception and ignoring it. That is bad practice, and will lead to bugs. It should be:
try:
...
except OneExceptionIWantToIgnore:
pass
except:
raise
Fix those two and get back to us with results.
3 Comments
except: raise is the same as not having an uncoditional except: in there at all. But this is correct about the break not ending the while loop.socket.timeout: timed out error. I tried setting the socket to non-blocking and that resulted in the [Error 11] Resource Temporarily unavailable error. I originally did the except the way it was to avoid the time-outs - I expect those to happen while the clients are not yet communicating with the server.I believe the issue here is the use of RandomPort. Each client and the server need to be sending/receiving on the same port for this to work. Also, the for ClientIP in ClientIPList(): if ClientIP == IP[0] and data == 'Done': loop is a little redundant and unnecessary. It can be replaced with if ip[0] in clientIpList: and placed inside the if data == 'Done': above it.
A few other thoughts; never name a variable the same name as something you have imported (like socket = socket.socket(..)) because then you will not be able to use the imported library anymore. And unless the client/server are both running on the same system or within the same sub-net, settimeout(0.5) is way to short.
I merged your code with some example code from the python socket documentation and came up with something that works that you should be able to easily adapt for your needs. Here are the scripts; the output from running the server and 12 clients is pasted below.
server.py:
#!/usr/bin/python
# server.py
import sys
import socket
import time
HOST = ''
PORT = 50008
CLIENT_IPS = ["10.10.1.11"]
## No longer necessary if the nested loop isn't needed
#class MyException(Exception):
# pass
def main():
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.bind((HOST, PORT))
sock.listen(0)
finishedCount = 0
while 1:
data = 'None'
IP = [0, 0]
try:
client, ip = sock.accept()
data = client.recv(1024)
print "%s: Server recieved: '%s'" % (time.ctime(), data)
if data == 'Done':
print "%s: Server sending: 'Thanks'" % time.ctime()
client.send('Thanks')
if ip[0] in CLIENT_IPS:
finishedCount += 1
print "%s: Finished Count: '%d'" % (time.ctime(), finishedCount)
if finishedCount == 12:
#raise MyException
break
except Exception, e:
print "%s: Server Exception - %s" % (time.ctime(), e)
#except MyException:
# print "%s: All clients accounted for. Server ending, goodbye!" % time.ctime()
# break
# close down the socket, ignore closing exceptions
try:
sock.close()
except:
pass
print "%s: All clients accounted for. Server ending, goodbye!" % time.ctime()
if __name__ == '__main__':
sys.exit(main())
client.py:
#!/usr/bin/python
# client.py
import sys
import time
import socket
import random
HOST = '10.10.1.11'
PORT = 50008
def main(n):
while 1:
try:
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((HOST, PORT))
s.send('Done')
print "%s: Client %d: Sending - 'Done'.." % (time.ctime(), n)
data = s.recv(1024)
print "%s: Client %d: Recieved - '%s'" % (time.ctime(), n, data)
if data == 'Thanks':
break
except Exception, e:
print "%s: Client %d: Exception - '%s'" % (time.ctime(), n, e)
time.sleep(2 + random.random() * 5)
finally:
try:
s.shutdown(socket.SHUT_RDWR)
except:
pass
finally:
try:
s.close()
except:
pass
print "%s: Client %d: Finished, goodbye!" % (time.ctime(), n)
if __name__ == '__main__':
if len(sys.argv) > 1 and sys.argv[1].isdigit():
sys.exit(main(int(sys.argv[1])))
Output from running 12 Clients:
[ 10:52 [email protected] ~/SO/python ]$ for x in {1..12}; do ./client.py $x && sleep 2; done
Fri Nov 18 10:52:44 2011: Client 1: Sending - 'Done'..
Fri Nov 18 10:52:44 2011: Client 1: Recieved - 'Thanks'
Fri Nov 18 10:52:44 2011: Client 1: Finished, goodbye!
Fri Nov 18 10:52:46 2011: Client 2: Sending - 'Done'..
Fri Nov 18 10:52:46 2011: Client 2: Recieved - 'Thanks'
Fri Nov 18 10:52:46 2011: Client 2: Finished, goodbye!
Fri Nov 18 10:52:48 2011: Client 3: Sending - 'Done'..
Fri Nov 18 10:52:48 2011: Client 3: Recieved - 'Thanks'
Fri Nov 18 10:52:48 2011: Client 3: Finished, goodbye!
Fri Nov 18 10:52:50 2011: Client 4: Sending - 'Done'..
Fri Nov 18 10:52:50 2011: Client 4: Recieved - 'Thanks'
Fri Nov 18 10:52:50 2011: Client 4: Finished, goodbye!
Fri Nov 18 10:52:52 2011: Client 5: Sending - 'Done'..
Fri Nov 18 10:52:52 2011: Client 5: Recieved - 'Thanks'
Fri Nov 18 10:52:52 2011: Client 5: Finished, goodbye!
Fri Nov 18 10:52:54 2011: Client 6: Sending - 'Done'..
Fri Nov 18 10:52:54 2011: Client 6: Recieved - 'Thanks'
Fri Nov 18 10:52:54 2011: Client 6: Finished, goodbye!
Fri Nov 18 10:52:56 2011: Client 7: Sending - 'Done'..
Fri Nov 18 10:52:56 2011: Client 7: Recieved - 'Thanks'
Fri Nov 18 10:52:56 2011: Client 7: Finished, goodbye!
Fri Nov 18 10:52:58 2011: Client 8: Sending - 'Done'..
Fri Nov 18 10:52:58 2011: Client 8: Recieved - 'Thanks'
Fri Nov 18 10:52:58 2011: Client 8: Finished, goodbye!
Fri Nov 18 10:53:01 2011: Client 9: Sending - 'Done'..
Fri Nov 18 10:53:01 2011: Client 9: Recieved - 'Thanks'
Fri Nov 18 10:53:01 2011: Client 9: Finished, goodbye!
Fri Nov 18 10:53:03 2011: Client 10: Sending - 'Done'..
Fri Nov 18 10:53:03 2011: Client 10: Recieved - 'Thanks'
Fri Nov 18 10:53:03 2011: Client 10: Finished, goodbye!
Fri Nov 18 10:53:05 2011: Client 11: Sending - 'Done'..
Fri Nov 18 10:53:05 2011: Client 11: Recieved - 'Thanks'
Fri Nov 18 10:53:05 2011: Client 11: Finished, goodbye!
Fri Nov 18 10:53:07 2011: Client 12: Sending - 'Done'..
Fri Nov 18 10:53:07 2011: Client 12: Recieved - 'Thanks'
Fri Nov 18 10:53:07 2011: Client 12: Finished, goodbye!
[ 10:53 [email protected] ~/SO/python ]$
Output from server running at the same time:
[ 10:52 [email protected] ~/SO/python ]$ ./server.py
Fri Nov 18 10:52:44 2011: Server recieved: 'Done'
Fri Nov 18 10:52:44 2011: Server sending: 'Thanks'
Fri Nov 18 10:52:44 2011: Finished Count: '1'
Fri Nov 18 10:52:46 2011: Server recieved: 'Done'
Fri Nov 18 10:52:46 2011: Server sending: 'Thanks'
Fri Nov 18 10:52:46 2011: Finished Count: '2'
Fri Nov 18 10:52:48 2011: Server recieved: 'Done'
Fri Nov 18 10:52:48 2011: Server sending: 'Thanks'
Fri Nov 18 10:52:48 2011: Finished Count: '3'
Fri Nov 18 10:52:50 2011: Server recieved: 'Done'
Fri Nov 18 10:52:50 2011: Server sending: 'Thanks'
Fri Nov 18 10:52:50 2011: Finished Count: '4'
Fri Nov 18 10:52:52 2011: Server recieved: 'Done'
Fri Nov 18 10:52:52 2011: Server sending: 'Thanks'
Fri Nov 18 10:52:52 2011: Finished Count: '5'
Fri Nov 18 10:52:54 2011: Server recieved: 'Done'
Fri Nov 18 10:52:54 2011: Server sending: 'Thanks'
Fri Nov 18 10:52:54 2011: Finished Count: '6'
Fri Nov 18 10:52:56 2011: Server recieved: 'Done'
Fri Nov 18 10:52:56 2011: Server sending: 'Thanks'
Fri Nov 18 10:52:56 2011: Finished Count: '7'
Fri Nov 18 10:52:58 2011: Server recieved: 'Done'
Fri Nov 18 10:52:58 2011: Server sending: 'Thanks'
Fri Nov 18 10:52:58 2011: Finished Count: '8'
Fri Nov 18 10:53:01 2011: Server recieved: 'Done'
Fri Nov 18 10:53:01 2011: Server sending: 'Thanks'
Fri Nov 18 10:53:01 2011: Finished Count: '9'
Fri Nov 18 10:53:03 2011: Server recieved: 'Done'
Fri Nov 18 10:53:03 2011: Server sending: 'Thanks'
Fri Nov 18 10:53:03 2011: Finished Count: '10'
Fri Nov 18 10:53:05 2011: Server recieved: 'Done'
Fri Nov 18 10:53:05 2011: Server sending: 'Thanks'
Fri Nov 18 10:53:05 2011: Finished Count: '11'
Fri Nov 18 10:53:07 2011: Server recieved: 'Done'
Fri Nov 18 10:53:07 2011: Server sending: 'Thanks'
Fri Nov 18 10:53:07 2011: Finished Count: '12'
Fri Nov 18 10:53:07 2011: All clients accounted for. Server ending, goodbye!
[ 10:53 [email protected] ~/SO/python ]$
3 Comments
socket.close() and it explicitly states that calling .shutdown() has additional effects. Just .close() definitely doesn't "do it for you".Calling listen(0) sets no backlog, so you are much more likely to get a connection refused. The server-side socket is never closed, also. Get rid of the try/excepts for now so you can see what the real problems are. Handle explicit socket.error exceptions otherwise.
Comments
If you do
socket.bind(('localhost', RandomPort))
your server machine will only accept connections from itself, i. e. localhost.
Instead, do
socket.bind(('', RandomPort))
to listen on all interfaces.
1 Comment
AF_INET6 and/or getaddrinfo() for new applications.
print datanothing never actually prints. In fact, the server loop fails atsocket.accept.RandomPortthe same?