I'm trying to make a simple proxy server in Python using the socket library, mostly to understand how it works. I'm a noob both at programming and networking, so please be nice if my questions are totally dumb.
At the moment I've set up a proxy server which intercepts requests from the browser, prints them and sends back an "Hello" sample response. It seems to work, so next I'm going to complete it by adding a client which forwards the request to the web server, receives the response and sends it back to the browser.
However, I have two doubts about this first (seemingly working) part of the code:
import socket, threading
serv_port = 50007
server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server.bind(('', serv_port))
server.listen(5)
while True:
#get conn to client
conn, addr = server.accept()
#span thread
conn_thread = threading.Thread(target=cthread, args=(conn,))
conn_thread.start()
def cthread(conn):
#set timeout
conn.settimeout(1)
#receive data until null data is received or socket times out
req = b''
while True:
try:
req_pack = conn.recv(4096)
if not req_pack:
break
else:
req += req_pack
except socket.timeout:
break
print req #print request to stdout
#send a sample response
conn.send(b'HTTP/1.1 200 OK\n\n<h1>Hello</h1') #QUESTION 1
conn.close() #QUESTION 2
My first question is about threading. Should I leave everything as it is, or should I, after receiving a request, span a different thread to deal with the response part?
Sample code of what I mean:
#in function cthread
print req
#span thread
resp_thread = threading.Thread(target=respthread, args=(conn, req))
resp_thread.start()
def respthread(conn, req):
#do everything that's to be done to get response, send it to browser
conn.send(b'HTTP/1.1 200 OK\n\n<h1>Hello</h1') #sample response
conn.close()
Another question: is it correct to close the connection to the browser (conn) after each response has been sent, or does it slow connections too much and it's possible to do without?
2 Answers 2
First off, you need to do some error catching with the socket functions. All of them can possibly fail, it's better to be safe and do some error catching.
Here is what I mean:
try:
server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server.bind(('', serv_port))
server.listen(5)
except socket.msg as error:
print("Recieved error #%d: %s\n" % (error[0], error[1]))
sys.exit(-1)
And, with socket.accept
too:
try: conn, addr = server.accept() except socket.error as error: print("Recieved error #%d: %s\n" % (error[0], error[1])) sys.exit(-1)
When there is an error with a socket call, a tuple is returned containing the error code (returned from the socket syscall) and an error message to go along with it.
I added a sys.exit
to the code that will close the program with exit code -1, which usually means to external that there was an error while running the program. To use this, you will have to import sys
.
In this part of a line
threading.Thread(target=cthread, args=(conn,))
I don't think there should be a comma after conn
in the tuple for args
.
According to the python docs on socket.settimeout
,
The value argument can be a nonnegative float expressing seconds, or
None
.
You aren't passing in a float value when you call settimeout
.
Also, I believe that you should allow more time before a timeout. Reading in 4096 bytes of data might take more than 1 second.
Maybe 3 seconds would work.
Indentation isn't as strict in python 2 as it is in python 3, but you still want it to be perfect.
In your while
loop of cthread
, your indentation is messed up:
def cthread(conn): # extra space
#set timeout
conn.settimeout(1)
#receive data until null data is received or socket times out
req = b''
while True:
try:
req_pack = conn.recv(4096) # extra 2 spaces (1/2 tab)
if not req_pack:
break
else:
req += req_pack
except socket.timeout:
break # extra 10 spaces (2 1/2 tabs)
print req #print request to stdout # missing space
#send a sample response
conn.send(b'HTTP/1.1 200 OK\n\n<h1>Hello</h1') # missing space
conn.close() # missing space
Either indent lines with 1 tab or 4 spaces.
Now for your questions.
I'm a little confused by this question. Right now in your code, when you recieve a connection, you do handle the recieving and the sending in a separate thread. This is fine, and is a good idea.
After you close a connection, there is no connection to slow down. And yes; this is fine. Since your server isn't going to do anything more after it has sent it's response, then there is no point in keeping the connection. It'll just bog your server down in the long-run.
You don't get real multithreading in CPython (see more here).
In your case it is mostly IO operations, so you can get good multithreading by using green threads. Something that is provided by Tulip (included in Python 3.4), or libraries like Twisted, eventlet, or gevent. Most of the libraries specified here will have an example of code similar to what you are trying to achieve.