I was learning socket programming and tried to design a basic http client of mine. But somehow everything is going good but I am not receiving any data. Can you please tell me what am I missing?
CODE
import socket
def create_socket():
return socket.socket( socket.AF_INET, socket.SOCK_STREAM )
def remove_socket(sock):
sock.close()
del sock
sock = create_socket()
print "Connecting"
sock.connect( ('en.wikipedia.org', 80) )
print "Sending Request"
print sock.sendall ('''GET /wiki/List_of_HTTP_header_fields HTTP/1.1
Host: en.wikipedia.org
Connection: close
User-Agent: Web-sniffer/1.0.37 (+http://web-sniffer.net/)
Accept-Encoding: gzip
Accept-Charset: ISO-8859-1,UTF-8;q=0.7,*;q=0.7
Cache-Control: no-cache
Accept-Language: de,en;q=0.7,en-us;q=0.3
Referer: d_r_G_o_s
''')
print "Receving Reponse"
while True:
content = sock.recv(1024)
if content:
print content
else:
break
print "Completed"
OUTPUT
Connecting
Sending Request
298
Receving Reponse
Completed
While I was expecting it show me html content of homepage of wikipedia :'(
Also, it would be great if somebody can share some web resources / books where I can read in detail about python socket programming for HTTP Request Client
Thanks!
1 Answer 1
For a minimal HTTP client, you definitely shouldn't send Accept-Encoding: gzip -- the server will most likely reply with a gzipped response you won't be able to make much sense of by eye. :)
You aren't sending the final double \r\n (nor are you actually terminating your lines with \r\n as per the spec (unless you happen to develop on Windows with Windows line endings, but that's just luck and not programming per se).
Also, del sock there does not do what you think it does.
Anyway -- this works:
import socket
sock = socket.socket()
sock.connect(('en.wikipedia.org', 80))
for line in (
"GET /wiki/List_of_HTTP_header_fields HTTP/1.1",
"Host: en.wikipedia.org",
"Connection: close",
):
sock.send(line + "\r\n")
sock.send("\r\n")
while True:
content = sock.recv(1024)
if content:
print content
else:
break
EDIT: As for resources/books/reference -- for a reference HTTP client implementation, look at Python's very own httplib.py. :)
4 Comments
remove_socket function -- doing it there, just before the function is exited, does nothing worthwhile. (You very rarely actually need del in Python code anyway.)
'\r\n')? Also, after the headers you should have a single empty line, this tells the server that the headers are done.