My code :-
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(("www.python.org" , 80))
s.sendall(b"GET https://www.python.org HTTP/1.0\n\n")
print(s.recv(4096))
s.close()
Why the output shows me this:-
b'HTTP/1.1 500 Domain Not Found\r\nServer: Varnish\r\nRetry-After: 0\r\ncontent-type: text/html\r\nCache-Control: private, no-cache\r\nconnection: keep-alive\r\nContent-Length: 179\r\nAccept-Ranges: bytes\r\nDate: 2017年7月11日 15:23:55 GMT\r\nVia: 1.1 varnish\r\nConnection: close\r\n\r\n\n\n\nFastly error: unknown domain \n\n\nFastly error: unknown domain: . Please check that this domain has been added to a service.'
How can I fix it?
2 Answers 2
This is wrong on multiple levels:
- to access a HTTPS resource you need to create a TLS connection (i.e. ssl_wrap on top of an existing TCP connection, with proper certificate checking etc) and then send the HTTP request. Of course the TCP connection in this case should go to port 443(https) not 80 (http).
- the HTTP request should only contain the path, not the full URL
- the line end must be \r\n not \n
- you better send a Host header too since many severs require it
And that's only the request. Properly handling the response is a different topic.
I really really recommend to use an existing library like requests. HTTP(S) is considerably more complex as most think who only had a look at a few traffic captures.
5 Comments
print(requests.get('https://www.python.org').content). Or are you asking how to fix your code: I don't think it is worth since too much is wrong.import requests
x = requests.get('https://www.python.org')
print x.text
With the requests library, HTTPS requests are very simple! If you're doing this with raw sockets, you have to do a lot more work to negotiate a cipher and etc. Try the above code (python 2.7).
I would also note that, in my experience, Python is excellent for doing things quickly. If you are learning about networking and cryptography, try writing a HTTPS client on your own using sockets. If you want to automate something quickly, use the tools that are available to you. I almost always use requests for this type of task. As an additional note, if you're interested in parsing HTML content, check out the PyQuery library. I've used it to automate interaction with many web services.
GET https://www.python.org-- I think you want "GET /" instead.