I'm investigating the possibility of making a single http request using python to retrieve both the html as well as http headers info instead of having to make 2 seperate calls.
Anyone know of any good ways?
Also what is the performance differences between the different methods of making these requests, e.g. urllib2 and httpconnection, etc.
2 Answers 2
Just use urllib2.urlopen(). The HTML can be retrieved by calling the read() method of the returned object, and the headers are available in the headers attribute.
import urllib2
f = urllib2.urlopen('http://www.google.com')
>>> print f.headers
Date: 2012年6月08日 12:57:25 GMT
Expires: -1
Cache-Control: private, max-age=0
Content-Type: text/html; charset=ISO-8859-1
Server: gws
X-XSS-Protection: 1; mode=block
X-Frame-Options: SAMEORIGIN
Connection: close
>>> print f.read()
<!doctype html><html itemscope itemtype="http://schema.org/WebPage"><head><meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
... etc ...
Comments
If you use HTTPResponse you can the headers and the content with two function calls, but it doesn't make two trips to the server.