I want to make a post request to a HTTPS-site that should respond with a .csv file. I have this Python code:
url = 'https://www.site.com/servlet/datadownload'
values = {
'val1' : '123',
'val2' : 'abc',
'val3' : '1b3',
}
data = urllib.urlencode(values)
req = urllib2.Request(url,data)
response = urllib2.urlopen(req)
myfile = open('file.csv', 'wb')
shutil.copyfileobj(response.fp, myfile)
myfile.close()
But 'm getting the error:
BadStatusLine: '' (in httplib.py)
I've tried the post request with the Chrome Extension: Advanced REST client (screenshot) and that works fine.
What could be the problem and how could I solve it? (is it becasue of the HTTPS?)
EDIT, refactored code:
try:
#conn = httplib.HTTPSConnection(host="www.site.com", port=443)
=> Gives an BadStatusLine: '' error
conn = httplib.HTTPConnection("www.site.com");
params = urllib.urlencode({'val1':'123','val2':'abc','val3':'1b3'})
conn.request("POST", "/nps/servlet/exportdatadownload", params)
content = conn.getresponse()
print content.reason, content.status
print content.read()
conn.close()
except:
import sys
print sys.exc_info()[:2]
Output:
Found 302
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<HTML><HEAD>
<TITLE>302 Found</TITLE>
</HEAD><BODY>
<H1>Found</H1>
The document has moved <A HREF="https://www.site.com/nps/servlet/exportdatadownload">here</A>.<P>
<HR>
<ADDRESS>Oracle-Application-Server-10g/10.1.3.5.0 Oracle-HTTP-Server Server at mp-www1.mrco.be Port 7778</ADDRESS>
</BODY></HTML>
What am I doing wrong?
4 Answers 4
Is there a reason you've got to use urllib? Requests is simpler, better in almost every way, and abstracts away some of the cruft that makes urllib hard to work with.
As an example, I'd rework you example as something like:
import requests
resp = requests.post(url, data=values, allow_redirects=True)
At this point, the response from the server is available in resp.text, and you can do what you'd like with it. If requests wasn't able to POST properly (because you need a custom SSL certificate, for example), it should give you a nice error message that tells you why.
Even if you can't do this in your production environment, do this in a local shell to see what error messages you get from requests, and use that to debug urllib.
2 Comments
ConnectionError: HTTPSConnectionPool(host='www.site.com', port=443): Max retries exceeded with url: /nps/servlet/exportdatadownload/ (Caused by <class 'httplib.BadStatusLine'>: '') When I browse to https://www.site.com/nps/servlet/exportdatadownload?val1=123& val2=abc&val3=1b3, the excel file is downloaded automatically , but still nog succes with a Python script...BadStatusLine means that the server sent back an HTTP status that Python doesn't understand (and it understands all the "normal" ones). From a command-line, can you do a curl -I https://site.com (with whatever the real URL is there) and paste the results? If you don't have curl, you can also use hurl.it (in which case I'm just interested in the first paragraph of the response).The BadStatusLine: '' (in httplib.py) gives away that there might be something else going on here. This may happen when the server sends no reply back at all, and just closes the connection.
As you mentioned that you're using an SSL connection, this might be particularly interesting to debug (with curl -v URL if you want).
If you find out that curl -2 URL (which forces the use of SSLv2) seems to work, while curl -3 URL (SSLv3), doesn't, you may want to take a look at issue #13636 and possibly #11220 on the python bugtracker. Depending on your Python version & a possibly misconfigured webserver, this might be causing a problem: the SSL defaults have changed in v2.7.3.
Comments
conn = httplib.HTTPSConnection(host='www.site.com', port=443, cert_file=_certfile)
params = urllib.urlencode({'cmd': 'token', 'device_id_st': 'AAAA-BBBB-CCCC',
'token_id_st':'DDDD-EEEE_FFFF', 'product_id':'Unit Test',
'product_ver':"1.6.3"})
conn.request("POST", "servlet/datadownload", params)
content = conn.getresponse().read()
#print response.status, response.reason
conn.close()
5 Comments
httplib.HTTPSConnection('www.site.com'). When I print content.status I get Found 302. And printing the content it self, I get html code with The document has moved <A HREF="https://www.site.com/servlet/exportdatadownload">here</A>.<P> But how do I get the founed file?https://google.com, it feels you have some sort of server/destination issues.httplib.HTTPSConnection(host="www.google.com", port=443) gives an Not Found 404 output and httplib.HTTPConnection("www.google.com") gives Service Unavailable 503/servlet/datadownload URL on google's website, hence the error. Now I am confident your server is the issue. Try to read something simple, like static html page(that you can access via a browser).The server may not like the missing headers, particularly user-agent and content-type. The Chrome image shows what is used for these. Maybe try adding the headers:
import httplib, urllib
host = 'www.site.com'
url = '/servlet/datadownload'
values = {
'val1' : '123',
'val2' : 'abc',
'val3' : '1b3',
}
headers = {
'User-Agent': 'python',
'Content-Type': 'application/x-www-form-urlencoded',
}
values = urllib.urlencode(values)
conn = httplib.HTTPSConnection(host)
conn.request("POST", url, values, headers)
response = conn.getresponse()
data = response.read()
print 'Response: ', response.status, response.reason
print 'Data:'
print data
This is untested code, and you may want to experiment by adding other header values to match your screenshot. Hope it helps.
https_handler = urllib2.HTTPSHandler(1)opener = urllib2.build_opener(https_handler)response = opener.open(req)in place ofresponse = urllib2.urlopen(req)? You should still get the error, but that should turn on debugging in the https response, which should mean that your response will be printed, which you can then use to help track down what isn't working. If it's for some odd reason using another handler, just try the same thing withurllib2.HTTPHandler(1)or whatever handler is relevant.