HTTPS POST request Python

Question 1

I want to make a post request to a HTTPS-site that should respond with a .csv file. I have this Python code:

url = 'https://www.site.com/servlet/datadownload'
values = {
 'val1' : '123',
 'val2' : 'abc',
 'val3' : '1b3',
}
data = urllib.urlencode(values)
req = urllib2.Request(url,data)
response = urllib2.urlopen(req)
myfile = open('file.csv', 'wb')
shutil.copyfileobj(response.fp, myfile)
myfile.close()

But 'm getting the error:

BadStatusLine: '' (in httplib.py)

I've tried the post request with the Chrome Extension: Advanced REST client (screenshot) and that works fine.

What could be the problem and how could I solve it? (is it becasue of the HTTPS?)

EDIT, refactored code:

try:
 #conn = httplib.HTTPSConnection(host="www.site.com", port=443)

=> Gives an BadStatusLine: '' error

 conn = httplib.HTTPConnection("www.site.com");
 params = urllib.urlencode({'val1':'123','val2':'abc','val3':'1b3'})
 conn.request("POST", "/nps/servlet/exportdatadownload", params)
 content = conn.getresponse()
 print content.reason, content.status
 print content.read()
 conn.close()
except:
 import sys
 print sys.exc_info()[:2]

Output:

Found 302
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<HTML><HEAD>
<TITLE>302 Found</TITLE>
</HEAD><BODY>
<H1>Found</H1>
The document has moved <A HREF="https://www.site.com/nps/servlet/exportdatadownload">here</A>.<P>
<HR>
<ADDRESS>Oracle-Application-Server-10g/10.1.3.5.0 Oracle-HTTP-Server Server at mp-www1.mrco.be Port 7778</ADDRESS>
</BODY></HTML>

What am I doing wrong?

Question 2

What version of python are you using? I would check this answer to see if httplib is working ok with https. I can't try our your code right now, but another piece of advice would be to use a friendlier library for your requests, called... requests.

Question 3

What do you get if you https_handler = urllib2.HTTPSHandler(1) opener = urllib2.build_opener(https_handler) response = opener.open(req) in place of response = urllib2.urlopen(req)? You should still get the error, but that should turn on debugging in the https response, which should mean that your response will be printed, which you can then use to help track down what isn't working. If it's for some odd reason using another handler, just try the same thing with urllib2.HTTPHandler(1) or whatever handler is relevant.

Question 4

I noticed that you are using urllib and urllib2 at the same time. Is that intentional?

Question 5

You should post the site.

Question 6

Is there a reason you've got to use urllib? Requests is simpler, better in almost every way, and abstracts away some of the cruft that makes urllib hard to work with.

As an example, I'd rework you example as something like:

import requests
resp = requests.post(url, data=values, allow_redirects=True)

At this point, the response from the server is available in resp.text, and you can do what you'd like with it. If requests wasn't able to POST properly (because you need a custom SSL certificate, for example), it should give you a nice error message that tells you why.

Even if you can't do this in your production environment, do this in a local shell to see what error messages you get from requests, and use that to debug urllib.

Question 7

The same error: BadStatusLine:

ConnectionError: HTTPSConnectionPool(host='www.site.com', port=443): Max retries exceeded with url: /nps/servlet/exportdatadownload/ (Caused by <class 'httplib.BadStatusLine'>: '')

When I browse to https://www.site.com/nps/servlet/exportdatadownload?val1=123& val2=abc&val3=1b3, the excel file is downloaded automatically , but still nog succes with a Python script...

Question 8

BadStatusLine means that the server sent back an HTTP status that Python doesn't understand (and it understands all the "normal" ones). From a command-line, can you do a curl -I https://site.com (with whatever the real URL is there) and paste the results? If you don't have curl, you can also use hurl.it (in which case I'm just interested in the first paragraph of the response).

Question 9

The BadStatusLine: '' (in httplib.py) gives away that there might be something else going on here. This may happen when the server sends no reply back at all, and just closes the connection.

As you mentioned that you're using an SSL connection, this might be particularly interesting to debug (with curl -v URL if you want). If you find out that curl -2 URL (which forces the use of SSLv2) seems to work, while curl -3 URL (SSLv3), doesn't, you may want to take a look at issue #13636 and possibly #11220 on the python bugtracker. Depending on your Python version & a possibly misconfigured webserver, this might be causing a problem: the SSL defaults have changed in v2.7.3.

Question 10

 conn = httplib.HTTPSConnection(host='www.site.com', port=443, cert_file=_certfile)
 params = urllib.urlencode({'cmd': 'token', 'device_id_st': 'AAAA-BBBB-CCCC',
 'token_id_st':'DDDD-EEEE_FFFF', 'product_id':'Unit Test',
 'product_ver':"1.6.3"})
 conn.request("POST", "servlet/datadownload", params)
 content = conn.getresponse().read()
 #print response.status, response.reason
 conn.close()

Question 11

I've tried your code, but adapted the first line to just httplib.HTTPSConnection('www.site.com'). When I print content.status I get Found 302. And printing the content it self, I get html code with The document has moved <A HREF="https://www.site.com/servlet/exportdatadownload">here</A>.<P> But how do I get the founed file?

Question 12

I've edited my question with more information and with your code.

Question 13

try url https://google.com, it feels you have some sort of server/destination issues.

Question 14

httplib.HTTPSConnection(host="www.google.com", port=443) gives an Not Found 404 output and httplib.HTTPConnection("www.google.com") gives Service Unavailable 503

Question 15

That's good. There isn't /servlet/datadownload URL on google's website, hence the error. Now I am confident your server is the issue. Try to read something simple, like static html page(that you can access via a browser).

Question 16

The server may not like the missing headers, particularly user-agent and content-type. The Chrome image shows what is used for these. Maybe try adding the headers:

import httplib, urllib
host = 'www.site.com'
url = '/servlet/datadownload'
values = {
 'val1' : '123',
 'val2' : 'abc',
 'val3' : '1b3',
}
headers = {
 'User-Agent': 'python',
 'Content-Type': 'application/x-www-form-urlencoded',
}
values = urllib.urlencode(values)
conn = httplib.HTTPSConnection(host)
conn.request("POST", url, values, headers)
response = conn.getresponse()
data = response.read()
print 'Response: ', response.status, response.reason
print 'Data:'
print data

This is untested code, and you may want to experiment by adding other header values to match your screenshot. Hope it helps.

Dan 1,3821 gold badge9 silver badges18 bronze badges · Accepted Answer · 2013-03-04 23:51:38Z

Is there a reason you've got to use urllib? Requests is simpler, better in almost every way, and abstracts away some of the cruft that makes urllib hard to work with.

As an example, I'd rework you example as something like:

import requests
resp = requests.post(url, data=values, allow_redirects=True)

At this point, the response from the server is available in resp.text, and you can do what you'd like with it. If requests wasn't able to POST properly (because you need a custom SSL certificate, for example), it should give you a nice error message that tells you why.

Even if you can't do this in your production environment, do this in a local shell to see what error messages you get from requests, and use that to debug urllib.

The same error: BadStatusLine: ConnectionError: HTTPSConnectionPool(host='www.site.com', port=443): Max retries exceeded with url: /nps/servlet/exportdatadownload/ (Caused by <class 'httplib.BadStatusLine'>: '') When I browse to https://www.site.com/nps/servlet/exportdatadownload?val1=123& val2=abc&val3=1b3, the excel file is downloaded automatically , but still nog succes with a Python script...
BadStatusLine means that the server sent back an HTTP status that Python doesn't understand (and it understands all the "normal" ones). From a command-line, can you do a curl -I https://site.com (with whatever the real URL is there) and paste the results? If you don't have curl, you can also use hurl.it (in which case I'm just interested in the first paragraph of the response).

CollectivesTM on Stack Overflow

HTTPS POST request Python

4 Answers 4

2 Comments

Comments

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

4 Answers 4

2 Comments

Comments

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related