homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Sending binary data with a POST request in httplib can cause Unicode exceptions
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 2.7
process
Status: closed Resolution: works for me
Dependencies: Superseder:
Assigned To: orsenthil Nosy List: Jiri.Horky, bero, cyrus, eric.araujo, ezio.melotti, orsenthil, santoso.wijaya, ssbarnea, terry.reedy
Priority: normal Keywords: patch

Created on 2011年04月21日 13:42 by bero, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
python-2.7.1-fix-httplib-UnicodeDecodeError.patch bero, 2011年04月21日 13:42 Proposed fix
data Jiri.Horky, 2011年05月15日 18:29 binary data that triggers the problem
Messages (17)
msg134211 - (view) Author: Bernhard Rosenkraenzer (bero) Date: 2011年04月21日 13:42
Sending e.g. a JPEG file with a httplib POST request (e.g. through mechanize) can result in an error like this:
 File "/usr/lib64/python2.7/httplib.py", line 947, in request
 self._send_request(method, url, body, headers)
 File "/usr/lib64/python2.7/httplib.py", line 988, in _send_request
 self.endheaders(body)
 File "/usr/lib64/python2.7/httplib.py", line 941, in endheaders
 self._send_output(message_body)
 File "/usr/lib64/python2.7/httplib.py", line 802, in _send_output
 msg += message_body
UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 2566: invalid start byte
The code triggering this is the attempt to merge the msg and message_body into a single request in httplib.py lines 791+
The patch I'm attaching treats an invalid string of unknown encoding (e.g. binary data wrapped as string) like something that isn't a string.
Works for me with the patch.
msg134824 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011年04月30日 00:11
Did you run the httplib test with your patch? Interactively
>>> from test.test_httplib import test_main as f; f()
(verbose mode, over 40 tests)
In 3.x, the patch would be to http/client.py, line 802 in 3.2 release
if isinstance(message_body, str) # becomes
if isinstance(message_body, bytes)
Will this be an issue in 3.x?
msg134840 - (view) Author: Bernhard Rosenkraenzer (bero) Date: 2011年04月30日 06:57
Not sure how to get it into verbose mode (I presume you don't mean "python -v"), but normal mode (22 tests) works fine:
Python 2.7.1 (r271:86832, Apr 22 2011, 13:40:40)
[GCC 4.6.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from test.test_httplib import test_main as f
>>> f()
test_auto_headers (test.test_httplib.HeaderTests) ... ok
test_ipv6host_header (test.test_httplib.HeaderTests) ... ok
test_putheader (test.test_httplib.HeaderTests) ... ok
test_responses (test.test_httplib.OfflineTest) ... ok
test_bad_status_repr (test.test_httplib.BasicTest) ... ok
test_chunked (test.test_httplib.BasicTest) ... ok
test_chunked_head (test.test_httplib.BasicTest) ... ok
test_epipe (test.test_httplib.BasicTest) ... ok
test_filenoattr (test.test_httplib.BasicTest) ... ok
test_host_port (test.test_httplib.BasicTest) ... ok
test_incomplete_read (test.test_httplib.BasicTest) ... ok
test_negative_content_length (test.test_httplib.BasicTest) ... ok
test_partial_reads (test.test_httplib.BasicTest) ... ok
test_read_head (test.test_httplib.BasicTest) ... ok
test_response_headers (test.test_httplib.BasicTest) ... ok
test_send (test.test_httplib.BasicTest) ... ok
test_send_file (test.test_httplib.BasicTest) ... ok
test_status_lines (test.test_httplib.BasicTest) ... ok
testTimeoutAttribute (test.test_httplib.TimeoutTest)
This will prove that the timeout gets through ... ok
test_attributes (test.test_httplib.HTTPSTimeoutTest) ... ok
testHTTPConnectionSourceAddress (test.test_httplib.SourceAddressTest) ... ok
testHTTPSConnectionSourceAddress (test.test_httplib.SourceAddressTest) ... ok
----------------------------------------------------------------------
Ran 22 tests in 0.004s
OK
Not sure if this is an issue with 3.x - I haven't used 3.x so far.
msg135290 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2011年05月06日 13:08
Hello Bernhard, 
I tried to a POST of JPEG file, through urllib2 (which internally uses httplib) and goes through the code that you pointed out and I don't face any problem. I am able to POST binaries using httplib.
I am also surprised at UnicodeDecodeError which is being raised. The POST data is string (8-bit strings) in Python2.7 and the portion of code will have no problem in creating the content.
You will get UnicodeDecodeError, only if you explicitly pass a Unicode Object as Data and never when you pass string or binary string.
Perhaps mechanize is doing something wrong here and sending a Unicode object.
So, this really does not look like a bug to me.
(Also a note on patch. The patch tries to silence the error, which is wrong thing to do).
If you can provide a simple snippet to reproduce this error, feel free reopen this again. I am closing this as 'works for me'.
Thanks.
msg136043 - (view) Author: Jiri Horky (Jiri.Horky) Date: 2011年05月15日 18:29
I have the same problem as the original submitter.
The reason it previously worked for you was probably because you didn't utilize a "right" unicode string in the urllib2.request. The following code will raise the exception (I enclose the data file for completeness, but it fails with basically any binary data).
It works fine with Python 2.6.6, but fails with Python 2.7.1.
{{{
import urllib2
f = open("data", "r")
mydata = f.read()
f.close()
#this fails
url=unicode('http://localhost/test')
#this works
#url=str('http://localhost/test')
#this also works 
#url=unicode('http://localhost')
req = urllib2.Request(url, data=mydata)
urllib2.urlopen(req)
}}}
msg136060 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2011年05月16日 01:51
The bug was about sending Binary "data" via httplib. In the example you
wrote, you are sending a unicode "url" and experiencing a failure for
certain examples.
In the 2.7, the urls should be str type, we don't have function to
deal with unicode url separately and sending of unicode url is an
error.
msg138056 - (view) Author: Ion Scerbatiuc (cyrus) Date: 2011年06月10日 08:48
Hello,
I would like to subscribe to the issue. The problem seems to indeed exist in Python 2.7. 
What I'm doing is to proxy HTTP requests (using Django) and the PUT / POST requests are working fine on Python 2.6 but are failing on 2.7 with the error already presented in the first bero's message.
I'm using httplib2 and the code looks like
{{
http = httplib2.Http(timeout=5)
try:
 resp, content = http.request(
 request_url, method,
 body=body, headers=headers)
 except (AttributeError, httplib.ResponseNotReady), e:
 # ...
}}
Body is the result of the Django's request.read() which in fact contain the binary data from the PUT / POST request.
The full stack trace is:
{{
Traceback:
File "/home/cyrus/workspace/macleod/ve/lib/python2.7/site-packages/django/core/handlers/base.py" in get_response
 111. response = callback(request, *callback_args, **callback_kwargs)
File "/home/cyrus/workspace/macleod/apps/macleod/macleod/auth.py" in _decorated_view
 33. return view(request, *args, **kwargs)
File "/home/cyrus/workspace/macleod/ve/lib/python2.7/site-packages/django/views/decorators/csrf.py" in wrapped_view
 39. resp = view_func(*args, **kwargs)
File "/home/cyrus/workspace/macleod/ve/lib/python2.7/site-packages/django/views/decorators/csrf.py" in wrapped_view
 52. return view_func(*args, **kwargs)
File "/home/cyrus/workspace/macleod/apps/macleod/macleod/views.py" in dispatch
 55. original=request.build_absolute_uri())
File "/home/cyrus/workspace/macleod/apps/macleod/macleod/handlers/its.py" in proxy
 51. body=body, headers=headers)
File "/home/cyrus/workspace/macleod/ve/lib/python2.7/site-packages/httplib2/__init__.py" in request
 1129. (response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)
File "/home/cyrus/workspace/macleod/ve/lib/python2.7/site-packages/httplib2/__init__.py" in _request
 901. (response, content) = self._conn_request(conn, request_uri, method, body, headers)
File "/home/cyrus/workspace/macleod/ve/lib/python2.7/site-packages/httplib2/__init__.py" in _conn_request
 862. conn.request(method, request_uri, body, headers)
File "/usr/local/lib/python2.7/httplib.py" in request
 941. self._send_request(method, url, body, headers)
File "/usr/local/lib/python2.7/httplib.py" in _send_request
 975. self.endheaders(body)
File "/usr/local/lib/python2.7/httplib.py" in endheaders
 937. self._send_output(message_body)
File "/usr/local/lib/python2.7/httplib.py" in _send_output
 795. msg += message_body
}}
msg138059 - (view) Author: Ion Scerbatiuc (cyrus) Date: 2011年06月10日 09:06
Hello again,
After some digging I found that the "real" problem was because the provided URL was a unicode string and the concatenation was failing. Maybe this is not a big deal, but I think we should least do a proper assertion for the provided URL or some other checks, because the error encountered is at least confusing.
msg138128 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011年06月10日 18:00
Ion, as you perhaps noticed, posting a message 'subscribes' you (puts you on the nosy list). One can also add oneself as nosy with the little button under it without saying anything.
This should be reopened because we do not change error classes in bugfix releases (ie, future 2.7.x releases) because that can break code -- unless the error class is contrary to the doc and we decide the doc is right. Even as a new feature, a change is dubious and carefully to be considered.
msg138142 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011年06月10日 23:45
should *not* be reopened. Sorry for omission of 'not'.
msg138908 - (view) Author: Sorin Sbarnea (ssbarnea) * Date: 2011年06月24日 11:00
Can we get more info regarding resolution of this bug. Due to this bug httplib cannot be used anymore to send binary data. This bug breaks other modules, one example being PyAMF (that does communicate only using binary data).
msg138914 - (view) Author: Sorin Sbarnea (ssbarnea) * Date: 2011年06月24日 11:25
There is another problem that makes the problem even more critical. OS X 10.7 does include Python 2.7.1 as the *default* interpreter.
So we'll need both a fix for the future and an workaround.
BTW, the hack with sys.setdefaultencoding cannot be used if you really send binary data.
msg138952 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2011年06月24日 15:27
Sorin, can you please open another report with more details and how some condition in httplib breaks PyAMF. We will see through that it is fixed. Commenting on an invalid closed issue is confusing.
msg138954 - (view) Author: Sorin Sbarnea (ssbarnea) * Date: 2011年06月24日 15:40
Added as bug http://bugs.python.org/issue12398 
msg138972 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011年06月24日 18:22
Soren, this is an issue that claimed a bug, not a bug. The resolution is that the claim appears false because the problem arose from using unicode rather than bytes url. The error message may be confusing, but the error class cannot be changed. Senthil says that he *did* send non-ascii bytes with no problem.
msg139110 - (view) Author: Sorin Sbarnea (ssbarnea) * Date: 2011年06月25日 19:54
I have to add some details here. First, this bug has nothing to do with the URL, it does reproduce for normal urls.
Still the problem with the line: "msg += message_body" is quite complex when combined with Python 2.7:
type(msg) is unicode
type(message_body) is str ... even if I tried to manually force Python for use bytes. It seams that in 2.7 bytes are alias to str. Due to this the code will fail to run only on 2.7 because it will try to convert binary data to unicode string.
If I am not mistaken the code will work with Python 3.x, because there bytes() are not str().
msg139116 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2011年06月25日 20:22
Hi Sorin,
On Sat, Jun 25, 2011 at 07:54:24PM +0000, sorin wrote:
> type(message_body) is str ... even if I tried to manually force
> Python for use bytes. It seams that in 2.7 bytes are alias to str.
> Due to this the code will fail to run only on 2.7 because it will
> try to convert binary data to unicode string.
Bit confused here. You encode the string to bytes and decode it back
to str. One does not force bytes to str. And if you use, str or bytes
consistently in Python2.7 you wont face the problem.
History
Date User Action Args
2022年04月11日 14:57:16adminsetgithub: 56107
2011年07月04日 16:16:35eric.araujosetmessages: - msg134878
2011年06月25日 20:22:42orsenthilsetmessages: + msg139116
2011年06月25日 19:54:23ssbarneasetmessages: + msg139110
2011年06月24日 18:22:47terry.reedysetmessages: + msg138972
2011年06月24日 15:40:23ssbarneasetmessages: + msg138954
2011年06月24日 15:27:10orsenthilsetmessages: + msg138952
2011年06月24日 11:25:05ssbarneasetmessages: + msg138914
2011年06月24日 11:00:41ssbarneasetnosy: + ssbarnea
messages: + msg138908
2011年06月10日 23:45:50terry.reedysetmessages: + msg138142
2011年06月10日 18:00:15terry.reedysetmessages: + msg138128
2011年06月10日 09:06:29cyrussetmessages: + msg138059
2011年06月10日 08:48:13cyrussetnosy: + cyrus
messages: + msg138056
2011年05月16日 01:51:57orsenthilsetmessages: + msg136060
2011年05月15日 18:29:59Jiri.Horkysetfiles: + data
nosy: + Jiri.Horky
messages: + msg136043

2011年05月06日 13:08:19orsenthilsetstatus: open -> closed
messages: + msg135290

assignee: orsenthil
resolution: works for me
stage: test needed -> resolved
2011年04月30日 16:30:08eric.araujosetnosy: + eric.araujo
messages: + msg134878
2011年04月30日 06:57:04berosetmessages: + msg134840
2011年04月30日 00:11:31terry.reedysetnosy: + terry.reedy

messages: + msg134824
stage: test needed
2011年04月21日 17:37:57santoso.wijayasetnosy: + santoso.wijaya
2011年04月21日 13:44:13ezio.melottisetnosy: + orsenthil, ezio.melotti
2011年04月21日 13:42:33berocreate

AltStyle によって変換されたページ (->オリジナル) /