This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2011年06月24日 15:39 by ssbarnea, last changed 2022年04月11日 14:57 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| urllib2.patch | vstinner, 2011年09月22日 23:54 | |||
| Messages (19) | |||
|---|---|---|---|
| msg138953 - (view) | Author: Sorin Sbarnea (ssbarnea) * | Date: 2011年06月24日 15:39 | |
It looks that Python 2.7 changes did induce some important bugs into httplib due to to implicit str-unicode encoding/decoding. One clear example is that PyAMF library doesn't work with Python 2.7 because it is not able to generate binary data POST responses. Please check http://dev.pyamf.org/ticket/823 (partial trackback, full in above bug) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 937, in endheaders self._send_output(message_body) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 795, in _send_output msg += message_body |
|||
| msg138971 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2011年06月24日 18:18 | |
If this worked in 2.6 and fails in 2.7, it would probably be helpful if we can determine what change broke it. I believe hg has some sort of 'bisect' support that might make this not too onerous to do. Senthil (or someone) will eventually either figure out the problem or do the bisect, but if you want to speed things along you could do the bisect. |
|||
| msg138975 - (view) | Author: Terry J. Reedy (terry.reedy) * (Python committer) | Date: 2011年06月24日 19:07 | |
A crash is a segfault or equivalent. Python 2.6 only gets security fixes. PyAMF does not run on Python 3. Hence a problem with PyAMF is no evidence of a problem with 3.x. Separate tests/examples would be needed. Changes are not bugs unless they introduce a discrepancy between code and doc. Please post a self-contained example that exhibits the behavior that you consider a problem. It should not just be a repeat of #11898. Then quote the section of the docs that says (or suggests) that the behavior should be different from what it is. The PyAMF site says "PyAMF requires Python 2.4 or newer. Python 3.0 isn’t supported yet." Since 3.0 was deprecated 2 years ago with the release of 3.1, I strongly suspect that the statement was written before 2.7 was released a year ago. Library developers should not make open ended promises like 'or newer' -- certainly not without testing and revising as necessary with each new Python version. If PyAMF was broken by planned, announced, and documented changed in 2.7, that is too bad, but it is a year too late to change 2.7. Like all new versions, it had public beta and release candidate phases when people could test their packages and make comments. I believe what David is getting at is finding out for sure whether the change was intended or not. The quote from the link you provide >msg += message_body appears to be the programming error, already explained in #11898, where msg is unicode and message_body is bytes with non-ascii bytes. >>> u'a'+'\xf0' UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 0: ordinal not in range(128) This is exactly the same error message that followed in the link, except that the position of the non-ascii byte. The fix is to not do the above. |
|||
| msg138977 - (view) | Author: Terry J. Reedy (terry.reedy) * (Python committer) | Date: 2011年06月24日 19:47 | |
Did things like "u'a'+'\xf0'" work in 2.6- (with implicit latin-1 decoding)? (I do not have 2.6 loaded.) The doc for seq+seq (concatenation) in the language reference section 5.6. Binary arithmetic operations says that both sequences must be the same type. In the Library manual, 5.6. Sequence Types, the footnote for seq+seq makes no mention of a special exception for (some) mixed unicode/byte concatenations. I think footnote 6 about string+string should both note the exception and its limitation (and if the limitation was changed in 2.7, say so). (In any case, the exception was removed in Py3, so *this* is not a Py3 issue.) |
|||
| msg138989 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2011年06月24日 21:41 | |
Many applications and libraries say "Python X.Y or newer", and it is one of the strengths of Python that this will often be true. That's what our backward compatibility policy is about, and that's why the fact that it isn't true for 2.x->3.x is such a big deal. As far as I can see there was no deprecation involved here, so "announced" is not a factor, I think. We won't be sure until we know what changed. All that said, it is quite possible (even likely, given #11898) that the pyamf code contains a bug and only worked by accident, and is now failing because some other bug in Python was fixed. Again, we won't know until we have a complete diagnosis of the cause of the change in behavior. |
|||
| msg139103 - (view) | Author: Sorin Sbarnea (ssbarnea) * | Date: 2011年06月25日 17:10 | |
You are right, I debugged the problem a little more and discovered at least one bug in PyAMF. Still, I was surprised to find out something very strange, it look that BytesIO.getvalue() does return `str` even if the documentation says it does return `bytes`. Should I file another bug? Python 2.7.1 (r271:86832, Jun 13 2011, 14:28:51) [GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 23351500)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import io >>> a = io.BytesIO() >>> a <_io.BytesIO object at 0x10f9453b0> >>> a.getvalue() '' >>> print type(a.getvalue()) <type 'str'> >>> |
|||
| msg139107 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2011年06月25日 18:26 | |
No, that's correct. In python 2.x the 'bytes' stuff is just a portability aid. In 2.x, bytes and string are the same type. In Python 3 they aren't, so by using the 'fake' classes in python2 you can often make your code work correctly on both python2 and python3. So, can this issue be closed, or do you think there is still might be a valid backward compatibility issue? |
|||
| msg139108 - (view) | Author: Terry J. Reedy (terry.reedy) * (Python committer) | Date: 2011年06月25日 18:31 | |
In 2.7, bytes is an alias for str to aid porting to 3.x. >>> bytes is str True >>> type(bytes()) <type 'str'> I suspect the doc uses 'bytes' rather than 'str' because it was backported from 3.x. Perhaps it should be changed but I do not know the policy on using the alias in 2.6/7 docs. I presume in 2.7 io.BytesIO is similar, if not equivalent to io.StringIO, but it is not an alias. Again, it was added so 2.7 code could use a bytes memory buffer that would remain bytes in 3.x and not become unicode text, like StringIO does. |
|||
| msg139265 - (view) | Author: Sorin Sbarnea (ssbarnea) * | Date: 2011年06月27日 12:59 | |
Here is a test file that will replicate the problem, I added it as a gist so it could support contributions ;) Py <2.7 works Py ==2.7 fails Py >=3.0 works after minor changes required by py3k https://gist.github.com/1047551 |
|||
| msg139268 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2011年06月27日 13:37 | |
rdmurray>python2.6 py27-str-unicode-bytes.py type(b)=<type 'str'> Traceback (most recent call last): File "py27-str-unicode-bytes.py", line 17, in <module> unicode_str += b # this line will throw UnicodeDecodeError on Python 2.7 UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 4: ordinal not in range(128) And of course it doesn't work earlier than 2.6 since the b'' notation isn't supported before 2.6. |
|||
| msg139269 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2011年06月27日 13:41 | |
To clarify: if I convert your program to using strings pre2.6, it still fails with a UnicodeDecodeError, as one would expect. bytes are strings in 2.x. |
|||
| msg139271 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2011年06月27日 13:48 | |
And finally, your program does *not* succeed on Python3, except in the trivial sense that on python3 you never attempt to add the string and bytes data. It is exactly this kind of programming error that Python3 is designed to avoid: instead of sometimes getting a UnicodeDecodeError depending on what is in the "bytes" string, you *always* get a "Can't convert 'bytes' object to str implicitly" error when you attempt to add string and bytes. |
|||
| msg139272 - (view) | Author: Sorin Sbarnea (ssbarnea) * | Date: 2011年06月27日 13:53 | |
Right, so you have some binary data and you want to sent it to `httplib`. This worked in the past when `msg` was a non-unicode string, but starting with Python 2.7 this became an unicode string, so when you try to append the `message` if will fail because it will try to decode it. |
|||
| msg139283 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2011年06月27日 14:36 | |
But senthil already demonstrated in the previous issue that it does not become a unicode string unless you use unicode input. You also claimed that your test program here succeeded in python2.6, but it does not. This casts a little bit of doubt on your claim that there is a regression. Can you produce a minimal example of using httplib that demonstrates the regression? |
|||
| msg139304 - (view) | Author: Sorin Sbarnea (ssbarnea) * | Date: 2011年06月27日 15:54 | |
I updated the gist and made a minimal test https://gist.github.com/1047551 |
|||
| msg144427 - (view) | Author: Adam Cohen (Adam.Cohen) | Date: 2011年09月22日 22:11 | |
I encountered this issue as well. "params" is simply a bytestring, with no encoding. Workaround/proper solution is to cast the string as a bytearray with bytearray(params). |
|||
| msg144433 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2011年09月22日 23:54 | |
Here is a patch for httplib encoding HTTP headers to ISO-8859-1, as done in Python 3 (see HTTPConnection.putheader() from http.client). urllib is not affected by this issue because it does already encode Unicode, but encode to ASCII instead of ISO-8859-1. Related commit in Python 3: changeset: 67720:b3cadf5cf742 user: Armin Ronacher <armin.ronacher@active-4.com> date: Sat Jan 22 13:44:22 2011 +0000 files: Lib/http/client.py Lib/test/test_httpservers.py Misc/NEWS description: To match the behaviour of HTTP server, the HTTP client library now also encodes headers with iso-8859-1 (latin1) encoding. It was already doing that for incoming headers which makes this behaviour now consistent in both incoming and outgoing direction. |
|||
| msg175727 - (view) | Author: Gregory P. Smith (gregory.p.smith) * (Python committer) | Date: 2012年11月17日 08:09 | |
I'm running into this on 2.7.3 with code that worked fine on 2.6.5. The problem appears to be caused by a 'Host' http header that has a unicode type for the hostname:port value. Encoding header values makes sense though I haven't yet examined the patch in detail. |
|||
| msg370428 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2020年05月31日 12:34 | |
Python 2.7 is no longer supported. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:18 | admin | set | github: 56607 |
| 2020年05月31日 12:34:31 | serhiy.storchaka | set | status: open -> closed nosy: + serhiy.storchaka messages: + msg370428 resolution: out of date stage: test needed -> resolved |
| 2012年11月18日 15:23:13 | eric.araujo | set | nosy:
+ aronacher |
| 2012年11月17日 08:09:11 | gregory.p.smith | set | nosy:
+ gregory.p.smith messages: + msg175727 |
| 2011年09月22日 23:54:34 | vstinner | set | files:
+ urllib2.patch keywords: + patch messages: + msg144433 |
| 2011年09月22日 22:11:22 | Adam.Cohen | set | nosy:
+ Adam.Cohen messages: + msg144427 |
| 2011年08月07日 06:07:40 | orsenthil | set | assignee: orsenthil nosy: + orsenthil |
| 2011年07月04日 16:15:31 | eric.araujo | set | nosy:
+ eric.araujo |
| 2011年07月03日 21:04:48 | thijs | set | nosy:
+ thijs |
| 2011年06月27日 15:54:00 | ssbarnea | set | messages: + msg139304 |
| 2011年06月27日 14:36:25 | r.david.murray | set | messages: + msg139283 |
| 2011年06月27日 13:53:51 | ssbarnea | set | messages: + msg139272 |
| 2011年06月27日 13:48:16 | r.david.murray | set | messages: + msg139271 |
| 2011年06月27日 13:41:29 | r.david.murray | set | messages: + msg139269 |
| 2011年06月27日 13:38:59 | vstinner | set | nosy:
+ vstinner |
| 2011年06月27日 13:37:24 | r.david.murray | set | messages: + msg139268 |
| 2011年06月27日 12:59:56 | ssbarnea | set | messages: + msg139265 |
| 2011年06月25日 18:31:39 | terry.reedy | set | messages: + msg139108 |
| 2011年06月25日 18:26:37 | r.david.murray | set | messages: + msg139107 |
| 2011年06月25日 17:10:24 | ssbarnea | set | messages: + msg139103 |
| 2011年06月24日 21:41:59 | r.david.murray | set | messages: + msg138989 |
| 2011年06月24日 19:47:35 | terry.reedy | set | messages: + msg138977 |
| 2011年06月24日 19:07:58 | terry.reedy | set | stage: test needed type: crash -> behavior versions: - Python 3.1, Python 3.2, Python 3.3, Python 3.4 |
| 2011年06月24日 19:07:26 | terry.reedy | set | nosy:
+ terry.reedy messages: + msg138975 |
| 2011年06月24日 18:18:03 | r.david.murray | set | nosy:
+ r.david.murray messages: + msg138971 |
| 2011年06月24日 15:39:53 | ssbarnea | create | |