This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2013年11月10日 09:20 by ncoghlan, last changed 2022年04月11日 14:57 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| issue19543.patch | djmitche, 2015年04月14日 02:08 | issue19543.patch | ||
| issue19543_blacklist_transforms_py27.patch | serhiy.storchaka, 2015年04月28日 16:13 | review | ||
| issue19543_unicode_decode.patch | serhiy.storchaka, 2015年05月31日 17:59 | review | ||
| Messages (19) | |||
|---|---|---|---|
| msg202512 - (view) | Author: Alyssa Coghlan (ncoghlan) * (Python committer) | Date: 2013年11月10日 09:20 | |
The long discussion in issue 7475 and some subsequent discussions I had with Armin Ronacher have made it clear to me that the key distinction between the codec systems in Python 2 and Python 3 is the following differences in type signatures of various operations: Python 2 (8 bit str): codecs module: object <-> object convenience methods: basestring <-> basestring available codecs: unicode <-> str, str <-> str, unicode <-> unicode Python 3 (Unicode str): codecs module: object <-> object convenience methods: str <-> bytes available codecs: str <-> bytes, bytes <-> bytes, str <-> str The significant distinction is the fact that, in Python 2, the convenience methods covered all standard library codecs, but for Python 3, the codecs module needs to be used directly for the bytes <-> bytes codecs and the one str <-> str codec (since those codecs no longer satisfy the constraints of the text model related convenience methods). After attempting to implement a 2to3 fixer for these non-Unicode codecs in issue 17823, I realised that wouldn't really work properly (since it's a data driven error based on the behaviour of the named codec), so I'm rejecting that proposal and replacing it with this one for additional Py3k warnings in Python 2.7.7. My proposal is to take the following cases and make them produce warnings under Python 2.7.7 when Py3k warnings are enabled (remember, these are the 2.7 types, not the 3.x ones): - the str.encode method is called (redirect to codecs.encode to handle arbitrary input types in a forward compatible way) - the unicode.decode method is called (redirect to codecs.decode to handle arbitrary input types) - PyUnicode_AsEncodedString produces something other than an 8-bit string (redirect to codecs.encode for arbitrary output types) - PyUnicode_Decode produces something other than a unicode string (redirect to codecs.decode for arbitrary output types) For the latter two cases, issue 17828 includes updates to the Python 3 error messages to similarly redirect to the convenience functions in the codecs module. However, the removed convenience methods will continue to simply trigger AttributeError in Python 3 with no special casing. |
|||
| msg202519 - (view) | Author: Martin Panter (martin.panter) * (Python committer) | Date: 2013年11月10日 10:35 | |
Just thinking the first case might get quite a few false positives. Maybe that would still be acceptable, I dunno.
> - the str.encode method is called (redirect to codecs.encode to handle arbitrary input types in a forward compatible way)
I guess you are trying to catch cases like this, which I have come across quite a few times:
data.encode("hex") # data is a byte string
But I think you would also catch cases that depend on Python 2 "str" objects automatically converting to Unicode. Here are some examples taken from real code:
file_name.encode("utf-8") # File name parameter may be str or unicode
# Code meant to be compatible with both Python 2 and 3:
"""<?xml . . . encoding="iso-8859-1"?>""".encode("iso-8859-1")
("data %s\n" % len(...)).encode("ascii")
|
|||
| msg202520 - (view) | Author: Marc-Andre Lemburg (lemburg) * (Python committer) | Date: 2013年11月10日 11:40 | |
On 10.11.2013 10:20, Nick Coghlan wrote: > > The long discussion in issue 7475 and some subsequent discussions I had with Armin Ronacher have made it clear to me that the key distinction between the codec systems in Python 2 and Python 3 is the following differences in type signatures of various operations: > > Python 2 (8 bit str): > > codecs module: object <-> object > convenience methods: basestring <-> basestring > available codecs: unicode <-> str, str <-> str, unicode <-> unicode > > Python 3 (Unicode str): > > codecs module: object <-> object > convenience methods: str <-> bytes > available codecs: str <-> bytes, bytes <-> bytes, str <-> str > > The significant distinction is the fact that, in Python 2, the convenience methods covered all standard library codecs, but for Python 3, the codecs module needs to be used directly for the bytes <-> bytes codecs and the one str <-> str codec (since those codecs no longer satisfy the constraints of the text model related convenience methods). Please remember that the codec sub-system is extensible. It's easily possible to add more codecs via registered codec search functions. Whatever you add as warning has to be aware of the fact that there may be codecs in the system that are not part of the stdlib and which can potentially implement codecs that use other type combinations that the ones you listed above. |
|||
| msg202527 - (view) | Author: Alyssa Coghlan (ncoghlan) * (Python committer) | Date: 2013年11月10日 14:23 | |
Martin: you're right, it wouldn't be feasible to check for the 8-bit str encoding case, since the types of string literals will implicitly change between the two versions. However, the latter three cases would be feasible to check (the unicode.decode one is particularly pernicious, since it's the culprit that can lead to UnicodeEncodeErrors on a decoding operation as Python implicitly tries to encode a Unicode string as ASCII). MAL: The latter two Py3k warnings would be in the same place as the corresponding output type errors in Python 3 (i.e. all in unicodeobject.c), so they would never trigger for the general codecs machinery. Python 2 actually already has output type checks in the same place as the proposed warnings, it just only checks for "basestring" rather than anything more specific. Those two warnings would just involve adding the more restrictive Py3k-style check when -3 was enabled. A Py3k warning for unicode.decode is just a straight "this method won't be there any more in Python 3" warning, since there's no way for the conversion from Python 2 to Python 3 to implicitly replace a Unicode string with 8-bit data the way string literals switch from 8-bit data to Unicode text. |
|||
| msg240830 - (view) | Author: Dustin J. Mitchell (djmitche) * | Date: 2015年04月14日 02:08 | |
This fixes the four cases Nick referred to, although not in the functions expected: - str.encode no longer exists - unicode.decode no longer exists - encoders used by str.encode must return bytes (does not apply to codecs.encode) - decoders used by unicode.decode must return unicode (does not apply to codecs.decode) |
|||
| msg241996 - (view) | Author: Alyssa Coghlan (ncoghlan) * (Python committer) | Date: 2015年04月25日 06:48 | |
For the warnings, it's actually bytes.decode() (et al) that are expected to return str, and str.encode() that's expected to return bytes. The codecs themselves remain free to do what they want (hence the recommendation to use codecs.encode() and codecs.decode() to invoke arbitrary codecs without the type constraints of the builtins). Aside from that, those 2 warnings and the unicode.decode() one look good.
However, the "bytes.encode()" warning is the one we determined couldn't be treated as a warning as "text".encode("ascii") is valid in both Python 2 & 3, it just does a str->str conversion in 2.x, and a str -> bytes conversion in 3.x.
|
|||
| msg242122 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2015年04月27日 15:24 | |
I think we should just backport issue19619 and issue20404. |
|||
| msg242188 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2015年04月28日 16:13 | |
Here is a patch that backports issue19619 and issue20404 with changing an exception to Py3k warning, and makes necessary changes in other modules and tests. $ ./python -3 Python 2.7.10rc0 (2.7:4234b0dd2a54+, Apr 28 2015, 16:51:51) [GCC 4.8.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> 'abcd'.decode('hex') __main__:1: DeprecationWarning: 'hex' is not a text encoding; use codecs.decode() to handle arbitrary codecs '\xab\xcd' |
|||
| msg243053 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2015年05月13日 08:07 | |
What would you say about this Benjamin? |
|||
| msg244456 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2015年05月30日 11:34 | |
Nick, Benjamin, could you please look at the patch? |
|||
| msg244458 - (view) | Author: Alyssa Coghlan (ncoghlan) * (Python committer) | Date: 2015年05月30日 13:59 | |
Serhiy's patch looks to me like it would pragmatically cover all the cases most likely to affect porting efforts: using the standard library bytes<->bytes codecs through the convenience methods.
For a Python 2 backport, I'm slightly more concerned with exposing the argument in the constructor signature, but see value in being consistent with Python 3 if anyone decides to use this to check a custom codec.
The main advantage this approach has over the typecheck based approach is that it can correctly warn about "data".encode("hex"), while a typecheck based approach can't distinguish that from "text".encode("ascii")
|
|||
| msg244547 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2015年05月31日 17:21 | |
New changeset cf6e782a7f94 by Serhiy Storchaka in branch '2.7': Issue #19543: Emit deprecation warning for known non-text encodings. https://hg.python.org/cpython/rev/cf6e782a7f94 |
|||
| msg244551 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2015年05月31日 17:59 | |
Committed patch covers a large part of this issue, but not all.
Following patch emits py3k warning for unicode.decode(). For now unicode(u'a', 'ascii') is forbidden, but u'a'.decode('ascii') is allowed in 2.7.
The risk of false positive in this patch is lower than in emitting warning on str.encode(), but is larger than in just committed patch.
|
|||
| msg244554 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2015年05月31日 19:05 | |
New changeset 0347f6e14ad6 by Tal Einat in branch '3.5': Issue #19543: Implementation of isclose as per PEP 485 https://hg.python.org/cpython/rev/0347f6e14ad6 |
|||
| msg244555 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2015年05月31日 19:16 | |
New changeset 5c8c123943cf by Tal Einat in branch 'default': Issue #19543: Implementation of isclose as per PEP 485 https://hg.python.org/cpython/rev/5c8c123943cf |
|||
| msg244570 - (view) | Author: Alyssa Coghlan (ncoghlan) * (Python committer) | Date: 2015年06月01日 03:33 | |
The last two commit notifications were intended for issue #24270. |
|||
| msg255832 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2015年12月03日 18:48 | |
New changeset c89a0f24d5f6 by Serhiy Storchaka in branch '2.7': Issue #19543: Added Py3k warning for decoding unicode. https://hg.python.org/cpython/rev/c89a0f24d5f6 |
|||
| msg260105 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2016年02月11日 14:30 | |
Is there something left to do with this issue? |
|||
| msg260155 - (view) | Author: Alyssa Coghlan (ncoghlan) * (Python committer) | Date: 2016年02月12日 02:55 | |
I think so - if anyone spots another place a Py3k warning could be usefully emitted, it can be handled as a new issue. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:53 | admin | set | github: 63742 |
| 2016年02月12日 02:55:33 | ncoghlan | set | status: open -> closed resolution: fixed messages: + msg260155 stage: resolved |
| 2016年02月11日 14:30:05 | serhiy.storchaka | set | messages: + msg260105 |
| 2015年12月25日 19:10:51 | serhiy.storchaka | set | stage: patch review -> (no value) |
| 2015年12月03日 18:48:15 | python-dev | set | messages: + msg255832 |
| 2015年06月01日 03:33:34 | ncoghlan | set | messages: + msg244570 |
| 2015年05月31日 19:16:37 | python-dev | set | messages: + msg244555 |
| 2015年05月31日 19:05:33 | python-dev | set | messages: + msg244554 |
| 2015年05月31日 17:59:54 | serhiy.storchaka | set | files:
+ issue19543_unicode_decode.patch messages: + msg244551 |
| 2015年05月31日 17:21:40 | python-dev | set | nosy:
+ python-dev messages: + msg244547 |
| 2015年05月30日 13:59:13 | ncoghlan | set | messages: + msg244458 |
| 2015年05月30日 11:34:15 | serhiy.storchaka | set | messages: + msg244456 |
| 2015年05月13日 08:07:26 | serhiy.storchaka | set | nosy:
+ benjamin.peterson messages: + msg243053 |
| 2015年04月28日 16:13:19 | serhiy.storchaka | set | files:
+ issue19543_blacklist_transforms_py27.patch messages: + msg242188 |
| 2015年04月27日 15:24:11 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages: + msg242122 |
| 2015年04月25日 06:48:19 | ncoghlan | set | messages: + msg241996 |
| 2015年04月15日 02:31:44 | ned.deily | set | stage: needs patch -> patch review |
| 2015年04月14日 02:08:45 | djmitche | set | files:
+ issue19543.patch nosy: + djmitche messages: + msg240830 keywords: + patch |
| 2013年11月11日 14:54:57 | brett.cannon | set | nosy:
+ brett.cannon |
| 2013年11月10日 14:23:29 | ncoghlan | set | messages: + msg202527 |
| 2013年11月10日 11:40:06 | lemburg | set | nosy:
+ lemburg messages: + msg202520 |
| 2013年11月10日 10:35:40 | martin.panter | set | nosy:
+ martin.panter messages: + msg202519 |
| 2013年11月10日 09:20:47 | ncoghlan | create | |