This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2012年01月31日 17:27 by kennyluck, last changed 2022年04月11日 14:57 by admin. This issue is now closed.
| Messages (7) | |||
|---|---|---|---|
| msg152399 - (view) | Author: Kang-Hao (Kenny) Lu (kennyluck) | Date: 2012年01月31日 17:27 | |
Since Python 3.2.2 (I don't have earlier version to test with),
>>> "\udc80".encode("utf-8")
UnicodeEncodeError: *utf-8* codec can't encode character '\udc80'...
but
>>> b"\xff".decode("utf-8")
UnicodeDecodeError: *utf8* codec can't decode byte 0xff in position 0
and the table on the documentation of the codec module suggests *utf_8* as the name of the codec, which I believe to be equivalent to "utf_8" because '-' is not a valid character of an identifier.
Can we at least make the above two consistent? I would go for "utf-8", which was probably introduced for rejecting surrogates, but "utf8" has been there for years. What do we do? I am happy to submit patches for all branches. These are one-liners anyway.
The backward compatibility risk should be pretty low as usually you don't get encoding from these errors and I don't see any use of PyUnicode(Encode|Decode)Error_GetEncoding in trunk, although I'm using it for issue #12892.
Also, "latin_1" displays as *latin-1* but "iso2022-jp" displays as *iso2022_jp*. I care less about this nit though.
|
|||
| msg152421 - (view) | Author: Kang-Hao (Kenny) Lu (kennyluck) | Date: 2012年02月01日 00:42 | |
> and the table on the documentation of the codec module suggests *utf_8* > as the name of the codec, which I believe to be equivalent to "utf_8" > because '-' is not a valid character of an identifier. typo: equivalent to "utf_8" → equivalent to "utf-8". |
|||
| msg153308 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2012年02月14日 00:17 | |
New changeset c861c0a7f40c by Victor Stinner in branch '3.2': Issue #13913: normalize utf-8 codec name in UTF-8 decoder http://hg.python.org/cpython/rev/c861c0a7f40c New changeset af1a9508f7fa by Victor Stinner in branch 'default': (Merge 3.2) Issue #13913: normalize utf-8 codec name in UTF-8 decoder http://hg.python.org/cpython/rev/af1a9508f7fa |
|||
| msg153309 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2012年02月14日 00:19 | |
Use codecs.lookup(alias).name to the the normalize name of a codec. Examples:
>>> import codecs
>>> codecs.lookup('utf-8').name
'utf-8'
>>> codecs.lookup('iso-8859-1').name
'iso8859-1'
>>> codecs.lookup('latin1').name
'iso8859-1'
>>> codecs.lookup('iso2022_jp').name
'iso2022_jp'
All issues look to be addressed, so I close the issue. Thanks for the report!
|
|||
| msg153417 - (view) | Author: Éric Araujo (eric.araujo) * (Python committer) | Date: 2012年02月15日 17:09 | |
You need to update test_pep3120: http://www.python.org/dev/buildbot/all/builders/AMD64%20Gentoo%20Wide%203.2/builds/910/steps/test/logs/stdio/text |
|||
| msg153437 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2012年02月15日 21:25 | |
New changeset 5b8f146103fa by Victor Stinner in branch '3.2': Issue #13913: Fix test_pep3120 for the UTF-8 codec name http://hg.python.org/cpython/rev/5b8f146103fa New changeset 170a224ce01e by Victor Stinner in branch 'default': (Merge 3.2) Issue #13913: Fix test_pep3120 for the UTF-8 codec name http://hg.python.org/cpython/rev/170a224ce01e |
|||
| msg153446 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2012年02月15日 22:44 | |
New changeset 824ddf6a30f2 by Victor Stinner in branch '3.2': Issue #13913: Another fix test_pep3120 for the UTF-8 codec name http://hg.python.org/cpython/rev/824ddf6a30f2 New changeset 2cfba214c243 by Victor Stinner in branch 'default': (Merge 3.2) Issue #13913: Another fix test_pep3120 for the UTF-8 codec name http://hg.python.org/cpython/rev/2cfba214c243 |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:26 | admin | set | github: 58121 |
| 2012年02月15日 22:44:40 | python-dev | set | messages: + msg153446 |
| 2012年02月15日 21:25:02 | python-dev | set | messages: + msg153437 |
| 2012年02月15日 17:09:06 | eric.araujo | set | nosy:
+ eric.araujo messages: + msg153417 |
| 2012年02月14日 00:19:45 | vstinner | set | status: open -> closed nosy: + vstinner messages: + msg153309 resolution: fixed |
| 2012年02月14日 00:17:36 | python-dev | set | nosy:
+ python-dev messages: + msg153308 |
| 2012年02月04日 08:21:56 | eric.araujo | set | priority: normal -> low type: behavior -> enhancement versions: - Python 2.7, Python 3.2 |
| 2012年02月01日 00:42:29 | kennyluck | set | messages: + msg152421 |
| 2012年01月31日 17:28:42 | kennyluck | set | type: behavior |
| 2012年01月31日 17:27:56 | kennyluck | create | |