This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2011年10月18日 22:44 by vstinner, last changed 2022年04月11日 14:57 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| cp65001.py | vstinner, 2011年10月18日 22:44 | |||
| Messages (10) | |||
|---|---|---|---|
| msg145871 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2011年10月18日 22:44 | |
Thanks to #12281, it is now trivial to implement any Windows code page in Python. I don't know if existing code pages (e.g. cp932) should use codecs.code_page_encode/.code_page_decode on Windows, or continue to use the (portable) Python code. Users want the code page 65001, even if I consider that it is useless to set the ANSI code page to 65001 in a console (see issue #1602), but that's a different story. Attached patch implements this code page. |
|||
| msg145872 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2011年10月18日 22:46 | |
> Users want the code page 65001 See issues #6058, #7441 and #10920. |
|||
| msg145891 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2011年10月19日 07:53 | |
We shouldn't use the MS codec if we have our own, as they may differ. As for the 65001 bug: is that actually solved by this codec? |
|||
| msg145894 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2011年10月19日 08:15 | |
> We shouldn't use the MS codec if we have our own, as they may differ.
Ok, I agree. MS codec has a nice replacement behaviour (search for a similar
glyph): cp1252 encodes Ł to b'L' for example. Our codec raises a
UnicodeEncodeError on u'\u0141'.encode('cp1252').
> As for the 65001 bug: is that actually solved by this codec?
Sorry, which bug?
See tests using CP_UTF8 in test_codecs. Depending on the Windows version, you
don't get the same behaviour on surrogates. Before Windows Vista, surrogates
were always encoded, whereas you can now choose the behaviour using the Python
error handler:
if self.vista_or_later():
tests.append(('\udc80', 'strict', None)) # None=UnicodeEncodeError
tests.append(('\udc80', 'ignore', b''))
tests.append(('\udc80', 'replace', b'?'))
else:
tests.append(('\udc80', 'strict', b'\xed\xb2\x80'))
|
|||
| msg145901 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2011年10月19日 11:58 | |
> I consider that it is useless to set the ANSI code page to 65001 in a console I did more tests on the Windows console, focused on output, see: http://bugs.python.org/issue1602#msg145898 I was wrong, it *is* useful to change the code page to 65001. Even if we have fully Unicode compliant sys.stdout and sys.stderr, setting the code page to CP_UTF8 (65001) does still improve Unicode support in some cases: - if the output (stdout and/or stderr) is redirected - if you encode Unicode to the console code page to use directly sys.stdout.buffer and sys.stderr.buffer |
|||
| msg145922 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2011年10月19日 17:11 | |
>> As for the 65001 bug: is that actually solved by this codec? > > Sorry, which bug? #6501 and friends (isn't it interesting that the issue of code page 65001 is reported as bug 6501?) |
|||
| msg145932 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2011年10月19日 18:25 | |
> > Sorry, which bug? > #6501 and friends Hum, this particular issue, #6501, doesn't concern the code page 65001. The typical usecase (issues #7441 and #10920) is: ------------ C:\victor\cpython>chcp 65001 Page de codes active : 65001 C:\victor\cpython>pcbuild\python_d.exe Fatal Python error: Py_Initialize: can't initialize sys standard streams LookupError: unknown encoding: cp65001 ------------ The console and console output code pages may be changed by something else. The current workaround is to set PYTHONIOENCODING environment variable to utf-8, but as explained in msg132831, the workaround is not applicable if Python is embeded or if the program has been frozen by cx-freeze ("cx-freeze deliberately sets Py_IgnoreEnvironmentFlag"). -- The issue #6501 was a bug in io.device_encoding(). I fixed it in Python 3.3 and I'm waiting... since 5 months... for Graham Dumpleton before backporting the fix. The issue suggests also to not fail if the encoding cannot be found (I dislike this idea). |
|||
| msg146463 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2011年10月26日 23:42 | |
New changeset 0eac706d82d1 by Victor Stinner in branch 'default': Fix the issue number of my cp65001 commit: 13247 => issue #13216 http://hg.python.org/cpython/rev/0eac706d82d1 |
|||
| msg146464 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2011年10月26日 23:43 | |
New changeset 2cad20e2e588 by Victor Stinner in branch 'default': Close #13247: Add cp65001 codec, the Windows UTF-8 (CP_UTF8) http://hg.python.org/cpython/rev/2cad20e2e588 |
|||
| msg146466 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2011年10月26日 23:44 | |
Lib/encodings/cp65001.py uses a little trick to mark the codec as specific to Windows: ----------------- if not hasattr(codecs, 'code_page_encode'): raise LookupError("cp65001 encoding is only available on Windows") ----------------- |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:22 | admin | set | github: 57425 |
| 2011年10月26日 23:44:52 | vstinner | set | messages: + msg146466 |
| 2011年10月26日 23:43:05 | vstinner | set | status: open -> closed resolution: fixed messages: + msg146464 |
| 2011年10月26日 23:42:43 | python-dev | set | nosy:
+ python-dev messages: + msg146463 |
| 2011年10月19日 18:25:32 | vstinner | set | messages: + msg145932 |
| 2011年10月19日 17:11:14 | loewis | set | messages: + msg145922 |
| 2011年10月19日 11:58:41 | vstinner | set | messages: + msg145901 |
| 2011年10月19日 08:15:55 | vstinner | set | messages: + msg145894 |
| 2011年10月19日 07:53:43 | loewis | set | messages: + msg145891 |
| 2011年10月18日 22:46:07 | vstinner | set | messages: + msg145872 |
| 2011年10月18日 22:44:30 | vstinner | create | |