This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2012年11月11日 22:14 by vstinner, last changed 2022年04月11日 14:57 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| workaround_codeset.patch | vstinner, 2012年11月11日 23:34 | review | ||
| force_ascii.patch | vstinner, 2012年11月12日 14:40 | review | ||
| Messages (11) | |||
|---|---|---|---|
| msg175401 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2012年11月11日 22:14 | |
On FreeBSD and OpenIndiana, sys.getfilesystemencoding() is 'ascii' when the locale is not set, whereas the locale encoding is ISO-8859-1. This inconsistency causes different issue. For example, os.fsencode(sys.argv[1]) fails if the argument is not ASCII because sys.argv are decoded from the locale encoding (by _Py_char2wchar()). sys.getfilesystemencoding() is 'ascii' because nl_langinfo(CODESET) is used to to get the locale encoding and nl_langinfo(CODESET) announces ASCII (or an alias of this encoding). Python should detect this case and set sys.getfilesystemencoding() to 'iso8859-1' if the locale encoding is 'iso8859-1' whereas nl_langinfo(CODESET) announces ASCII. We can for example decode b'\xe9' with mbstowcs() and check if it fails or if the result is U+00E9. |
|||
| msg175408 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2012年11月11日 23:34 | |
Attached patch works around the CODESET issue on OpenIndiana and FreeBSD. If the LC_CTYPE locale is "C" and nl_langinfo(CODESET) returns ASCII (or an alias of this encoding), b"\xE9" is decoded from the locale encoding: if the result is U+00E9, the patch Python uses ISO-8859-1. (If decoding fails, the locale encoding is really ASCII, the workaround is not used.) If the result is different (b'\xe9' is not decoded from the locale encoding to U+00E9), a ValueError is raised. I wrote this test to detect bugs. I hope that our buildbots will validate the code. We may choose a different behaviour (ex: keep ASCII). Example on FreeBSD 8.2, original Python 3.4: $ ./python >>> import sys, locale >>> sys.getfilesystemencoding() 'ascii' >>> locale.getpreferredencoding() 'US-ASCII' Example on FreeBSD 8.2, patched Python 3.4: $ ./python >>> import sys, locale >>> sys.getfilesystemencoding() 'iso8859-1' >>> locale.getpreferredencoding() 'iso8859-1' |
|||
| msg175410 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2012年11月11日 23:54 | |
Some tests are failing with the patch: ====================================================================== FAIL: test_undecodable_env (test.test_subprocess.POSIXProcessTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/home/haypo/prog/python/default/Lib/test/test_subprocess.py", line 1606, in test_undecodable_env self.assertEqual(stdout.decode('ascii'), ascii(value)) AssertionError: "'abc\\xff'" != "'abc\\udcff'" - 'abc\xff' ? ^ + 'abc\udcff' ? ^^^ ====================================================================== FAIL: test_strcoll_with_diacritic (test.test_locale.TestEnUSCollation) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/home/haypo/prog/python/default/Lib/test/test_locale.py", line 364, in test_strcoll_with_diacritic self.assertLess(locale.strcoll('\xe0', 'b'), 0) AssertionError: 126 not less than 0 ====================================================================== FAIL: test_strxfrm_with_diacritic (test.test_locale.TestEnUSCollation) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/home/haypo/prog/python/default/Lib/test/test_locale.py", line 367, in test_strxfrm_with_diacritic self.assertLess(locale.strxfrm('\xe0'), locale.strxfrm('b')) AssertionError: '\xe0' not less than 'b' |
|||
| msg175446 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2012年11月12日 14:40 | |
Hijacking locale.getpreferredencoding() is maybe dangerous. I attached a new patch, force_ascii.patch, which uses a different approach: be more strict than mbstowcs(), force the ASCII encoding when: - the LC_CTYPE locale is C - nl_langinfo(CODESET) is ASCII or an alias of ASCII - mbstowcs() is able to decode non-ASCII characters 2012年11月12日 STINNER Victor <report@bugs.python.org> > > STINNER Victor added the comment: > > Some tests are failing with the patch: > > ====================================================================== > FAIL: test_undecodable_env (test.test_subprocess.POSIXProcessTestCase) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "/usr/home/haypo/prog/python/default/Lib/test/test_subprocess.py", > line 1606, in test_undecodable_env > self.assertEqual(stdout.decode('ascii'), ascii(value)) > AssertionError: "'abc\\xff'" != "'abc\\udcff'" > - 'abc\xff' > ? ^ > + 'abc\udcff' > ? ^^^ > > ====================================================================== > FAIL: test_strcoll_with_diacritic (test.test_locale.TestEnUSCollation) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "/usr/home/haypo/prog/python/default/Lib/test/test_locale.py", line > 364, in test_strcoll_with_diacritic > self.assertLess(locale.strcoll('\xe0', 'b'), 0) > AssertionError: 126 not less than 0 > > ====================================================================== > FAIL: test_strxfrm_with_diacritic (test.test_locale.TestEnUSCollation) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "/usr/home/haypo/prog/python/default/Lib/test/test_locale.py", line > 367, in test_strxfrm_with_diacritic > self.assertLess(locale.strxfrm('\xe0'), locale.strxfrm('b')) > AssertionError: '\xe0' not less than 'b' > > ---------- > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue16455> > _______________________________________ > |
|||
| msg176434 - (view) | Author: Jesús Cea Avión (jcea) * (Python committer) | Date: 2012年11月26日 17:38 | |
Victor, any progress on this? |
|||
| msg176436 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2012年11月26日 17:56 | |
> Victor, any progress on this? We have two options, I don't know which one is the best (safer). Does the terminal handle non-ASCII characters with a C locale on FreeBSD or Solaris? |
|||
| msg176869 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2012年12月04日 02:23 | |
New changeset c25635b137cc by Victor Stinner in branch 'default': Issue #16455: On FreeBSD and Solaris, if the locale is C, the http://hg.python.org/cpython/rev/c25635b137cc |
|||
| msg176870 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2012年12月04日 02:30 | |
> We have two options, I don't know which one is the best (safer). Force ASCII is safer. Python should announce that it does not "understand" non-ASCII bytes on the command line. I also chose this option because isalpha(0xe9) returns 0 (even if mbstowcs(0xe9) returns L"\xe9"): FreeBSD doesn't consider U+00E9 as a letter in the C locale, so Python should also consider this byte as raw data. |
|||
| msg176871 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2012年12月04日 02:32 | |
> New changeset c25635b137cc by Victor Stinner in branch 'default': > Issue #16455: On FreeBSD and Solaris, if the locale is C, the > http://hg.python.org/cpython/rev/c25635b137cc This changeset should fix #16218 on FreeBSD and Solaris (these OS should now decode correctly undecodable command line arguments). |
|||
| msg178864 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2013年01月03日 00:24 | |
New changeset c256764e2b3f by Victor Stinner in branch '3.2': Issue #16455: On FreeBSD and Solaris, if the locale is C, the http://hg.python.org/cpython/rev/c256764e2b3f New changeset 5bb289e4fb35 by Victor Stinner in branch '3.3': (Merge 3.2) Issue #16455: On FreeBSD and Solaris, if the locale is C, the http://hg.python.org/cpython/rev/5bb289e4fb35 |
|||
| msg178866 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2013年01月03日 00:41 | |
I backported the fix to Python 3.2 and 3.3 because I consider it important enough. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:38 | admin | set | github: 60659 |
| 2013年01月03日 01:07:03 | vstinner | set | status: open -> closed resolution: fixed |
| 2013年01月03日 00:41:24 | vstinner | set | messages:
+ msg178866 versions: + Python 3.2, Python 3.3 |
| 2013年01月03日 00:24:06 | python-dev | set | messages: + msg178864 |
| 2012年12月04日 02:32:10 | vstinner | set | messages: + msg176871 |
| 2012年12月04日 02:30:31 | vstinner | set | messages: + msg176870 |
| 2012年12月04日 02:24:48 | vstinner | set | title: sys.getfilesystemencoding() is not the locale encoding on FreeBSD and OpenSolaris when the locale is not set -> Decode command line arguments from ASCII on FreeBSD and Solaris if the locale is C |
| 2012年12月04日 02:23:05 | python-dev | set | nosy:
+ python-dev messages: + msg176869 |
| 2012年11月26日 17:56:28 | vstinner | set | messages: + msg176436 |
| 2012年11月26日 17:38:26 | jcea | set | messages: + msg176434 |
| 2012年11月12日 14:49:40 | jcea | set | nosy:
+ jcea |
| 2012年11月12日 14:40:44 | vstinner | set | files:
+ force_ascii.patch messages: + msg175446 |
| 2012年11月11日 23:54:58 | vstinner | set | messages: + msg175410 |
| 2012年11月11日 23:34:24 | vstinner | set | files:
+ workaround_codeset.patch keywords: + patch messages: + msg175408 |
| 2012年11月11日 22:14:15 | vstinner | create | |