homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Decode command line arguments from ASCII on FreeBSD and Solaris if the locale is C
Type: Stage:
Components: Unicode Versions: Python 3.2, Python 3.3, Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: ezio.melotti, jcea, python-dev, vstinner
Priority: normal Keywords: patch

Created on 2012年11月11日 22:14 by vstinner, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
workaround_codeset.patch vstinner, 2012年11月11日 23:34 review
force_ascii.patch vstinner, 2012年11月12日 14:40 review
Messages (11)
msg175401 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012年11月11日 22:14
On FreeBSD and OpenIndiana, sys.getfilesystemencoding() is 'ascii' when the locale is not set, whereas the locale encoding is ISO-8859-1.
This inconsistency causes different issue. For example, os.fsencode(sys.argv[1]) fails if the argument is not ASCII because sys.argv are decoded from the locale encoding (by _Py_char2wchar()).
sys.getfilesystemencoding() is 'ascii' because nl_langinfo(CODESET) is used to to get the locale encoding and nl_langinfo(CODESET) announces ASCII (or an alias of this encoding).
Python should detect this case and set sys.getfilesystemencoding() to 'iso8859-1' if the locale encoding is 'iso8859-1' whereas nl_langinfo(CODESET) announces ASCII. We can for example decode b'\xe9' with mbstowcs() and check if it fails or if the result is U+00E9.
msg175408 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012年11月11日 23:34
Attached patch works around the CODESET issue on OpenIndiana and FreeBSD. If the LC_CTYPE locale is "C" and nl_langinfo(CODESET) returns ASCII (or an alias of this encoding), b"\xE9" is decoded from the locale encoding: if the result is U+00E9, the patch Python uses ISO-8859-1. (If decoding fails, the locale encoding is really ASCII, the workaround is not used.)
If the result is different (b'\xe9' is not decoded from the locale encoding to U+00E9), a ValueError is raised. I wrote this test to detect bugs. I hope that our buildbots will validate the code. We may choose a different behaviour (ex: keep ASCII).
Example on FreeBSD 8.2, original Python 3.4:
$ ./python
>>> import sys, locale
>>> sys.getfilesystemencoding()
'ascii'
>>> locale.getpreferredencoding()
'US-ASCII'
Example on FreeBSD 8.2, patched Python 3.4:
$ ./python 
>>> import sys, locale
>>> sys.getfilesystemencoding()
'iso8859-1'
>>> locale.getpreferredencoding()
'iso8859-1'
msg175410 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012年11月11日 23:54
Some tests are failing with the patch:
======================================================================
FAIL: test_undecodable_env (test.test_subprocess.POSIXProcessTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
 File "/usr/home/haypo/prog/python/default/Lib/test/test_subprocess.py", line 1606, in test_undecodable_env
 self.assertEqual(stdout.decode('ascii'), ascii(value))
AssertionError: "'abc\\xff'" != "'abc\\udcff'"
- 'abc\xff'
? ^
+ 'abc\udcff'
? ^^^
======================================================================
FAIL: test_strcoll_with_diacritic (test.test_locale.TestEnUSCollation)
----------------------------------------------------------------------
Traceback (most recent call last):
 File "/usr/home/haypo/prog/python/default/Lib/test/test_locale.py", line 364, in test_strcoll_with_diacritic
 self.assertLess(locale.strcoll('\xe0', 'b'), 0)
AssertionError: 126 not less than 0
======================================================================
FAIL: test_strxfrm_with_diacritic (test.test_locale.TestEnUSCollation)
----------------------------------------------------------------------
Traceback (most recent call last):
 File "/usr/home/haypo/prog/python/default/Lib/test/test_locale.py", line 367, in test_strxfrm_with_diacritic
 self.assertLess(locale.strxfrm('\xe0'), locale.strxfrm('b'))
AssertionError: '\xe0' not less than 'b'
msg175446 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012年11月12日 14:40
Hijacking locale.getpreferredencoding() is maybe dangerous. I attached a
new patch, force_ascii.patch, which uses a different approach: be more
strict than mbstowcs(), force the ASCII encoding when:
 - the LC_CTYPE locale is C
 - nl_langinfo(CODESET) is ASCII or an alias of ASCII
 - mbstowcs() is able to decode non-ASCII characters
2012年11月12日 STINNER Victor <report@bugs.python.org>
>
> STINNER Victor added the comment:
>
> Some tests are failing with the patch:
>
> ======================================================================
> FAIL: test_undecodable_env (test.test_subprocess.POSIXProcessTestCase)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
> File "/usr/home/haypo/prog/python/default/Lib/test/test_subprocess.py",
> line 1606, in test_undecodable_env
> self.assertEqual(stdout.decode('ascii'), ascii(value))
> AssertionError: "'abc\\xff'" != "'abc\\udcff'"
> - 'abc\xff'
> ? ^
> + 'abc\udcff'
> ? ^^^
>
> ======================================================================
> FAIL: test_strcoll_with_diacritic (test.test_locale.TestEnUSCollation)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
> File "/usr/home/haypo/prog/python/default/Lib/test/test_locale.py", line
> 364, in test_strcoll_with_diacritic
> self.assertLess(locale.strcoll('\xe0', 'b'), 0)
> AssertionError: 126 not less than 0
>
> ======================================================================
> FAIL: test_strxfrm_with_diacritic (test.test_locale.TestEnUSCollation)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
> File "/usr/home/haypo/prog/python/default/Lib/test/test_locale.py", line
> 367, in test_strxfrm_with_diacritic
> self.assertLess(locale.strxfrm('\xe0'), locale.strxfrm('b'))
> AssertionError: '\xe0' not less than 'b'
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue16455>
> _______________________________________
>
msg176434 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2012年11月26日 17:38
Victor, any progress on this?
msg176436 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012年11月26日 17:56
> Victor, any progress on this?
We have two options, I don't know which one is the best (safer). Does
the terminal handle non-ASCII characters with a C locale on FreeBSD or
Solaris?
msg176869 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012年12月04日 02:23
New changeset c25635b137cc by Victor Stinner in branch 'default':
Issue #16455: On FreeBSD and Solaris, if the locale is C, the
http://hg.python.org/cpython/rev/c25635b137cc 
msg176870 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012年12月04日 02:30
> We have two options, I don't know which one is the best (safer).
Force ASCII is safer. Python should announce that it does not "understand" non-ASCII bytes on the command line. I also chose this option because isalpha(0xe9) returns 0 (even if mbstowcs(0xe9) returns L"\xe9"): FreeBSD doesn't consider U+00E9 as a letter in the C locale, so Python should also consider this byte as raw data.
msg176871 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012年12月04日 02:32
> New changeset c25635b137cc by Victor Stinner in branch 'default':
> Issue #16455: On FreeBSD and Solaris, if the locale is C, the
> http://hg.python.org/cpython/rev/c25635b137cc
This changeset should fix #16218 on FreeBSD and Solaris (these OS should now decode correctly undecodable command line arguments).
msg178864 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013年01月03日 00:24
New changeset c256764e2b3f by Victor Stinner in branch '3.2':
Issue #16455: On FreeBSD and Solaris, if the locale is C, the
http://hg.python.org/cpython/rev/c256764e2b3f
New changeset 5bb289e4fb35 by Victor Stinner in branch '3.3':
(Merge 3.2) Issue #16455: On FreeBSD and Solaris, if the locale is C, the
http://hg.python.org/cpython/rev/5bb289e4fb35 
msg178866 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013年01月03日 00:41
I backported the fix to Python 3.2 and 3.3 because I consider it important enough.
History
Date User Action Args
2022年04月11日 14:57:38adminsetgithub: 60659
2013年01月03日 01:07:03vstinnersetstatus: open -> closed
resolution: fixed
2013年01月03日 00:41:24vstinnersetmessages: + msg178866
versions: + Python 3.2, Python 3.3
2013年01月03日 00:24:06python-devsetmessages: + msg178864
2012年12月04日 02:32:10vstinnersetmessages: + msg176871
2012年12月04日 02:30:31vstinnersetmessages: + msg176870
2012年12月04日 02:24:48vstinnersettitle: sys.getfilesystemencoding() is not the locale encoding on FreeBSD and OpenSolaris when the locale is not set -> Decode command line arguments from ASCII on FreeBSD and Solaris if the locale is C
2012年12月04日 02:23:05python-devsetnosy: + python-dev
messages: + msg176869
2012年11月26日 17:56:28vstinnersetmessages: + msg176436
2012年11月26日 17:38:26jceasetmessages: + msg176434
2012年11月12日 14:49:40jceasetnosy: + jcea
2012年11月12日 14:40:44vstinnersetfiles: + force_ascii.patch

messages: + msg175446
2012年11月11日 23:54:58vstinnersetmessages: + msg175410
2012年11月11日 23:34:24vstinnersetfiles: + workaround_codeset.patch
keywords: + patch
messages: + msg175408
2012年11月11日 22:14:15vstinnercreate

AltStyle によって変換されたページ (->オリジナル) /