This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2017年06月25日 16:58 by mattheww, last changed 2022年04月11日 14:58 by admin.
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 14925 | open | gordonmessmer, 2019年07月24日 02:57 | |
| Messages (8) | |||
|---|---|---|---|
| msg296828 - (view) | Author: Matthew Woodcraft (mattheww) | Date: 2017年06月25日 16:58 | |
I have a system where the default locale is C.UTF-8, and en_US.UTF-8 is
not installed.
But locale.normalize() unhelpfully converts "C.UTF-8" to "en_US.UTF-8".
So the following crashes for me:
python3.6 -c "import locale;locale.setlocale(locale.LC_ALL, ('C', 'UTF-8'))"
Similarly getdefaultlocale() returns ('en_US', 'UTF-8'), so this crashes too:
export LANG=C.UTF-8
unset LC_CTYPE
unset LC_ALL
unset LANGUAGE
python3.6 -c "import locale;locale.setlocale(locale.LC_ALL, locale.getdefaultlocale())"
This behaviour is caused by a locale_alias entry in Lib/locale.py .
https://bugs.python.org/issue20076 documents its addition but doesn't
provide a rationale.
I can see that it might be helpful to provide such a conversion if
C.UTF-8 doesn't exist and en_US.UTF-8 does, but the current code is
breaking modern correctly-configured systems for the benefit of old
misconfigured ones (C.UTF-8 shouldn't really be in the environment if it
isn't available on the system, after all).
|
|||
| msg297342 - (view) | Author: Alyssa Coghlan (ncoghlan) * (Python committer) | Date: 2017年06月30日 02:20 | |
I'm honestly not sure how our Python level locale handling really works (I've mainly worked on the lower level C locale manipulation), so adding folks to the nosy list based on #20076 and #29571. I agree we shouldn't be aliasing C.UTF-8 to en_US.UTF-8 though - we took en_US.UTF-8 out of the locale coercion fallback list in PEP 538 because it wasn't really right. |
|||
| msg302981 - (view) | Author: Matthew Woodcraft (mattheww) | Date: 2017年09月25日 22:37 | |
I've investigated a bit more. First, I've tried with Python 3.7.0a1 . As you'd expect, PEP 537 means this behaviour now also occurs when no locale environment variables at all are set. Second, I've looked through locale.py a bit. I believe what it calls the "aliasing engine" is applied for: - getlocale() - getdefaultlocale() - setlocale() when passed a tuple, but not when passed a string This leads to some rather odd results. With 3.7.0a1 and no locale environment variables: >>> import locale >>> locale.getlocale() ('en_US', 'UTF-8') # getlocale() is lying: the effective locale is really C.UTF-8 >>> sorted("abcABC", key=locale.strxfrm) ['A', 'B', 'C', 'a', 'b', 'c'] Third, I've checked on a system which does have en_US.UTF-8 installed, and (as you'd expect) instead of crashing it gives wrong results: >>> import locale >>> locale.setlocale(locale.LC_ALL, ('C', 'UTF-8')) 'en_US.UTF-8' >>> locale.getlocale() ('en_US', 'UTF-8') # now getlocale() is telling the truth, and the user isn't getting the # collation they requested >>> sorted("abcABC", key=locale.strxfrm) ['a', 'A', 'b', 'B', 'c', 'C'] |
|||
| msg302982 - (view) | Author: Matthew Woodcraft (mattheww) | Date: 2017年09月25日 22:39 | |
(For PEP 537 please read PEP 538, sorry) |
|||
| msg347520 - (view) | Author: Gordon Messmer (gordonmessmer) * | Date: 2019年07月09日 05:44 | |
> I can see that it might be helpful to provide such a conversion if > C.UTF-8 doesn't exist and en_US.UTF-8 does That can't happen. The "C" locale describes the behavior defined in the ISO C standard. It's built-in to glibc (and should be for all other libc implementations). All other locales require external support (i.e. /usr/lib/locale/<locale>) https://www.gnu.org/software/libc/manual/html_node/Standard-Locales.html#Standard-Locales |
|||
| msg347521 - (view) | Author: Gordon Messmer (gordonmessmer) * | Date: 2019年07月09日 06:10 | |
> I agree we shouldn't be aliasing C.UTF-8 to en_US.UTF-8 though What can we do about reverting that change? Python's current behavior causes unexpected exceptions, especially in containers. I'm currently debugging test failures in a Python application that occur in Fedora rawhide containers. Those containers don't have any locales installed. The test software saves its current locale, changes the locale in order to run a test, and then restores the original. Because Python is incorrectly reporting the original locale as "en_US", restoring the original fails. |
|||
| msg347528 - (view) | Author: Miro Hrončok (hroncok) * | Date: 2019年07月09日 08:42 | |
>> C.UTF-8 doesn't exist and en_US.UTF-8 does > That can't happen It certainly can. Take for example RHEL 7 or 6. |
|||
| msg348367 - (view) | Author: Gordon Messmer (gordonmessmer) * | Date: 2019年07月24日 04:24 | |
As an example, let's consider dnf's i18n setup: try: dnf.pycomp.setlocale(locale.LC_ALL, '') except locale.Error: # default to C.UTF-8 or C locale if we got a failure. try: dnf.pycomp.setlocale(locale.LC_ALL, 'C.UTF-8') os.environ['LC_ALL'] = 'C.UTF-8' except locale.Error: dnf.pycomp.setlocale(locale.LC_ALL, 'C') os.environ['LC_ALL'] = 'C' If setting the environment-specified locale fails, dnf will attempt to set the locale to C.UTF-8, and if that fails it will set the locale to C. This seems like an ideal process. If the expected locale is missing, dnf will attempt to at least use UTF-8, before falling back to the C locale. Unfortunately, because of the alias, this process will be unable to set the 'C.UTF-8' locale on systems which do not have the 'en_US' locale installed. This renders system support for 'C.UTF-8' unusable when no locales are installed. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:58:48 | admin | set | github: 74940 |
| 2019年07月29日 15:54:44 | vstinner | set | nosy:
- vstinner |
| 2019年07月24日 04:24:56 | gordonmessmer | set | messages: + msg348367 |
| 2019年07月24日 02:57:25 | gordonmessmer | set | keywords:
+ patch stage: patch review pull_requests: + pull_request14696 |
| 2019年07月11日 20:17:50 | Jeffrey.Kintscher | set | nosy:
+ Jeffrey.Kintscher |
| 2019年07月09日 08:42:49 | hroncok | set | nosy:
+ vstinner, hroncok messages: + msg347528 versions: + Python 3.8 |
| 2019年07月09日 06:10:15 | gordonmessmer | set | messages: + msg347521 |
| 2019年07月09日 05:44:38 | gordonmessmer | set | nosy:
+ gordonmessmer messages: + msg347520 |
| 2017年09月25日 22:39:15 | mattheww | set | messages: + msg302982 |
| 2017年09月25日 22:37:16 | mattheww | set | messages:
+ msg302981 versions: + Python 3.7 |
| 2017年06月30日 02:20:13 | ncoghlan | set | nosy:
+ lemburg, benjamin.peterson, serhiy.storchaka messages: + msg297342 |
| 2017年06月29日 18:35:14 | r.david.murray | set | nosy:
+ ncoghlan |
| 2017年06月25日 16:58:59 | mattheww | create | |