homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: locale.normalize() and getdefaultlocale() convert C.UTF-8 to en_US.UTF-8
Type: Stage: patch review
Components: Versions: Python 3.8, Python 3.7, Python 3.6, Python 3.4, Python 3.5
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Jeffrey.Kintscher, benjamin.peterson, gordonmessmer, hroncok, lemburg, mattheww, ncoghlan, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2017年06月25日 16:58 by mattheww, last changed 2022年04月11日 14:58 by admin.

Pull Requests
URL Status Linked Edit
PR 14925 open gordonmessmer, 2019年07月24日 02:57
Messages (8)
msg296828 - (view) Author: Matthew Woodcraft (mattheww) Date: 2017年06月25日 16:58
I have a system where the default locale is C.UTF-8, and en_US.UTF-8 is
not installed.
But locale.normalize() unhelpfully converts "C.UTF-8" to "en_US.UTF-8".
So the following crashes for me:
 python3.6 -c "import locale;locale.setlocale(locale.LC_ALL, ('C', 'UTF-8'))"
Similarly getdefaultlocale() returns ('en_US', 'UTF-8'), so this crashes too:
 export LANG=C.UTF-8
 unset LC_CTYPE
 unset LC_ALL
 unset LANGUAGE
 python3.6 -c "import locale;locale.setlocale(locale.LC_ALL, locale.getdefaultlocale())"
This behaviour is caused by a locale_alias entry in Lib/locale.py .
https://bugs.python.org/issue20076 documents its addition but doesn't
provide a rationale.
I can see that it might be helpful to provide such a conversion if
C.UTF-8 doesn't exist and en_US.UTF-8 does, but the current code is
breaking modern correctly-configured systems for the benefit of old
misconfigured ones (C.UTF-8 shouldn't really be in the environment if it
isn't available on the system, after all).
msg297342 - (view) Author: Alyssa Coghlan (ncoghlan) * (Python committer) Date: 2017年06月30日 02:20
I'm honestly not sure how our Python level locale handling really works (I've mainly worked on the lower level C locale manipulation), so adding folks to the nosy list based on #20076 and #29571.
I agree we shouldn't be aliasing C.UTF-8 to en_US.UTF-8 though - we took en_US.UTF-8 out of the locale coercion fallback list in PEP 538 because it wasn't really right.
msg302981 - (view) Author: Matthew Woodcraft (mattheww) Date: 2017年09月25日 22:37
I've investigated a bit more.
First, I've tried with Python 3.7.0a1 . As you'd expect, PEP 537 means
this behaviour now also occurs when no locale environment variables at
all are set.
Second, I've looked through locale.py a bit. I believe what it calls the
"aliasing engine" is applied for:
 - getlocale()
 - getdefaultlocale()
 - setlocale() when passed a tuple, but not when passed a string
This leads to some rather odd results.
With 3.7.0a1 and no locale environment variables:
 >>> import locale
 >>> locale.getlocale()
 ('en_US', 'UTF-8')
 # getlocale() is lying: the effective locale is really C.UTF-8
 >>> sorted("abcABC", key=locale.strxfrm)
 ['A', 'B', 'C', 'a', 'b', 'c']
Third, I've checked on a system which does have en_US.UTF-8 installed,
and (as you'd expect) instead of crashing it gives wrong results:
 >>> import locale
 >>> locale.setlocale(locale.LC_ALL, ('C', 'UTF-8'))
 'en_US.UTF-8'
 >>> locale.getlocale()
 ('en_US', 'UTF-8')
 # now getlocale() is telling the truth, and the user isn't getting the
 # collation they requested
 >>> sorted("abcABC", key=locale.strxfrm)
 ['a', 'A', 'b', 'B', 'c', 'C']
msg302982 - (view) Author: Matthew Woodcraft (mattheww) Date: 2017年09月25日 22:39
(For PEP 537 please read PEP 538, sorry)
msg347520 - (view) Author: Gordon Messmer (gordonmessmer) * Date: 2019年07月09日 05:44
> I can see that it might be helpful to provide such a conversion if
> C.UTF-8 doesn't exist and en_US.UTF-8 does
That can't happen. The "C" locale describes the behavior defined in the ISO C standard. It's built-in to glibc (and should be for all other libc implementations). All other locales require external support (i.e. /usr/lib/locale/<locale>)
https://www.gnu.org/software/libc/manual/html_node/Standard-Locales.html#Standard-Locales 
msg347521 - (view) Author: Gordon Messmer (gordonmessmer) * Date: 2019年07月09日 06:10
> I agree we shouldn't be aliasing C.UTF-8 to en_US.UTF-8 though
What can we do about reverting that change? Python's current behavior causes unexpected exceptions, especially in containers.
I'm currently debugging test failures in a Python application that occur in Fedora rawhide containers. Those containers don't have any locales installed. The test software saves its current locale, changes the locale in order to run a test, and then restores the original. Because Python is incorrectly reporting the original locale as "en_US", restoring the original fails.
msg347528 - (view) Author: Miro Hrončok (hroncok) * Date: 2019年07月09日 08:42
>> C.UTF-8 doesn't exist and en_US.UTF-8 does
> That can't happen
It certainly can. Take for example RHEL 7 or 6.
msg348367 - (view) Author: Gordon Messmer (gordonmessmer) * Date: 2019年07月24日 04:24
As an example, let's consider dnf's i18n setup:
 try:
 dnf.pycomp.setlocale(locale.LC_ALL, '')
 except locale.Error:
 # default to C.UTF-8 or C locale if we got a failure.
 try:
 dnf.pycomp.setlocale(locale.LC_ALL, 'C.UTF-8')
 os.environ['LC_ALL'] = 'C.UTF-8'
 except locale.Error:
 dnf.pycomp.setlocale(locale.LC_ALL, 'C')
 os.environ['LC_ALL'] = 'C'
If setting the environment-specified locale fails, dnf will attempt to set the locale
to C.UTF-8, and if that fails it will set the locale to C. This seems like an ideal
process. If the expected locale is missing, dnf will attempt to at least use UTF-8,
before falling back to the C locale.
Unfortunately, because of the alias, this process will be unable to set the 'C.UTF-8'
locale on systems which do not have the 'en_US' locale installed. This renders
system support for 'C.UTF-8' unusable when no locales are installed.
History
Date User Action Args
2022年04月11日 14:58:48adminsetgithub: 74940
2019年07月29日 15:54:44vstinnersetnosy: - vstinner
2019年07月24日 04:24:56gordonmessmersetmessages: + msg348367
2019年07月24日 02:57:25gordonmessmersetkeywords: + patch
stage: patch review
pull_requests: + pull_request14696
2019年07月11日 20:17:50Jeffrey.Kintschersetnosy: + Jeffrey.Kintscher
2019年07月09日 08:42:49hroncoksetnosy: + vstinner, hroncok

messages: + msg347528
versions: + Python 3.8
2019年07月09日 06:10:15gordonmessmersetmessages: + msg347521
2019年07月09日 05:44:38gordonmessmersetnosy: + gordonmessmer
messages: + msg347520
2017年09月25日 22:39:15matthewwsetmessages: + msg302982
2017年09月25日 22:37:16matthewwsetmessages: + msg302981
versions: + Python 3.7
2017年06月30日 02:20:13ncoghlansetnosy: + lemburg, benjamin.peterson, serhiy.storchaka
messages: + msg297342
2017年06月29日 18:35:14r.david.murraysetnosy: + ncoghlan
2017年06月25日 16:58:59matthewwcreate

AltStyle によって変換されたページ (->オリジナル) /