homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: locale documentation doesn't mention that LC_CTYPE is changed at startup
Type: behavior Stage: resolved
Components: Documentation, Unicode Versions: Python 3.2, Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: docs@python Nosy List: Arfrever, alexis, ezio.melotti, georg.brandl, lemburg, loewis, ned.deily, petri.lehtinen, pitrou, python-dev, r.david.murray, sdaoden, vstinner
Priority: high Keywords: patch

Created on 2009年06月05日 10:56 by ned.deily, last changed 2022年04月11日 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
locale_doc.patch vstinner, 2012年06月05日 12:02 review
Messages (27)
msg88932 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2009年06月05日 10:56
In the Library Reference section 22.2.1 for locale, it states:
"Initially, when a program is started, the locale is the C locale, no 
matter what the user’s preferred locale is. The program must explicitly 
say that it wants the user’s preferred locale settings by calling 
setlocale(LC_ALL, '')."
This is the case for python2.x:
$ export LANG=en_US.UTF-8
$ python2.5
Python 2.5.4 (r254:67916, Feb 17 2009, 20:16:45) 
[GCC 4.3.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale; locale.getlocale()
(None, None)
>>> locale.getdefaultlocale()
('en_US', 'UTF8')
>>> 
but not for 3.1:
$ python3.1
Python 3.1a1+ (py3k, Mar 23 2009, 00:12:12) 
[GCC 4.3.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale; locale.getlocale()
('en_US', 'UTF8')
>>> locale.getdefaultlocale()
('en_US', 'UTF8')
>>> 
Either the code is incorrect in 3.1 or the documentation should be 
updated.
msg89016 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2009年06月06日 21:00
Confirmed for 3.1, 3.0 still returns (None, None).
msg89077 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2009年06月08日 13:29
Deferring to Martin which one is correct :)
msg89084 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2009年06月08日 16:01
This is definately a bug in 3.1, for the same reason that a C program
uses the C locale until an explicit setlocale is done: otherwise, a
non-locale-aware program can run into bugs resulting from locale issues
when run under a different locale than that of the program author.
I have a memory of this being reported before somewhere and someone
tracking it down to a change in python initialization, but I can't find
a bug report and my google-foo is failing me.
msg89088 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009年06月08日 16:17
For some reason only LC_CTYPE is affected:
>>> locale.getlocale(locale.LC_CTYPE)
('fr_FR', 'UTF8')
>>> locale.getlocale(locale.LC_MESSAGES)
(None, None)
>>> locale.getlocale(locale.LC_TIME)
(None, None)
>>> locale.getlocale(locale.LC_NUMERIC)
(None, None)
>>> locale.getlocale(locale.LC_COLLATE)
(None, None)
msg89089 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2009年06月08日 16:22
Ah, I can tell you exactly why that is, then. I noticed this in
pythonrun.c while grepping the source:
#ifdef HAVE_SETLOCALE
 /* Set up the LC_CTYPE locale, so we can obtain
 the locale's charset without having to switch
 locales. */
 setlocale(LC_CTYPE, "");
#endif
SVN blames Martin in r56922, so this case is assigned appropriately. 
Perhaps changing only LC_CTYPE is safe? I must admit to ignorance as to
what all the LC variables mean/control.
msg89090 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009年06月08日 16:26
It would still be better it is was unset afterwards. Third-party
extensions could have LC_CTYPE-dependent behaviour.
msg89101 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009年06月08日 19:39
> It would still be better it is was unset afterwards. Third-party
> extensions could have LC_CTYPE-dependent behaviour.
In principle, they could, yes - but what specific behavior might that
be? What will change is character classification, which I consider
fairly harmless. Also, multi-byte conversion routines will change, which
is the primary reason for leaving it modified.
msg89102 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009年06月08日 19:43
> In principle, they could, yes - but what specific behavior might that
> be? What will change is character classification, which I consider
> fairly harmless. Also, multi-byte conversion routines will change, which
> is the primary reason for leaving it modified.
Ok, so I suppose we could leave the code as-is.
msg89120 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2009年06月08日 21:51
Since it controls what is considered to be whitespace, it is possible
this will lead to subtle bugs, but I agree that it seems relatively
benign, especially considering 3.x's unicode orientation. So, this
becomes a doc bug...
msg89136 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009年06月09日 07:10
To add a little bit more analysis: posix.device_encoding requires that
the LC_CTYPE is set. Setting it just in this function would not be
possible, as setlocale is not thread-safe.
So for 3.1, it seems that Python must set LC_CTYPE. If somebody can
propose a patch that avoids that for 3.2, I'd be certainly in favor.
msg127180 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年01月27日 11:40
> To add a little bit more analysis: posix.device_encoding requires that
> the LC_CTYPE is set. Setting it just in this function would not be
> possible, as setlocale is not thread-safe.
open() does indirectly (locale.getpreferredencoding()) change temporary the locale (set LC_CTYPE to "") if the file is not a TTY (if it is a TTY, device_encoding() calls nl_langinfo(CODESET) without changing the current locale). If setlocale() is not thread-safe we have (maybe?) a problem here. See also #11022: report of an user not understanding why setlocale() doesn't impact open() (TextIOWrapper) encoding). A quick solution is to call locale.getpreferredencoding(False) which doesn't change the locale.
Do you really need os.device_encoding()? If we change TextIOWrapper to call locale.getpreferredencoding(False), os.device_encoding() and locale.getpreferredencoding(False) will give the same result. Except on Windows: os.device_encoding() uses GetConsoleCP() if fd==0 and GetConsoleOutputCP() if fd in (1, 2). But we can use GetConsoleCP() and GetConsoleOutputCP() directly in initstdio(). If someone closes sys.std* and recreate them later: os.device_encoding() can be use explicitly to keep the previous behaviour.
> It would still be better it is was unset afterwards. Third-party
> extensions could have LC_CTYPE-dependent behaviour.
If Python is embeded, it should not change the locale. Even if it is not embeded, it is maybe better to never set LC_CTYPE.
It is too late to touch such critical point in Python 3.2, but we may change it in Python 3.3.
msg127262 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2011年01月28日 09:27
Python can be embedded into other applications and unconditionally
changing the locale (esp. the LC_CTYPE) is not good practice, since
it's not thread-safe and affects the entire process. An application
may have set LC_CTYPE (or the locale) to something completely
different.
If at all, Python should be more careful using this call (pseudo
code):
lc_ctype = setlocale(LC_CTYPE, NULL);
if (lc_ctype == NULL || strcmp(lc_ctype, "") || strcmp(lc_ctype, "C")) {
 env_lc_ctype = setlocale(LC_CTYPE, "");
 setlocale(LC_CTYPE, lc_ctype);
 lc_ctype = env_lc_ctype;
}
Then use lc_ctype to figure out encodings, etc.
While this is not thread-safe, it at least reverts the change back
to the original setting and only applies the change if needed. That's
still not optimal, but better than nothing.
An clean alternative would be adding LC_* variable parsing code to
Python to avoid the setlocale() call altogether.
msg127265 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2011年01月28日 09:33
> An clean alternative would be adding LC_* variable parsing code to
> Python to avoid the setlocale() call altogether.
That would be highly non-portable, and repeat the mistakes of
getdefaultlocale.
msg127283 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2011年01月28日 11:05
Martin v. Löwis wrote:
> 
> Martin v. Löwis <martin@v.loewis.de> added the comment:
> 
>> An clean alternative would be adding LC_* variable parsing code to
>> Python to avoid the setlocale() call altogether.
> 
> That would be highly non-portable, and repeat the mistakes of
> getdefaultlocale.
You say that often, but I don't really know why. It's certainly portable
between various Unix platforms, perhaps not Windows, but then i18n
on Windows is a different story altogether.
BTW: For Windows, you can adjust setlocale() to work thread-based
using: _configthreadlocale()
(http://msdn.microsoft.com/de-de/library/26c0tb7x(v=vs.80).aspx)
Perhaps we ought to expose this in _locale and use it in
getdefaultlocal() on Windows to query the locale settings
via the pseudocode I posted.
msg127347 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2011年01月28日 21:22
>> That would be highly non-portable, and repeat the mistakes of
>> getdefaultlocale.
> 
> You say that often, but I don't really know why. It's certainly portable
> between various Unix platforms, perhaps not Windows, but then i18n
> on Windows is a different story altogether.
No, it's absolutely not portable across Unix platforms. Looking at
LANG or LC_ALL does *not* allow you to infer the region name, or
the locale's character set. For example, using glibc, in some
installations, /etc/locale.alias is considered to map a value of LANG
to the final locale name. As an option, glibc also considers a
LOCALE_ALIAS_PATH that may point to a (colon-separated) path of
files to search for locale aliases.
Other systems may use other databases to map a locale name to locale
properties.
Unless you know exactly what version of C library is running on
a system, parsing environment variables yourself is doomed to fail.
msg127350 - (view) Author: Arfrever Frehtes Taifersar Arahesis (Arfrever) * (Python triager) Date: 2011年01月28日 21:36
Martin v. Löwis:
It seems that your web browser replaces ", " with ",\t" in the title (where "\t" is a tab character) each time you add a comment.
msg127351 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2011年01月28日 21:38
More likely, it's my email reader. Sorry about that.
msg127417 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011年01月29日 13:51
User lemburg pointed me to this, but no, i've posted msg127416 to Issue 11022.
msg141830 - (view) Author: Alexis Metaireau (alexis) * (Python triager) Date: 2011年08月09日 15:53
Maybe could it be useful to specify in the documentation that getlocale() is not intended to be used to get information about what is the locale of the system? 
It's not explained currently and thus it's a bit weird to have getlocale returning (None, None) even if you have your locales set.
msg141847 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011年08月10日 00:24
This issue is about the fact that it doesn't return (None, None). We should probably decide what we are going to do about that before changing the docs if they need it.
msg141872 - (view) Author: Alexis Metaireau (alexis) * (Python triager) Date: 2011年08月10日 16:05
I see two different things here:
1) the fact that getlocale() doesn't return (None, None) on some python 
versions
2) the fact that having it returning (None, None) by default is a bit 
misleading as users may think that getlocale() is tied to environment 
variables. That's what was at the origin of #12699
My last remark is about the second bit. Maybe should I start a new issue 
for this?
msg141890 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011年08月11日 01:25
Yes a new issue would be more appropriate.
msg147174 - (view) Author: Petri Lehtinen (petri.lehtinen) * (Python committer) Date: 2011年11月06日 19:48
If the thread safety of setlocale() is a problem, does anybody know how portable uselocale() is? It sets the locale of the current thread only, so it's safe to temporarily change the locale and then set it back.
msg162340 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012年06月05日 12:02
> Either the code is incorrect in 3.1
> or the documentation should be updated.
Leaving LC_CTYPE unchanged (use the "C" locale, which is ASCII in most
cases) at Python startup would be a major change in Python 3. I don't
want to change this. You would see a lot of mojibake in your GUIs and get a lot of ugly surrogate characters in filenames (because of the PEP
393) if we don't set the LC_CTYPE to the user preferred encoding at startup anymore.
Setting the LC_CTYPE to the user preferred encoding is just very
convinient and helps Python to speak to the user though the console,
to the filesystem, to pass arguments on a command line of a
subprocess, etc. For example, you cannot pass non-ASCII characters to
a subprocess, characters written by the user in your GUI, if your
current LC_CTYPE locale is C (ASCII): you get an Unicode encode error.
So it's just a documentation issue: see my attached patch.
msg162355 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2012年06月05日 16:24
LGTM
msg162380 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012年06月05日 23:39
New changeset 113cdce4663c by Victor Stinner in branch 'default':
Close #6203: Document that Python 3 sets LC_CTYPE at startup to the user's preferred locale encoding
http://hg.python.org/cpython/rev/113cdce4663c 
History
Date User Action Args
2022年04月11日 14:56:49adminsetgithub: 50452
2012年06月05日 23:39:39python-devsetstatus: open -> closed

nosy: + python-dev
messages: + msg162380

resolution: fixed
stage: needs patch -> resolved
2012年06月05日 16:24:06ned.deilysetmessages: + msg162355
2012年06月05日 12:03:57vstinnersettitle: 3.x locale does not default to C, contrary to the documentation and to 2.x behavior -> locale documentation doesn't mention that LC_CTYPE is changed at startup
components: + Unicode
versions: + Python 3.2
2012年06月05日 12:02:58vstinnersetfiles: + locale_doc.patch
keywords: + patch
messages: + msg162340
2011年11月06日 19:48:10petri.lehtinensetnosy: + petri.lehtinen
messages: + msg147174
2011年08月11日 01:25:33r.david.murraysetmessages: + msg141890
2011年08月10日 16:05:48alexissetmessages: + msg141872
title: 3.x locale does not default to C, contrary to the documentation and to 2.x behavior -> 3.x locale does not default to C, contrary to the documentation and to 2.x behavior
2011年08月10日 00:24:02r.david.murraysetmessages: + msg141847
2011年08月09日 15:53:51alexissetnosy: + alexis
messages: + msg141830
2011年08月05日 21:34:37ned.deilylinkissue12699 superseder
2011年01月29日 13:51:48sdaodensetnosy: + sdaoden
messages: + msg127417
2011年01月28日 21:38:45loewissetnosy: lemburg, loewis, georg.brandl, pitrou, vstinner, ned.deily, ezio.melotti, Arfrever, r.david.murray
messages: + msg127351
2011年01月28日 21:36:54Arfreversetnosy: lemburg, loewis, georg.brandl, pitrou, vstinner, ned.deily, ezio.melotti, Arfrever, r.david.murray
messages: + msg127350
2011年01月28日 21:22:14loewissetnosy: lemburg, loewis, georg.brandl, pitrou, vstinner, ned.deily, ezio.melotti, Arfrever, r.david.murray
messages: + msg127347
title: 3.x locale does not default to C, contrary to the documentation and to 2.x behavior -> 3.x locale does not default to C, contrary to the documentation and to 2.x behavior
2011年01月28日 15:01:17Arfreversetnosy: lemburg, loewis, georg.brandl, pitrou, vstinner, ned.deily, ezio.melotti, Arfrever, r.david.murray
title: 3.x locale does not default to C, contrary to the documentation and to 2.x behavior -> 3.x locale does not default to C, contrary to the documentation and to 2.x behavior
2011年01月28日 11:05:45lemburgsetnosy: lemburg, loewis, georg.brandl, pitrou, vstinner, ned.deily, ezio.melotti, Arfrever, r.david.murray
messages: + msg127283
2011年01月28日 09:33:39loewissetnosy: lemburg, loewis, georg.brandl, pitrou, vstinner, ned.deily, ezio.melotti, Arfrever, r.david.murray
messages: + msg127265
title: 3.x locale does not default to C, contrary to the documentation and to 2.x behavior -> 3.x locale does not default to C, contrary to the documentation and to 2.x behavior
2011年01月28日 09:27:54lemburgsetnosy: + lemburg
messages: + msg127262
2011年01月27日 16:58:10Arfreversetnosy: + Arfrever
2011年01月27日 11:40:07vstinnersetnosy: + vstinner

messages: + msg127180
versions: + Python 3.3, - Python 3.2
2010年10月29日 10:07:21adminsetassignee: georg.brandl -> docs@python
2009年12月30日 01:46:52r.david.murraysetversions: + Python 3.2, - Python 3.1
2009年06月09日 10:43:42pitrousetassignee: georg.brandl
2009年06月09日 07:10:25loewissetassignee: loewis -> (no value)
messages: + msg89136
2009年06月08日 21:51:50r.david.murraysetpriority: release blocker -> high

messages: + msg89120
components: - Library (Lib)
nosy: loewis, georg.brandl, pitrou, ned.deily, ezio.melotti, r.david.murray
2009年06月08日 19:43:09pitrousetmessages: + msg89102
title: 3.x locale does not default to C, contrary to the documentation and to 2.x behavior -> 3.x locale does not default to C, contrary to the documentation and to 2.x behavior
2009年06月08日 19:39:29loewissetmessages: + msg89101
title: 3.x locale does not default to C, contrary to the documentation and to 2.x behavior -> 3.x locale does not default to C, contrary to the documentation and to 2.x behavior
2009年06月08日 16:26:25pitrousetmessages: + msg89090
2009年06月08日 16:22:10r.david.murraysetmessages: + msg89089
2009年06月08日 16:17:53pitrousetnosy: + pitrou
messages: + msg89088
2009年06月08日 16:01:05r.david.murraysetpriority: normal -> release blocker

nosy: + r.david.murray
messages: + msg89084

stage: needs patch
2009年06月08日 13:29:54georg.brandlsetassignee: georg.brandl -> loewis

messages: + msg89077
nosy: + loewis
2009年06月06日 21:00:39ezio.melottisetpriority: normal

nosy: + ezio.melotti
messages: + msg89016

components: + Library (Lib)
2009年06月05日 10:56:37ned.deilycreate

AltStyle によって変換されたページ (->オリジナル) /