This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2009年06月05日 10:56 by ned.deily, last changed 2022年04月11日 14:56 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| locale_doc.patch | vstinner, 2012年06月05日 12:02 | review | ||
| Messages (27) | |||
|---|---|---|---|
| msg88932 - (view) | Author: Ned Deily (ned.deily) * (Python committer) | Date: 2009年06月05日 10:56 | |
In the Library Reference section 22.2.1 for locale, it states:
"Initially, when a program is started, the locale is the C locale, no
matter what the user’s preferred locale is. The program must explicitly
say that it wants the user’s preferred locale settings by calling
setlocale(LC_ALL, '')."
This is the case for python2.x:
$ export LANG=en_US.UTF-8
$ python2.5
Python 2.5.4 (r254:67916, Feb 17 2009, 20:16:45)
[GCC 4.3.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale; locale.getlocale()
(None, None)
>>> locale.getdefaultlocale()
('en_US', 'UTF8')
>>>
but not for 3.1:
$ python3.1
Python 3.1a1+ (py3k, Mar 23 2009, 00:12:12)
[GCC 4.3.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale; locale.getlocale()
('en_US', 'UTF8')
>>> locale.getdefaultlocale()
('en_US', 'UTF8')
>>>
Either the code is incorrect in 3.1 or the documentation should be
updated.
|
|||
| msg89016 - (view) | Author: Ezio Melotti (ezio.melotti) * (Python committer) | Date: 2009年06月06日 21:00 | |
Confirmed for 3.1, 3.0 still returns (None, None). |
|||
| msg89077 - (view) | Author: Georg Brandl (georg.brandl) * (Python committer) | Date: 2009年06月08日 13:29 | |
Deferring to Martin which one is correct :) |
|||
| msg89084 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2009年06月08日 16:01 | |
This is definately a bug in 3.1, for the same reason that a C program uses the C locale until an explicit setlocale is done: otherwise, a non-locale-aware program can run into bugs resulting from locale issues when run under a different locale than that of the program author. I have a memory of this being reported before somewhere and someone tracking it down to a change in python initialization, but I can't find a bug report and my google-foo is failing me. |
|||
| msg89088 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2009年06月08日 16:17 | |
For some reason only LC_CTYPE is affected:
>>> locale.getlocale(locale.LC_CTYPE)
('fr_FR', 'UTF8')
>>> locale.getlocale(locale.LC_MESSAGES)
(None, None)
>>> locale.getlocale(locale.LC_TIME)
(None, None)
>>> locale.getlocale(locale.LC_NUMERIC)
(None, None)
>>> locale.getlocale(locale.LC_COLLATE)
(None, None)
|
|||
| msg89089 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2009年06月08日 16:22 | |
Ah, I can tell you exactly why that is, then. I noticed this in pythonrun.c while grepping the source: #ifdef HAVE_SETLOCALE /* Set up the LC_CTYPE locale, so we can obtain the locale's charset without having to switch locales. */ setlocale(LC_CTYPE, ""); #endif SVN blames Martin in r56922, so this case is assigned appropriately. Perhaps changing only LC_CTYPE is safe? I must admit to ignorance as to what all the LC variables mean/control. |
|||
| msg89090 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2009年06月08日 16:26 | |
It would still be better it is was unset afterwards. Third-party extensions could have LC_CTYPE-dependent behaviour. |
|||
| msg89101 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2009年06月08日 19:39 | |
> It would still be better it is was unset afterwards. Third-party > extensions could have LC_CTYPE-dependent behaviour. In principle, they could, yes - but what specific behavior might that be? What will change is character classification, which I consider fairly harmless. Also, multi-byte conversion routines will change, which is the primary reason for leaving it modified. |
|||
| msg89102 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2009年06月08日 19:43 | |
> In principle, they could, yes - but what specific behavior might that > be? What will change is character classification, which I consider > fairly harmless. Also, multi-byte conversion routines will change, which > is the primary reason for leaving it modified. Ok, so I suppose we could leave the code as-is. |
|||
| msg89120 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2009年06月08日 21:51 | |
Since it controls what is considered to be whitespace, it is possible this will lead to subtle bugs, but I agree that it seems relatively benign, especially considering 3.x's unicode orientation. So, this becomes a doc bug... |
|||
| msg89136 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2009年06月09日 07:10 | |
To add a little bit more analysis: posix.device_encoding requires that the LC_CTYPE is set. Setting it just in this function would not be possible, as setlocale is not thread-safe. So for 3.1, it seems that Python must set LC_CTYPE. If somebody can propose a patch that avoids that for 3.2, I'd be certainly in favor. |
|||
| msg127180 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2011年01月27日 11:40 | |
> To add a little bit more analysis: posix.device_encoding requires that > the LC_CTYPE is set. Setting it just in this function would not be > possible, as setlocale is not thread-safe. open() does indirectly (locale.getpreferredencoding()) change temporary the locale (set LC_CTYPE to "") if the file is not a TTY (if it is a TTY, device_encoding() calls nl_langinfo(CODESET) without changing the current locale). If setlocale() is not thread-safe we have (maybe?) a problem here. See also #11022: report of an user not understanding why setlocale() doesn't impact open() (TextIOWrapper) encoding). A quick solution is to call locale.getpreferredencoding(False) which doesn't change the locale. Do you really need os.device_encoding()? If we change TextIOWrapper to call locale.getpreferredencoding(False), os.device_encoding() and locale.getpreferredencoding(False) will give the same result. Except on Windows: os.device_encoding() uses GetConsoleCP() if fd==0 and GetConsoleOutputCP() if fd in (1, 2). But we can use GetConsoleCP() and GetConsoleOutputCP() directly in initstdio(). If someone closes sys.std* and recreate them later: os.device_encoding() can be use explicitly to keep the previous behaviour. > It would still be better it is was unset afterwards. Third-party > extensions could have LC_CTYPE-dependent behaviour. If Python is embeded, it should not change the locale. Even if it is not embeded, it is maybe better to never set LC_CTYPE. It is too late to touch such critical point in Python 3.2, but we may change it in Python 3.3. |
|||
| msg127262 - (view) | Author: Marc-Andre Lemburg (lemburg) * (Python committer) | Date: 2011年01月28日 09:27 | |
Python can be embedded into other applications and unconditionally
changing the locale (esp. the LC_CTYPE) is not good practice, since
it's not thread-safe and affects the entire process. An application
may have set LC_CTYPE (or the locale) to something completely
different.
If at all, Python should be more careful using this call (pseudo
code):
lc_ctype = setlocale(LC_CTYPE, NULL);
if (lc_ctype == NULL || strcmp(lc_ctype, "") || strcmp(lc_ctype, "C")) {
env_lc_ctype = setlocale(LC_CTYPE, "");
setlocale(LC_CTYPE, lc_ctype);
lc_ctype = env_lc_ctype;
}
Then use lc_ctype to figure out encodings, etc.
While this is not thread-safe, it at least reverts the change back
to the original setting and only applies the change if needed. That's
still not optimal, but better than nothing.
An clean alternative would be adding LC_* variable parsing code to
Python to avoid the setlocale() call altogether.
|
|||
| msg127265 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2011年01月28日 09:33 | |
> An clean alternative would be adding LC_* variable parsing code to > Python to avoid the setlocale() call altogether. That would be highly non-portable, and repeat the mistakes of getdefaultlocale. |
|||
| msg127283 - (view) | Author: Marc-Andre Lemburg (lemburg) * (Python committer) | Date: 2011年01月28日 11:05 | |
Martin v. Löwis wrote: > > Martin v. Löwis <martin@v.loewis.de> added the comment: > >> An clean alternative would be adding LC_* variable parsing code to >> Python to avoid the setlocale() call altogether. > > That would be highly non-portable, and repeat the mistakes of > getdefaultlocale. You say that often, but I don't really know why. It's certainly portable between various Unix platforms, perhaps not Windows, but then i18n on Windows is a different story altogether. BTW: For Windows, you can adjust setlocale() to work thread-based using: _configthreadlocale() (http://msdn.microsoft.com/de-de/library/26c0tb7x(v=vs.80).aspx) Perhaps we ought to expose this in _locale and use it in getdefaultlocal() on Windows to query the locale settings via the pseudocode I posted. |
|||
| msg127347 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2011年01月28日 21:22 | |
>> That would be highly non-portable, and repeat the mistakes of >> getdefaultlocale. > > You say that often, but I don't really know why. It's certainly portable > between various Unix platforms, perhaps not Windows, but then i18n > on Windows is a different story altogether. No, it's absolutely not portable across Unix platforms. Looking at LANG or LC_ALL does *not* allow you to infer the region name, or the locale's character set. For example, using glibc, in some installations, /etc/locale.alias is considered to map a value of LANG to the final locale name. As an option, glibc also considers a LOCALE_ALIAS_PATH that may point to a (colon-separated) path of files to search for locale aliases. Other systems may use other databases to map a locale name to locale properties. Unless you know exactly what version of C library is running on a system, parsing environment variables yourself is doomed to fail. |
|||
| msg127350 - (view) | Author: Arfrever Frehtes Taifersar Arahesis (Arfrever) * (Python triager) | Date: 2011年01月28日 21:36 | |
Martin v. Löwis: It seems that your web browser replaces ", " with ",\t" in the title (where "\t" is a tab character) each time you add a comment. |
|||
| msg127351 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2011年01月28日 21:38 | |
More likely, it's my email reader. Sorry about that. |
|||
| msg127417 - (view) | Author: Steffen Daode Nurpmeso (sdaoden) | Date: 2011年01月29日 13:51 | |
User lemburg pointed me to this, but no, i've posted msg127416 to Issue 11022. |
|||
| msg141830 - (view) | Author: Alexis Metaireau (alexis) * (Python triager) | Date: 2011年08月09日 15:53 | |
Maybe could it be useful to specify in the documentation that getlocale() is not intended to be used to get information about what is the locale of the system? It's not explained currently and thus it's a bit weird to have getlocale returning (None, None) even if you have your locales set. |
|||
| msg141847 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2011年08月10日 00:24 | |
This issue is about the fact that it doesn't return (None, None). We should probably decide what we are going to do about that before changing the docs if they need it. |
|||
| msg141872 - (view) | Author: Alexis Metaireau (alexis) * (Python triager) | Date: 2011年08月10日 16:05 | |
I see two different things here: 1) the fact that getlocale() doesn't return (None, None) on some python versions 2) the fact that having it returning (None, None) by default is a bit misleading as users may think that getlocale() is tied to environment variables. That's what was at the origin of #12699 My last remark is about the second bit. Maybe should I start a new issue for this? |
|||
| msg141890 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2011年08月11日 01:25 | |
Yes a new issue would be more appropriate. |
|||
| msg147174 - (view) | Author: Petri Lehtinen (petri.lehtinen) * (Python committer) | Date: 2011年11月06日 19:48 | |
If the thread safety of setlocale() is a problem, does anybody know how portable uselocale() is? It sets the locale of the current thread only, so it's safe to temporarily change the locale and then set it back. |
|||
| msg162340 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2012年06月05日 12:02 | |
> Either the code is incorrect in 3.1 > or the documentation should be updated. Leaving LC_CTYPE unchanged (use the "C" locale, which is ASCII in most cases) at Python startup would be a major change in Python 3. I don't want to change this. You would see a lot of mojibake in your GUIs and get a lot of ugly surrogate characters in filenames (because of the PEP 393) if we don't set the LC_CTYPE to the user preferred encoding at startup anymore. Setting the LC_CTYPE to the user preferred encoding is just very convinient and helps Python to speak to the user though the console, to the filesystem, to pass arguments on a command line of a subprocess, etc. For example, you cannot pass non-ASCII characters to a subprocess, characters written by the user in your GUI, if your current LC_CTYPE locale is C (ASCII): you get an Unicode encode error. So it's just a documentation issue: see my attached patch. |
|||
| msg162355 - (view) | Author: Ned Deily (ned.deily) * (Python committer) | Date: 2012年06月05日 16:24 | |
LGTM |
|||
| msg162380 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2012年06月05日 23:39 | |
New changeset 113cdce4663c by Victor Stinner in branch 'default': Close #6203: Document that Python 3 sets LC_CTYPE at startup to the user's preferred locale encoding http://hg.python.org/cpython/rev/113cdce4663c |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:56:49 | admin | set | github: 50452 |
| 2012年06月05日 23:39:39 | python-dev | set | status: open -> closed nosy: + python-dev messages: + msg162380 resolution: fixed stage: needs patch -> resolved |
| 2012年06月05日 16:24:06 | ned.deily | set | messages: + msg162355 |
| 2012年06月05日 12:03:57 | vstinner | set | title: 3.x locale does not default to C, contrary to the documentation and to 2.x behavior -> locale documentation doesn't mention that LC_CTYPE is changed at startup components: + Unicode versions: + Python 3.2 |
| 2012年06月05日 12:02:58 | vstinner | set | files:
+ locale_doc.patch keywords: + patch messages: + msg162340 |
| 2011年11月06日 19:48:10 | petri.lehtinen | set | nosy:
+ petri.lehtinen messages: + msg147174 |
| 2011年08月11日 01:25:33 | r.david.murray | set | messages: + msg141890 |
| 2011年08月10日 16:05:48 | alexis | set | messages:
+ msg141872 title: 3.x locale does not default to C, contrary to the documentation and to 2.x behavior -> 3.x locale does not default to C, contrary to the documentation and to 2.x behavior |
| 2011年08月10日 00:24:02 | r.david.murray | set | messages: + msg141847 |
| 2011年08月09日 15:53:51 | alexis | set | nosy:
+ alexis messages: + msg141830 |
| 2011年08月05日 21:34:37 | ned.deily | link | issue12699 superseder |
| 2011年01月29日 13:51:48 | sdaoden | set | nosy:
+ sdaoden messages: + msg127417 |
| 2011年01月28日 21:38:45 | loewis | set | nosy:
lemburg, loewis, georg.brandl, pitrou, vstinner, ned.deily, ezio.melotti, Arfrever, r.david.murray messages: + msg127351 |
| 2011年01月28日 21:36:54 | Arfrever | set | nosy:
lemburg, loewis, georg.brandl, pitrou, vstinner, ned.deily, ezio.melotti, Arfrever, r.david.murray messages: + msg127350 |
| 2011年01月28日 21:22:14 | loewis | set | nosy:
lemburg, loewis, georg.brandl, pitrou, vstinner, ned.deily, ezio.melotti, Arfrever, r.david.murray messages: + msg127347 title: 3.x locale does not default to C, contrary to the documentation and to 2.x behavior -> 3.x locale does not default to C, contrary to the documentation and to 2.x behavior |
| 2011年01月28日 15:01:17 | Arfrever | set | nosy:
lemburg, loewis, georg.brandl, pitrou, vstinner, ned.deily, ezio.melotti, Arfrever, r.david.murray title: 3.x locale does not default to C, contrary to the documentation and to 2.x behavior -> 3.x locale does not default to C, contrary to the documentation and to 2.x behavior |
| 2011年01月28日 11:05:45 | lemburg | set | nosy:
lemburg, loewis, georg.brandl, pitrou, vstinner, ned.deily, ezio.melotti, Arfrever, r.david.murray messages: + msg127283 |
| 2011年01月28日 09:33:39 | loewis | set | nosy:
lemburg, loewis, georg.brandl, pitrou, vstinner, ned.deily, ezio.melotti, Arfrever, r.david.murray messages: + msg127265 title: 3.x locale does not default to C, contrary to the documentation and to 2.x behavior -> 3.x locale does not default to C, contrary to the documentation and to 2.x behavior |
| 2011年01月28日 09:27:54 | lemburg | set | nosy:
+ lemburg messages: + msg127262 |
| 2011年01月27日 16:58:10 | Arfrever | set | nosy:
+ Arfrever |
| 2011年01月27日 11:40:07 | vstinner | set | nosy:
+ vstinner messages: + msg127180 versions: + Python 3.3, - Python 3.2 |
| 2010年10月29日 10:07:21 | admin | set | assignee: georg.brandl -> docs@python |
| 2009年12月30日 01:46:52 | r.david.murray | set | versions: + Python 3.2, - Python 3.1 |
| 2009年06月09日 10:43:42 | pitrou | set | assignee: georg.brandl |
| 2009年06月09日 07:10:25 | loewis | set | assignee: loewis -> (no value) messages: + msg89136 |
| 2009年06月08日 21:51:50 | r.david.murray | set | priority: release blocker -> high messages: + msg89120 components: - Library (Lib) nosy: loewis, georg.brandl, pitrou, ned.deily, ezio.melotti, r.david.murray |
| 2009年06月08日 19:43:09 | pitrou | set | messages:
+ msg89102 title: 3.x locale does not default to C, contrary to the documentation and to 2.x behavior -> 3.x locale does not default to C, contrary to the documentation and to 2.x behavior |
| 2009年06月08日 19:39:29 | loewis | set | messages:
+ msg89101 title: 3.x locale does not default to C, contrary to the documentation and to 2.x behavior -> 3.x locale does not default to C, contrary to the documentation and to 2.x behavior |
| 2009年06月08日 16:26:25 | pitrou | set | messages: + msg89090 |
| 2009年06月08日 16:22:10 | r.david.murray | set | messages: + msg89089 |
| 2009年06月08日 16:17:53 | pitrou | set | nosy:
+ pitrou messages: + msg89088 |
| 2009年06月08日 16:01:05 | r.david.murray | set | priority: normal -> release blocker nosy: + r.david.murray messages: + msg89084 stage: needs patch |
| 2009年06月08日 13:29:54 | georg.brandl | set | assignee: georg.brandl -> loewis messages: + msg89077 nosy: + loewis |
| 2009年06月06日 21:00:39 | ezio.melotti | set | priority: normal nosy: + ezio.melotti messages: + msg89016 components: + Library (Lib) |
| 2009年06月05日 10:56:37 | ned.deily | create | |