Issue 7442: _localemodule.c: str2uni() with different LC_NUMERIC and LC_CTYPE

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/51691

classification

Title:	_localemodule.c: str2uni() with different LC_NUMERIC and LC_CTYPE
Type:	behavior	Stage:	needs patch
Components:	Versions:	Python 3.2

process

Dependencies:	Superseder:
Status:	closed	Resolution:	wont fix
Assigned To:	Nosy List:	eric.smith, loewis, mark.dickinson, mcepl, skrah, vstinner
Priority:	normal	Keywords:	patch

Created on 2009年12月05日 10:44 by skrah, last changed 2022年04月11日 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
set_ctype_before_mbstowcs.patch	skrah, 2010年02月13日 14:05	review
inconsistent_locale_encodings.py	vstinner, 2013年10月22日 13:54
mbstowcs_l.c	skrah, 2013年10月22日 19:35

Messages (35)
msg95988 - (view)	Author: Stefan Krah (skrah) * (Python committer)	Date: 2009年12月05日 10:44
Hi, the following works in 2.7 but not in 3.x: >>> import locale >>> from decimal import * >>> locale.setlocale(locale.LC_NUMERIC, 'fi_FI') 'fi_FI' >>> format(Decimal('1000'), 'n') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python3.2/decimal.py", line 3632, in __format__ spec = _parse_format_specifier(specifier, _localeconv=_localeconv) File "/usr/lib/python3.2/decimal.py", line 5628, in _parse_format_specifier _localeconv = _locale.localeconv() File "/usr/lib/python3.2/locale.py", line 111, in localeconv d = _localeconv() ValueError: Cannot convert byte to string
msg96008 - (view)	Author: Stefan Krah (skrah) * (Python committer)	Date: 2009年12月05日 21:03
This fails in _localemodule.c: str2uni(). mbstowcs(NULL, s, 0) is LC_CTYPE sensitive, but LC_CTYPE is UTF-8 in my terminal. If I set LC_CTYPE and LC_NUMERIC together, things work. This raises the question: If LC_CTYPE and LC_NUMERIC differ (and since they are separate entities I assume they may differ), what is the correct way to convert the separator and the decimal point? a) call setlocale(LC_CTYPE, setlocale(LC_NUMERIC, NULL)) before mbstowcs. This is not really an option. b) use some kind of _mbstowcs_l (http://msdn.microsoft.com/en-us/library/k1f9b8cy(VS.80).aspx), which takes a locale parameter. But I don't find such a thing on Linux.
msg96534 - (view)	Author: Mark Dickinson (mark.dickinson) * (Python committer)	Date: 2009年12月17日 21:02
I'm failing to reproduce this (with py3k) on OS X: Python 3.2a0 (py3k:76866:76867, Dec 17 2009, 09:19:26) [GCC 4.2.1 (Apple Inc. build 5646) (dot 1)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import locale >>> from decimal import * >>> locale.setlocale(locale.LC_NUMERIC, 'fi_FI') 'fi_FI' >>> format(Decimal('1000'), 'n') '1.000' The locale command, from the same Terminal prompt, gives me: LANG="en_IE.UTF-8" LC_COLLATE="en_IE.UTF-8" LC_CTYPE="en_IE.UTF-8" LC_MESSAGES="en_IE.UTF-8" LC_MONETARY="en_IE.UTF-8" LC_NUMERIC="en_IE.UTF-8" LC_TIME="en_IE.UTF-8" LC_ALL= Just to be clear, is is true that you still get the same result without involving Decimal at all? That is, am I correct in assuming that: >>> import locale >>> locale.setlocale(locale.LC_NUMERIC, 'fi_FI') 'fi_FI' >>> locale.localeconv() also gives you that ValueError?
msg96535 - (view)	Author: Mark Dickinson (mark.dickinson) * (Python committer)	Date: 2009年12月17日 21:07
What are the multibyte strings that mbstowcs is failing to convert? On my machine, the separators come out as plain ASCII '.' (for thousands) and ',' (for the decimal point).
msg96544 - (view)	Author: Eric V. Smith (eric.smith) * (Python committer)	Date: 2009年12月18日 00:35
I can reproduce it on a Fedora (fc6) Linux box. It's not a decimal problem, but a plain locale problem: >>> import locale >>> locale.setlocale(locale.LC_NUMERIC, 'fi_FI') 'fi_FI' >>> locale.localeconv() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/root/python/py3k/Lib/locale.py", line 111, in localeconv d = _localeconv() ValueError: Cannot convert byte to string >>> Here's the contents of the struct lconv as returned by localeconv(): ((gdb) p *l 1ドル = {decimal_point = 0xb7b54020 ",", thousands_sep = 0xb7b54022 " ", grouping = 0xb7b54024 "003円003円", int_curr_symbol = 0x998858 "", currency_symbol = 0x998858 "", mon_decimal_point = 0x998858 "", mon_thousands_sep = 0x998858 "", mon_grouping = 0x998858 "", positive_sign = 0x998858 "", negative_sign = 0x998858 "", int_frac_digits = 127 '177円', frac_digits = 127 '177円', p_cs_precedes = 127 '177円', p_sep_by_space = 127 '177円', n_cs_precedes = 127 '177円', n_sep_by_space = 127 '177円', p_sign_posn = 127 '177円', n_sign_posn = 127 '177円', int_p_cs_precedes = 127 '177円', int_p_sep_by_space = 127 '177円', int_n_cs_precedes = 127 '177円', int_n_sep_by_space = 127 '177円', int_p_sign_posn = 127 '177円', int_n_sign_posn = 127 '177円'} The problem is thousands_sep: (gdb) p l->thousands_sep 2ドル = 0xb7b54022 " " (gdb) p (unsigned char)l->thousands_sep[0] 3ドル = 160 ' '
msg96556 - (view)	Author: Stefan Krah (skrah) * (Python committer)	Date: 2009年12月18日 09:38
Yes, it's a problem in _localemodule.c. This situation always occurs when LC_NUMERIC is something like ISO8859-15, LC_CTYPE is UTF-8 AND the decimal point or separator are in the range 128-255. Then mbstowcs tries to decode the character according to LC_CTYPE and finds that the character is not valid UTF-8: static PyObjectmbstowcs str2uni(const char s) { #ifdef HAVE_BROKEN_MBSTOWCS size_t needed = strlen(s); #else size_t needed = mbstowcs(NULL, s, 0); #endif I can't see a portable way to fix this except: block threads set temporary LC_CTYPE call mbstowcs restore LC_CTYPE unblock threads I don't think this issue is important enough to do that. What I do in cdecimal is raise an error "Invalid separator or unsupported combination of LC_NUMERIC and LC_CTYPE".
msg96557 - (view)	Author: Stefan Krah (skrah) * (Python committer)	Date: 2009年12月18日 09:47
Changed title (was: decimal.py: format failure with locale specifier)
msg99317 - (view)	Author: Stefan Krah (skrah) * (Python committer)	Date: 2010年02月13日 14:05
I have a patch that fixes this specific issue. Probably there are similar issues in other places, e.g. when LC_TIME and LC_CTYPE differ. I suspect that this is related: http://bugs.python.org/issue5905
msg190055 - (view)	Author: Mark Lawrence (BreamoreBoy) *	Date: 2013年05月26日 03:40
Could we have a patch review please. Also note that #5905 has been closed.
msg200892 - (view)	Author: Matej Cepl (mcepl) *	Date: 2013年10月22日 08:28
Is this the same as when tests with python-3.3.2 fails on me with RHEL-6? test_locale (test.test_format.FormatTest) ... ERROR test_non_ascii (test.test_format.FormatTest) ... test test_format failed '\u20ac=%f' % (1.0,) =? '\u20ac=1.000000' ... yes ok ====================================================================== ERROR: test_locale (test.test_format.FormatTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/matej/build/Extras/python3/Python-3.Lib/test/test_format.py", line 295, in test_locale localeconv = locale.localeconv() File "/home/matej/build/Extras/python3/Python-3.Lib/locale.py", line 111, in localeconv d = _localeconv() UnicodeDecodeError: 'locale' codec can't decode byte 0xe2 in position 0: Invalid or incomplete multibyte or wide character ----------------------------------------------------------------------
msg200912 - (view)	Author: Stefan Krah (skrah) * (Python committer)	Date: 2013年10月22日 10:29
Matej Cepl <report@bugs.python.org> wrote: > Is this the same as when tests with python-3.3.2 fails on me with RHEL-6? If LC_CTYPE is UTF-8 and LC_NUMERIC something like ISO-8859-2 then it's the same issue.
msg200916 - (view)	Author: Matej Cepl (mcepl) *	Date: 2013年10月22日 11:31
Hmm, so with this patch diff -up Python-3.Lib/test/test_format.py.fixFormatTest Python-3.Lib/test/test_format.py --- Python-3.Lib/test/test_format.py.fixFormatTest 2013年10月22日 10:05:12.253426746 +0200 +++ Python-3.Lib/test/test_format.py 2013年10月22日 10:16:58.510530570 +0200 @@ -288,7 +288,7 @@ class FormatTest(unittest.TestCase): def test_locale(self): try: oldloc = locale.setlocale(locale.LC_ALL) - locale.setlocale(locale.LC_ALL, '') + locale.setlocale(locale.LC_ALL, 'ps_AF') except locale.Error as err: self.skipTest("Cannot set locale: {}".format(err)) try: (or any other explicit locale, I have tried also en_IE) test doesn't fail. Using Python-3.3.2 on RHEL-6 (kernel 2.6.32-358.23.2.el6.i686).
msg200917 - (view)	Author: Matej Cepl (mcepl) *	Date: 2013年10月22日 11:34
To #200912: now, system locale is UTF-8 all the way: santiago:python3 (el6) $ locale LANG=en_US.utf8 LC_CTYPE="en_US.utf8" LC_NUMERIC=en_IE.utf8 LC_TIME=en_IE.utf8 LC_COLLATE="en_US.utf8" LC_MONETARY=en_IE.utf8 LC_MESSAGES="en_US.utf8" LC_PAPER=en_IE.utf8 LC_NAME="en_US.utf8" LC_ADDRESS="en_US.utf8" LC_TELEPHONE="en_US.utf8" LC_MEASUREMENT=en_IE.utf8 LC_IDENTIFICATION="en_US.utf8" LC_ALL= santiago:python3 (el6) $
msg200927 - (view)	Author: Stefan Krah (skrah) * (Python committer)	Date: 2013年10月22日 12:00
Matej Cepl <report@bugs.python.org> wrote: > To #200912: now, system locale is UTF-8 all the way: > santiago:python3 (el6) $ locale > LANG=en_US.utf8 > LC_CTYPE="en_US.utf8" > LC_NUMERIC=en_IE.utf8 > LC_TIME=en_IE.utf8 > LC_COLLATE="en_US.utf8" > LC_MONETARY=en_IE.utf8 > LC_MESSAGES="en_US.utf8" > LC_PAPER=en_IE.utf8 > LC_NAME="en_US.utf8" > LC_ADDRESS="en_US.utf8" > LC_TELEPHONE="en_US.utf8" > LC_MEASUREMENT=en_IE.utf8 > LC_IDENTIFICATION="en_US.utf8" > LC_ALL= > santiago:python3 (el6) $ The test passes here with these values (Debian). What is the output of: a) locale.localeconv() b) locale.setlocale(locale.LC_ALL, '') locale.localeconv()
msg200928 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2013年10月22日 12:12
This issue is very close to the issue #13706 which I solved with the new function PyUnicode_DecodeLocale(): see get_locale_info() in Python/formatter_unicode.c. We might copy/paste the code, or we should maybe add a private API to get locale information: get_locale_info() => _Py_get_locale_info() and expose the LocaleInfo structure. It may be added to unicodeobject.h for example, there is already a function related to locales: _PyUnicode_InsertThousandsGrouping().
msg200929 - (view)	Author: Matej Cepl (mcepl) *	Date: 2013年10月22日 12:17
>>> import locale >>> locale.localeconv() {'p_cs_precedes': 127, 'n_sep_by_space': 127, 'n_sign_posn': 127, 'n_cs_precedes': 127, 'grouping': [], 'positive_sign': '', 'mon_grouping': [], 'p_sep_by_space': 127, 'mon_thousands_sep': '', 'currency_symbol': '', 'mon_decimal_point': '', 'int_curr_symbol': '', 'thousands_sep': '', 'frac_digits': 127, 'int_frac_digits': 127, 'negative_sign': '', 'decimal_point': '.', 'p_sign_posn': 127} >>> locale.setlocale(locale.LC_ALL, '') 'LC_CTYPE=en_US.utf8;LC_NUMERIC=en_IE.utf8;LC_TIME=en_IE.utf8;LC_COLLATE=en_US.utf8;LC_MONETARY=en_IE.utf8;LC_MESSAGES=en_US.utf8;LC_PAPER=en_IE.utf8;LC_NAME=en_US.utf8;LC_ADDRESS=en_US.utf8;LC_TELEPHONE=en_US.utf8;LC_MEASUREMENT=en_IE.utf8;LC_IDENTIFICATION=en_US.utf8' >>> locale.localeconv() {'p_cs_precedes': 1, 'n_sep_by_space': 0, 'n_sign_posn': 1, 'n_cs_precedes': 1, 'grouping': [3, 3, 0], 'positive_sign': '', 'mon_grouping': [3, 3, 0], 'p_sep_by_space': 0, 'mon_thousands_sep': ',', 'currency_symbol': '€', 'mon_decimal_point': '.', 'int_curr_symbol': 'EUR ', 'thousands_sep': ',', 'frac_digits': 2, 'int_frac_digits': 2, 'negative_sign': '-', 'decimal_point': '.', 'p_sign_posn': 1} >>>
msg200933 - (view)	Author: Matej Cepl (mcepl) *	Date: 2013年10月22日 12:33
Perhaps version of glibc might be interesting as well? glibc-2.12-1.107.el6_4.5.i686
msg200944 - (view)	Author: Stefan Krah (skrah) * (Python committer)	Date: 2013年10月22日 13:22
Matej Cepl <report@bugs.python.org> wrote: > >>> import locale > >>> locale.localeconv() > {'p_cs_precedes': 127, 'n_sep_by_space': 127, 'n_sign_posn': 127, 'n_cs_precedes': 127, 'grouping': [], 'positive_sign': '', 'mon_grouping': [], 'p_sep_by_space': 127, 'mon_thousands_sep': '', 'currency_symbol': '', 'mon_decimal_point': '', 'int_curr_symbol': '', 'thousands_sep': '', 'frac_digits': 127, 'int_frac_digits': 127, 'negative_sign': '', 'decimal_point': '.', 'p_sign_posn': 127} > >>> locale.setlocale(locale.LC_ALL, '') > 'LC_CTYPE=en_US.utf8;LC_NUMERIC=en_IE.utf8;LC_TIME=en_IE.utf8;LC_COLLATE=en_US.utf8;LC_MONETARY=en_IE.utf8;LC_MESSAGES=en_US.utf8;LC_PAPER=en_IE.utf8;LC_NAME=en_US.utf8;LC_ADDRESS=en_US.utf8;LC_TELEPHONE=en_US.utf8;LC_MEASUREMENT=en_IE.utf8;LC_IDENTIFICATION=en_US.utf8' > >>> locale.localeconv() > {'p_cs_precedes': 1, 'n_sep_by_space': 0, 'n_sign_posn': 1, 'n_cs_precedes': 1, 'grouping': [3, 3, 0], 'positive_sign': '', 'mon_grouping': [3, 3, 0], 'p_sep_by_space': 0, 'mon_thousands_sep': ',', 'currency_symbol': '€', 'mon_decimal_point': '.', 'int_curr_symbol': 'EUR ', 'thousands_sep': ',', 'frac_digits': 2, 'int_frac_digits': 2, 'negative_sign': '-', 'decimal_point': '.', 'p_sign_posn': 1} These look normal. I'm puzzled, because that's what is going on in the test as well. Do you get the failure when running the test in isolation: ./python -m test test_format If this passes, there might be some interaction with other tests. If it doesn't pass, I would step through the test in gdb (break PyLocale_localeconv) and see which member of struct lconv is the troublemaker.
msg200954 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2013年10月22日 13:54
Title: _localemodule.c: str2uni() with different LC_NUMERIC and LC_CTYPE Oh, I just realized that the issue is a LC_NUMERIC using an encoding A with a LC_CTYPE using an encoding B. It looks like the glibc does not support this setup, at least for the fi_FI locale which has a non-ASCII thousand separator (non-breaking space: U+00A0). Try attached inconsistent_locale_encodings.py script (it uses locale names for Fedora 19, you may have to adapt it to your OS). Output on Fedora 19: fi_FI numeric (ISO-8859-1) with fr_FR.utf8 ctype (UTF-8) UnicodeDecodeError: 'locale' codec can't decode byte 0xa0 in position 0: Virheellinen tai epätäydellinen monitavumerkki tai leveä merkki fi_FI@euro numeric (ISO-8859-15) with fr_FR.utf8 ctype (UTF-8) UnicodeDecodeError: 'locale' codec can't decode byte 0xa0 in position 0: Virheellinen tai epätäydellinen monitavumerkki tai leveä merkki fi_FI.iso88591 numeric (ISO-8859-1) with fr_FR.utf8 ctype (UTF-8) UnicodeDecodeError: 'locale' codec can't decode byte 0xa0 in position 0: Virheellinen tai epätäydellinen monitavumerkki tai leveä merkki fi_FI.iso885915@euro numeric (ISO-8859-15) with fr_FR.utf8 ctype (UTF-8) UnicodeDecodeError: 'locale' codec can't decode byte 0xa0 in position 0: Virheellinen tai epätäydellinen monitavumerkki tai leveä merkki fi_FI.utf8 numeric (UTF-8) with fr_FR.utf8 ctype (UTF-8) {'grouping': [3, 3, 0], 'p_cs_precedes': 0, 'mon_thousands_sep': '\xa0', 'decimal_point': ',', 'n_sep_by_space': 1, 'n_sign_posn': 1, 'mon_decimal_point': ',', 'frac_digits': 2, 'positive_sign': '', 'mon_grouping': [3, 3, 0], 'n_cs_precedes': 0, 'thousands_sep': '\xa0', 'p_sep_by_space': 1, 'p_sign_posn': 1, 'int_frac_digits': 2, 'currency_symbol': '€', 'negative_sign': '-', 'int_curr_symbol': 'EUR '}
msg200957 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2013年10月22日 14:19
msg95988> Hi, the following works in 2.7 but not in 3.x: ... Sure it works because Python 2 pass the raw byte string, it does not try to decode it. But did you try to display the result in a terminal for example? Example with Python 2 in an UTF-8 terminal: $ python Python 2.7.5 (default, Oct 8 2013, 12:19:40) [GCC 4.8.1 20130603 (Red Hat 4.8.1-1)] on linux2 >>> import locale >>> # set the locale encoding to UTF-8 ... locale.setlocale(locale.LC_CTYPE, 'fr_FR.utf8') 'fr_FR.utf8' >>> # set the thousand separator to U+00A0 ... locale.setlocale(locale.LC_NUMERIC, 'fi_FI') 'fi_FI' >>> locale.getlocale(locale.LC_CTYPE) ('fr_FR', 'UTF-8') >>> locale.getlocale(locale.LC_NUMERIC) ('fi_FI', 'ISO8859-15') >>> locale.format('%d', 123456, True) '123\xa0456' >>> print(locale.format('%d', 123456, True)) 123�456 Mojibake! � means that b'\xA0' cannot be decoded from the locale encoding (UTF-8). There is probably the same issue with a LC_MONETARY using a different encoding than LC_CTYPE. > I suspect that this is related: #5905 It is unrelated: time.strftime() uses the LC_CTYPE, but the Python was using the wrong encoding. Python used the locale encoding read at startup, whereas the current locale encoding must be used. This issue is specific to LC_NUMERIC with a LC_CTYPE using different encoding. > If I set LC_CTYPE and LC_NUMERIC together, things work. Sure, because in this case, LC_NUMERIC produces data in the same encoding than LC_CTYPE. > call setlocale(LC_CTYPE, setlocale(LC_NUMERIC, NULL)) before > mbstowcs. This is not really an option. Setting a locale is process-wide and should be avoided. FYI locale.getpreferredencoding() changes temporarly the LC_CTYPE by default, it only uses the current LC_CTYPE if you pass False. open() changed temporarly LC_CTYPE because of that in Python 3.0-3.2 (see issue #11022). The following PostgreSQL issue looks to be the same than this Python issue: 4B7E07541D0@cvs.postgresql.org">http://www.postgresql.org/message-id/20100422015552.4B7E07541D0@cvs.postgresql.org The fix changes temporarly the LC_CTYPE encoding: #ifdef WIN32 setlocale(LC_CTYPE, locale_monetary); #endif (I don't know why the code is specific to Windows.)
msg200960 - (view)	Author: Stefan Krah (skrah) * (Python committer)	Date: 2013年10月22日 14:28
STINNER Victor <report@bugs.python.org> wrote: > The following PostgreSQL issue looks to be the same than this Python issue: > 4B7E07541D0@cvs.postgresql.org">http://www.postgresql.org/message-id/20100422015552.4B7E07541D0@cvs.postgresql.org > > The fix changes temporarly the LC_CTYPE encoding: So does my patch. :)
msg200961 - (view)	Author: Matej Cepl (mcepl) *	Date: 2013年10月22日 14:29
What I did: 1) run the build (it is a building of Fedora Rawhide python3 package on RHEL), and see it failed in this test. 2) see below santiago:python3 (el6) $ cd Python-3.3.2/build/optimized/ santiago:optimized (el6) $ ./python -m test test_format [1/1] test_format 1 test OK. santiago:optimized (el6) $ Oh well
msg200968 - (view)	Author: Stefan Krah (skrah) * (Python committer)	Date: 2013年10月22日 15:28
Victor, thanks for the comments. I also think we should set LC_CTYPE closer to the actual call to mbstowcs(), otherwise there are many API calls in between. So it should happen somewhere in PyUnicode_DecodeLocaleAndSize(). Perhaps we can create _PyUnicode_DecodeLocaleAndSize() which would take an LC_CTYPE parameter?
msg200969 - (view)	Author: Stefan Krah (skrah) * (Python committer)	Date: 2013年10月22日 15:32
Matej Cepl <report@bugs.python.org> wrote: > santiago:optimized (el6) $ ./python -m test test_format > [1/1] test_format > 1 test OK. It looks like some other test in the test suite did not restore a changed locale. If you're motivated enough, you could try if it still happens in 3.4 and open a new issue (it's unrelated to this one).
msg200970 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2013年10月22日 15:48
"So it should happen somewhere in PyUnicode_DecodeLocaleAndSize(). Perhaps we can create _PyUnicode_DecodeLocaleAndSize() which would take an LC_CTYPE parameter?" For this issue, it means that Python localeconv() will have to change the LC_CTYPE locale many time, for each monetary and each number value. I prefer your patch :-)
msg200974 - (view)	Author: Matej Cepl (mcepl) *	Date: 2013年10月22日 17:22
On 22/10/13 17:32, Stefan Krah wrote: > It looks like some other test in the test suite did not restore a changed > locale. If you're motivated enough, you could try if it still happens in > 3.4 and open a new issue (it's unrelated to this one). Very plain checkout of git and ./configure && make && make test leads to another failed test, but not this one (issue #19353). So either we do something wrong in all those Fedora patches, or this has been fixed since 3.3.2. Curiouser and curiouser.
msg200976 - (view)	Author: Stefan Krah (skrah) * (Python committer)	Date: 2013年10月22日 17:55
STINNER Victor <report@bugs.python.org> wrote: > "So it should happen somewhere in PyUnicode_DecodeLocaleAndSize(). Perhaps we can create _PyUnicode_DecodeLocaleAndSize() which would take an LC_CTYPE parameter?" > > For this issue, it means that Python localeconv() will have to change the LC_CTYPE locale many time, for each monetary and each number value. I prefer your patch :-) Windows and OS X have mbstowcs_l(), which takes a locale arg. Linux doesn't (as far as I can see). I agree this solution is ugly, but it probably won't have an impact on benchmarks (naively assuming that setlocale() is fast).
msg200983 - (view)	Author: Stefan Krah (skrah) * (Python committer)	Date: 2013年10月22日 19:34
Here's a quick-and-dirty version of mbstowcs_l(). The difference in the benchmark to the plain mbstowcs() is 0.82s vs 0.55s. In the context of a Python function this is unlikely to be measurable.
msg202117 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2013年11月04日 11:02
@Stefan: Did you my comments on Rietveld? http://bugs.python.org/review/7442/#ps1473
msg202127 - (view)	Author: Stefan Krah (skrah) * (Python committer)	Date: 2013年11月04日 13:09
Yes, I saw the comments. I'm still wondering if we should just write an mbstowcs_l() function instead. Even then, there would still be a small chance that a C extension that creates its own thread picks up the wrong LC_CTYPE.
msg202132 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2013年11月04日 13:27
What is this locale_t type used for the locale parameter of mbstowcs_l()? Are you sure that it is a string? According to this patch, it looks like a structure: http://www.winehq.org/pipermail/wine-cvs/2010-May/067264.html
msg202145 - (view)	Author: Stefan Krah (skrah) * (Python committer)	Date: 2013年11月04日 15:39
STINNER Victor <report@bugs.python.org> wrote: > What is this locale_t type used for the locale parameter of mbstowcs_l()? > Are you sure that it is a string? According to this patch, it looks like a structure: > http://www.winehq.org/pipermail/wine-cvs/2010-May/067264.html Yes, the string was mainly for benchmarking. FreeBSD seems to have thread safe versions (xlocale.h), that take a locale_t as the extra parameter.
msg229318 - (view)	Author: Stefan Krah (skrah) * (Python committer)	Date: 2014年10月14日 17:15
Well, I originally opened this issue but personally I'm not that bothered by it any more. Victor, do you want to keep it open?
msg229338 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2014年10月14日 19:30
The issue has a workaround: use LC_NUMERIC and LC_CTYPE locales which use the same encoding. To avoid issues, it's probably safer to only use UTF-8 locales, which are now available on modern Linux distro. I don't like the idea of calling setlocale() just for this corner case, because it changes the locale for all threads. Even if Python is protected by the GIL, Python can be embedded or modules written in C may spawn threads which don't care of the GIL. Usually, if it can fail, it will fail :-) I see that various people contributed to the issue, but it looks like the only user asking for the request is Stefan Krah. I prefer to close the issue and wait until more users ask for it before considering again the patch, or find a different way to implement the feature (support LC_NUMERIC and LC_CTYPE locales using a different encoding). To be clear, I'm closing the issue right now.
msg309770 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2018年01月10日 15:46
> I prefer to close the issue and wait until more users ask for it before considering again the patch, or find a different way to implement the feature (support LC_NUMERIC and LC_CTYPE locales using a different encoding). Here we are: https://bugs.python.org/issue31900

History
Date	User	Action	Args
2022年04月11日 14:56:55	admin	set	github: 51691
2018年01月10日 15:46:42	vstinner	set	messages: + msg309770
2014年10月14日 19:30:57	vstinner	set	status: open -> closed resolution: wont fix messages: + msg229338
2014年10月14日 17:15:25	skrah	set	messages: + msg229318 versions: + Python 3.2, - Python 3.3, Python 3.4
2014年02月03日 15:47:57	BreamoreBoy	set	nosy: - BreamoreBoy
2013年11月04日 15:39:02	skrah	set	messages: + msg202145
2013年11月04日 13:27:41	vstinner	set	messages: + msg202132
2013年11月04日 13:09:39	skrah	set	messages: + msg202127
2013年11月04日 11:02:09	vstinner	set	messages: + msg202117
2013年10月22日 19:35:57	skrah	set	files: + mbstowcs_l.c
2013年10月22日 19:34:06	skrah	set	messages: + msg200983
2013年10月22日 17:55:08	skrah	set	messages: + msg200976
2013年10月22日 17:22:11	mcepl	set	messages: + msg200974
2013年10月22日 15:48:48	vstinner	set	messages: + msg200970
2013年10月22日 15:32:13	skrah	set	messages: + msg200969
2013年10月22日 15:28:43	skrah	set	messages: + msg200968
2013年10月22日 14:29:19	mcepl	set	messages: + msg200961
2013年10月22日 14:28:48	skrah	set	messages: + msg200960
2013年10月22日 14:19:06	vstinner	set	messages: + msg200957
2013年10月22日 13:54:51	vstinner	set	files: + inconsistent_locale_encodings.py messages: + msg200954
2013年10月22日 13:22:16	skrah	set	messages: + msg200944
2013年10月22日 12:33:54	mcepl	set	messages: + msg200933
2013年10月22日 12:17:22	mcepl	set	messages: + msg200929
2013年10月22日 12:12:53	vstinner	set	nosy: + vstinner messages: + msg200928 versions: + Python 3.3, Python 3.4, - Python 3.1, Python 3.2
2013年10月22日 12:00:34	skrah	set	messages: + msg200927
2013年10月22日 11:34:03	mcepl	set	messages: + msg200917
2013年10月22日 11:31:38	mcepl	set	messages: + msg200916
2013年10月22日 10:29:10	skrah	set	messages: + msg200912
2013年10月22日 08:28:54	mcepl	set	nosy: + mcepl messages: + msg200892
2013年05月26日 03:40:38	BreamoreBoy	set	nosy: + BreamoreBoy messages: + msg190055
2010年02月13日 14:05:58	skrah	set	files: + set_ctype_before_mbstowcs.patch keywords: + patch messages: + msg99317
2009年12月19日 21:40:37	pitrou	set	priority: normal nosy: + loewis type: behavior stage: needs patch
2009年12月18日 09:47:04	skrah	set	messages: + msg96557 title: decimal.py: format failure with locale specifier -> _localemodule.c: str2uni() with different LC_NUMERIC and LC_CTYPE
2009年12月18日 09:38:04	skrah	set	messages: + msg96556
2009年12月18日 00:35:59	eric.smith	set	nosy: + eric.smith messages: + msg96544
2009年12月17日 21:07:00	mark.dickinson	set	messages: + msg96535
2009年12月17日 21:02:04	mark.dickinson	set	messages: + msg96534
2009年12月05日 21:03:37	skrah	set	messages: + msg96008
2009年12月05日 10:44:18	skrah	create

homepage