homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: TestEnUSCollation.test_strxfrm() fails on Solaris
Type: Stage:
Components: Unicode Versions: Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder: test_local.TestEnUSCollection failures on Solaris 10
View: 16258
Assigned To: Nosy List: ezio.melotti, jcea, loewis, python-dev, skrah, vstinner
Priority: normal Keywords:

Created on 2011年11月20日 23:58 by vstinner, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
strxfrm.c vstinner, 2011年11月21日 02:09
localeconv_wchar.c vstinner, 2011年12月08日 01:23
Messages (24)
msg148017 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年11月20日 23:58
I added a test in _PyUnicode_CheckConsistency() (in debug mode) to ensure that all characters of a string are in the range U+0000-U+10FFFF. Locale tests are now failing on Solaris:
-----------------------------------
[ 28/361] test__locale
Assertion failed: maxchar <= 0x10FFFF, file Objects/unicodeobject.c, line 408
Fatal Python error: Aborted
Current thread 0x00000001:
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/test__locale.py", line 134 in test_float_parsing
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/case.py", line 385 in _executeTestPart
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/case.py", line 440 in run
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/case.py", line 492 in __call__
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/suite.py", line 105 in run
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/suite.py", line 67 in __call__
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/suite.py", line 105 in run
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/suite.py", line 67 in __call__
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/runner.py", line 168 in run
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/support.py", line 1368 in _run_suite
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/support.py", line 1402 in run_unittest
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/test__locale.py", line 139 in test_main
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/regrtest.py", line 1203 in runtest_inner
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/regrtest.py", line 906 in runtest
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/regrtest.py", line 709 in main
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/__main__.py", line 13 in <module>
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/runpy.py", line 73 in _run_code
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/runpy.py", line 160 in _run_module_as_main
*** Error code 134
-----------------------------------
The problem is that strxfrm() and wcsxfrm() return strange results for the string "a" and the english locale (e.g. en_US.UTF-8).
strxfrm(buffer, "a0円", 100) returns 21 (bytes) but only 2 bytes are written ("\x01\x00"). The next bytes are unchanged.
wcsxfrm(buffer, L"a0円", 100) returns 7 (characters), the 7 characters are written but they are in range U+1010101..U+1010163, whereas the maximum character of Unicode 6.0 is U+10FFFF (U+101xxxx vs U+10xxxx).
Output of the attached program, strxfrm.c, on OpenSolaris:
-----------------------------------
strxfrm: len=21
0x01
0x00
0xff
0xff
0xff
0xff
0xff
0xff
0xff
0xff
0xff
0xff
0xff
0xff
0xff
0xff
0xff
0xff
0xff
0xff
0xff
wcsxfrm: len=7
U+1010163
U+1010101
U+1010103
U+1010101
U+1010103
U+1010101
U+1010101
-----------------------------------
I don't know if it's normal that wcsxfrm() writes characters in the range U+1010101..U+1010163.
Is Python supposed to support characters outside U+0000-U+10FFFF range? chr(0x10FFFF+1) raises a ValueError.
msg148019 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011年11月21日 00:11
New changeset 31baf1363ba1 by Victor Stinner in branch 'default':
Issue #13441: Disable temporary strxfrm() tests on Solaris
http://hg.python.org/cpython/rev/31baf1363ba1 
msg148026 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年11月21日 02:05
> Is Python supposed to support characters outside U+0000-U+10FFFF range?
If not, PyUnicode_FromUnicode(), PyUnicode_FromWideChar() and PyUnicode_FromKindAndData() should be patched to raise an error if a bigger character is encountered.
msg148027 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年11月21日 02:09
> strxfrm(buffer, "a0円", 100) returns 21 (bytes) but only 2 bytes
> are written ("\x01\x00"). The next bytes are unchanged.
Woops, it was a bug in my program. I attached the fixed version. The correct program writes:
----
strxfrm: len=21
0x01
0x01
0x63
0x01
0x01
0x01
0x01
0x01
0x03
0x01
0x01
0x01
0x01
0x01
0x03
0x01
0x01
0x01
0x01
0x01
0x01
wcsxfrm: len=7
U+1010163
U+1010101
U+1010103
U+1010101
U+1010103
U+1010101
U+1010101
----
msg148028 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011年11月21日 02:17
New changeset 78123afb3ea4 by Victor Stinner in branch 'default':
Issue #13441: Disable temporary localeconv() tests on Solaris
http://hg.python.org/cpython/rev/78123afb3ea4 
msg148034 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011年11月21日 12:14
> Is Python supposed to support characters outside U+0000-U+10FFFF range?
No, they should be rejected. Allowing them in some specific places might cause them to leak somewhere else and cause problems, so I'd rather stick with that range and reject all the chars >U+10FFFF everywhere.
msg148038 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011年11月21日 13:32
New changeset a19dad38d4e8 by Victor Stinner in branch 'default':
Issue #13441: _PyUnicode_CheckConsistency() dumps the string if the maximum
http://hg.python.org/cpython/rev/a19dad38d4e8 
msg148039 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年11月21日 13:32
> No, they should be rejected. Allowing them in some specific
> places might cause them to leak somewhere else and cause problems,
> so I'd rather stick with that range and reject all the chars
> >U+10FFFF everywhere.
That's why I added a (debug) check to reject them. I don't think that your UTF-8 encoder support such character some example. All functions assumes that the maximum character is U+10FFFF.
If they should be rejected, a solution is to modify strxfrm() to return a list of integer (of code points) instead of a string.
msg148046 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011年11月21日 14:41
New changeset d1b3b1d00811 by Victor Stinner in branch 'default':
Another temporary hack to debug the issue #13441
http://hg.python.org/cpython/rev/d1b3b1d00811 
msg148048 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年11月21日 14:44
I dumped some values to try to debug this issue. Last failure in test__locale.test_lc_numeric_basic() on localeconv():
----------------------------
[ 25/361] test_float
Decode localeconv() decimal_point: {0x2c} (len=1)
Decode localeconv() thousands_sep: {0x2e} (len=1)
Decode localeconv() int_curr_symbol: {} (len=0)
Decode localeconv() currency_symbol: {} (len=0)
Decode localeconv() mon_decimal_point: {} (len=0)
Decode localeconv() mon_thousands_sep: {} (len=0)
Decode localeconv() positive_sign: {} (len=0)
Decode localeconv() negative_sign: {} (len=0)
...
[100/361] test__locale
Decode localeconv() decimal_point: {0x2c} (len=1)
Decode localeconv() thousands_sep: {0xa0} (len=1)
Invalid Unicode string! {U+30000020} (len=1)
Fatal Python error: Aborted
----------------------------
msg148051 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011年11月21日 15:01
New changeset acda16de630c by Victor Stinner in branch 'default':
Remove temporary hacks for the issue #13441
http://hg.python.org/cpython/rev/acda16de630c 
msg148054 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年11月21日 15:10
Here is a more complete output. localeconv() fails in the hu_HU locale for the "thousands_sep" field: localeconv() returns b'\xa0' which is decoded as the wchar_t* string: {U+30000020} (len=1). This is an invalid character
-----------------------------------
[ 54/361/3] test__locale
Decode wchar_t {U+0043} (len=1)
SET LOCALE "es_UY"
SET LOCALE "fr_FR"
SET LOCALE "fi_FI"
SET LOCALE "es_CO"
SET LOCALE "pt_PT"
SET LOCALE "it_IT"
SET LOCALE "et_EE"
SET LOCALE "es_PY"
SET LOCALE "no_NO"
SET LOCALE "nl_NL"
SET LOCALE "lv_LV"
SET LOCALE "el_GR"
SET LOCALE "be_BY"
SET LOCALE "fr_BE"
SET LOCALE "ro_RO"
SET LOCALE "ru_UA"
SET LOCALE "ru_RU"
SET LOCALE "es_VE"
SET LOCALE "ca_ES"
SET LOCALE "se_NO"
SET LOCALE "es_EC"
SET LOCALE "id_ID"
SET LOCALE "ka_GE"
SET LOCALE "es_CL"
SET LOCALE "hu_HU"
SET LOCALE -> hu_HU
Decode wchar_t {U+0068 U+0075 U+005f U+0048 U+0055} (len=5)
SET LOCALE "hu_HU"
SET LOCALE -> hu_HU
Decode wchar_t {U+0068 U+0075 U+005f U+0048 U+0055} (len=5)
Decode wchar_t {U+002c} (len=1)
Decode localeconv() decimal_point: {0x2c} (len=1)
Decode wchar_t {U+002c} (len=1)
Decode localeconv() thousands_sep: {0xa0} (len=1)
Decode wchar_t {U+30000020} (len=1)
Invalid Unicode string! {U+30000020} (len=1)
Fatal Python error: Aborted
Current thread 0x00000001:
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/test__locale.py", line 105 in test_lc_numeric_basic
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/case.py", line 385 in _executeTestPart
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/case.py", line 440 in run
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/case.py", line 492 in __call__
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/suite.py", line 105 in run
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/suite.py", line 67 in __call__
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/suite.py", line 105 in run
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/suite.py", line 67 in __call__
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/runner.py", line 168 in run
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/support.py", line 1368 in _run_suite
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/support.py", line 1402 in run_unittest
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/test__locale.py", line 141 in test_main
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/regrtest.py", line 1203 in runtest_inner
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/regrtest.py", line 906 in runtest
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/regrtest.py", line 709 in main
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/__main__.py", line 13 in <module>
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/runpy.py", line 73 in _run_code
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/runpy.py", line 160 in _run_module_as_main
*** Error code 134
-----------------------------------
msg148061 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011年11月21日 17:04
New changeset d6d15fcf5eb6 by Victor Stinner in branch 'default':
Issue #13441: Reenable strxfrm() tests on Solaris
http://hg.python.org/cpython/rev/d6d15fcf5eb6 
msg148106 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011年11月22日 02:25
New changeset 6f9af4e3c1db by Victor Stinner in branch 'default':
Issue #13441: Disable temporary the check on the maximum character until
http://hg.python.org/cpython/rev/6f9af4e3c1db 
msg149014 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年12月08日 01:23
localeconv_wchar.c: test program to dump the thousands separator on a locale specified on the command line. I wrote this program to try to reproduce the hu_HU issue, but I cannot reproduce it on OpenIndiana. I only have UTF-8 locales on my OpenIndiana VM, whereas the issue looks to be specific to an ISO-8859-?? encoding (b'\xA0' is not decodable from UTF-8).
msg149022 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年12月08日 12:16
See also the issue #7442.
msg149033 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011年12月08日 13:31
localeconv_wchar.c runs fine on Ubuntu with hu_HU and fi_FI.
I tried on OpenSolaris, but I only have UTF-8 locales. The package
with ISO locales seems to be SUNWlang-cs-extra, but Oracle took down
http://pkg.opensolaris.org/release/ .
msg149058 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011年12月08日 22:41
New changeset 93bab8400ca5 by Victor Stinner in branch 'default':
Issue #13441: Log the locale when localeconv() fails
http://hg.python.org/cpython/rev/93bab8400ca5 
msg149059 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年12月08日 22:42
Changeset 489ea02ed351 changed PyUnicode_FromWideChar() and PyUnicode_FromUnicode(): raise a ValueError if a character in not in range [U+0000; U+10ffff].
test__locale errors:
======================================================================
ERROR: test_float_parsing (test.test__locale._LocaleTests)
----------------------------------------------------------------------
Traceback (most recent call last):
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/test__locale.py", line 134, in test_float_parsing
 if localeconv()['decimal_point'] != '.':
ValueError: character U+30000020 is not in range [U+0000; U+10ffff]
======================================================================
ERROR: test_lc_numeric_basic (test.test__locale._LocaleTests)
----------------------------------------------------------------------
Traceback (most recent call last):
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/test__locale.py", line 105, in test_lc_numeric_basic
 li_radixchar = localeconv()[lc]
ValueError: character U+30000020 is not in range [U+0000; U+10ffff]
======================================================================
ERROR: test_lc_numeric_localeconv (test.test__locale._LocaleTests)
----------------------------------------------------------------------
Traceback (most recent call last):
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/test__locale.py", line 91, in test_lc_numeric_localeconv
 self.numeric_tester('localeconv', localeconv()[lc], lc, loc)
ValueError: character U+30000020 is not in range [U+0000; U+10ffff]
======================================================================
ERROR: test_lc_numeric_nl_langinfo (test.test__locale._LocaleTests)
----------------------------------------------------------------------
Traceback (most recent call last):
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/test__locale.py", line 79, in test_lc_numeric_nl_langinfo
 self.numeric_tester('nl_langinfo', nl_langinfo(li), lc, loc)
ValueError: character U+30000020 is not in range [U+0000; U+10ffff]
----------------------------------------------------------------------
If the issue is specific to the hu_HU locale, a possible workaround is to skip this locale on Solaris. I changed to test to display the locale on failure.
msg149064 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011年12月09日 00:18
New changeset 87c6be1e393a by Victor Stinner in branch 'default':
Issue #13441: Don't test the hu_HU locale on Solaris to workaround a mbstowcs()
http://hg.python.org/cpython/rev/87c6be1e393a 
msg149081 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011年12月09日 09:28
New changeset 2a2d0872d993 by Victor Stinner in branch 'default':
Issue #13441: Skip some locales (e.g. cs_CZ and hu_HU) on Solaris to workaround
http://hg.python.org/cpython/rev/2a2d0872d993 
msg149085 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011年12月09日 10:29
New changeset 7ffe3d304487 by Victor Stinner in branch 'default':
Issue #13441: Enable the workaround for Solaris locale bug
http://hg.python.org/cpython/rev/7ffe3d304487 
msg149086 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年12月09日 10:34
I collected the locale list triggering the mbstowcs() bug thanks my previous commit:
 * hu_HU (ISO8859-2): character U+30000020
 * de_AT (ISO8859-1): character U+30000076
 * cs_CZ (ISO8859-2): character U+30000020
 * sk_SK (ISO8859-2): character U+30000020
 * pl_PL (ISO8859-2): character U+30000020
 * fr_CA (ISO8859-1): character U+30000020
Hum, the bug occurs maybe on all locales... I suppose that all "xx_XX" locales use an encoding different than UTF-8 and that the bug is specific to encodings different than UTF-8.
I don't understand why locale.strxfrm('à') doesn't crash anymore.
msg149091 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年12月09日 13:21
The Solaris buildbot is green, let's close it. I didn't report the bug upstream. Feel free to report it to Oracle!
History
Date User Action Args
2022年04月11日 14:57:24adminsetgithub: 57650
2012年10月17日 14:35:41jceasetnosy: + jcea
superseder: test_local.TestEnUSCollection failures on Solaris 10
2011年12月09日 13:21:17vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg149091
2011年12月09日 10:34:18vstinnersetmessages: + msg149086
2011年12月09日 10:29:16python-devsetmessages: + msg149085
2011年12月09日 09:28:41python-devsetmessages: + msg149081
2011年12月09日 00:18:20python-devsetmessages: + msg149064
2011年12月08日 22:42:10vstinnersetmessages: + msg149059
2011年12月08日 22:41:00python-devsetmessages: + msg149058
2011年12月08日 13:31:31skrahsetmessages: + msg149033
2011年12月08日 12:16:17vstinnersetmessages: + msg149022
2011年12月08日 10:20:00skrahsetnosy: + skrah
2011年12月08日 01:23:28vstinnersetfiles: + localeconv_wchar.c

messages: + msg149014
2011年11月22日 02:25:52python-devsetmessages: + msg148106
2011年11月21日 17:04:19python-devsetmessages: + msg148061
2011年11月21日 15:12:05pitrousetnosy: - pitrou
2011年11月21日 15:10:15vstinnersetmessages: + msg148054
2011年11月21日 15:01:15python-devsetmessages: + msg148051
2011年11月21日 14:44:12vstinnersetmessages: + msg148048
2011年11月21日 14:41:04python-devsetmessages: + msg148046
2011年11月21日 13:32:32vstinnersetmessages: + msg148039
2011年11月21日 13:32:04python-devsetmessages: + msg148038
2011年11月21日 12:14:39ezio.melottisetmessages: + msg148034
2011年11月21日 02:17:06python-devsetmessages: + msg148028
2011年11月21日 02:09:27vstinnersetfiles: - strxfrm.c
2011年11月21日 02:09:16vstinnersetfiles: + strxfrm.c

messages: + msg148027
2011年11月21日 02:05:09vstinnersetmessages: + msg148026
2011年11月21日 00:11:53python-devsetnosy: + python-dev
messages: + msg148019
2011年11月20日 23:58:15vstinnercreate

AltStyle によって変換されたページ (->オリジナル) /