Message148017
| Author |
vstinner |
| Recipients |
ezio.melotti, loewis, pitrou, vstinner |
| Date |
2011年11月20日.23:58:14 |
| SpamBayes Score |
1.8047368e-09 |
| Marked as misclassified |
No |
| Message-id |
<1321833496.0.0.544021070577.issue13441@psf.upfronthosting.co.za> |
| In-reply-to |
| Content |
I added a test in _PyUnicode_CheckConsistency() (in debug mode) to ensure that all characters of a string are in the range U+0000-U+10FFFF. Locale tests are now failing on Solaris:
-----------------------------------
[ 28/361] test__locale
Assertion failed: maxchar <= 0x10FFFF, file Objects/unicodeobject.c, line 408
Fatal Python error: Aborted
Current thread 0x00000001:
File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/test__locale.py", line 134 in test_float_parsing
File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/case.py", line 385 in _executeTestPart
File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/case.py", line 440 in run
File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/case.py", line 492 in __call__
File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/suite.py", line 105 in run
File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/suite.py", line 67 in __call__
File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/suite.py", line 105 in run
File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/suite.py", line 67 in __call__
File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/runner.py", line 168 in run
File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/support.py", line 1368 in _run_suite
File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/support.py", line 1402 in run_unittest
File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/test__locale.py", line 139 in test_main
File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/regrtest.py", line 1203 in runtest_inner
File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/regrtest.py", line 906 in runtest
File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/regrtest.py", line 709 in main
File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/__main__.py", line 13 in <module>
File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/runpy.py", line 73 in _run_code
File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/runpy.py", line 160 in _run_module_as_main
*** Error code 134
-----------------------------------
The problem is that strxfrm() and wcsxfrm() return strange results for the string "a" and the english locale (e.g. en_US.UTF-8).
strxfrm(buffer, "a0円", 100) returns 21 (bytes) but only 2 bytes are written ("\x01\x00"). The next bytes are unchanged.
wcsxfrm(buffer, L"a0円", 100) returns 7 (characters), the 7 characters are written but they are in range U+1010101..U+1010163, whereas the maximum character of Unicode 6.0 is U+10FFFF (U+101xxxx vs U+10xxxx).
Output of the attached program, strxfrm.c, on OpenSolaris:
-----------------------------------
strxfrm: len=21
0x01
0x00
0xff
0xff
0xff
0xff
0xff
0xff
0xff
0xff
0xff
0xff
0xff
0xff
0xff
0xff
0xff
0xff
0xff
0xff
0xff
wcsxfrm: len=7
U+1010163
U+1010101
U+1010103
U+1010101
U+1010103
U+1010101
U+1010101
-----------------------------------
I don't know if it's normal that wcsxfrm() writes characters in the range U+1010101..U+1010163.
Is Python supposed to support characters outside U+0000-U+10FFFF range? chr(0x10FFFF+1) raises a ValueError. |
|
History
|
|---|
| Date |
User |
Action |
Args |
| 2011年11月20日 23:58:16 | vstinner | set | recipients:
+ vstinner, loewis, pitrou, ezio.melotti |
| 2011年11月20日 23:58:16 | vstinner | set | messageid: <1321833496.0.0.544021070577.issue13441@psf.upfronthosting.co.za> |
| 2011年11月20日 23:58:15 | vstinner | link | issue13441 messages |
| 2011年11月20日 23:58:14 | vstinner | create |
|