Message105656
| Author |
vstinner |
| Recipients |
stutzbach, theller, vstinner |
| Date |
2010年05月13日.21:10:19 |
| SpamBayes Score |
1.0319214e-06 |
| Marked as misclassified |
No |
| Message-id |
<1273785022.02.0.930366439933.issue8670@psf.upfronthosting.co.za> |
| In-reply-to |
| Content |
Support of characters outside the Unicode BMP (code > 0xffff) is not complete in narrow build (sizeof(Py_UNICODE) == 2) for Python2:
$ ./python
Python 2.7b2+ (trunk:81139M, May 13 2010, 18:45:37)
>>> x=u'\U00010000'
>>> x[0], x[1]
(u'\ud800', u'\udc00')
>>> len(x)
2
>>> ord(x)
Traceback (most recent call last):
...
TypeError: ord() expected a character, but string of length 2 found
>>> unichr(0x10000)
Traceback (most recent call last):
...
ValueError: unichr() arg not in range(0x10000) (narrow Python build)
It looks better in Python3:
$ ./python
Python 3.2a0 (py3k:81137:81138, May 13 2010, 18:50:51)
>>> x='\U00010000'
>>> x[0], x[1]
('\ud800', '\udc00')
>>> len(x)
2
>>> ord(x)
65536
>>> chr(0x10000)
'\U00010000'
About the issue, the problem is in function u_set(). This function should use PyUnicode_AsWideChar() but PyUnicode_AsWideChar() doesn't support surrogates... whereas PyUnicode_FromWideChar() does support surrogates. |
|
History
|
|---|
| Date |
User |
Action |
Args |
| 2010年05月13日 21:10:22 | vstinner | set | recipients:
+ vstinner, theller, stutzbach |
| 2010年05月13日 21:10:22 | vstinner | set | messageid: <1273785022.02.0.930366439933.issue8670@psf.upfronthosting.co.za> |
| 2010年05月13日 21:10:20 | vstinner | link | issue8670 messages |
| 2010年05月13日 21:10:20 | vstinner | create |
|