[Python-Dev] len(chr(i)) = 2?
Victor Stinner
victor.stinner at haypocalc.com
Thu Nov 25 22:39:00 CET 2010
On Friday 19 November 2010 23:25:03 you wrote:
> > Python is unclear about non-BMP characters: narrow build was called
> > "ucs2" for long time, even if it is UTF-16 (each character is encoded to
> > one or two UTF-16 words).
>> No, no, no :-)
>> UCS2 and UCS4 are more appropriate than "narrow" and "wide" or even
> "UTF-16" and "UTF-32".
Ok for Python 2:
$ ./python
Python 2.7.0+ (release27-maint:84618M, Sep 8 2010, 12:43:49)
>>> import sys; sys.maxunicode
65535
>>> x=u'\U0010ffff'; len(x)
2
>>> ord(x)
...
TypeError: ord() expected a character, but string of length 2 found
But Python 3 does use UTF-16 for narrow build:
$ ./python
Python 3.2a3+ (py3k:86396:86399M, Nov 10 2010, 15:24:09)
>>> import sys; sys.maxunicode
65535
>>> c=chr(0x10ffff); len(c)
2
>>> ord(c)
1114111
Victor
More information about the Python-Dev
mailing list