Message81053
| Author |
lemburg |
| Recipients |
amaury.forgeotdarc, bupjae, ezio.melotti, lemburg, vstinner |
| Date |
2009年02月03日.13:18:19 |
| SpamBayes Score |
2.628342e-12 |
| Marked as misclassified |
No |
| Message-id |
<4988441A.1010607@egenix.com> |
| In-reply-to |
<200902031413.56264.victor.stinner@haypocalc.com> |
| Content |
On 2009年02月03日 14:14, STINNER Victor wrote:
> STINNER Victor <victor.stinner@haypocalc.com> added the comment:
>
> amaury> Since r56395, ord() and chr() accept and return surrogate pairs
> amaury> even in narrow builds.
>
> Note: My examples are made with Python 2.x.
>
>> The goal is to remove most differences between narrow and wide unicode
>> builds (except for string lengths, indices or slices)
>
> It would be nice to get the same behaviour in Python 2.x and 3.x to help
> migration from Python2 to Python3 ;-)
>
> unichr() (in Python 2.x) documentation is correct. But I would approciate to
> support surrogates using unichr() which means also changing ord() behaviour.
This is not possible for unichr() in Python 2.x, since applications
always expect len(unichr(x)) == 1.
Changing ord() would be possible in Python 2.x is easier, since
this would only extend the range of returned values for UCS2
builds.
>> To address this problem, I suggest to change all functions in
>> unicodectype.c so that they accept Py_UCS4 characters (instead of
>> Py_UNICODE).
>
> Why? Using surrogates, you can use 16-bits Py_UNICODE to store non-BMP
> characters (code > 0xffff).
>
> --
>
> I can open a new issue if you agree that we can change unichr() / ord()
> behaviour on narrow build. We may ask on the mailing list? |
|