Message 124864 - Python tracker

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	belopolsky
Recipients	Rhamphoryncus, amaury.forgeotdarc, belopolsky, doerwalter, eric.smith, ezio.melotti, georg.brandl, lemburg, loewis, pitrou, rhettinger, stutzbach, vstinner
Date	2010年12月29日.18:31:28
SpamBayes Score	1.110223e-16
Marked as misclassified	No
Message-id	<AANLkTi=-AX41M4RRv4K047obyrF0NXqLGhnLpXCzdi-S@mail.gmail.com>
In-reply-to	<4CF18525.1050103@egenix.com>

Content
On Sat, Nov 27, 2010 at 5:24 PM, Marc-Andre Lemburg <report@bugs.python.org> wrote: .. > Perhaps we should allow ord() to work on surrogates in > UCS4 builds as well. That would reduce the number of > surprises. > This is an interesting idea, however, having surrogates in UCS4 builds will sooner or later lead to surprises such as Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'utf-8' codec can't encode character '\ud800' in position 0: surrogates not allowed I though UCS4 (or more properly, UTF-32) did not allow encoding of surrogate code points. It is somewhat bothersome that a valid string literal such as '\uD800\uDC00' in narrow build is subtly invalid in wide build. It would probably be better if '\uD800\uDC00' was either rejected on a wide build, or interpreted as a single character so that True on any build.

Content

On Sat, Nov 27, 2010 at 5:24 PM, Marc-Andre Lemburg
<report@bugs.python.org> wrote:
..
> Perhaps we should allow ord() to work on surrogates in
> UCS4 builds as well. That would reduce the number of
> surprises.
>
This is an interesting idea, however, having surrogates in UCS4 builds
will sooner or later lead to surprises such as
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'utf-8' codec can't encode character '\ud800' in
position 0: surrogates not allowed
I though UCS4 (or more properly, UTF-32) did not allow encoding of
surrogate code points.
It is somewhat bothersome that a valid string literal such as
'\uD800\uDC00' in narrow build is subtly invalid in wide build. It
would probably be better if '\uD800\uDC00' was either rejected on a
wide build, or interpreted as a single character so that
True
on any build.

History
Date	User	Action	Args
2010年12月29日 18:31:30	belopolsky	set	recipients: + belopolsky, lemburg, loewis, doerwalter, georg.brandl, rhettinger, amaury.forgeotdarc, Rhamphoryncus, pitrou, vstinner, eric.smith, stutzbach, ezio.melotti
2010年12月29日 18:31:28	belopolsky	link	issue10542 messages
2010年12月29日 18:31:28	belopolsky	create

homepage