Message 169283 - Python tracker

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

In-reply-to
Author	serhiy.storchaka
Recipients	Brian.Merrell, belopolsky, ezio.melotti, merrellb, petri.lehtinen, pitrou, rhettinger, serhiy.storchaka, tchrist, vstinner
Date	2012年08月28日.13:58:28
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1346162309.53.0.965792304651.issue11489@psf.upfronthosting.co.za>

Content
> It's Unicode that considers unpaired surrogates invalid, not UTF-8 by itself. It's UTF-8 too. See RFC 3629: The definition of UTF-8 prohibits encoding character numbers between U+D800 and U+DFFF, which are reserved for use with the UTF-16 encoding form (as surrogate pairs) and do not directly represent characters. When encoding in UTF-8 from UTF-16 data, it is necessary to first decode the UTF-16 data to obtain character numbers, which are then encoded in UTF-8 as described above.

Content

> It's Unicode that considers unpaired surrogates invalid, not UTF-8 by itself.
It's UTF-8 too. See RFC 3629:
 The definition of UTF-8 prohibits encoding character numbers between
 U+D800 and U+DFFF, which are reserved for use with the UTF-16
 encoding form (as surrogate pairs) and do not directly represent
 characters. When encoding in UTF-8 from UTF-16 data, it is necessary
 to first decode the UTF-16 data to obtain character numbers, which
 are then encoded in UTF-8 as described above.

History
Date	User	Action	Args
2012年08月28日 13:58:29	serhiy.storchaka	set	recipients: + serhiy.storchaka, rhettinger, belopolsky, pitrou, vstinner, ezio.melotti, merrellb, Brian.Merrell, petri.lehtinen, tchrist
2012年08月28日 13:58:29	serhiy.storchaka	set	messageid: <1346162309.53.0.965792304651.issue11489@psf.upfronthosting.co.za>
2012年08月28日 13:58:29	serhiy.storchaka	link	issue11489 messages
2012年08月28日 13:58:28	serhiy.storchaka	create

homepage