Message169283
| Author |
serhiy.storchaka |
| Recipients |
Brian.Merrell, belopolsky, ezio.melotti, merrellb, petri.lehtinen, pitrou, rhettinger, serhiy.storchaka, tchrist, vstinner |
| Date |
2012年08月28日.13:58:28 |
| SpamBayes Score |
-1.0 |
| Marked as misclassified |
Yes |
| Message-id |
<1346162309.53.0.965792304651.issue11489@psf.upfronthosting.co.za> |
| In-reply-to |
| Content |
> It's Unicode that considers unpaired surrogates invalid, not UTF-8 by itself.
It's UTF-8 too. See RFC 3629:
The definition of UTF-8 prohibits encoding character numbers between
U+D800 and U+DFFF, which are reserved for use with the UTF-16
encoding form (as surrogate pairs) and do not directly represent
characters. When encoding in UTF-8 from UTF-16 data, it is necessary
to first decode the UTF-16 data to obtain character numbers, which
are then encoded in UTF-8 as described above. |
|