Message248392
| Author |
serhiy.storchaka |
| Recipients |
ezio.melotti, gvanrossum, kennyluck, lemburg, loewis, mjpieters, pitrou, python-dev, serhiy.storchaka, tchrist, vstinner |
| Date |
2015年08月11日.06:04:30 |
| SpamBayes Score |
-1.0 |
| Marked as misclassified |
Yes |
| Message-id |
<1439273071.36.0.963320980607.issue12892@psf.upfronthosting.co.za> |
| In-reply-to |
| Content |
There are two causes:
1. UTF-16 and UTF-32 are based on 2- and 4-bytes units. If the surrogateescape error handler will support UTF-16 and UTF-32, encoding could produce the data that can't be decoded back correctly. For example '\udcac \udcac' -> b'\xac\x20\x00\xac' -> '\u20ac\uac20' == '€가'.
2. ASCII bytes (0x00-0x80) can't be escaped with surrogateescape. UTF-16 and UTF-32 data can contain illegal ASCII bytes (b'\xD8\x00' in UTF-16-BE, b'abcd' in UTF-32). For the same reason surrogateescape is not compatible with UTF-7 and CP037. |
|
History
|
|---|
| Date |
User |
Action |
Args |
| 2015年08月11日 06:04:31 | serhiy.storchaka | set | recipients:
+ serhiy.storchaka, lemburg, gvanrossum, loewis, mjpieters, pitrou, vstinner, ezio.melotti, python-dev, tchrist, kennyluck |
| 2015年08月11日 06:04:31 | serhiy.storchaka | set | messageid: <1439273071.36.0.963320980607.issue12892@psf.upfronthosting.co.za> |
| 2015年08月11日 06:04:31 | serhiy.storchaka | link | issue12892 messages |
| 2015年08月11日 06:04:30 | serhiy.storchaka | create |
|