Message195886
| Author |
vstinner |
| Recipients |
Arfrever, a.badger, abadger1999, benjamin.peterson, ezio.melotti, lemburg, ncoghlan, pitrou, r.david.murray, serhiy.storchaka, vstinner |
| Date |
2013年08月22日.13:18:23 |
| SpamBayes Score |
-1.0 |
| Marked as misclassified |
Yes |
| Message-id |
<1377177503.29.0.600213251678.issue18713@psf.upfronthosting.co.za> |
| In-reply-to |
| Content |
> The surrogateescape error handler is dangerous with utf-16/32. It can produce globally invalid output.
I don't understand, can you give an example? surrogateescape generate invalid encoded string with any encoding. Example with UTF-8:
>>> b"a\xffb".decode("utf-8", "surrogateescape")
'a\udcffb'
>>> 'a\udcffb'.encode("utf-8", "surrogateescape")
b'a\xffb'
>>> b'a\xffb'.decode("utf-8")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 1: invalid start byte
So str.encode("utf-8", "surrogateescape") produces an invalid UTF-8 sequence. |
|
History
|
|---|
| Date |
User |
Action |
Args |
| 2013年08月22日 13:18:23 | vstinner | set | recipients:
+ vstinner, lemburg, ncoghlan, pitrou, abadger1999, benjamin.peterson, ezio.melotti, a.badger, Arfrever, r.david.murray, serhiy.storchaka |
| 2013年08月22日 13:18:23 | vstinner | set | messageid: <1377177503.29.0.600213251678.issue18713@psf.upfronthosting.co.za> |
| 2013年08月22日 13:18:23 | vstinner | link | issue18713 messages |
| 2013年08月22日 13:18:23 | vstinner | create |
|