Message148603
| Author |
vstinner |
| Recipients |
ezio.melotti, gvanrossum, lemburg, loewis, tchrist, vstinner |
| Date |
2011年11月29日.20:42:29 |
| SpamBayes Score |
0.0002851445 |
| Marked as misclassified |
No |
| Message-id |
<1322599350.13.0.163750536411.issue12892@psf.upfronthosting.co.za> |
| In-reply-to |
| Content |
Python 3.3 has a strange behaviour:
>>> '\uDBFF\uDFFF'.encode('utf-16-le').decode('utf-16-le')
'\U0010ffff'
>>> '\U0010ffff'.encode('utf-16-le').decode('utf-16-le')
'\U0010ffff'
I would expect text.decode(encoding).encode(encoding)==text or an encode or decode error.
So I agree that the encoder should reject lone surogates. |
|