Message172558
| Author |
doerwalter |
| Recipients |
Marcus.Gröber, doerwalter, ezio.melotti, lovelylain, serhiy.storchaka, vstinner |
| Date |
2012年10月10日.09:49:29 |
| SpamBayes Score |
-1.0 |
| Marked as misclassified |
Yes |
| Message-id |
<1349862569.84.0.138655370126.issue15278@psf.upfronthosting.co.za> |
| In-reply-to |
| Content |
> >>> codecs.utf_8_decode('\u20ac'.encode('utf8')[:2])
> ('', 0)
>
> Oh... codecs.CODEC_decode are incremental decoders? I misunderstood completly this.
No, those function are not decoders, they're just helper functions used to implement the real incremental decoders. That's why they're undocumented.
Whether codecs.utf_8_decode() returns partial results or raises an exception depends on the final argument::
>>> s = '\u20ac'.encode('utf8')[:2]
>>> codecs.utf_8_decode(s, 'strict')
('', 0)
>>> codecs.utf_8_decode(s, 'strict', False)
('', 0)
>>> codecs.utf_8_decode(s, 'strict', True)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 0-1: unexpected end of data
If you look at encodings/utf_8.py you see that the stateless decoder call codecs.utf_8_decode() with final==True::
def decode(input, errors='strict'):
return codecs.utf_8_decode(input, errors, True)
so the stateless decoder *will* raise exceptions for partial results. The incremental decoder simply passed on the final argument given to its encode() method. |
|