Message161054
| Author |
serhiy.storchaka |
| Recipients |
serhiy.storchaka |
| Date |
2012年05月18日.14:46:35 |
| SpamBayes Score |
-1.0 |
| Marked as misclassified |
Yes |
| Message-id |
<1337352396.54.0.468921009595.issue14850@psf.upfronthosting.co.za> |
| In-reply-to |
| Content |
codecs.charmap_decode behaves differently with native and user string as decode table.
>>> import codecs
>>> print(ascii(codecs.charmap_decode(b'\x00', 'replace', '\uFFFE')))
('\ufffd', 1)
>>> class S(str): pass
...
>>> print(ascii(codecs.charmap_decode(b'\x00', 'replace', S('\uFFFE'))))
('\ufffe', 1)
It's because charmap decoder (function PyUnicode_DecodeCharmap in Objects/unicodeobject.c) uses different algorithms for exact strings and for other.
We need to fix it? If yes, what should return `codecs.charmap_decode(b'\x00', 'replace', {0:'\uFFFE'})`? What should return `codecs.charmap_decode(b'\x00', 'replace', {0:0xFFFE})`? |
|
History
|
|---|
| Date |
User |
Action |
Args |
| 2012年05月18日 14:46:36 | serhiy.storchaka | set | recipients:
+ serhiy.storchaka |
| 2012年05月18日 14:46:36 | serhiy.storchaka | set | messageid: <1337352396.54.0.468921009595.issue14850@psf.upfronthosting.co.za> |
| 2012年05月18日 14:46:35 | serhiy.storchaka | link | issue14850 messages |
| 2012年05月18日 14:46:35 | serhiy.storchaka | create |
|