Message138244
| Author |
vstinner |
| Recipients |
amaury.forgeotdarc, loewis, ocean-city, vstinner |
| Date |
2011年06月13日.13:52:52 |
| SpamBayes Score |
1.42018e-06 |
| Marked as misclassified |
No |
| Message-id |
<1307973176.03.0.353214642985.issue12281@psf.upfronthosting.co.za> |
| In-reply-to |
| Content |
Patch version 3:
- add unit tests for code pages 932, 1252, CP_UTF7 and CP_UTF8
- fix encode/decode flags for CP_UTF7/CP_UTF8
- fix encode name on UnicodeDecodeError, support also "CP_UTF7" and "CP_UTF8" code page names
TODO:
- The decoder (with errors) doesn't support multibyte characters, e.g. b"\xC3\xA9\xFF" is not correctly decoded using "replace" (insize is fixed to 1)
- The encoder doesn't support surrogate pairs, but the result with UTF-8 looks correct
- UTF-7 decoder is not strict, e.g. b'[+/]' is decoded to '[]' in strict mode
- UTF-8 encoder is not strict, e.g. replace surrogates by U+FFFD
- Use final in decode_mbcs_errors(): a multibyte character may be splitted between two chunks of INT_MAX bytes
- Implement suggested Martin's optimizations? |
|
History
|
|---|
| Date |
User |
Action |
Args |
| 2011年06月13日 13:52:56 | vstinner | set | recipients:
+ vstinner, loewis, amaury.forgeotdarc, ocean-city |
| 2011年06月13日 13:52:56 | vstinner | set | messageid: <1307973176.03.0.353214642985.issue12281@psf.upfronthosting.co.za> |
| 2011年06月13日 13:52:55 | vstinner | link | issue12281 messages |
| 2011年06月13日 13:52:55 | vstinner | create |
|