Message135772
| Author |
vstinner |
| Recipients |
belopolsky, ezio.melotti, georg.brandl, lemburg, moese, phr, vstinner |
| Date |
2011年05月11日.12:55:37 |
| SpamBayes Score |
0.0006877316 |
| Marked as misclassified |
No |
| Message-id |
<1305118551.23.0.106349254941.issue2857@psf.upfronthosting.co.za> |
| In-reply-to |
| Content |
utf_8_java.patch: Implement "utf-8-java" encoding.
* It has no alias
* 'a0円b'.encode('utf-8-java') returns b'a\xc0\x80b'
* b'a\xc0\x80b'.decode('utf-8-java') returns 'a\x00b'
* I added some tests to utf-8 codec (test_invalid, test_null_byte)
* I added many tests for utf-8-java codec
* I choosed to copy utf8_code_length as utf8java_code_length instead of adding some if to not slow down UTF-8 codec
* Decoder: 2 byte sequences may be *a little bit* slower for UTF-8:
"if ((s[1] & 0xc0) != 0x80)"
is replaced by
"if ((ch <= 0x007F && (ch != 0x0000 || !java)) || ch > 0x07FF)"
* Encoder: encode chars in U+0000-U+007F may be *a little bit* slower for UTF-8: I added (ch == 0x00 && java) test
For the doc, I just added a line "utf-8-java" in the codec list, but I did not add a paragraph to explain how this codec is different to utf-8. Does anyone have a suggestion? |
|