Message102209
| Author |
ezio.melotti |
| Recipients |
dangra, ezio.melotti, lemburg, sjmachin |
| Date |
2010年04月02日.22:27:14 |
| SpamBayes Score |
7.2867296e-10 |
| Marked as misclassified |
No |
| Message-id |
<1270247238.98.0.791005996157.issue8271@psf.upfronthosting.co.za> |
| In-reply-to |
| Content |
Here's a new patch. Should be complete but I want to test it some more before committing.
I decided to follow RFC 3629, putting 0 instead of 5/6 for bytes in range F5-FD (we can always put them back in the unlikely case that the Unicode Consortium changed its mind) and also for other invalid ranges (e.g. C0-C1). This lead to some simplification in the code.
I also found out that, according to RFC 3629, surrogates are considered invalid and they can't be encoded/decoded, but the UTF-8 codec actually does it. I included tests and fix but I left them commented out because this is out of the scope of this patch, and it probably need a discussion on python-dev. |
|
History
|
|---|
| Date |
User |
Action |
Args |
| 2010年04月02日 22:27:19 | ezio.melotti | set | recipients:
+ ezio.melotti, lemburg, sjmachin, dangra |
| 2010年04月02日 22:27:18 | ezio.melotti | set | messageid: <1270247238.98.0.791005996157.issue8271@psf.upfronthosting.co.za> |
| 2010年04月02日 22:27:17 | ezio.melotti | link | issue8271 messages |
| 2010年04月02日 22:27:17 | ezio.melotti | create |
|