This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
| Author | Rhamphoryncus |
|---|---|
| Recipients | Rhamphoryncus, ezio.melotti, lemburg |
| Date | 2008年07月12日.19:03:49 |
| SpamBayes Score | 0.012099987 |
| Marked as misclassified | No |
| Message-id | <1215889432.77.0.572868807141.issue3297@psf.upfronthosting.co.za> |
| In-reply-to |
| Content | |
|---|---|
Marc, perhaps Unicode has refined their definitions since you last looked? Valid UTF-8 *cannot* contain surrogates[1]. If it does, you have CESU-8[2][3], not UTF-8. So there are two bugs: first, the UTF-8 codec should refuse to load surrogates. Second, since the original bug showed up before the .pyc is created, something in the parse/compilation/whatever stage is producing CESU-8. [1] 4th bullet point of D92 in http://www.unicode.org/versions/Unicode5.0.0/ch03.pdf [2] http://unicode.org/reports/tr26/ [3] http://en.wikipedia.org/wiki/CESU-8 |
|
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2008年07月12日 19:03:53 | Rhamphoryncus | set | spambayes_score: 0.0121 -> 0.012099987 recipients: + Rhamphoryncus, lemburg, ezio.melotti |
| 2008年07月12日 19:03:52 | Rhamphoryncus | set | spambayes_score: 0.0121 -> 0.0121 messageid: <1215889432.77.0.572868807141.issue3297@psf.upfronthosting.co.za> |
| 2008年07月12日 19:03:50 | Rhamphoryncus | link | issue3297 messages |
| 2008年07月12日 19:03:49 | Rhamphoryncus | create | |