Message141938
| Author |
tchrist |
| Recipients |
belopolsky, ezio.melotti, georg.brandl, lemburg, moese, phr, tchrist, vstinner |
| Date |
2011年08月12日.02:41:15 |
| SpamBayes Score |
1.650485e-09 |
| Marked as misclassified |
No |
| Message-id |
<1313116876.94.0.050147310014.issue2857@psf.upfronthosting.co.za> |
| In-reply-to |
| Content |
Please do not call this "utf-8-java". It is called "cesu-8" per UTS#18 at:
http://unicode.org/reports/tr26/
CESU-8 is *not* a a valid Unicode Transform Format and should not be called UTF-8. It is a real pain in the butt, caused by people who misunderand Unicode mis-encoding UCS-2 into UTF-8, screwing it up. I understand the need to be able to read it, but call it what it is, please.
Despite the talk about Lucene, I note that the Perl port of Lucene uses real UTF-8, not CESU-8. |
|