Message143310
| Author |
ezio.melotti |
| Recipients |
akuchling, belopolsky, eric.araujo, ezio.melotti, georg.brandl, rhettinger, terry.reedy |
| Date |
2011年09月01日.08:04:08 |
| SpamBayes Score |
2.8812983e-07 |
| Marked as misclassified |
No |
| Message-id |
<1314864252.85.0.599694103576.issue4153@psf.upfronthosting.co.za> |
| In-reply-to |
| Content |
After the recent discussions on python-dev I went through the Unicode howto and fixed a few things, then I found this issue so I'm attaching the patch here.
The patch addresses mostly markup issues, but it also removes the usage of 'byte string'.
A few more things that should be done:
* clarify some more terms (e.g. codepoints, code units, characters, possibly scalar values etc.);
* mention the differences between narrow and wide builds, including:
- a discussion about the UCS-2/UTF-16 implementation of narrow builds;
- something about surrogates and surrogate pairs;
- effects of slicing and indexing on narrow builds;
- functions/methods that (don't) accept non-BMP chars on narrow builds;
* something about Unicode supports in the re module (this probably can wait after the 'regex' inclusion).
Also the codecs doc has a section about Unicode and encodings that might be moved to the howto. |
|