Message93274
| Author |
lemburg |
| Recipients |
christoph, ezio.melotti, gvanrossum, lemburg, markon, nickd, nnorwitz, pitrou, r.david.murray, rhettinger, twb |
| Date |
2009年09月29日.10:40:54 |
| SpamBayes Score |
5.7325438e-09 |
| Marked as misclassified |
No |
| Message-id |
<4AC1E435.9030908@egenix.com> |
| In-reply-to |
<1254219647.05.0.244296326279.issue7008@psf.upfronthosting.co.za> |
| Content |
Christoph Burgmer wrote:
>
> Christoph Burgmer <cburgmer@ira.uka.de> added the comment:
>
> I admit I don't fully understand the semantics of capwords().
string.capwords() is an old function from the days before Unicode.
The function is basically defined by its implementation.
> But from
> what I believe what it should do, this function could be happily
> replaced by the word-breaking algorithm as defined in
> http://www.unicode.org/reports/tr29/.
>
> This algorithm should be implemented anyway, to properly solve
> issue6412.
Simple word breaking would be nice to have in Python as new
Unicode method, e.g. .splitwords().
Note however, that word boundaries are just as complicated as casing:
there are lots of special cases in different languages or locales
(see the notes after the word boundary rules in the TR29). |
|
History
|
|---|
| Date |
User |
Action |
Args |
| 2009年09月29日 10:40:56 | lemburg | set | recipients:
+ lemburg, gvanrossum, nnorwitz, rhettinger, pitrou, christoph, ezio.melotti, r.david.murray, markon, twb, nickd |
| 2009年09月29日 10:40:54 | lemburg | link | issue7008 messages |
| 2009年09月29日 10:40:54 | lemburg | create |
|