Message298322
| Author |
Guillaume Sanchez |
| Recipients |
Arfrever, Guillaume Sanchez, Nicholas.Cole, benjamin.peterson, eric.araujo, ezio.melotti, inigoserna, lemburg, loewis, poq, r.david.murray, serhiy.storchaka, tchrist, terry.reedy, vstinner, zeha |
| Date |
2017年07月13日.23:47:06 |
| SpamBayes Score |
-1.0 |
| Marked as misclassified |
Yes |
| Message-id |
<1499989626.46.0.535744055477.issue12568@psf.upfronthosting.co.za> |
| In-reply-to |
| Content |
Hello,
I come from bugs.python.org/issue30717 . I have a pending PR that needs review ( https://github.com/python/cpython/pull/2673 ) adding a function that breaks unicode strings into grapheme clusters (aka what one would intuitively call "a character"). It's based on the grapheme cluster breaking algorithm from TR29.
Let me know if this is of any relevance.
Quick demo:
>>> a=unicodedata.break_graphemes("lol")
>>> list(a)
['l', 'o', 'l']
>>> list(unicodedata.break_graphemes("lo\u0309l"))
['l', 'ỏ', 'l']
>>> list(unicodedata.break_graphemes("lo\u0309\u0301l"))
['l', 'ỏ́', 'l']
>>> list(unicodedata.break_graphemes("lo\u0301l"))
['l', 'ó', 'l']
>>> list(unicodedata.break_graphemes(""))
[] |
|
History
|
|---|
| Date |
User |
Action |
Args |
| 2017年07月13日 23:47:06 | Guillaume Sanchez | set | recipients:
+ Guillaume Sanchez, lemburg, loewis, terry.reedy, vstinner, benjamin.peterson, ezio.melotti, eric.araujo, Arfrever, r.david.murray, inigoserna, zeha, poq, Nicholas.Cole, tchrist, serhiy.storchaka |
| 2017年07月13日 23:47:06 | Guillaume Sanchez | set | messageid: <1499989626.46.0.535744055477.issue12568@psf.upfronthosting.co.za> |
| 2017年07月13日 23:47:06 | Guillaume Sanchez | link | issue12568 messages |
| 2017年07月13日 23:47:06 | Guillaume Sanchez | create |
|