homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author Guillaume Sanchez
Recipients Arfrever, Guillaume Sanchez, Nicholas.Cole, benjamin.peterson, eric.araujo, ezio.melotti, inigoserna, lemburg, loewis, poq, r.david.murray, serhiy.storchaka, tchrist, terry.reedy, vstinner, zeha
Date 2017年07月13日.23:47:06
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1499989626.46.0.535744055477.issue12568@psf.upfronthosting.co.za>
In-reply-to
Content
Hello,
I come from bugs.python.org/issue30717 . I have a pending PR that needs review ( https://github.com/python/cpython/pull/2673 ) adding a function that breaks unicode strings into grapheme clusters (aka what one would intuitively call "a character"). It's based on the grapheme cluster breaking algorithm from TR29.
Let me know if this is of any relevance.
Quick demo:
>>> a=unicodedata.break_graphemes("lol")
>>> list(a)
['l', 'o', 'l']
>>> list(unicodedata.break_graphemes("lo\u0309l"))
['l', 'ỏ', 'l']
>>> list(unicodedata.break_graphemes("lo\u0309\u0301l"))
['l', 'ỏ́', 'l']
>>> list(unicodedata.break_graphemes("lo\u0301l"))
['l', 'ó', 'l']
>>> list(unicodedata.break_graphemes(""))
[]
History
Date User Action Args
2017年07月13日 23:47:06Guillaume Sanchezsetrecipients: + Guillaume Sanchez, lemburg, loewis, terry.reedy, vstinner, benjamin.peterson, ezio.melotti, eric.araujo, Arfrever, r.david.murray, inigoserna, zeha, poq, Nicholas.Cole, tchrist, serhiy.storchaka
2017年07月13日 23:47:06Guillaume Sanchezsetmessageid: <1499989626.46.0.535744055477.issue12568@psf.upfronthosting.co.za>
2017年07月13日 23:47:06Guillaume Sanchezlinkissue12568 messages
2017年07月13日 23:47:06Guillaume Sanchezcreate

AltStyle によって変換されたページ (->オリジナル) /