homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Request for grapheme support in Python re lib
Type: enhancement Stage:
Components: Regular Expressions Versions: Python 3.4
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Arfrever, Socob, ezio.melotti, gvanrossum, mcepl, mrabarnett, tchrist
Priority: normal Keywords:

Created on 2011年08月11日 19:59 by tchrist, last changed 2022年04月11日 14:57 by admin.

Messages (3)
msg141924 - (view) Author: Tom Christiansen (tchrist) Date: 2011年08月11日 19:59
Without proper grapheme support in the regular expression library, it is impossible to correctly process Unicode. And the very least, one needs the \X escape supported, which is an extended grapheme cluster per UTS#18. This escape is supported by many regex libraries, include Perl's own and of course PCRE (and thence PHP, the standard ICU library, and Matthew Barnett's replacement regex library for Python.
How do you process a string by graphemes if you cannot split on \X? How can you avoid splitting a grapheme into silly pieces if you cannot match one? How do I match the letter O no matter what diacritics have been applied to it otherwise? A match of (?=O)\X against an NFD string is by far the simplest and best way.
This is necessary for a wide variety of reasons. Adding \pM and \PM go a little ways, but not far enough, because that is not how grapheme clusters are defined. You need a proper \X.
msg142114 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011年08月15日 10:45
As I said on #12734 and #12731, if the 'regex' module address this issue, we should just wait until we include it in the stdlib.
msg143041 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2011年08月26日 21:23
Again, I would be disappointed if the re (_sre) module could not be fixed. It is a reasonable feature request.
History
Date User Action Args
2022年04月11日 14:57:20adminsetgithub: 56942
2018年09月09日 17:47:14mceplsetnosy: + mcepl
2017年07月24日 02:20:12Socobsetnosy: + Socob
2013年07月10日 19:09:54terry.reedysetversions: + Python 3.4, - Python 3.3
2011年08月26日 21:23:45gvanrossumsetnosy: + gvanrossum
messages: + msg143041
2011年08月15日 10:45:43ezio.melottisetmessages: + msg142114
2011年08月13日 00:57:51mrabarnettsetnosy: + mrabarnett
2011年08月12日 18:05:36eric.araujosetversions: + Python 3.3, - Python 3.2
2011年08月12日 18:03:52Arfreversetnosy: + Arfrever
2011年08月12日 00:18:31ezio.melottisetnosy: + ezio.melotti
2011年08月11日 19:59:53tchristcreate

AltStyle によって変換されたページ (->オリジナル) /