Grapheme clusters, a.k.a.real characters

Marko Rauhamaa marko at pacujo.net
Tue Jul 18 13:01:10 EDT 2017


Chris Angelico <rosuav at gmail.com>:
> what you're more likely to want is "match the letter á", and you don't
> care whether it's represented as U+0061 U+0301 or as U+00E1. That's
> where Unicode normalization comes in.

Yes. Also, not every letter can be normalized to a single codepoint so
NFC is not a way out. For example,
 re.match("^[q̈]$", "q̈")
returns None regardless of normalization.
Marko


More information about the Python-list mailing list

AltStyle によって変換されたページ (->オリジナル) /