python3, regular expression and bytes text

MRAB python at mrabarnett.plus.com
Sat Oct 12 16:15:26 EDT 2019


On 2019年10月12日 20:48, Serhiy Storchaka wrote:
> 12.10.19 21:08, Eko palypse пише:
>> So how can I make it work with utf8 encoded text?
>> You cannot. First, \w in re.LOCALE works only when the text is encoded
> with the locale encoding (cp1252 in your case). Second, re.LOCALE
> supports only 8-bit charsets. So even if you set the utf-8 locale, it
> would not help.
>> Regular expressions with re.LOCALE are slow. It may be more efficient to
> decode text and use Unicode regular expression.
>+1
It's best to treat re.LOCALE as being for old legacy encodings that 
use/used 8 bits per character. Wherever possible, decode to Unicode and 
work with that instead.


More information about the Python-list mailing list

AltStyle によって変換されたページ (->オリジナル) /