Message 68905 - Python tracker

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

In-reply-to
Author	pitrou
Recipients	gvanrossum, humitos, pitrou, sven.siegmund
Date	2008年06月28日.20:27:23
SpamBayes Score	0.004190082
Marked as misclassified	No
Message-id	<1214684844.73.0.533760513068.issue2834@psf.upfronthosting.co.za>

Content
Uh, actually, it works if you specify re.UNICODE. If you don't, the getlower() function in _sre.c falls back to the plain ASCII algorithm. >>> pat = re.compile('Á', re.IGNORECASE \| re.UNICODE) >>> pat.match('á') <_sre.SRE_Match object at 0xb7c66c28> >>> pat.match('Á') <_sre.SRE_Match object at 0xb7c66cd0> I wonder if re.UNICODE shouldn't be the default in Py3k, at least when the pattern is a string and not a bytes object. There may also be a re.ASCII flag for those cases where people want to fallback to the old behaviour.

Content

Uh, actually, it works if you specify re.UNICODE. If you don't, the
getlower() function in _sre.c falls back to the plain ASCII algorithm.
>>> pat = re.compile('Á', re.IGNORECASE | re.UNICODE)
>>> pat.match('á')
<_sre.SRE_Match object at 0xb7c66c28>
>>> pat.match('Á')
<_sre.SRE_Match object at 0xb7c66cd0>
I wonder if re.UNICODE shouldn't be the default in Py3k, at least when
the pattern is a string and not a bytes object. There may also be a
re.ASCII flag for those cases where people want to fallback to the old
behaviour.

History
Date	User	Action	Args
2008年06月28日 20:27:24	pitrou	set	spambayes_score: 0.00419008 -> 0.004190082 recipients: + pitrou, gvanrossum, humitos, sven.siegmund
2008年06月28日 20:27:24	pitrou	set	spambayes_score: 0.00419008 -> 0.00419008 messageid: <1214684844.73.0.533760513068.issue2834@psf.upfronthosting.co.za>
2008年06月28日 20:27:24	pitrou	link	issue2834 messages
2008年06月28日 20:27:23	pitrou	create

homepage