homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: re.IGNORECASE does not match literal "_" (underscore)
Type: behavior Stage: resolved
Components: Regular Expressions Versions: Python 2.6
process
Status: closed Resolution: duplicate
Dependencies: Superseder: re.sub confusion between count and flags args
View: 11957
Assigned To: Nosy List: RobM, effbot, ezio.melotti, mrabarnett, pitrou
Priority: normal Keywords:

Created on 2011年04月28日 17:02 by RobM, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Messages (6)
msg134700 - (view) Author: Robert Meerman (RobM) Date: 2011年04月28日 17:02
Regular expressions which are written match literal underscores ("_", ASCII 
ordinal 95) and specify `re.IGNORECASE` during compilation do not consistently 
match underscores: it seems some occurrences are matched, but others are not.
The following session log shows the problem:
 Python 2.6.5 (r265:79063, Apr 16 2010, 13:57:41) 
 [GCC 4.4.3] on linux2
 Type "help", "copyright", "credits" or "license" for more information.
 >>> import re
 >>> subject = "[Conclave-Mendoi]_ef_-_a_tale_of_memories_00-12_H264"
 >>> print subject.encode("base64") # Incase my environment encoding is to blame
 W0NvbmNsYXZlLU1lbmRvaV1fZWZfLV9hX3RhbGVfb2ZfbWVtb3JpZXNfMDAtMTJfSDI2NA==
 >>> re.sub("_", "X", subject) # No flags, does what I expect
 '[Conclave-Mendoi]XefX-XaXtaleXofXmemoriesX00-12XH264'
 >>> 
 >>> re.sub("_", "X", subject, re.IGNORECASE) # Misses some matches
 '[Conclave-Mendoi]XefX-_a_tale_of_memories_00-12_H264'
 >>> 
 >>> re.sub("_", "X", subject, re.IGNORECASE | re.LOCALE) # Misses fewer matches
 '[Conclave-Mendoi]XefX-XaXtaleXofXmemories_00-12_H264'
 >>> 
 >>> re.sub("_", "X", subject, re.IGNORECASE | re.LOCALE | re.UNICODE) # Works OK
 '[Conclave-Mendoi]XefX-XaXtaleXofXmemoriesX00-12XH264'
 >>> 
 >>> re.sub("_", "X", subject, re.IGNORECASE | re.UNICODE) # Works OK
 '[Conclave-Mendoi]XefX-XaXtaleXofXmemoriesX00-12XH264'
 >>> 
 >>> type(subject) # Don't think this is a unicode string
 <type 'str'>
 >>> 
Since my `subject` variable is of type `str` and only contains ASCII characters
I do not believe that the `re.UNICODE` flag should be required.
msg134716 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2011年04月28日 19:54
help(re.sub) says:
 sub(pattern, repl, string, count=0)
and re.IGNORECASE has a value of 2.
Therefore this:
 re.sub("_", "X", subject, re.IGNORECASE)
is telling it to replace at most 2 occurrences of "_".
msg134717 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011年04月28日 20:49
Closing as invalid.
I wonder if it would be better to have count as a keyword-only argument though, since this problem seems to come up pretty often and it's not easy to debug.
msg134723 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2011年04月28日 22:21
I don't know how much code that might break. It might not be that much; I can't remember when I last used re.sub without the default count.
msg134752 - (view) Author: Robert Meerman (RobM) Date: 2011年04月29日 11:53
Oh, that's embarrassing. :-)
Could a type-check be used to alert the user to their mistake? I suppose that would require re.IGNORECASE (et al) to be of some new type (presumably sub-classed from Integer).
(Thanks for the quick response, and sorry to waste your time)
msg134831 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011年04月30日 02:24
See also #11957.
History
Date User Action Args
2022年04月11日 14:57:16adminsetgithub: 56156
2014年10月29日 16:15:11vstinnersetsuperseder: re.sub confusion between count and flags args
resolution: not a bug -> duplicate
2011年04月30日 02:24:10ezio.melottisetmessages: + msg134831
2011年04月29日 11:53:32RobMsetmessages: + msg134752
2011年04月28日 22:21:58mrabarnettsetmessages: + msg134723
2011年04月28日 20:49:19ezio.melottisetstatus: open -> closed
resolution: not a bug
messages: + msg134717

stage: resolved
2011年04月28日 19:54:48mrabarnettsetnosy: + mrabarnett
messages: + msg134716
2011年04月28日 17:02:55RobMcreate

AltStyle によって変換されたページ (->オリジナル) /