homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: re module: strange behaviour of space inside {m, n}
Type: behavior Stage: resolved
Components: Library (Lib), Regular Expressions Versions: Python 3.2, Python 3.3, Python 3.4, Python 2.7
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: ezio.melotti, mrabarnett, pitrou, roysmith, serhiy.storchaka, sjmachin
Priority: normal Keywords:

Created on 2011年02月12日 23:19 by sjmachin, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Messages (5)
msg128472 - (view) Author: John Machin (sjmachin) Date: 2011年02月12日 23:19
A pattern like r"b{1,3}\Z" matches "b", "bb", and "bbb", as expected. There is no documentation of the behaviour of r"b{1, 3}\Z" -- it matches the LITERAL TEXT "b{1, 3}" in normal mode and "b{1,3}" in verbose mode.
# paste the following at the interactive prompt:
pat = r"b{1, 3}\Z"
bool(re.match(pat, "bb")) # False
bool(re.match(pat, "b{1, 3}")) # True
bool(re.match(pat, "bb", re.VERBOSE)) # False
bool(re.match(pat, "b{1, 3}", re.VERBOSE)) # False
bool(re.match(pat, "b{1,3}", re.VERBOSE)) # True
Suggested change, in decreasing order of preference:
(1) Ignore leading/trailing spaces when parsing the m and n components of {m,n}
(2) Raise an exception if the exact syntax is not followed
(3) Document the existing behaviour
Note: deliberately matching the literal text would be expected to be done by escaping the left brace:
pat2 = r"b\{1, 3}\Z"
bool(re.match(pat2, "b{1, 3}")) # True
and this is not prevented by the suggested changes.
msg176812 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2012年12月02日 22:28
Interesting.
In my regex module (http://pypi.python.org/pypi/regex) I have:
bool(regex.match(pat, "bb", regex.VERBOSE)) # True
bool(regex.match(pat, "b{1,3}", regex.VERBOSE)) # False
because I thought that when the VERBOSE flag is turned on it should ignore whitespace except when it's inside a character class, so "b{1, 3}" would be treated as "b{1,3}".
Apparently re has another exception.
msg176813 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012年12月02日 22:40
$ echo 'bbbbbaaa' | grep -o 'b\{1,3\}a'
bbba
$ echo 'bbbbbaaa' | grep -o 'b\{1, 3\}a'
grep: Invalid content of \{\}
$ echo 'bbbbbaaa' | egrep -o 'b{1,3}a'
bbba
$ echo 'bbbbbaaa' | egrep -o 'b{1, 3}a'
$ echo 'bbb{1, 3}aa' | LC_ALL=C egrep -o 'b{1, 3}a'
b{1, 3}a
I.e. grep raises error and egrep chooses silent verbatim meaning. I don't know what any standards say about this.
msg176819 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2012年12月03日 00:10
The question is whether re should always treat 'b{1, 3}a' as a literal, even with the VERBOSE flag.
I've checked with Perl 5.14.2, and it agrees with re: adding a space _always_ makes it a literal, even with the 'x' flag (/b{1, 3}a/x is treated as /b\{1,3}a/).
msg180700 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013年01月26日 19:01
Then let's leave all as is.
History
Date User Action Args
2022年04月11日 14:57:12adminsetgithub: 55413
2014年09月14日 19:40:45serhiy.storchakasetstatus: pending -> closed
resolution: rejected
stage: resolved
2013年10月27日 17:27:19serhiy.storchakasetstatus: open -> pending
2013年02月11日 20:02:21roysmithsetnosy: + roysmith
2013年01月26日 19:01:21serhiy.storchakasetmessages: + msg180700
2012年12月03日 00:10:46mrabarnettsetmessages: + msg176819
2012年12月02日 22:40:27serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg176813
2012年12月02日 22:28:55mrabarnettsetmessages: + msg176812
2012年12月02日 21:53:47serhiy.storchakasetnosy: + mrabarnett

type: behavior
components: + Library (Lib), Regular Expressions
versions: + Python 3.2, Python 3.3, Python 3.4, - Python 3.1
2011年02月18日 19:55:44terry.reedysetnosy: + ezio.melotti, pitrou
2011年02月12日 23:19:56sjmachincreate

AltStyle によって変換されたページ (->オリジナル) /