homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: In re's positive lookbehind assertion repetition works
Type: behavior Stage: resolved
Components: Regular Expressions Versions: Python 3.4, Python 3.5, Python 2.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: BreamoreBoy, ezio.melotti, mrabarnett, py.user, serhiy.storchaka, tim.peters
Priority: normal Keywords:

Created on 2012年04月01日 08:07 by py.user, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Messages (11)
msg157264 - (view) Author: py.user (py.user) * Date: 2012年04月01日 08:07
>>> import re
>>> re.search(r'(?<=a){100,200}bc', 'abc', re.DEBUG)
max_repeat 100 200 
 assert -1 
 literal 97 
literal 98 
literal 99 
<_sre.SRE_Match object at 0xb7429f38>
>>> re.search(r'(?<=a){100,200}bc', 'abc', re.DEBUG).group()
'bc'
>>>
I expected "nothing to repeat"
msg221588 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2014年06月26日 01:33
Can someone comment on this regex problem please, they're just not my cup of tea.
msg221594 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014年06月26日 07:24
Technically this is not a bug.
msg221601 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2014年06月26日 10:25
Lookarounds can contain capture groups:
>>> import re
>>> re.search(r'a(?=(.))', 'ab').groups()
('b',)
>>> re.search(r'(?<=(.))b', 'ab').groups()
('a',)
so lookarounds that are optional or can have no repeats might have a use.
I'm not sure whether it's useful to repeat them more than once, but that's another matter.
I'd say that it's not a bug.
msg221631 - (view) Author: py.user (py.user) * Date: 2014年06月26日 19:01
>>> m = re.search(r'(?<=(a)){10}bc', 'abc', re.DEBUG)
max_repeat 10 10 
 assert -1 
 subpattern 1 
 literal 97 
literal 98 
literal 99 
>>> m.group()
'bc'
>>>
>>> m.groups()
('a',)
>>>
It works like there are 10 letters "a" before letter "b".
msg221633 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2014年06月26日 19:16
Lookarounds can capture, but they don't consume. That lookbehind is matching the same part of the string every time.
msg221635 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2014年06月26日 19:24
I would not call this a bug - it's just usually a silly thing to do ;-)
Note, e.g., that p{N} is shorthand for writing p N times. For example, p{4} is much the same as pppp (but not exactly so in all cases; e.g., if `p` happens to contain a capturing group, the numbering of all capturing groups will differ between those two spellings).
A successful assertion generally matches an empty string (does not advance the position being looked at in the target string). So, e.g., if we're at some point in the target string where
(?<=a)
matches, then
(?<=a)(?<=a)
will also match at the same point, and so will
(?<=a)(?<=a)(?<=a)
and
(?<=a)(?<=a)(?<=a)(?<=a)
and so on & so on. The position in the target string never changes, so each redundant assertion succeeds too. So (?<=a){N} _should_ match there too.
> It works like there are 10 letters "a" before letter "b".
It's much more like you're asking whether "a" appears before "b", but are rather pointlessly asking the same question 10 times ;-)
msg221639 - (view) Author: py.user (py.user) * Date: 2014年06月26日 19:49
Tim Peters wrote:
> (?<=a)(?<=a)(?<=a)(?<=a)
There are four different points.
If a1 before a2 and a2 before a3 and a3 before a4 and a4 before something.
Otherwise repetition of assertion has no sense. If it has no sense, there should be an exception.
msg221646 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2014年06月26日 20:52
>> (?<=a)(?<=a)(?<=a)(?<=a)
> There are four different points.
> If a1 before a2 and a2 before a3 and a3 before a4 and a4
> before something.
Sorry, that view doesn't make any sense. A successful lookbehind assertion matches the empty string. Same as the regexp
()()()()
matches 4 empty strings (and all the _same_ empty string) at any point.
> Otherwise repetition of assertion has no sense.
As I said before, it's "usually a silly thing to do". It does make sense, just not _useful_ sense - it's "silly" ;-)
> If it has no sense, there should be an exception.
Why? Code like
 i += 0
is usually pointless too, but it's not up to a programming language to force you to code only useful things.
It's easy to write to write regexps that are pointless. For example, the regexp
(?=a)b
can never succeed. Should that raise an exception? Or should the regexp
(?=a)a
raise an exception because the (?=a) part is redundant? Etc.
msg221666 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2014年06月26日 23:42
BTW, note that the idea "successful lookaround assertions match an empty string" isn't just a figure of speech: it's the literal truth, and - indeed - is key to understanding what happens here. You can see this by adding some capturing groups around the assertions. Like so:
m = re.search("((?<=a))((?<=a))((?<=a))((?<=a))b", "xab")
Then
[m.span(i) for i in range(1, 5)]
produces
[(2, 2), (2, 2), (2, 2), (2, 2)]
That is, each assertion matched (the same) empty string immediately preceding "b" in the target string.
This makes perfect sense - although it may not be useful. So I think this report should be closed with "so if it bothers you, don't do it" ;-)
msg221673 - (view) Author: py.user (py.user) * Date: 2014年06月27日 04:48
Tim Peters wrote:
> Should that raise an exception?
>i += 0
>(?=a)b
>(?=a)a
These are another cases. The first is very special. The second and third are special too, but with different contents of assertion they can do useful work.
While "(?=any contents){N}a" never uses the "{N}" part in any useful manner.
> So I think this report should be closed
I looked into Perl behaviour today, it works like Python. It's not an error there.
History
Date User Action Args
2022年04月11日 14:57:28adminsetgithub: 58665
2014年11月01日 18:20:29serhiy.storchakasetstatus: open -> closed
resolution: not a bug
stage: resolved
2014年06月27日 04:48:54py.usersetmessages: + msg221673
2014年06月26日 23:42:32tim.peterssetmessages: + msg221666
2014年06月26日 20:52:25tim.peterssetmessages: + msg221646
2014年06月26日 19:49:53py.usersetmessages: + msg221639
2014年06月26日 19:24:17tim.peterssetnosy: + tim.peters
messages: + msg221635
2014年06月26日 19:16:23mrabarnettsetmessages: + msg221633
2014年06月26日 19:01:46py.usersetmessages: + msg221631
2014年06月26日 10:25:14mrabarnettsetmessages: + msg221601
2014年06月26日 07:24:26serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg221594
2014年06月26日 01:33:37BreamoreBoysetnosy: + BreamoreBoy

messages: + msg221588
versions: + Python 2.7, Python 3.4, Python 3.5, - Python 3.2
2012年04月01日 08:07:50py.usercreate

AltStyle によって変換されたページ (->オリジナル) /