This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2012年04月01日 08:07 by py.user, last changed 2022年04月11日 14:57 by admin. This issue is now closed.
| Messages (11) | |||
|---|---|---|---|
| msg157264 - (view) | Author: py.user (py.user) * | Date: 2012年04月01日 08:07 | |
>>> import re
>>> re.search(r'(?<=a){100,200}bc', 'abc', re.DEBUG)
max_repeat 100 200
assert -1
literal 97
literal 98
literal 99
<_sre.SRE_Match object at 0xb7429f38>
>>> re.search(r'(?<=a){100,200}bc', 'abc', re.DEBUG).group()
'bc'
>>>
I expected "nothing to repeat"
|
|||
| msg221588 - (view) | Author: Mark Lawrence (BreamoreBoy) * | Date: 2014年06月26日 01:33 | |
Can someone comment on this regex problem please, they're just not my cup of tea. |
|||
| msg221594 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2014年06月26日 07:24 | |
Technically this is not a bug. |
|||
| msg221601 - (view) | Author: Matthew Barnett (mrabarnett) * (Python triager) | Date: 2014年06月26日 10:25 | |
Lookarounds can contain capture groups:
>>> import re
>>> re.search(r'a(?=(.))', 'ab').groups()
('b',)
>>> re.search(r'(?<=(.))b', 'ab').groups()
('a',)
so lookarounds that are optional or can have no repeats might have a use.
I'm not sure whether it's useful to repeat them more than once, but that's another matter.
I'd say that it's not a bug.
|
|||
| msg221631 - (view) | Author: py.user (py.user) * | Date: 2014年06月26日 19:01 | |
>>> m = re.search(r'(?<=(a)){10}bc', 'abc', re.DEBUG)
max_repeat 10 10
assert -1
subpattern 1
literal 97
literal 98
literal 99
>>> m.group()
'bc'
>>>
>>> m.groups()
('a',)
>>>
It works like there are 10 letters "a" before letter "b".
|
|||
| msg221633 - (view) | Author: Matthew Barnett (mrabarnett) * (Python triager) | Date: 2014年06月26日 19:16 | |
Lookarounds can capture, but they don't consume. That lookbehind is matching the same part of the string every time. |
|||
| msg221635 - (view) | Author: Tim Peters (tim.peters) * (Python committer) | Date: 2014年06月26日 19:24 | |
I would not call this a bug - it's just usually a silly thing to do ;-)
Note, e.g., that p{N} is shorthand for writing p N times. For example, p{4} is much the same as pppp (but not exactly so in all cases; e.g., if `p` happens to contain a capturing group, the numbering of all capturing groups will differ between those two spellings).
A successful assertion generally matches an empty string (does not advance the position being looked at in the target string). So, e.g., if we're at some point in the target string where
(?<=a)
matches, then
(?<=a)(?<=a)
will also match at the same point, and so will
(?<=a)(?<=a)(?<=a)
and
(?<=a)(?<=a)(?<=a)(?<=a)
and so on & so on. The position in the target string never changes, so each redundant assertion succeeds too. So (?<=a){N} _should_ match there too.
> It works like there are 10 letters "a" before letter "b".
It's much more like you're asking whether "a" appears before "b", but are rather pointlessly asking the same question 10 times ;-)
|
|||
| msg221639 - (view) | Author: py.user (py.user) * | Date: 2014年06月26日 19:49 | |
Tim Peters wrote: > (?<=a)(?<=a)(?<=a)(?<=a) There are four different points. If a1 before a2 and a2 before a3 and a3 before a4 and a4 before something. Otherwise repetition of assertion has no sense. If it has no sense, there should be an exception. |
|||
| msg221646 - (view) | Author: Tim Peters (tim.peters) * (Python committer) | Date: 2014年06月26日 20:52 | |
>> (?<=a)(?<=a)(?<=a)(?<=a) > There are four different points. > If a1 before a2 and a2 before a3 and a3 before a4 and a4 > before something. Sorry, that view doesn't make any sense. A successful lookbehind assertion matches the empty string. Same as the regexp ()()()() matches 4 empty strings (and all the _same_ empty string) at any point. > Otherwise repetition of assertion has no sense. As I said before, it's "usually a silly thing to do". It does make sense, just not _useful_ sense - it's "silly" ;-) > If it has no sense, there should be an exception. Why? Code like i += 0 is usually pointless too, but it's not up to a programming language to force you to code only useful things. It's easy to write to write regexps that are pointless. For example, the regexp (?=a)b can never succeed. Should that raise an exception? Or should the regexp (?=a)a raise an exception because the (?=a) part is redundant? Etc. |
|||
| msg221666 - (view) | Author: Tim Peters (tim.peters) * (Python committer) | Date: 2014年06月26日 23:42 | |
BTW, note that the idea "successful lookaround assertions match an empty string" isn't just a figure of speech: it's the literal truth, and - indeed - is key to understanding what happens here. You can see this by adding some capturing groups around the assertions. Like so:
m = re.search("((?<=a))((?<=a))((?<=a))((?<=a))b", "xab")
Then
[m.span(i) for i in range(1, 5)]
produces
[(2, 2), (2, 2), (2, 2), (2, 2)]
That is, each assertion matched (the same) empty string immediately preceding "b" in the target string.
This makes perfect sense - although it may not be useful. So I think this report should be closed with "so if it bothers you, don't do it" ;-)
|
|||
| msg221673 - (view) | Author: py.user (py.user) * | Date: 2014年06月27日 04:48 | |
Tim Peters wrote:
> Should that raise an exception?
>i += 0
>(?=a)b
>(?=a)a
These are another cases. The first is very special. The second and third are special too, but with different contents of assertion they can do useful work.
While "(?=any contents){N}a" never uses the "{N}" part in any useful manner.
> So I think this report should be closed
I looked into Perl behaviour today, it works like Python. It's not an error there.
|
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:28 | admin | set | github: 58665 |
| 2014年11月01日 18:20:29 | serhiy.storchaka | set | status: open -> closed resolution: not a bug stage: resolved |
| 2014年06月27日 04:48:54 | py.user | set | messages: + msg221673 |
| 2014年06月26日 23:42:32 | tim.peters | set | messages: + msg221666 |
| 2014年06月26日 20:52:25 | tim.peters | set | messages: + msg221646 |
| 2014年06月26日 19:49:53 | py.user | set | messages: + msg221639 |
| 2014年06月26日 19:24:17 | tim.peters | set | nosy:
+ tim.peters messages: + msg221635 |
| 2014年06月26日 19:16:23 | mrabarnett | set | messages: + msg221633 |
| 2014年06月26日 19:01:46 | py.user | set | messages: + msg221631 |
| 2014年06月26日 10:25:14 | mrabarnett | set | messages: + msg221601 |
| 2014年06月26日 07:24:26 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages: + msg221594 |
| 2014年06月26日 01:33:37 | BreamoreBoy | set | nosy:
+ BreamoreBoy messages: + msg221588 versions: + Python 2.7, Python 3.4, Python 3.5, - Python 3.2 |
| 2012年04月01日 08:07:50 | py.user | create | |