This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2010年07月06日 10:23 by acooke, last changed 2022年04月11日 14:57 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| re_getwidth.patch | serhiy.storchaka, 2014年10月11日 18:36 | review | ||
| re_forbid_some_groupref_in_lookbehind-2.7.patch | serhiy.storchaka, 2014年11月30日 15:30 | review | ||
| re_forbid_groupref_in_lookbehind-2.7.patch | serhiy.storchaka, 2014年11月30日 17:55 | review | ||
| re_forbid_groupref_in_lookbehind-2.7_2.patch | serhiy.storchaka, 2014年11月30日 19:36 | review | ||
| Messages (21) | |||
|---|---|---|---|
| msg109382 - (view) | Author: andrew cooke (acooke) | Date: 2010年07月06日 10:23 | |
from re import compile
# these work as expected
assert compile('(a)b(?<=b)(c)').match('abc')
assert not compile('(a)b(?<=c)(c)').match('abc')
assert compile('(a)b(?=c)(c)').match('abc')
assert not compile('(a)b(?=b)(c)').match('abc')
# but when you add groups, you get bugs
assert not compile('(?:(a)|(x))b(?<=(?(2)x|c))c').match('abc') # matches!
assert not compile('(?:(a)|(x))b(?<=(?(2)b|x))c').match('abc')
assert compile('(?:(a)|(x))b(?<=(?(2)x|b))c').match('abc') # fails!
assert not compile('(?:(a)|(x))b(?<=(?(1)c|x))c').match('abc') # matches!
assert compile('(?:(a)|(x))b(?<=(?(1)b|x))c').match('abc') # fails!
# but lookahead works as expected
assert compile('(?:(a)|(x))b(?=(?(2)x|c))c').match('abc')
assert not compile('(?:(a)|(x))b(?=(?(2)c|x))c').match('abc')
assert compile('(?:(a)|(x))b(?=(?(2)x|c))c').match('abc')
assert not compile('(?:(a)|(x))b(?=(?(1)b|x))c').match('abc')
assert compile('(?:(a)|(x))b(?=(?(1)c|x))c').match('abc')
# these are similar but, in my opinion, shouldn't even compile
# (group used before defined)
assert not compile('(a)b(?<=(?(2)x|c))(c)').match('abc') # matches!
assert not compile('(a)b(?<=(?(2)b|x))(c)').match('abc')
assert not compile('(a)b(?<=(?(1)c|x))(c)').match('abc') # matches!
assert compile('(a)b(?<=(?(1)b|x))(c)').match('abc') # fails!
assert compile('(a)b(?=(?(2)x|c))(c)').match('abc')
assert not compile('(a)b(?=(?(2)b|x))(c)').match('abc')
assert compile('(a)b(?=(?(1)c|x))(c)').match('abc')
# this is the error we should see above
try:
compile('(a)\2円(b)')
assert False, 'expected error'
except:
pass
|
|||
| msg109383 - (view) | Author: andrew cooke (acooke) | Date: 2010年07月06日 10:30 | |
I hope the above is clear enough (you need to stare at the regexps for a time) - basically, lookback with a group conditional is not as expected (it appears to be evaluated as lookahead?). Also, some patterns compile that probably shouldn't. The re package only supports (according to the docs) lookback on expressions whose length is known. So I guess it's also possible that (?(n)pat1|pat2) should always fail that, even when len(pat1) = len(pat2)? Also, the generally excellent unit tests for the re package don't have much coverage for lookback (I am writing my own regexp lib and it passes all the re unit tests but had a similar bug - that's how I found this one...). |
|||
| msg109387 - (view) | Author: andrew cooke (acooke) | Date: 2010年07月06日 13:08 | |
If it's any help, these are the equivalent tests as I think they should be (you'll need to translate engine(parse(... to compile(...) http://code.google.com/p/rxpy/source/browse/rxpy/src/rxpy/engine/backtrack/_test/engine.py?r=fc52f6959a0cfabdddc6960f47d7380128bb3584#284 |
|||
| msg109388 - (view) | Author: Mark Dickinson (mark.dickinson) * (Python committer) | Date: 2010年07月06日 13:30 | |
Thanks very much for the reports. > So I guess it's also possible that (?(n)pat1|pat2) should always fail > that, even when len(pat1) = len(pat2)? Yes, this seems likely to me. Possibly even the compile stage should fail, though I've no idea how feasible it is to make that happen. Unfortunately I'm not sure that any of the currently active Python developers is particularly well versed in the intricacies of the re module. The most realistic option here may be just to document the restrictions on lookbehind assertions more clearly. Unless you're able to provide a patch? |
|||
| msg109389 - (view) | Author: andrew cooke (acooke) | Date: 2010年07月06日 13:47 | |
I thought someone was working on the re module these days? I thought there I'd seen some issues with patches etc? Anyway, short term, sorry - no patch. Medium/long term, yes it's possible, but please don't rely on it. The simplest way to document it is as you suggest, I think - just extend the qualifier on lookback requiring fixed length to exclude references to groups (it does seem to *bind* groups correctly on lookback, so there's no need to exclude them completely). |
|||
| msg109390 - (view) | Author: Mark Dickinson (mark.dickinson) * (Python committer) | Date: 2010年07月06日 13:56 | |
> I thought someone was working on the re module these days? Well, there's issue 2636. It doesn't seem likely that that work will land in core Python any time soon, though. |
|||
| msg109399 - (view) | Author: Matthew Barnett (mrabarnett) * (Python triager) | Date: 2010年07月06日 15:52 | |
Should a regex compile if a group is referenced before it's defined? Consider this: (?:(?(2)(a)|(b))+ Other regex implementations permit forward references to groups. BTW, I had a look at the re module, found it too difficult, and so started on my own implementation of the matching engine (available on PyPI). |
|||
| msg109400 - (view) | Author: andrew cooke (acooke) | Date: 2010年07月06日 16:02 | |
Ah good point, thanks. |
|||
| msg227743 - (view) | Author: Mark Lawrence (BreamoreBoy) * | Date: 2014年09月28日 00:12 | |
Given the comment from Matthew Barnett in msg109399 "...I had a look at the re module, found it too difficult..." can this be closed as "won't fix"? |
|||
| msg229102 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2014年10月11日 18:36 | |
Here is a patch which fixes lookbacks with group references and with group conditionals. I have used Andrew's examples as the base for tests. |
|||
| msg229917 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2014年10月24日 12:03 | |
The patch also fixes issue814253. If there are no objections I'll commit it soon. |
|||
| msg230351 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2014年10月31日 16:26 | |
If there are no objections I'm going to commit the patch soon. |
|||
| msg230825 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2014年11月07日 19:49 | |
New changeset fac649bf2d10 by Serhiy Storchaka in branch '2.7': Issues #814253, #9179: Group references and conditional group references now https://hg.python.org/cpython/rev/fac649bf2d10 New changeset 9fcf4008b626 by Serhiy Storchaka in branch '3.4': Issues #814253, #9179: Group references and conditional group references now https://hg.python.org/cpython/rev/9fcf4008b626 New changeset 60fccf0aad83 by Serhiy Storchaka in branch 'default': Issues #814253, #9179: Group references and conditional group references now https://hg.python.org/cpython/rev/60fccf0aad83 |
|||
| msg231889 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2014年11月30日 15:30 | |
The more I think about it, the more doubt. This patch added a behavior that is incompatible with the regex module. The regex module proceeds lookbehind assertions in the opposite direction, from right to left. This allows it to work with lookbehind assertions of non-fixed length. But the side effect is that in regex group reference in lookbehind assertion can refer only to a group defined right in the same lookbehind assertion (or defined left outside). In re now group reference in lookbehind assertion can refer only to a group defined left. This is likely to change in the future, which brings us to the problem of incompatibility. There are several quick ways to resolve the problem: 1) Rollback the patch and return to the previous non-working behavior. Because of the obvious non-working the problem with changing the implementation of lookbehind assertion in the future will be weaker. 2) Rollback the patch and emit a warning or error when using any group references in lookbehind assertion. Something like patch proposed by Greg Chapman in issue814253 (but slightly more advanced). 3) Leave the patch and emit a warning or an error when using group references to the group defined in this same lookbehind assertion. Group references will work in lookbehind assertions in most cases except rare cases when current re behavior differs from regex behavior. What is your decision Benjamin? Here is a patch against 2.7 which implements variant 3. |
|||
| msg231894 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2014年11月30日 16:52 | |
New changeset d1f7c3f45ffe by Benjamin Peterson in branch '3.4': backout 9fcf4008b626 (#9179) for further consideration https://hg.python.org/cpython/rev/d1f7c3f45ffe New changeset f385bc6e6e09 by Benjamin Peterson in branch 'default': merge 3.4 (#9179) https://hg.python.org/cpython/rev/f385bc6e6e09 New changeset 8a3807e15a1f by Benjamin Peterson in branch '2.7': backout fac649bf2d10 (#9179) for further consideration https://hg.python.org/cpython/rev/8a3807e15a1f |
|||
| msg231895 - (view) | Author: Benjamin Peterson (benjamin.peterson) * (Python committer) | Date: 2014年11月30日 16:52 | |
I just backed out the change. Thanks for brining up the issue. |
|||
| msg231897 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2014年11月30日 17:55 | |
What would be the best solution for 2.7? Here is a patch which forbids any group references in lookbehind assertions (they are not work currently and users shouldn't use them). |
|||
| msg231900 - (view) | Author: Benjamin Peterson (benjamin.peterson) * (Python committer) | Date: 2014年11月30日 18:52 | |
On Sun, Nov 30, 2014, at 12:55, Serhiy Storchaka wrote: > > Serhiy Storchaka added the comment: > > What would be the best solution for 2.7? You can pick. I just always favor not changing things for release candidates. > > Here is a patch which forbids any group references in lookbehind > assertions (they are not work currently and users shouldn't use them). |
|||
| msg231901 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2014年11月30日 19:36 | |
Updated documentation. If there are no objections I'll commit re_forbid_groupref_in_lookbehind-2.7_2.patch to 2.7 and 3.4. For 3.5 I prefer to add support of group references. |
|||
| msg236358 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2015年02月21日 10:12 | |
New changeset b78195af96f5 by Serhiy Storchaka in branch 'default': Issues #814253, #9179: Group references and conditional group references now https://hg.python.org/cpython/rev/b78195af96f5 New changeset 5387095b8675 by Serhiy Storchaka in branch '2.7': Issues #814253, #9179: Warnings now are raised when group references and https://hg.python.org/cpython/rev/5387095b8675 New changeset e295ad9be16d by Serhiy Storchaka in branch '3.4': Issues #814253, #9179: Warnings now are raised when group references and https://hg.python.org/cpython/rev/e295ad9be16d |
|||
| msg236359 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2015年02月21日 10:19 | |
Only warnings are raised in 2.7 and 3.4, so it will not break third party code that "works" by accident. In 3.5 only references to groups defined outside of lookbehind assertion work, so the behavior is compatible with regex. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:03 | admin | set | github: 53425 |
| 2015年02月21日 10:19:37 | serhiy.storchaka | set | status: open -> closed resolution: fixed stage: patch review -> resolved |
| 2015年02月21日 10:19:07 | serhiy.storchaka | set | messages: + msg236359 |
| 2015年02月21日 10:12:03 | python-dev | set | messages: + msg236358 |
| 2014年12月06日 01:47:33 | benjamin.peterson | set | priority: release blocker -> normal |
| 2014年11月30日 19:36:45 | serhiy.storchaka | set | files:
+ re_forbid_groupref_in_lookbehind-2.7_2.patch messages: + msg231901 |
| 2014年11月30日 18:52:57 | benjamin.peterson | set | messages: + msg231900 |
| 2014年11月30日 17:55:22 | serhiy.storchaka | set | files:
+ re_forbid_groupref_in_lookbehind-2.7.patch messages: + msg231897 stage: patch review |
| 2014年11月30日 16:52:43 | benjamin.peterson | set | messages: + msg231895 |
| 2014年11月30日 16:52:08 | python-dev | set | messages: + msg231894 |
| 2014年11月30日 15:30:57 | serhiy.storchaka | set | status: closed -> open files: + re_forbid_some_groupref_in_lookbehind-2.7.patch nosy: + larry, benjamin.peterson stage: resolved -> (no value) messages: + msg231889 resolution: fixed -> (no value) priority: normal -> release blocker |
| 2014年11月07日 21:27:30 | serhiy.storchaka | set | status: open -> closed resolution: fixed stage: patch review -> resolved |
| 2014年11月07日 19:49:37 | python-dev | set | nosy:
+ python-dev messages: + msg230825 |
| 2014年10月31日 16:26:21 | serhiy.storchaka | set | messages: + msg230351 |
| 2014年10月24日 12:05:32 | serhiy.storchaka | link | issue814253 superseder |
| 2014年10月24日 12:03:21 | serhiy.storchaka | set | messages: + msg229917 |
| 2014年10月11日 18:36:13 | serhiy.storchaka | set | files:
+ re_getwidth.patch assignee: serhiy.storchaka components: + Regular Expressions versions: + Python 3.4, Python 3.5, - Python 2.6, Python 3.1, Python 3.2 keywords: + patch nosy: + ezio.melotti messages: + msg229102 stage: patch review |
| 2014年09月28日 00:12:14 | BreamoreBoy | set | nosy:
+ BreamoreBoy, serhiy.storchaka messages: + msg227743 |
| 2010年07月06日 16:02:41 | acooke | set | messages: + msg109400 |
| 2010年07月06日 15:52:05 | mrabarnett | set | messages: + msg109399 |
| 2010年07月06日 13:56:16 | mark.dickinson | set | messages: + msg109390 |
| 2010年07月06日 13:47:53 | acooke | set | messages: + msg109389 |
| 2010年07月06日 13:31:43 | mark.dickinson | set | versions: + Python 3.1, Python 2.7, Python 3.2 |
| 2010年07月06日 13:30:20 | mark.dickinson | set | nosy:
+ mark.dickinson, mrabarnett messages: + msg109388 |
| 2010年07月06日 13:08:29 | acooke | set | messages: + msg109387 |
| 2010年07月06日 10:30:28 | acooke | set | messages: + msg109383 |
| 2010年07月06日 10:23:32 | acooke | create | |