This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2014年03月20日 18:40 by Lucretiel, last changed 2022年04月11日 14:58 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| sre_fullmatch_repeated_ignorecase.patch | serhiy.storchaka, 2014年03月20日 20:26 | review | ||
| issue20998.patch | mrabarnett, 2014年03月20日 21:37 | |||
| issue20998_2.patch | serhiy.storchaka, 2014年04月13日 15:28 | review | ||
| Messages (10) | |||
|---|---|---|---|
| msg214257 - (view) | Author: Nathan West (Lucretiel) * | Date: 2014年03月20日 18:40 | |
I have the following regular expression:
In [2]: regex = re.compile("ME IS \w+", re.I)
For some reason, when using `fullmatch`, it doesn't match substrings longer than 1 for the '\w+':
In [3]: regex.fullmatch("ME IS L")
Out[3]: <_sre.SRE_Match object; span=(0, 7), match='ME IS L'>
In [4]: regex.fullmatch("me is l")
Out[4]: <_sre.SRE_Match object; span=(0, 7), match='me is l'>
In [5]: regex.fullmatch("ME IS Lucretiel")
In [6]: regex.fullmatch("me is lucretiel")
I have no idea why this is happening. Using `match` works fine:
In [7]: regex.match("ME IS L")
Out[7]: <_sre.SRE_Match object; span=(0, 7), match='ME IS L'>
In [8]: regex.match("ME IS Lucretiel")
Out[8]: <_sre.SRE_Match object; span=(0, 15), match='ME IS Lucretiel'>
In [9]: regex.match("me is lucretiel")
Out[9]: <_sre.SRE_Match object; span=(0, 15), match='me is lucretiel'>
Additionally, using `fullmatch` WITHOUT using the `re.I` flag causes it to work:
In [10]: regex = re.compile("ME IS \w+")
In [11]: regex.fullmatch("ME IS L")
Out[11]: <_sre.SRE_Match object; span=(0, 7), match='ME IS L'>
In [12]: regex.fullmatch("ME IS Lucretiel")
Out[12]: <_sre.SRE_Match object; span=(0, 15), match='ME IS Lucretiel'>
My platform is Ubuntu 12.04, using Python 3.4 installed from Felix Krull's deadsnakes PPA (https://launchpad.net/~fkrull/+archive/deadsnakes).
|
|||
| msg214272 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2014年03月20日 20:26 | |
Here is a patch. |
|||
| msg214287 - (view) | Author: Matthew Barnett (mrabarnett) * (Python triager) | Date: 2014年03月20日 21:37 | |
FWIW, here's my own attempt at a patch. |
|||
| msg215546 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2014年04月04日 18:22 | |
Both patch are almost equivalent (my patch is much simpler but perhaps
Matthew's approach is more correct in long perspective).
Unfortunately Rietvield doesn't work with Matthew's patch, so I have added my
comments here.
> - (!ctx->match_all || ctx->ptr == state->end)) {
> + ctx->ptr == state->end) {
Why this check is not needed anymore?
> - status = SRE(match)(state, pattern + 2*prefix_skip);
> + status = SRE(match)(state, pattern + 2*prefix_skip,
state->match_all);
> - status = SRE(match)(state, pattern + 2);
> + status = SRE(match)(state, pattern + 2, state->match_all);
state->match_all is used but it is never initialized.
|
|||
| msg215549 - (view) | Author: Matthew Barnett (mrabarnett) * (Python triager) | Date: 2014年04月04日 18:49 | |
> > - (!ctx->match_all || ctx->ptr == state->end)) {
> > + ctx->ptr == state->end) {
>
> Why this check is not needed anymore?
>
After stepping through the code for that regex that fails, I concluded
that the condition shouldn't depend on ctx->match_all at that point
after all.
> > - status = SRE(match)(state, pattern + 2*prefix_skip);
> > + status = SRE(match)(state, pattern + 2*prefix_skip,
> state->match_all);
>
> > - status = SRE(match)(state, pattern + 2);
> > + status = SRE(match)(state, pattern + 2, state->match_all);
>
> state->match_all is used but it is never initialized.
I thought I'd initialised it in all the places it's used.
I admit that I find the code a little hard to follow at times... :-(
|
|||
| msg215667 - (view) | Author: Gareth Gouldstone (Gareth.Gouldstone) | Date: 2014年04月06日 20:32 | |
fullmatch() is not yet implemented on the regex scanner object SRE_Scanner (issue 21002). Is it possible to adapt this patch to fix this omission? |
|||
| msg216019 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2014年04月13日 15:28 | |
> After stepping through the code for that regex that fails, I concluded > that the condition shouldn't depend on ctx->match_all at that point > after all. Tests are passed without this check. But I'm not sure it is not needed. At least without this check the code is not equivalent to the code before adding support for fullmatch(). So I prefer to left it as is. > I thought I'd initialised it in all the places it's used. > > I admit that I find the code a little hard to follow at times... :-( Indeed, it is initialized in Modules/_sre.c, and it is always 0. Perhaps it will be more consistent to get rid of the match_all field in the SRE_STATE structure and pass it as argument. |
|||
| msg216022 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2014年04月13日 15:50 | |
Gareth, this is unrelated issue. |
|||
| msg218566 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2014年05月14日 18:52 | |
New changeset 6267428afbdb by Serhiy Storchaka in branch '3.4': Issue #20998: Fixed re.fullmatch() of repeated single character pattern http://hg.python.org/cpython/rev/6267428afbdb New changeset bcf64c1c92f6 by Serhiy Storchaka in branch 'default': Issue #20998: Fixed re.fullmatch() of repeated single character pattern http://hg.python.org/cpython/rev/bcf64c1c92f6 |
|||
| msg218567 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2014年05月14日 18:57 | |
Thank you Matthew for your contribution. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:58:00 | admin | set | github: 65197 |
| 2014年05月14日 18:57:45 | serhiy.storchaka | set | status: open -> closed resolution: fixed messages: + msg218567 stage: patch review -> resolved |
| 2014年05月14日 18:52:07 | python-dev | set | nosy:
+ python-dev messages: + msg218566 |
| 2014年04月13日 17:57:17 | serhiy.storchaka | set | assignee: serhiy.storchaka |
| 2014年04月13日 15:50:27 | serhiy.storchaka | set | messages: + msg216022 |
| 2014年04月13日 15:28:32 | serhiy.storchaka | set | files:
+ issue20998_2.patch messages: + msg216019 |
| 2014年04月06日 20:32:44 | Gareth.Gouldstone | set | nosy:
+ Gareth.Gouldstone messages: + msg215667 |
| 2014年04月04日 18:49:34 | mrabarnett | set | messages: + msg215549 |
| 2014年04月04日 18:22:59 | serhiy.storchaka | set | messages: + msg215546 |
| 2014年03月20日 21:37:52 | mrabarnett | set | files:
+ issue20998.patch messages: + msg214287 |
| 2014年03月20日 20:26:25 | serhiy.storchaka | set | files:
+ sre_fullmatch_repeated_ignorecase.patch keywords: + patch messages: + msg214272 stage: needs patch -> patch review |
| 2014年03月20日 18:57:45 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka stage: needs patch versions: + Python 3.5 |
| 2014年03月20日 18:43:09 | Lucretiel | set | type: behavior |
| 2014年03月20日 18:40:40 | Lucretiel | create | |