homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: fullmatch isn't matching correctly under re.IGNORECASE
Type: behavior Stage: resolved
Components: Regular Expressions Versions: Python 3.4, Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: Gareth.Gouldstone, Lucretiel, ezio.melotti, mrabarnett, python-dev, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2014年03月20日 18:40 by Lucretiel, last changed 2022年04月11日 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
sre_fullmatch_repeated_ignorecase.patch serhiy.storchaka, 2014年03月20日 20:26 review
issue20998.patch mrabarnett, 2014年03月20日 21:37
issue20998_2.patch serhiy.storchaka, 2014年04月13日 15:28 review
Messages (10)
msg214257 - (view) Author: Nathan West (Lucretiel) * Date: 2014年03月20日 18:40
I have the following regular expression:
In [2]: regex = re.compile("ME IS \w+", re.I)
For some reason, when using `fullmatch`, it doesn't match substrings longer than 1 for the '\w+':
In [3]: regex.fullmatch("ME IS L")
Out[3]: <_sre.SRE_Match object; span=(0, 7), match='ME IS L'>
In [4]: regex.fullmatch("me is l")
Out[4]: <_sre.SRE_Match object; span=(0, 7), match='me is l'>
In [5]: regex.fullmatch("ME IS Lucretiel")
In [6]: regex.fullmatch("me is lucretiel")
I have no idea why this is happening. Using `match` works fine:
In [7]: regex.match("ME IS L")
Out[7]: <_sre.SRE_Match object; span=(0, 7), match='ME IS L'>
In [8]: regex.match("ME IS Lucretiel")
Out[8]: <_sre.SRE_Match object; span=(0, 15), match='ME IS Lucretiel'>
In [9]: regex.match("me is lucretiel")
Out[9]: <_sre.SRE_Match object; span=(0, 15), match='me is lucretiel'>
Additionally, using `fullmatch` WITHOUT using the `re.I` flag causes it to work:
In [10]: regex = re.compile("ME IS \w+")
In [11]: regex.fullmatch("ME IS L")
Out[11]: <_sre.SRE_Match object; span=(0, 7), match='ME IS L'>
In [12]: regex.fullmatch("ME IS Lucretiel")
Out[12]: <_sre.SRE_Match object; span=(0, 15), match='ME IS Lucretiel'>
My platform is Ubuntu 12.04, using Python 3.4 installed from Felix Krull's deadsnakes PPA (https://launchpad.net/~fkrull/+archive/deadsnakes).
msg214272 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014年03月20日 20:26
Here is a patch.
msg214287 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2014年03月20日 21:37
FWIW, here's my own attempt at a patch.
msg215546 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014年04月04日 18:22
Both patch are almost equivalent (my patch is much simpler but perhaps 
Matthew's approach is more correct in long perspective).
Unfortunately Rietvield doesn't work with Matthew's patch, so I have added my 
comments here.
> - (!ctx->match_all || ctx->ptr == state->end)) {
> + ctx->ptr == state->end) {
Why this check is not needed anymore?
> - status = SRE(match)(state, pattern + 2*prefix_skip);
> + status = SRE(match)(state, pattern + 2*prefix_skip, 
state->match_all);
> - status = SRE(match)(state, pattern + 2);
> + status = SRE(match)(state, pattern + 2, state->match_all);
state->match_all is used but it is never initialized.
msg215549 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2014年04月04日 18:49
> > - (!ctx->match_all || ctx->ptr == state->end)) {
> > + ctx->ptr == state->end) {
> 
> Why this check is not needed anymore?
> 
After stepping through the code for that regex that fails, I concluded 
that the condition shouldn't depend on ctx->match_all at that point 
after all.
> > - status = SRE(match)(state, pattern + 2*prefix_skip);
> > + status = SRE(match)(state, pattern + 2*prefix_skip, 
> state->match_all);
> 
> > - status = SRE(match)(state, pattern + 2);
> > + status = SRE(match)(state, pattern + 2, state->match_all);
> 
> state->match_all is used but it is never initialized.
I thought I'd initialised it in all the places it's used.
I admit that I find the code a little hard to follow at times... :-(
msg215667 - (view) Author: Gareth Gouldstone (Gareth.Gouldstone) Date: 2014年04月06日 20:32
fullmatch() is not yet implemented on the regex scanner object SRE_Scanner (issue 21002). Is it possible to adapt this patch to fix this omission?
msg216019 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014年04月13日 15:28
> After stepping through the code for that regex that fails, I concluded
> that the condition shouldn't depend on ctx->match_all at that point
> after all.
Tests are passed without this check. But I'm not sure it is not needed. At 
least without this check the code is not equivalent to the code before adding 
support for fullmatch(). So I prefer to left it as is.
> I thought I'd initialised it in all the places it's used.
> 
> I admit that I find the code a little hard to follow at times... :-(
Indeed, it is initialized in Modules/_sre.c, and it is always 0. Perhaps it 
will be more consistent to get rid of the match_all field in the SRE_STATE 
structure and pass it as argument.
msg216022 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014年04月13日 15:50
Gareth, this is unrelated issue.
msg218566 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2014年05月14日 18:52
New changeset 6267428afbdb by Serhiy Storchaka in branch '3.4':
Issue #20998: Fixed re.fullmatch() of repeated single character pattern
http://hg.python.org/cpython/rev/6267428afbdb
New changeset bcf64c1c92f6 by Serhiy Storchaka in branch 'default':
Issue #20998: Fixed re.fullmatch() of repeated single character pattern
http://hg.python.org/cpython/rev/bcf64c1c92f6 
msg218567 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014年05月14日 18:57
Thank you Matthew for your contribution.
History
Date User Action Args
2022年04月11日 14:58:00adminsetgithub: 65197
2014年05月14日 18:57:45serhiy.storchakasetstatus: open -> closed
resolution: fixed
messages: + msg218567

stage: patch review -> resolved
2014年05月14日 18:52:07python-devsetnosy: + python-dev
messages: + msg218566
2014年04月13日 17:57:17serhiy.storchakasetassignee: serhiy.storchaka
2014年04月13日 15:50:27serhiy.storchakasetmessages: + msg216022
2014年04月13日 15:28:32serhiy.storchakasetfiles: + issue20998_2.patch

messages: + msg216019
2014年04月06日 20:32:44Gareth.Gouldstonesetnosy: + Gareth.Gouldstone
messages: + msg215667
2014年04月04日 18:49:34mrabarnettsetmessages: + msg215549
2014年04月04日 18:22:59serhiy.storchakasetmessages: + msg215546
2014年03月20日 21:37:52mrabarnettsetfiles: + issue20998.patch

messages: + msg214287
2014年03月20日 20:26:25serhiy.storchakasetfiles: + sre_fullmatch_repeated_ignorecase.patch
keywords: + patch
messages: + msg214272

stage: needs patch -> patch review
2014年03月20日 18:57:45serhiy.storchakasetnosy: + serhiy.storchaka
stage: needs patch

versions: + Python 3.5
2014年03月20日 18:43:09Lucretielsettype: behavior
2014年03月20日 18:40:40Lucretielcreate

AltStyle によって変換されたページ (->オリジナル) /