This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2017年12月13日 18:28 by serhiy.storchaka, last changed 2022年04月11日 14:58 by admin. This issue is now closed.
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 4846 | merged | serhiy.storchaka, 2017年12月13日 18:34 | |
| Messages (14) | |||
|---|---|---|---|
| msg308229 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2017年12月13日 18:28 | |
Currently re.sub() replaces empty matches only when not adjacent to a previous match. This makes it inconsistent with re.findall() and re.finditer() which finds empty matches adjacent to a previous non-empty match and with other RE engines.
Proposed PR makes all functions that makes repeated searching (re.split(), re.sub(), re.findall(), re.finditer()) mutually consistent.
The PR change the behavior of re.split() too, but this doesn't matter, since it already is different from the 3.6 behavior.
BDFL have approved this change.
This change doesn't break any stdlib code. It is expected that it will not break much third-party code, and even if it will break some code, it can be easily rewritten. For example replacing re.sub('(.*)', ...) (which now matches an empty string at the end of the string) with re.sub('(.+)', ...) is an obvious fix.
|
|||
| msg309055 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2017年12月26日 10:14 | |
Could anybody please make a review of at least the documentation part? |
|||
| msg309458 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2018年01月04日 09:06 | |
New changeset fbb490fd2f38bd817d99c20c05121ad0168a38ee by Serhiy Storchaka in branch 'master': bpo-32308: Replace empty matches adjacent to a previous non-empty match in re.sub(). (#4846) https://github.com/python/cpython/commit/fbb490fd2f38bd817d99c20c05121ad0168a38ee |
|||
| msg339949 - (view) | Author: Anders Hovmöller (Anders.Hovmöller) * | Date: 2019年04月11日 09:50 | |
This was a really bad idea in my opinion. We just found this and we have no way to know how this will impact production. It's really absurd that
re.sub('(.*)', r'foo', 'asd')
is "foo" in python 1 to 3.6 but 'foofoo' in python 3.7.
|
|||
| msg339950 - (view) | Author: Anders Hovmöller (Anders.Hovmöller) * | Date: 2019年04月11日 09:57 | |
Just as a comparison, sed does the 3.6 thing: > echo foo | sed 's/\(.*\)/x1円y/g' xfooy |
|||
| msg339989 - (view) | Author: Matthew Barnett (mrabarnett) * (Python triager) | Date: 2019年04月11日 17:46 | |
It's now consistent with Perl, PCRE and .Net (C#), as well as re.split(), re.sub(), re.findall() and re.finditer(). |
|||
| msg340040 - (view) | Author: Anders Hovmöller (Anders.Hovmöller) * | Date: 2019年04月12日 13:33 | |
That might be true, but that seems like a weak argument. If anything, it means those others are broken. What is the logic behind "(.*)" returning the entire string (which is what you asked for) and exactly one empty string? Why not two empty strings? 3? 4? 5? Why not an empty string at the beginning? It makes no practical sense. We will have to spend considerable effort to work around this change and adapt our code to 3.7. The lack of a discussion about backwards compatibility in this, and the other, thread before making this change is also a problem I think. |
|||
| msg340102 - (view) | Author: Matthew Barnett (mrabarnett) * (Python triager) | Date: 2019年04月12日 19:53 | |
Consider re.findall(r'.{0,2}', 'abcde').
It finds 'ab', then continues where it left off to find 'cd', then 'e'.
It can also find ''; re.match(r'.*', '') does match, after all.
It could, in fact, an infinite number of ''.
And what about re.match(r'()*', '')?
What should it do? Run forever? Raise an exception?
At some point you have to make a decision as to what should happen, and the general consensus has been to match once.
|
|||
| msg360352 - (view) | Author: David Barnett (mu_mind) | Date: 2020年01月21日 04:56 | |
We were also bitten by this behavior change in https://github.com/google/vroom/issues/110. I'm kinda baffled by the new behavior and assumed it had to be an accidental regression, but I guess not. If you have any other context on the BDFL conversation and reasoning for calling this behavior correct, I'd love to see additional info. |
|||
| msg360355 - (view) | Author: Anders Hovmöller (Anders.Hovmöller) * | Date: 2020年01月21日 06:07 | |
We were also bitten by this. In fact we still run a compatibility shim in production where we log if the new and old behavior are different. We also didn't think this "bug fix" made sense or was treated with the appropriate gravity in the release notes. I understand the logic in the bug tracker and they it matches other languages is good. But the bahvior also makes no sense for the .* case unfortunately. > On 21 Jan 2020, at 05:56, David Barnett <report@bugs.python.org> wrote: > > > David Barnett <davidbarnett2@gmail.com> added the comment: > > We were also bitten by this behavior change in https://github.com/google/vroom/issues/110. I'm kinda baffled by the new behavior and assumed it had to be an accidental regression, but I guess not. If you have any other context on the BDFL conversation and reasoning for calling this behavior correct, I'd love to see additional info. > > ---------- > nosy: +mu_mind > > _______________________________________ > Python tracker <report@bugs.python.org> > <https://bugs.python.org/issue32308> > _______________________________________ |
|||
| msg366595 - (view) | Author: Mark Borgerding (Mark Borgerding) | Date: 2020年04月16日 12:57 | |
So third-party code was knowingly broken to satisfy an aesthetic notion that substitution should be more like iteration. Would not a FutureWarning have been a kinder way to stage this implementation? A foolish consistency, indeed. |
|||
| msg366602 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2020年04月16日 14:46 | |
The former implementation was wrong. See issue25054 which contains more obvious examples of that bug: >>> re.sub(r"\b|:+", "-", "a::bc") '-a-:-bc-' Not all colons were replaced despite the fact that the pattern matches all colons. |
|||
| msg366604 - (view) | Author: Mark Borgerding (Mark Borgerding) | Date: 2020年04月16日 14:59 | |
@serhiy.storchaka Thanks for the link to issue25054 to clarify this change was not done solely for aesthetics. Hopefully that will mollify others like me who find their way to this discussion as they try to figure out why their code broke with a new version of python. I wish it had been done in a more staged and overt way, but that is just spitting in the wind at this point. Thanks for all your work, my gripe du jour notwithstanding. |
|||
| msg366606 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2020年04月16日 16:28 | |
If the behavior is obviously wrong (like in issue25054), we can fix it without warnings, and even backport the fix to older versions, because we do not expect that anybody depends on such weird behavior. If we are going to change the behavior, but expect that users can depend on the current behavior, we emit a FutureWarning first (and we did it for other changes in re). But this issue is the hard one. Before 3.7 we did not know that it is related to issue25054. We were not going to change this behavior (at least not in near future). But when a fix for issue25054 was written we did see that it is the same issue. We did not want to keep a bug in issue25054 few versions more, so we changed the behavior in this issue without warnings. It was an exceptional case. This change was documented, in the module documentation, and in "What's New in Python 3.7" (section "Porting to Python 3.7"). If this is not enough we will be happy to get help to make it better. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:58:55 | admin | set | github: 76489 |
| 2020年06月22日 17:54:14 | ezio.melotti | link | issue41080 superseder |
| 2020年04月16日 16:28:26 | serhiy.storchaka | set | messages: + msg366606 |
| 2020年04月16日 14:59:31 | Mark Borgerding | set | messages: + msg366604 |
| 2020年04月16日 14:46:18 | serhiy.storchaka | set | messages: + msg366602 |
| 2020年04月16日 12:57:37 | Mark Borgerding | set | nosy:
+ Mark Borgerding messages: + msg366595 |
| 2020年01月21日 06:07:35 | Anders.Hovmöller | set | messages: + msg360355 |
| 2020年01月21日 04:56:04 | mu_mind | set | nosy:
+ mu_mind messages: + msg360352 |
| 2019年04月12日 19:53:42 | mrabarnett | set | messages: + msg340102 |
| 2019年04月12日 13:33:22 | Anders.Hovmöller | set | messages: + msg340040 |
| 2019年04月11日 17:46:30 | mrabarnett | set | messages: + msg339989 |
| 2019年04月11日 09:57:04 | Anders.Hovmöller | set | messages: + msg339950 |
| 2019年04月11日 09:50:19 | Anders.Hovmöller | set | nosy:
+ Anders.Hovmöller messages: + msg339949 |
| 2018年01月04日 09:06:40 | serhiy.storchaka | set | status: open -> closed resolution: fixed stage: patch review -> resolved |
| 2018年01月04日 09:06:15 | serhiy.storchaka | set | messages: + msg309458 |
| 2017年12月26日 10:14:59 | serhiy.storchaka | set | messages: + msg309055 |
| 2017年12月13日 18:34:24 | serhiy.storchaka | set | keywords:
+ patch stage: patch review pull_requests: + pull_request4734 |
| 2017年12月13日 18:28:38 | serhiy.storchaka | create | |