This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2008年07月02日 22:07 by mrabarnett, last changed 2022年04月11日 14:56 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| split_zero_width.diff | mrabarnett, 2008年07月03日 00:59 | |||
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 4471 | merged | serhiy.storchaka, 2017年11月19日 23:36 | |
| PR 4678 | closed | serhiy.storchaka, 2017年12月02日 17:32 | |
| Messages (15) | |||
|---|---|---|---|
| msg69134 - (view) | Author: Matthew Barnett (mrabarnett) * (Python triager) | Date: 2008年07月02日 22:07 | |
re.split doesn't split a string when the regex matches a zero characters. For example: re.split(r'\b', 'a b') returns ['a b'] instead of ['', 'a', ' ', 'b', '']. re.split(r'(?<!\w)(?=\w)', 'a b') returns ['a b'] instead of ['', 'a ', 'b']. |
|||
| msg69139 - (view) | Author: Matthew Barnett (mrabarnett) * (Python triager) | Date: 2008年07月02日 22:51 | |
The attached patch appears to work. |
|||
| msg69146 - (view) | Author: Guido van Rossum (gvanrossum) * (Python committer) | Date: 2008年07月02日 23:28 | |
Probably by design. There's probably even a unittest for this behavior. |
|||
| msg69150 - (view) | Author: Matthew Barnett (mrabarnett) * (Python triager) | Date: 2008年07月02日 23:57 | |
I've found that this issue has been discussed before: #988761. |
|||
| msg69157 - (view) | Author: Matthew Barnett (mrabarnett) * (Python triager) | Date: 2008年07月03日 00:59 | |
New patch version after studying #988761 and doing more testing. |
|||
| msg69408 - (view) | Author: Mike Coleman (mkc) | Date: 2008年07月08日 02:36 | |
I don't want to discourage you, but #852532, which is essentially the same bug report, was closed--without explanation--as 'wont fix' in April, after four-plus years. I wish you good luck--this is an important and irritating bug, in my opinion... |
|||
| msg69438 - (view) | Author: Matthew Barnett (mrabarnett) * (Python triager) | Date: 2008年07月08日 16:39 | |
There appear to be 2 opinions on this issue: 1. It's a bug, a corner case that got missed. 2. It's always been like this, so it's probably a design decision, although no-one can't point to where or when the decision was made... Looking at the code, I think it's a bug. Expected behaviour: if 'pattern' is a non-capturing regex, then re.split(pattern, text) == re.sub(pattern, MARKER, text).split(MARKER). |
|||
| msg69852 - (view) | Author: Mike Coleman (mkc) | Date: 2008年07月16日 22:40 | |
I think it's probably both. The original design was incorrect, though this probably wasn't apparent to the designer. But as a significant user of 're', it really stands out as a problem. |
|||
| msg70749 - (view) | Author: Guido van Rossum (gvanrossum) * (Python committer) | Date: 2008年08月05日 16:08 | |
I think it's better to leave this alone. Such a subtle change is likely to trip over more people in worse ways than the alleged "bug". |
|||
| msg70752 - (view) | Author: Mike Coleman (mkc) | Date: 2008年08月05日 16:18 | |
Okay. For what it's worth, note that my original 2004 patch for this (#988761) is completely backward compatible (a flag must be set in the call to get the new behavior). |
|||
| msg73523 - (view) | Author: Matthew Barnett (mrabarnett) * (Python triager) | Date: 2008年09月21日 19:41 | |
I wonder whether it could be put into Python 3 where certain breaks in backwards compatibility are to be expected. |
|||
| msg73567 - (view) | Author: Jeffrey C. Jacobs (timehorse) | Date: 2008年09月22日 11:54 | |
I think Mike Coleman proposal of enabling this behaviour via flag is probably best and IMHO we should consider it under these circumstances. Intuitively, I think you're interpretation of what re.split should do under zero-width conditions is logical, and I almost think this should be a 2-minor number transition à la from __future__ import zeroWidthRegexpSplit if we are to consider it as the long-term 'right thing to do'. 3000 (3.0) seems a good place to also consider it for true overhaul / reexamination, especially as we are writing 'upgrade' scripts for many of the other Python features. However, I would say this, Guido has spoken and it may be too late for the pebbles to vote. I would like to add this patch as a new item to the general Regexp Enhancements thread of issue 2636 though, as I think it is an idea worth considering when overhauling Regexp. |
|||
| msg73592 - (view) | Author: Guido van Rossum (gvanrossum) * (Python committer) | Date: 2008年09月22日 20:39 | |
The problem with doing this per 3.0 is that it's impossible to write a conversion script. I'm okay with adding a flag to enable this behavior though. Please open a new bug with a new patch, preferably one that applies cleanly to the trunk, and a separate patch for the py3k branch unless the trunk patch merges cleanly. There should also be unittests and documentation. The patches should be marked for Python 2.7 and 3.1 -- it's way too late to get this into 2.6 and 3.0. |
|||
| msg104226 - (view) | Author: Tim Pietzcker (pietzcker) | Date: 2010年04月26日 12:29 | |
Sorry to revive this dormant (?) topic - has anybody brought this any further? This "feature" has tripped me up a few times, and I would be all for adding a flag to enable the "split on zero-size matches" behavior, but I myself am not competent enough to code a patch. |
|||
| msg104257 - (view) | Author: Matthew Barnett (mrabarnett) * (Python triager) | Date: 2010年04月26日 17:31 | |
You could try the regex module mentioned in issue 2636. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:56:36 | admin | set | github: 47512 |
| 2021年11月04日 14:19:04 | eryksun | set | nosy:
- ahmedsayeed1982 |
| 2021年11月04日 14:18:56 | eryksun | set | messages: - msg405692 |
| 2021年11月04日 12:09:24 | ahmedsayeed1982 | set | versions:
- Python 2.6, Python 2.5, Python 3.1 nosy: + ahmedsayeed1982, - gvanrossum, mkc, timehorse, filip, pietzcker, mrabarnett messages: + msg405692 components: + Tests, - Regular Expressions |
| 2017年12月02日 17:32:37 | serhiy.storchaka | set | pull_requests: + pull_request4589 |
| 2017年11月19日 23:36:58 | serhiy.storchaka | set | pull_requests: + pull_request4406 |
| 2010年08月04日 05:05:56 | terry.reedy | set | status: open -> closed |
| 2010年04月26日 17:31:46 | mrabarnett | set | messages: + msg104257 |
| 2010年04月26日 12:29:45 | pietzcker | set | nosy:
+ pietzcker messages: + msg104226 versions: + Python 2.6, Python 3.1, Python 2.7 |
| 2008年09月22日 20:40:00 | gvanrossum | set | messages: + msg73592 |
| 2008年09月22日 11:54:30 | timehorse | set | messages: + msg73567 |
| 2008年09月21日 19:41:19 | mrabarnett | set | messages: + msg73523 |
| 2008年09月21日 11:58:49 | timehorse | set | nosy: + timehorse |
| 2008年08月05日 16:18:46 | mkc | set | messages: + msg70752 |
| 2008年08月05日 16:08:32 | gvanrossum | set | resolution: rejected messages: + msg70749 |
| 2008年07月16日 22:40:59 | mkc | set | messages: + msg69852 |
| 2008年07月08日 16:39:18 | mrabarnett | set | messages: + msg69438 |
| 2008年07月08日 02:36:23 | mkc | set | messages: + msg69408 |
| 2008年07月08日 02:20:49 | mkc | set | nosy: + mkc |
| 2008年07月07日 11:40:01 | filip | set | nosy: + filip |
| 2008年07月03日 00:59:38 | mrabarnett | set | files: - split_zero_width.diff |
| 2008年07月03日 00:59:01 | mrabarnett | set | files:
+ split_zero_width.diff messages: + msg69157 |
| 2008年07月02日 23:57:16 | mrabarnett | set | messages: + msg69150 |
| 2008年07月02日 23:28:53 | gvanrossum | set | nosy:
+ gvanrossum messages: + msg69146 |
| 2008年07月02日 22:51:51 | mrabarnett | set | files:
+ split_zero_width.diff keywords: + patch messages: + msg69139 |
| 2008年07月02日 22:07:48 | mrabarnett | create | |