This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2012年08月09日 17:50 by stevencollins, last changed 2022年04月11日 14:57 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| re_whitespace.patch | stevencollins, 2012年08月11日 19:27 | Proposed patch for re.VERBOSE docs (whitespace behavior) | review | |
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 4366 | merged | serhiy.storchaka, 2017年11月10日 21:53 | |
| PR 4394 | merged | python-dev, 2017年11月14日 15:22 | |
| PR 4395 | merged | python-dev, 2017年11月14日 15:23 | |
| Messages (11) | |||
|---|---|---|---|
| msg167803 - (view) | Author: Steven Collins (stevencollins) | Date: 2012年08月09日 17:50 | |
Given the way the documentation is written for re.VERBOSE - "Whitespace within the pattern is ignored, except when in a character class or preceded by an unescaped backslash" - I would expect all three of the findall() commands below to return successfully with the same result:
Python 3.2.3 (default, Jun 8 2012, 05:37:15)
[GCC 4.7.0 20120507 (Red Hat 4.7.0-5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> re.findall('(?x) (?: a | b ) + ', 'abaabc')
['abaab']
>>> re.findall('(?x) (? : a | b ) + ', 'abaabc')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.2/re.py", line 193, in findall
return _compile(pattern, flags).findall(string)
File "/usr/lib/python3.2/re.py", line 255, in _compile
return _compile_typed(type(pattern), pattern, flags)
File "/usr/lib/python3.2/functools.py", line 184, in wrapper
result = user_function(*args, **kwds)
File "/usr/lib/python3.2/re.py", line 267, in _compile_typed
return sre_compile.compile(pattern, flags)
File "/usr/lib/python3.2/sre_compile.py", line 491, in compile
p = sre_parse.parse(p, flags)
File "/usr/lib/python3.2/sre_parse.py", line 692, in parse
p = _parse_sub(source, pattern, 0)
File "/usr/lib/python3.2/sre_parse.py", line 315, in _parse_sub
itemsappend(_parse(source, state))
File "/usr/lib/python3.2/sre_parse.py", line 627, in _parse
raise error("unexpected end of pattern")
sre_constants.error: unexpected end of pattern
>>> re.findall('(?x) ( ?: a | b ) + ', 'abaabc')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.2/re.py", line 193, in findall
return _compile(pattern, flags).findall(string)
File "/usr/lib/python3.2/re.py", line 255, in _compile
return _compile_typed(type(pattern), pattern, flags)
File "/usr/lib/python3.2/functools.py", line 184, in wrapper
result = user_function(*args, **kwds)
File "/usr/lib/python3.2/re.py", line 267, in _compile_typed
return sre_compile.compile(pattern, flags)
File "/usr/lib/python3.2/sre_compile.py", line 491, in compile
p = sre_parse.parse(p, flags)
File "/usr/lib/python3.2/sre_parse.py", line 692, in parse
p = _parse_sub(source, pattern, 0)
File "/usr/lib/python3.2/sre_parse.py", line 315, in _parse_sub
itemsappend(_parse(source, state))
File "/usr/lib/python3.2/sre_parse.py", line 640, in _parse
p = _parse_sub(source, state)
File "/usr/lib/python3.2/sre_parse.py", line 315, in _parse_sub
itemsappend(_parse(source, state))
File "/usr/lib/python3.2/sre_parse.py", line 520, in _parse
raise error("nothing to repeat")
sre_constants.error: nothing to repeat
>>>
The behavior is the same in Python 2.7. Apparently the scan for the special '(?' character sequences happens before the whitespace is stripped out. In my opinion, the behavior should be changed, the documentation should be more clear about the current behavior, or at least the errors given should be more informative (I spent an hour or two debugging the "nothing to repeat" error in my work yesterday.) Thank you.
|
|||
| msg167890 - (view) | Author: Matthew Barnett (mrabarnett) * (Python triager) | Date: 2012年08月10日 16:37 | |
Ideally, yes, that whitespace should be ignored. The question is whether it's worth fixing the code for the small case of when there's whitespace within "tokens", such as within "(?:". Usually those who use verbose mode use whitespace as in the first example rather than the second or third examples. |
|||
| msg167999 - (view) | Author: Steven Collins (stevencollins) | Date: 2012年08月11日 19:27 | |
Fair enough, but in that case I still think the current behavior should be documented. Attached is a possible patch. (This is my first interaction with the Python issue tracker, by the way; apologies if I ought to have set some field differently or left some other field alone.) |
|||
| msg181928 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2013年02月11日 19:53 | |
See also related issue11204. |
|||
| msg182174 - (view) | Author: Ezio Melotti (ezio.melotti) * (Python committer) | Date: 2013年02月15日 21:08 | |
See also #17184. |
|||
| msg305158 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2017年10月28日 11:49 | |
Steven, would you mind to update your patch according to review comments and create a pull request on GitHub? |
|||
| msg306039 - (view) | Author: Kevin Shweh (Kevin Shweh) | Date: 2017年11月10日 17:19 | |
It looks to me like there are more situations than the patch lists where whitespace still separates tokens. For example, *? is a reluctant quantifier and * ? is a syntax error, even in verbose mode. |
|||
| msg306050 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2017年11月10日 21:57 | |
Steven's patch is outdated since 71a0b43854164b6ada0026d90f241c987b54d019. But that commit missed that spaces are not ignored within tokens. PR 4366 fixes this by using the wording from Ezio's comments. |
|||
| msg306216 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2017年11月14日 15:21 | |
New changeset b0b44b4b3337297007f5ef87220a75df204399f8 by Serhiy Storchaka in branch 'master': bpo-15606: Improve the re.VERBOSE documentation. (#4366) https://github.com/python/cpython/commit/b0b44b4b3337297007f5ef87220a75df204399f8 |
|||
| msg306217 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2017年11月14日 15:38 | |
New changeset 14c1fe682f0086ec28f24fee9bf1c85d80507ee5 by Serhiy Storchaka (Miss Islington (bot)) in branch '3.6': bpo-15606: Improve the re.VERBOSE documentation. (GH-4366) (#4394) https://github.com/python/cpython/commit/14c1fe682f0086ec28f24fee9bf1c85d80507ee5 |
|||
| msg306218 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2017年11月14日 15:39 | |
New changeset a2f1be0b5ba2bed49b7f94c026b541ff07e52518 by Serhiy Storchaka (Miss Islington (bot)) in branch '2.7': bpo-15606: Improve the re.VERBOSE documentation. (GH-4366) (#4395) https://github.com/python/cpython/commit/a2f1be0b5ba2bed49b7f94c026b541ff07e52518 |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:34 | admin | set | github: 59811 |
| 2017年11月14日 15:39:50 | serhiy.storchaka | set | status: open -> closed resolution: fixed stage: patch review -> resolved |
| 2017年11月14日 15:39:06 | serhiy.storchaka | set | messages: + msg306218 |
| 2017年11月14日 15:38:52 | serhiy.storchaka | set | messages: + msg306217 |
| 2017年11月14日 15:23:36 | python-dev | set | pull_requests: + pull_request4343 |
| 2017年11月14日 15:22:43 | python-dev | set | pull_requests: + pull_request4342 |
| 2017年11月14日 15:21:28 | serhiy.storchaka | set | messages: + msg306216 |
| 2017年11月10日 21:57:34 | serhiy.storchaka | set | nosy:
+ zach.ware messages: + msg306050 |
| 2017年11月10日 21:53:19 | serhiy.storchaka | set | stage: needs patch -> patch review pull_requests: + pull_request4319 |
| 2017年11月10日 17:19:46 | Kevin Shweh | set | nosy:
+ Kevin Shweh messages: + msg306039 |
| 2017年10月28日 11:49:45 | serhiy.storchaka | set | stage: patch review -> needs patch messages: + msg305158 versions: + Python 2.7, Python 3.6, Python 3.7, - Python 3.3 |
| 2013年02月15日 21:08:52 | ezio.melotti | set | messages: + msg182174 |
| 2013年02月15日 21:08:18 | ezio.melotti | link | issue17184 superseder |
| 2013年02月11日 19:53:17 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages: + msg181928 |
| 2013年02月11日 19:42:49 | roysmith | set | nosy:
+ roysmith |
| 2012年09月15日 23:32:10 | ezio.melotti | set | stage: patch review |
| 2012年08月11日 19:27:39 | stevencollins | set | files:
+ re_whitespace.patch assignee: docs@python keywords: + patch versions: + Python 3.3, - Python 2.7, Python 3.2 nosy: + docs@python title: re.VERBOSE doesn't ignore certain whitespace -> re.VERBOSE whitespace behavior not completely documented messages: + msg167999 components: + Documentation type: behavior -> enhancement |
| 2012年08月10日 16:37:05 | mrabarnett | set | messages: + msg167890 |
| 2012年08月09日 17:50:17 | stevencollins | create | |