homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: re.VERBOSE whitespace behavior not completely documented
Type: enhancement Stage: resolved
Components: Documentation, Regular Expressions Versions: Python 3.7, Python 3.6, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: docs@python Nosy List: Kevin Shweh, docs@python, ezio.melotti, mrabarnett, roysmith, serhiy.storchaka, stevencollins, zach.ware
Priority: normal Keywords: patch

Created on 2012年08月09日 17:50 by stevencollins, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
re_whitespace.patch stevencollins, 2012年08月11日 19:27 Proposed patch for re.VERBOSE docs (whitespace behavior) review
Pull Requests
URL Status Linked Edit
PR 4366 merged serhiy.storchaka, 2017年11月10日 21:53
PR 4394 merged python-dev, 2017年11月14日 15:22
PR 4395 merged python-dev, 2017年11月14日 15:23
Messages (11)
msg167803 - (view) Author: Steven Collins (stevencollins) Date: 2012年08月09日 17:50
Given the way the documentation is written for re.VERBOSE - "Whitespace within the pattern is ignored, except when in a character class or preceded by an unescaped backslash" - I would expect all three of the findall() commands below to return successfully with the same result:
Python 3.2.3 (default, Jun 8 2012, 05:37:15) 
[GCC 4.7.0 20120507 (Red Hat 4.7.0-5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> re.findall('(?x) (?: a | b ) + ', 'abaabc')
['abaab']
>>> re.findall('(?x) (? : a | b ) + ', 'abaabc')
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "/usr/lib/python3.2/re.py", line 193, in findall
 return _compile(pattern, flags).findall(string)
 File "/usr/lib/python3.2/re.py", line 255, in _compile
 return _compile_typed(type(pattern), pattern, flags)
 File "/usr/lib/python3.2/functools.py", line 184, in wrapper
 result = user_function(*args, **kwds)
 File "/usr/lib/python3.2/re.py", line 267, in _compile_typed
 return sre_compile.compile(pattern, flags)
 File "/usr/lib/python3.2/sre_compile.py", line 491, in compile
 p = sre_parse.parse(p, flags)
 File "/usr/lib/python3.2/sre_parse.py", line 692, in parse
 p = _parse_sub(source, pattern, 0)
 File "/usr/lib/python3.2/sre_parse.py", line 315, in _parse_sub
 itemsappend(_parse(source, state))
 File "/usr/lib/python3.2/sre_parse.py", line 627, in _parse
 raise error("unexpected end of pattern")
sre_constants.error: unexpected end of pattern
>>> re.findall('(?x) ( ?: a | b ) + ', 'abaabc')
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "/usr/lib/python3.2/re.py", line 193, in findall
 return _compile(pattern, flags).findall(string)
 File "/usr/lib/python3.2/re.py", line 255, in _compile
 return _compile_typed(type(pattern), pattern, flags)
 File "/usr/lib/python3.2/functools.py", line 184, in wrapper
 result = user_function(*args, **kwds)
 File "/usr/lib/python3.2/re.py", line 267, in _compile_typed
 return sre_compile.compile(pattern, flags)
 File "/usr/lib/python3.2/sre_compile.py", line 491, in compile
 p = sre_parse.parse(p, flags)
 File "/usr/lib/python3.2/sre_parse.py", line 692, in parse
 p = _parse_sub(source, pattern, 0)
 File "/usr/lib/python3.2/sre_parse.py", line 315, in _parse_sub
 itemsappend(_parse(source, state))
 File "/usr/lib/python3.2/sre_parse.py", line 640, in _parse
 p = _parse_sub(source, state)
 File "/usr/lib/python3.2/sre_parse.py", line 315, in _parse_sub
 itemsappend(_parse(source, state))
 File "/usr/lib/python3.2/sre_parse.py", line 520, in _parse
 raise error("nothing to repeat")
sre_constants.error: nothing to repeat
>>> 
The behavior is the same in Python 2.7. Apparently the scan for the special '(?' character sequences happens before the whitespace is stripped out. In my opinion, the behavior should be changed, the documentation should be more clear about the current behavior, or at least the errors given should be more informative (I spent an hour or two debugging the "nothing to repeat" error in my work yesterday.) Thank you.
msg167890 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2012年08月10日 16:37
Ideally, yes, that whitespace should be ignored.
The question is whether it's worth fixing the code for the small case of when there's whitespace within "tokens", such as within "(?:". Usually those who use verbose mode use whitespace as in the first example rather than the second or third examples.
msg167999 - (view) Author: Steven Collins (stevencollins) Date: 2012年08月11日 19:27
Fair enough, but in that case I still think the current behavior should be documented. Attached is a possible patch. (This is my first interaction with the Python issue tracker, by the way; apologies if I ought to have set some field differently or left some other field alone.)
msg181928 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013年02月11日 19:53
See also related issue11204.
msg182174 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2013年02月15日 21:08
See also #17184.
msg305158 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017年10月28日 11:49
Steven, would you mind to update your patch according to review comments and create a pull request on GitHub?
msg306039 - (view) Author: Kevin Shweh (Kevin Shweh) Date: 2017年11月10日 17:19
It looks to me like there are more situations than the patch lists where whitespace still separates tokens. For example, *? is a reluctant quantifier and * ? is a syntax error, even in verbose mode.
msg306050 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017年11月10日 21:57
Steven's patch is outdated since 71a0b43854164b6ada0026d90f241c987b54d019. But that commit missed that spaces are not ignored within tokens. PR 4366 fixes this by using the wording from Ezio's comments.
msg306216 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017年11月14日 15:21
New changeset b0b44b4b3337297007f5ef87220a75df204399f8 by Serhiy Storchaka in branch 'master':
bpo-15606: Improve the re.VERBOSE documentation. (#4366)
https://github.com/python/cpython/commit/b0b44b4b3337297007f5ef87220a75df204399f8
msg306217 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017年11月14日 15:38
New changeset 14c1fe682f0086ec28f24fee9bf1c85d80507ee5 by Serhiy Storchaka (Miss Islington (bot)) in branch '3.6':
bpo-15606: Improve the re.VERBOSE documentation. (GH-4366) (#4394)
https://github.com/python/cpython/commit/14c1fe682f0086ec28f24fee9bf1c85d80507ee5
msg306218 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017年11月14日 15:39
New changeset a2f1be0b5ba2bed49b7f94c026b541ff07e52518 by Serhiy Storchaka (Miss Islington (bot)) in branch '2.7':
bpo-15606: Improve the re.VERBOSE documentation. (GH-4366) (#4395)
https://github.com/python/cpython/commit/a2f1be0b5ba2bed49b7f94c026b541ff07e52518
History
Date User Action Args
2022年04月11日 14:57:34adminsetgithub: 59811
2017年11月14日 15:39:50serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2017年11月14日 15:39:06serhiy.storchakasetmessages: + msg306218
2017年11月14日 15:38:52serhiy.storchakasetmessages: + msg306217
2017年11月14日 15:23:36python-devsetpull_requests: + pull_request4343
2017年11月14日 15:22:43python-devsetpull_requests: + pull_request4342
2017年11月14日 15:21:28serhiy.storchakasetmessages: + msg306216
2017年11月10日 21:57:34serhiy.storchakasetnosy: + zach.ware
messages: + msg306050
2017年11月10日 21:53:19serhiy.storchakasetstage: needs patch -> patch review
pull_requests: + pull_request4319
2017年11月10日 17:19:46Kevin Shwehsetnosy: + Kevin Shweh
messages: + msg306039
2017年10月28日 11:49:45serhiy.storchakasetstage: patch review -> needs patch
messages: + msg305158
versions: + Python 2.7, Python 3.6, Python 3.7, - Python 3.3
2013年02月15日 21:08:52ezio.melottisetmessages: + msg182174
2013年02月15日 21:08:18ezio.melottilinkissue17184 superseder
2013年02月11日 19:53:17serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg181928
2013年02月11日 19:42:49roysmithsetnosy: + roysmith
2012年09月15日 23:32:10ezio.melottisetstage: patch review
2012年08月11日 19:27:39stevencollinssetfiles: + re_whitespace.patch


assignee: docs@python
keywords: + patch
versions: + Python 3.3, - Python 2.7, Python 3.2
nosy: + docs@python
title: re.VERBOSE doesn't ignore certain whitespace -> re.VERBOSE whitespace behavior not completely documented
messages: + msg167999
components: + Documentation
type: behavior -> enhancement
2012年08月10日 16:37:05mrabarnettsetmessages: + msg167890
2012年08月09日 17:50:17stevencollinscreate

AltStyle によって変換されたページ (->オリジナル) /