homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: RecursionError in re with '(' * 500
Type: behavior Stage: resolved
Components: Library (Lib), Regular Expressions Versions: Python 3.6, Python 3.4, Python 3.5
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: The Compiler, effbot, ezio.melotti, mrabarnett, pitrou, serhiy.storchaka, terry.reedy
Priority: normal Keywords:

Created on 2015年11月04日 09:02 by The Compiler, last changed 2022年04月11日 14:58 by admin. This issue is now closed.

Messages (7)
msg254040 - (view) Author: Florian Bruhin (The Compiler) * Date: 2015年11月04日 09:02
I just found this thanks to Hypothesis[1]:
 >>> import re
 >>> re.compile('(' * 500)
 Traceback (most recent call last): 
 File "<stdin>", line 1, in <module>
 File "/usr/lib/python3.5/re.py", line 224, in compile
 return _compile(pattern, flags)
 File "/usr/lib/python3.5/re.py", line 293, in _compile
 p = sre_compile.compile(pattern, flags)
 File "/usr/lib/python3.5/sre_compile.py", line 536, in compile
 p = sre_parse.parse(p, flags)
 File "/usr/lib/python3.5/sre_parse.py", line 829, in parse
 p = _parse_sub(source, pattern, 0)
 File "/usr/lib/python3.5/sre_parse.py", line 437, in _parse_sub
 itemsappend(_parse(source, state))
 File "/usr/lib/python3.5/sre_parse.py", line 778, in _parse
 p = _parse_sub(source, state)
 File "/usr/lib/python3.5/sre_parse.py", line 437, in _parse_sub
 itemsappend(_parse(source, state))
 File "/usr/lib/python3.5/sre_parse.py", line 778, in _parse
 p = _parse_sub(source, state)
 File "/usr/lib/python3.5/sre_parse.py", line 437, in _parse_sub
 itemsappend(_parse(source, state))
 [...]
 File "/usr/lib/python3.5/sre_parse.py", line 493, in _parse
 subpattern = SubPattern(state)
 RecursionError: maximum recursion depth exceeded
It seems a maximum recursion error has been treated as a bug before (like in issue401612), so I'm guessing this isn't intended here either.
[1] https://hypothesis.readthedocs.org/ 
msg254042 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015年11月04日 09:27
This isn't a bug, this is a limitation of the implementation. Regular expression parser is recursive and Python has a limit for recursion depth. You can increase this limit if needed.
>>> import sys, re
>>> sys.setrecursionlimit(2000)
>>> re.compile('(' * 500)
Traceback (most recent call last):
 File "/home/serhiy/py/cpython/Lib/sre_parse.py", line 437, in _parse_sub
 itemsappend(_parse(source, state))
 File "/home/serhiy/py/cpython/Lib/sre_parse.py", line 778, in _parse
 p = _parse_sub(source, state)
 File "/home/serhiy/py/cpython/Lib/sre_parse.py", line 437, in _parse_sub
 itemsappend(_parse(source, state))
...
 File "/home/serhiy/py/cpython/Lib/sre_parse.py", line 778, in _parse
 p = _parse_sub(source, state)
 File "/home/serhiy/py/cpython/Lib/sre_parse.py", line 437, in _parse_sub
 itemsappend(_parse(source, state))
 File "/home/serhiy/py/cpython/Lib/sre_parse.py", line 781, in _parse
 source.tell() - start)
sre_constants.error: missing ), unterminated subpattern at position 499
msg254046 - (view) Author: Florian Bruhin (The Compiler) * Date: 2015年11月04日 10:26
I see how it's not possible to compile that pattern as it's using recursion - I don't mind that.
However, I think this should be handled and re-raised as a re.error ("Exception raised [...] when some other error occurs during compilation or matching.").
I think no matter what the pattern is, it's quite unexpected to get anything other than a re.error (or a TypeError) from re.compile.
msg254060 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015年11月04日 15:54
re.compile can also raise ValueError and OverflowError. Agree that may be these exceptions can be converted to re.error.
But it can also raise MemoryError, and KeyboardInterrupt. And these exceptions can't be converted to re.error.
RecursionError is closer to the latter case. It depends not only on the pattern itself, but on the place where it is called. A pattern that is compiled successfully at top level can produce a RecursionError if it is compiled deeply in the recursion.
>>> import re
>>> def r(n):
... if not n:
... return re.compile('x')
... return r(n-1)
... 
>>> r(990)
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "<stdin>", line 4, in r
 File "<stdin>", line 4, in r
...
 File "<stdin>", line 4, in r
 File "<stdin>", line 3, in r
 File "/home/serhiy/py/cpython/Lib/re.py", line 224, in compile
 return _compile(pattern, flags)
 File "/home/serhiy/py/cpython/Lib/re.py", line 293, in _compile
 p = sre_compile.compile(pattern, flags)
 File "/home/serhiy/py/cpython/Lib/sre_compile.py", line 555, in compile
 p = sre_parse.parse(p, flags)
 File "/home/serhiy/py/cpython/Lib/sre_parse.py", line 822, in parse
 source = Tokenizer(str)
 File "/home/serhiy/py/cpython/Lib/sre_parse.py", line 225, in __init__
 self.__next()
RecursionError: maximum recursion depth exceeded
>>> re.compile('x')
re.compile('x')
msg254236 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2015年11月06日 22:22
Is the recursion limit hit enough in real-world problems that we should consider replacing recursive calls and the implicit C call stack with loops and an explicit stack?
msg254261 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015年11月07日 07:52
I think it is possible to make the parser non-recursive, but without reports about such real-world problems it looks premature and just will complicate the code. For years before 3.5 there was a limitation on only 100 capturing groups.
msg278752 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016年10月16日 08:25
Closed as not a bug. Unlike to issue401612 this is not an infinite recursion. Raising RecursionError depends on the context (it is more likely in deeply nested call) and there is a common workaround -- just increase maximum recursion depth using sys.setrecursionlimit(). There is nothing specific to regular expressions.
History
Date User Action Args
2022年04月11日 14:58:23adminsetgithub: 69736
2016年10月16日 08:25:41serhiy.storchakasetstatus: open -> closed
resolution: not a bug
messages: + msg278752

stage: resolved
2015年11月07日 07:52:41serhiy.storchakasetmessages: + msg254261
2015年11月06日 22:22:05terry.reedysetnosy: + terry.reedy
messages: + msg254236
2015年11月04日 15:54:55serhiy.storchakasetmessages: + msg254060
2015年11月04日 10:26:25The Compilersetmessages: + msg254046
2015年11月04日 09:27:11serhiy.storchakasetmessages: + msg254042
2015年11月04日 09:07:10SilentGhostsetnosy: + mrabarnett

components: + Regular Expressions
versions: + Python 3.6
2015年11月04日 09:02:56The Compilercreate

AltStyle によって変換されたページ (->オリジナル) /