This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2007年02月17日 23:39 by gregory.p.smith, last changed 2022年04月11日 14:56 by admin. This issue is now closed.
| Messages (17) | |||
|---|---|---|---|
| msg55016 - (view) | Author: Gregory P. Smith (gregory.p.smith) * (Python committer) | Date: 2007年02月17日 23:39 | |
in short, the re module can degenerate to really really horrid performance. See this for how and why: http://swtch.com/~rsc/regexp/regexp1.html exponential decline instead of squared. I don't have a patch so i'm filing this bug as a starting point for future work. The Modules/_sre.c files implementation could be updated to use the parallel stepping Thompson approach instead of recursive backtracking. filing this as a bug until me or someone else comes up with a patch. |
|||
| msg55017 - (view) | Author: Josiah Carlson (josiahcarlson) * (Python triager) | Date: 2007年02月22日 08:51 | |
I would file this under "feature request"; the current situation isn't so much buggy, as slow. While you can produce a segfault with the current regular expression engine (due to stack overflow), you can do the same thing with regular Python on Linux (with sys.setrecursionlimit), ctypes, etc., and none of those are considered as buggy. My only concern with such a change is that it may or may not change the semantics of the repeat operators '*' and '+', which are currently defined as "greedy". If I skimmed the article correctly late at night, switching to a Thompson family regular expression engine may result in those operators no longer being greedy. Please correct me if I am wrong. |
|||
| msg55018 - (view) | Author: Gregory P. Smith (gregory.p.smith) * (Python committer) | Date: 2007年02月22日 22:30 | |
yeah this is better as a feature request. certianly low priority either way. -nothing- I propose doing would change the syntax or behaviour of existing regular expressions at all. Doing so would be a disaster. thompson nfa does not imply changing the behaviour. anyways its a lot more than a simple "patch" to change the re module to not use backtracking so i expect this to languish unless someone has a of free time and motivation all at once. :) |
|||
| msg56256 - (view) | Author: Aaron Swartz (aaronsw) | Date: 2007年10月07日 13:55 | |
Just a note for those who think this is a purely theoretical issue: We've been using the python-markdown module on our web app for a while, only to notice the app has been repeatedly going down. After tracking down the culprit, we found that a speech from Hamlet passed to one of the Markdown regular expressions caused this exponential behavior, freezing up the app. |
|||
| msg59627 - (view) | Author: Ralf Schmitt (schmir) | Date: 2008年01月09日 21:26 | |
here are two other bug reports about the same issue: http://bugs.python.org/issue1448325 http://bugs.python.org/issue1566086 |
|||
| msg69491 - (view) | Author: Yarko Tymciurak (yarkot) | Date: 2008年07月09日 23:05 | |
Not sure if this is a real-world case of this in particular, but possibly: http://groups.google.com/group/web2py/browse_thread/thread/59ff2e31698bced6/9bbae2d482d11b88 |
|||
| msg81479 - (view) | Author: Matthew Barnett (mrabarnett) * (Python triager) | Date: 2009年02月09日 19:45 | |
This has been addressed in issue #2636. |
|||
| msg81484 - (view) | Author: Mark Dickinson (mark.dickinson) * (Python committer) | Date: 2009年02月09日 20:23 | |
> This has been addressed in issue #2636. Are you sure about this? Does the proposed new regex engine use Thompson NFAs, or some variant thereof? |
|||
| msg81516 - (view) | Author: Matthew Barnett (mrabarnett) * (Python triager) | Date: 2009年02月09日 23:40 | |
The new code includes some extra checks which, although not foolproof, certainly reduce the amount of backtracking in a lot of cases. |
|||
| msg94319 - (view) | Author: Dan Helfman (witten) | Date: 2009年10月21日 19:28 | |
Here's an easy way to trigger the poor performance. Tested with 2.5, 2.6, and 2.7: re.compile( '(\s+.*)*x' ).search( 'a ' * 30 ) |
|||
| msg94322 - (view) | Author: Matthew Barnett (mrabarnett) * (Python triager) | Date: 2009年10月21日 20:24 | |
I'm still tinkering with my regex engine (issue #2636). Some timings: re.compile(r'(\s+.*)*x').search('a ' * 25) 20.23secs regex.compile(r'(\s+.*)*x').search('a ' * 25) 0.10secs |
|||
| msg129423 - (view) | Author: Terry J. Reedy (terry.reedy) * (Python committer) | Date: 2011年02月25日 20:42 | |
Another example from #11307 import re r = re.compile(r'(\w+)*=.*') r.match("abcdefghijklmnopqrstuvwxyz") |
|||
| msg176467 - (view) | Author: Mark Dickinson (mark.dickinson) * (Python committer) | Date: 2012年11月27日 08:47 | |
Given the number of times this comes up, I think it's a least worth an upgrade from 'low' priority to 'normal' priority. |
|||
| msg189352 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2013年05月16日 10:49 | |
Note this can be used for denials of service: see http://bugs.python.org/issue17980 |
|||
| msg189405 - (view) | Author: Gregory P. Smith (gregory.p.smith) * (Python committer) | Date: 2013年05月16日 20:27 | |
The recommendation for anyone using regular expressions on hostile input is to (a) don't do that. (b) use a better regexp without this possible behavior and (c) use something like re2 (there's a Python binding at https://github.com/axiak/pyre2) which is a regular expression engine that this cannot happen to. fixing this within python requires a complete rewrite and replacement of the re module with one that uses a different approach. see the other work on the MRAB regex module and discussion surrounding that. that is a non trivial task and it is fixing other more important things (unicode correctness!) than this... Given that, I don't actually expect this issue to ever be fixed. IMNSHO: People shouldn't abuse regexes and get themselves into this situation in the first place. ;) discussion should really happen on python-ideas. |
|||
| msg292245 - (view) | Author: Gregory P. Smith (gregory.p.smith) * (Python committer) | Date: 2017年04月24日 23:40 | |
Note that https://pypi.python.org/pypi/re2 exists today as well and offers a re module compatible interface. I haven't tried it. |
|||
| msg390808 - (view) | Author: Gregory P. Smith (gregory.p.smith) * (Python committer) | Date: 2021年04月12日 01:41 | |
https://pypi.org/project/pyre2/ seems to be a maintained version of that for use on modern Python interpreters. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:56:22 | admin | set | github: 44592 |
| 2021年04月12日 01:45:43 | gregory.p.smith | link | issue43686 superseder |
| 2021年04月12日 01:41:22 | gregory.p.smith | set | messages: + msg390808 |
| 2017年04月24日 23:40:02 | gregory.p.smith | set | messages: + msg292245 |
| 2017年04月24日 23:39:26 | gregory.p.smith | set | versions: + Python 3.7, - Python 3.4 |
| 2013年05月16日 20:27:53 | gregory.p.smith | set | status: open -> closed resolution: wont fix messages: + msg189405 |
| 2013年05月16日 10:49:32 | pitrou | set | nosy:
+ pitrou messages: + msg189352 |
| 2012年11月27日 08:47:23 | mark.dickinson | set | priority: low -> normal messages: + msg176467 versions: + Python 3.4, - Python 3.3 |
| 2012年11月27日 08:45:44 | mark.dickinson | link | issue16563 superseder |
| 2012年11月07日 17:43:13 | mark.dickinson | link | issue13723 superseder |
| 2012年11月07日 17:35:54 | mark.dickinson | link | issue16430 superseder |
| 2011年02月25日 20:42:44 | terry.reedy | link | issue11307 superseder |
| 2011年02月25日 20:42:32 | terry.reedy | set | nosy:
+ terry.reedy messages: + msg129423 versions: + Python 3.3, - Python 2.7 |
| 2010年07月29日 13:28:00 | georg.brandl | link | issue9414 superseder |
| 2009年10月21日 20:24:25 | mrabarnett | set | messages: + msg94322 |
| 2009年10月21日 19:28:42 | witten | set | nosy:
+ witten messages: + msg94319 |
| 2009年02月09日 23:40:58 | mrabarnett | set | messages: + msg81516 |
| 2009年02月09日 20:23:05 | mark.dickinson | set | nosy:
+ mark.dickinson messages: + msg81484 |
| 2009年02月09日 19:45:07 | mrabarnett | set | nosy:
+ mrabarnett messages: + msg81479 |
| 2008年09月27日 14:42:27 | timehorse | set | nosy:
+ timehorse versions: + Python 2.7 |
| 2008年07月09日 23:05:29 | yarkot | set | nosy:
+ yarkot messages: + msg69491 |
| 2008年04月24日 21:05:17 | rsc | set | nosy: + rsc |
| 2008年01月09日 22:37:50 | georg.brandl | link | issue1566086 superseder |
| 2008年01月09日 21:26:35 | schmir | set | nosy:
+ schmir messages: + msg59627 |
| 2007年10月07日 13:55:51 | aaronsw | set | nosy:
+ aaronsw messages: + msg56256 |
| 2007年02月17日 23:39:24 | gregory.p.smith | create | |