This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2012年07月31日 18:08 by crouleau, last changed 2022年04月11日 14:57 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| RegexBug.py | crouleau, 2012年07月31日 18:08 | |||
| Messages (7) | |||
|---|---|---|---|
| msg167024 - (view) | Author: Caleb Rouleau (crouleau) | Date: 2012年07月31日 18:08 | |
Version info: 2.7.1 (r271:86832, Feb 7 2011, 11:33:02) [MSC v.1500 64 bit (AMD64)] The program included never prints "done" because it never returns from re.match(). -- Caleb Rouleau |
|||
| msg167028 - (view) | Author: Matthew Barnett (mrabarnett) * (Python triager) | Date: 2012年07月31日 18:59 | |
That's because it uses a pathological regular expression (catastrophic backtracking). The problem lies here: (\\?[\w\.\-]+)+ |
|||
| msg167031 - (view) | Author: Tim Peters (tim.peters) * (Python committer) | Date: 2012年07月31日 19:14 | |
Matthew is right: the nested quantifiers can cause this to take a very long time when the regexp doesn't match. Note that the example cannot match, because nothing in the regexp can match the space before "warning" in the example string. But the nested quantifiers cause it to _try_ an enormous number of futile attempts. Under Python 2.7.1, it eventually does return, but it took over 15 minutes when I tried it on my laptop. Friedl's book "Mastering Regular Expressions" is a book-length treatment of how to write regexps that don't "take forever" when they fail to match, and that's highly recommended. Or start a discussion on comp.lang.python, and I'm sure someone will help you flesh out exactly what it is you do and don't want to match, and how to write a regexp that performs well on both matching and non-matching text (the bug tracker isn't an appropriate place for this). |
|||
| msg167035 - (view) | Author: Caleb Rouleau (crouleau) | Date: 2012年07月31日 19:44 | |
Thanks for the help. Apologies for the poor understanding of regular expressions. Closing this issue. |
|||
| msg167038 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2012年07月31日 19:48 | |
Make a distinction between a large number of infinity. You have a bad regexp, the matching time depends exponentially on the length of the string. Try with short strings. Use the regexp r"(\w:)(\\?[\w\.\-]+)((\\[\w\.\-]+)*)(\.[\w ]+): ". It's not a bug. |
|||
| msg167042 - (view) | Author: Matthew Barnett (mrabarnett) * (Python triager) | Date: 2012年07月31日 19:58 | |
It's probably inappropriate for me to mention that the alternative 'regex' module on PyPI completes promptly, so I won't. :-) |
|||
| msg167054 - (view) | Author: Tim Peters (tim.peters) * (Python committer) | Date: 2012年07月31日 21:16 | |
Matthew, yes, PyPy's regex module implements regular expressions of the "computer science" (as opposed to POSIX) sense. See Friedl's book for a full explanation. Short course is that regex's flavor of regexp matching is linear-time, but cannot support "advanced" features like backreferences. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:33 | admin | set | github: 59720 |
| 2012年07月31日 21:16:17 | tim.peters | set | messages: + msg167054 |
| 2012年07月31日 19:58:12 | mrabarnett | set | messages: + msg167042 |
| 2012年07月31日 19:48:36 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages: + msg167038 |
| 2012年07月31日 19:44:56 | crouleau | set | status: open -> closed messages: + msg167035 |
| 2012年07月31日 19:14:55 | tim.peters | set | resolution: not a bug messages: + msg167031 nosy: + tim.peters |
| 2012年07月31日 18:59:38 | mrabarnett | set | messages: + msg167028 |
| 2012年07月31日 18:08:57 | crouleau | create | |