Message176459
| Author |
tim.peters |
| Recipients |
ezio.melotti, lpd, mrabarnett, tim.peters |
| Date |
2012年11月27日.00:42:17 |
| SpamBayes Score |
-1.0 |
| Marked as misclassified |
Yes |
| Message-id |
<1353976938.17.0.842131609007.issue16563@psf.upfronthosting.co.za> |
| In-reply-to |
| Content |
There's actually enormous backtracking here. Try this much shorter regexp and you'll see much the same behavior:
re_utf8 = r'^([\x00-\x7f]+)*$'
That's the original re_utf8 with all but the first alternative removed.
Looks like passing s[0:34] "works" because it eliminates the trailing \x8d that prevents the regexp from matching the whole string. Because the regexp cannot match the whole string, it takes a very long time to try all the futile combinations implied by the nested quantifiers. As the much simpler re_utf8 above shows, it's not the alternatives in the regexp that matter here, it's the nested quantifiers. |
|
History
|
|---|
| Date |
User |
Action |
Args |
| 2012年11月27日 00:42:18 | tim.peters | set | recipients:
+ tim.peters, lpd, ezio.melotti, mrabarnett |
| 2012年11月27日 00:42:18 | tim.peters | set | messageid: <1353976938.17.0.842131609007.issue16563@psf.upfronthosting.co.za> |
| 2012年11月27日 00:42:18 | tim.peters | link | issue16563 messages |
| 2012年11月27日 00:42:17 | tim.peters | create |
|