This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2011年10月13日 16:30 by techmaurice, last changed 2022年04月11日 14:57 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| re_maxrepeat.patch | serhiy.storchaka, 2013年01月23日 20:19 | review | ||
| re_maxrepeat2.patch | serhiy.storchaka, 2013年01月24日 13:45 | review | ||
| re_maxrepeat3.patch | serhiy.storchaka, 2013年01月24日 19:20 | |||
| re_maxrepeat4-2.7.patch | serhiy.storchaka, 2013年01月31日 15:23 | review | ||
| re_maxrepeat4-3.2.patch | serhiy.storchaka, 2013年01月31日 15:23 | review | ||
| re_maxrepeat4.patch | serhiy.storchaka, 2013年01月31日 15:23 | review | ||
| Messages (28) | |||
|---|---|---|---|
| msg145469 - (view) | Author: Maurice de Rooij (techmaurice) | Date: 2011年10月13日 16:30 | |
Regular expressions with 0 to 65536 repetitions and above makes Python crash with a "OverflowError: regular expression code size limit exceeded" exception.
65535 repetitions do not raise this issue.
Tested and confirmed this with versions 2.7.1 and 3.2.2.
C:\Python27>python.exe
Python 2.7.1 (r271:86832, Nov 27 2010, 18:30:46) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> re.search('(?s)\A.{0,65535}test', 'test')
<_sre.SRE_Match object at 0x00B4E4B8>
>>> re.search('(?s)\A.{0,65536}test', 'test')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\re.py", line 142, in search
return _compile(pattern, flags).search(string)
File "C:\Python27\lib\re.py", line 243, in _compile
p = sre_compile.compile(pattern, flags)
File "C:\Python27\lib\sre_compile.py", line 523, in compile
groupindex, indexgroup
OverflowError: regular expression code size limit exceeded
>>>
C:\Python32>python.exe
Python 3.2.2 (default, Sep 4 2011, 09:51:08) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> re.search('(?s)\A.{0,65535}test', 'test')
<_sre.SRE_Match object at 0x00A6F250>
>>> re.search('(?s)\A.{0,65536}test', 'test')
Traceback (most recent call last):
File "C:\Python32\lib\functools.py", line 176, in wrapper
result = cache[key]
KeyError: (<class 'str'>, '(?s)\\A.{0,65536}test', 0)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python32\lib\re.py", line 158, in search
return _compile(pattern, flags).search(string)
File "C:\Python32\lib\re.py", line 255, in _compile
return _compile_typed(type(pattern), pattern, flags)
File "C:\Python32\lib\functools.py", line 180, in wrapper
result = user_function(*args, **kwds)
File "C:\Python32\lib\re.py", line 267, in _compile_typed
return sre_compile.compile(pattern, flags)
File "C:\Python32\lib\sre_compile.py", line 514, in compile
groupindex, indexgroup
OverflowError: regular expression code size limit exceeded
>>>
|
|||
| msg145471 - (view) | Author: Brian Curtin (brian.curtin) * (Python committer) | Date: 2011年10月13日 16:38 | |
I might be missing something, but what's the issue? 65535 is the limit, and doing 65536 gives a clear overflow exception (no crash). |
|||
| msg145475 - (view) | Author: Matthew Barnett (mrabarnett) * (Python triager) | Date: 2011年10月13日 17:46 | |
The quantifiers use 65535 to represent no upper limit, so ".{0,65535}" is equivalent to ".*".
For example:
>>> re.match(".*", "x" * 100000).span()
(0, 100000)
>>> re.match(".{0,65535}", "x" * 100000).span()
(0, 100000)
but:
>>> re.match(".{0,65534}", "x" * 100000).span()
(0, 65534)
|
|||
| msg145506 - (view) | Author: Maurice de Rooij (techmaurice) | Date: 2011年10月14日 11:07 | |
So if I understand correctly, the maximum of 65535 repetitions is by design?
Have tried a workaround by repeating the repetitions by placing it inside a capturing group, which is perfectly legal with Perl regular expressions:
$mystring = "test";
if($mystring =~ m/^(.{0,32766}){0,3}test/s) { print "Yes\n"; }
(32766 being the max repetitions in Perl)
Unfortunately, in Python this does not work and raises a "nothing to repeat" sre_constants error:
re.search('(?s)\A(.{0,65535}){0,3}test', 'test')
This, however works, which yields 65536 repetitions of DOTALL:
re.search('(?s)\A.{0,65535}.{0,1}test', 'test')
In the end this solves my problem sort or less, but requires extra logic in my script and complicates stuff unnecessary.
A suggestion might be to make repetitions of repeats possible?
|
|||
| msg145547 - (view) | Author: Matthew Barnett (mrabarnett) * (Python triager) | Date: 2011年10月14日 16:28 | |
The limit is an implementation detail. The pattern is compiled into codes which are then interpreted, and it just happens that the codes are (usually) 16 bits, giving a range of 0..65535, but it uses 65535 to represent no limit and doesn't warn if you actually write 65535. There's an alternative regex implementation here: http://pypi.python.org/pypi/regex |
|||
| msg152412 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2012年01月31日 22:36 | |
Issue #13914 has been marked as a duplicate of this issue. |
|||
| msg154625 - (view) | Author: Ezio Melotti (ezio.melotti) * (Python committer) | Date: 2012年02月29日 12:18 | |
Matthew, do you think this should be documented somewhere or that the behavior should be changed (e.g. raising a warning when 65535 is used)? If not I'll just close the issue. |
|||
| msg154653 - (view) | Author: Matthew Barnett (mrabarnett) * (Python triager) | Date: 2012年02月29日 17:51 | |
Ideally, it should raise an exception (or a warning) because the behaviour is unexpected. |
|||
| msg180499 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2013年01月23日 20:19 | |
Now RuntimeError is raised in this case. Here is a patch, which: 1) Increases the limit of repeat numbers to 4G (now SRE_CODE at least 32-bit). 2) Raises re.error exception if this limit is exceeded. 3) Fixes some minor related things. |
|||
| msg180505 - (view) | Author: Matthew Barnett (mrabarnett) * (Python triager) | Date: 2013年01月24日 03:53 | |
IMHO, I don't think that MAXREPEAT should be defined in sre_constants.py _and_ SRE_MAXREPEAT defined in sre_constants.h. (In the latter case, why is it in decimal?) I think that it should be defined in one place, namely sre_constants.h, perhaps as: #define SRE_MAXREPEAT ~(SRE_CODE)0 and then imported into sre_constants.py. That'll reduce the chance of an inadvertent mismatch, and it's the C code that's imposing the limit to the number of repeats, not the Python code. |
|||
| msg180516 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2013年01月24日 08:30 | |
> (In the latter case, why is it in decimal?) Because SRE_MAXREPEAT is generated (as all sre_constants.h) from sre_constants.py (note changes at the end of sre_constants.py). I agree, that SRE_MAXREPEAT is imposed by the C code limitation and it will be better to defined it in C. But we can't just import C's define into Python. This requires more code. |
|||
| msg180521 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2013年01月24日 13:44 | |
Patch updated for addressing Ezio's and Matthew's comments. MAXREPEAT now defined in the C code. It lowered to 2G on 32-bit platform to fit repetition numbers into Py_ssize_t. The condition for raising of an exception now more complex: if the repetition number overflows Py_ssize_t it means the same as an infinity bound and in this case an exception is not raised (i.e. it never raised on 32-bit platform). Tests added. |
|||
| msg180543 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2013年01月24日 19:20 | |
Patch updated for addressing Ezio's comments. Tests simplified and optimized a little as Ezio suggested. Added a test for implementation dependent behavior (I hope it will gone away at some day). |
|||
| msg181026 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2013年01月31日 15:23 | |
Here are patches for 2.7, 3.2 and updated patch for 3.3+ (test_repeat_minmax_overflow_maxrepeat is changed). |
|||
| msg182224 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2013年02月16日 14:59 | |
New changeset c1b3d25882ca by Serhiy Storchaka in branch '2.7': Issue #13169: The maximal repetition number in a regular expression has been http://hg.python.org/cpython/rev/c1b3d25882ca New changeset 472a7c652cbd by Serhiy Storchaka in branch '3.2': Issue #13169: The maximal repetition number in a regular expression has been http://hg.python.org/cpython/rev/472a7c652cbd New changeset b78c321ee9a5 by Serhiy Storchaka in branch '3.3': Issue #13169: The maximal repetition number in a regular expression has been http://hg.python.org/cpython/rev/b78c321ee9a5 New changeset ca0307905cd7 by Serhiy Storchaka in branch 'default': Issue #13169: The maximal repetition number in a regular expression has been http://hg.python.org/cpython/rev/ca0307905cd7 |
|||
| msg182226 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2013年02月16日 15:07 | |
I have committed simplified patches. They don't change an exception type from OverflowError to re.error (but an error message now is more helpful) and don't made the code clever enough to not raise an exception when a repetition number is exceeded sys.maxsize. |
|||
| msg182290 - (view) | Author: Arfrever Frehtes Taifersar Arahesis (Arfrever) * (Python triager) | Date: 2013年02月17日 23:44 | |
Some third-party modules (e.g. epydoc) refer to sre_constants.MAXREPEAT. Please add 'from _sre import MAXREPEAT' to Lib/sre_constants.py for compatibility. |
|||
| msg182307 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2013年02月18日 09:00 | |
Thank you for report, Arfrever. I'll see how epydoc uses MAXREPEAT. Maybe it requires larger changes. |
|||
| msg182308 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2013年02月18日 09:30 | |
New changeset a80ea934da9a by Serhiy Storchaka in branch '2.7': Fix issue #13169: Reimport MAXREPEAT into sre_constants.py. http://hg.python.org/cpython/rev/a80ea934da9a New changeset a6231ed7bff4 by Serhiy Storchaka in branch '3.2': Fix issue #13169: Reimport MAXREPEAT into sre_constants.py. http://hg.python.org/cpython/rev/a6231ed7bff4 New changeset 88c04657c9f1 by Serhiy Storchaka in branch '3.3': Fix issue #13169: Reimport MAXREPEAT into sre_constants.py. http://hg.python.org/cpython/rev/88c04657c9f1 New changeset 3dd5be5c4794 by Serhiy Storchaka in branch 'default': Fix issue #13169: Reimport MAXREPEAT into sre_constants.py. http://hg.python.org/cpython/rev/3dd5be5c4794 |
|||
| msg186013 - (view) | Author: Martin Gfeller (Martin.Gfeller) | Date: 2013年04月04日 08:54 | |
I see (under Windows) the same symptoms as reported for Debian under http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=704084. Python refuses to start. 2.7.4.rc1 Windows 32-bit. |
|||
| msg186018 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2013年04月04日 09:22 | |
"Python refuses to start. 2.7.4.rc1 Windows 32-bit." Oh oh. I reopen the issue and set its priority to release blocker. |
|||
| msg186020 - (view) | Author: Georg Brandl (georg.brandl) * (Python committer) | Date: 2013年04月04日 09:49 | |
"Python refuses to start." is not a very good description. * What script are you running/module are you importing? * What is the traceback/error message? |
|||
| msg186021 - (view) | Author: Martin Gfeller (Martin.Gfeller) | Date: 2013年04月04日 09:51 | |
@Georg, the referenced Debian issue (http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=704084) already contains the stack. |
|||
| msg186022 - (view) | Author: Georg Brandl (georg.brandl) * (Python committer) | Date: 2013年04月04日 09:58 | |
And this happens when you simply start Python, not executing any code? Can you start with "python -S", then do "import _sre", and see if it has a _sre.MAXREPEAT attribute? |
|||
| msg186023 - (view) | Author: Ezio Melotti (ezio.melotti) * (Python committer) | Date: 2013年04月04日 10:01 | |
IIRC a few days ago I've seen a similar issue and the cause was that they did something wrong while porting the rc to Debian, but I don't remember the details. If I'm not mistaken they also fixed it shortly after. |
|||
| msg186024 - (view) | Author: Georg Brandl (georg.brandl) * (Python committer) | Date: 2013年04月04日 10:04 | |
Just tested with 2.7.4rc1 32bit on Windows 7; no problem here. I suspect your 2.7.4rc1 install picks up a python27.dll from an earlier version. |
|||
| msg186027 - (view) | Author: Martin Gfeller (Martin.Gfeller) | Date: 2013年04月04日 11:14 | |
Sorry for passing on my confusion, and thanks for your help! There was indeed an old python.dll lying in one of the places Windows likes to put DLLs. Deleting it resolved the problem. Thanks again and sorry to use your valuable time. Best regards, Martin |
|||
| msg186028 - (view) | Author: Georg Brandl (georg.brandl) * (Python committer) | Date: 2013年04月04日 11:15 | |
Thanks for the confirmation! |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:22 | admin | set | github: 57378 |
| 2013年04月04日 11:15:43 | georg.brandl | set | status: open -> closed messages: + msg186028 |
| 2013年04月04日 11:14:19 | Martin.Gfeller | set | status: pending -> open messages: + msg186027 |
| 2013年04月04日 10:04:15 | georg.brandl | set | status: open -> pending resolution: fixed messages: + msg186024 |
| 2013年04月04日 10:01:24 | ezio.melotti | set | messages: + msg186023 |
| 2013年04月04日 09:58:32 | georg.brandl | set | messages: + msg186022 |
| 2013年04月04日 09:51:24 | Martin.Gfeller | set | messages: + msg186021 |
| 2013年04月04日 09:49:18 | georg.brandl | set | messages: + msg186020 |
| 2013年04月04日 09:22:55 | vstinner | set | status: closed -> open priority: normal -> release blocker nosy: + larry, benjamin.peterson, georg.brandl messages: + msg186018 resolution: fixed -> (no value) |
| 2013年04月04日 08:54:59 | Martin.Gfeller | set | nosy:
+ Martin.Gfeller messages: + msg186013 |
| 2013年02月18日 11:48:10 | serhiy.storchaka | set | status: open -> closed resolution: fixed stage: resolved |
| 2013年02月18日 09:30:43 | python-dev | set | messages: + msg182308 |
| 2013年02月18日 09:00:04 | serhiy.storchaka | set | messages: + msg182307 |
| 2013年02月17日 23:44:39 | Arfrever | set | status: closed -> open nosy: + Arfrever messages: + msg182290 resolution: fixed -> (no value) stage: resolved -> (no value) |
| 2013年02月16日 15:07:41 | serhiy.storchaka | set | status: open -> closed resolution: fixed messages: + msg182226 stage: patch review -> resolved |
| 2013年02月16日 14:59:41 | python-dev | set | nosy:
+ python-dev messages: + msg182224 |
| 2013年01月31日 15:23:27 | serhiy.storchaka | set | files:
+ re_maxrepeat4-2.7.patch, re_maxrepeat4-3.2.patch, re_maxrepeat4.patch messages: + msg181026 |
| 2013年01月31日 14:42:52 | brian.curtin | set | nosy:
- brian.curtin |
| 2013年01月31日 14:37:54 | serhiy.storchaka | set | assignee: serhiy.storchaka |
| 2013年01月24日 19:20:33 | serhiy.storchaka | set | files:
+ re_maxrepeat3.patch messages: + msg180543 |
| 2013年01月24日 13:45:10 | serhiy.storchaka | set | files: + re_maxrepeat2.patch |
| 2013年01月24日 13:44:23 | serhiy.storchaka | set | messages: + msg180521 |
| 2013年01月24日 08:30:24 | serhiy.storchaka | set | messages: + msg180516 |
| 2013年01月24日 03:53:01 | mrabarnett | set | messages: + msg180505 |
| 2013年01月23日 20:19:15 | serhiy.storchaka | set | files:
+ re_maxrepeat.patch components: + Extension Modules, Regular Expressions versions: + Python 3.3, Python 3.4 keywords: + patch nosy: + serhiy.storchaka messages: + msg180499 stage: patch review |
| 2012年02月29日 17:51:50 | mrabarnett | set | messages: + msg154653 |
| 2012年02月29日 12:18:19 | ezio.melotti | set | messages: + msg154625 |
| 2012年01月31日 22:36:21 | vstinner | set | messages: + msg152412 |
| 2011年10月14日 16:28:24 | mrabarnett | set | messages: + msg145547 |
| 2011年10月14日 11:07:27 | techmaurice | set | messages: + msg145506 |
| 2011年10月13日 17:46:28 | mrabarnett | set | nosy:
+ mrabarnett messages: + msg145475 |
| 2011年10月13日 17:26:34 | ezio.melotti | set | nosy:
+ ezio.melotti |
| 2011年10月13日 16:39:13 | brian.curtin | set | title: Regular expressions with 0 to 65536 repetitions and above makes Python crash -> Regular expressions with 0 to 65536 repetitions raises OverflowError |
| 2011年10月13日 16:38:15 | brian.curtin | set | type: crash -> behavior messages: + msg145471 nosy: + brian.curtin |
| 2011年10月13日 16:32:29 | vstinner | set | nosy:
+ vstinner |
| 2011年10月13日 16:30:27 | techmaurice | create | |