homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Integer overflow in unicode-escape decoder
Type: behavior Stage: resolved
Components: Interpreter Core, Unicode Versions: Python 3.2, Python 3.3, Python 3.4, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: benjamin.peterson, ezio.melotti, lemburg, mark.dickinson, pitrou, python-dev, serhiy.storchaka, skrah, terry.reedy, vstinner
Priority: normal Keywords: patch

Created on 2012年10月26日 22:55 by serhiy.storchaka, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
decode_unicode_escape_overflow-3.3.patch serhiy.storchaka, 2012年11月09日 14:41 review
decode_unicode_escape_overflow-3.2.patch serhiy.storchaka, 2012年11月09日 14:41 review
decode_unicode_escape_overflow-2.7.patch serhiy.storchaka, 2012年11月09日 14:41 review
Messages (27)
msg173902 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012年10月26日 22:55
Size of parsed Unicode character name casted to int in unicode-escape decoder. This can cause integer overflow.
msg174169 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012年10月30日 00:48
If I understood correctly, (b'\\N{' + b'x' * (INT_MAX+1)) + '}').decode('unicode-decode') may crash? Did you try such string?
msg174190 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012年10月30日 09:39
(b'\\N{WHITE SMILING FACE' + b'x' * 2**32 + '}').decode('unicode-escape') may pass on platform with 32-bit int and more than 32-bit size_t if there is enough memory.
I don't have so much memory.
msg174381 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012年10月31日 22:08
I have 12 GB of RAM. Let's test.
$ ./python 
Python 3.4.0a0 (default:8573a86c11b5+, Oct 31 2012, 22:17:00) 
[GCC 4.6.3 20120306 (Red Hat 4.6.3-2)] on linux
>>> x=(b'\\N{WHITE SMILING FACE' + b'x' * 2**32 + b'}')
>>> len(x)
4294967318
>>> y=x.decode('unicode-escape')
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
MemoryError
There is no crash, but it would be better to get a SyntaxError("(unicode error) 'unicodeescape' codec can't decode bytes in position 0-6: unknown Unicode character name") instead.
I propose to only fix this issue in Python 3.4.
msg174383 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012年10月31日 22:35
> MemoryError
It's because you need >4GB for source bytes + at least >8GB (>12GB on Windows) for temporary UCS2 string.
msg174579 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2012年11月02日 21:14
I don't know what to make of this, but...
Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:57:17) [MSC v.1600 64 bit (AMD64)] on win32
Win7 pro, 24 gb mem
>>> x=(b'\\N{WHITE SMILING FACE' + b'x' * 2**32 + b'}')
>>> len(x)
4294967318
>>> y=x.decode('unicode-escape')
>>> len(y)
1
>>> y
'☺'
msg174585 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012年11月02日 21:30
Wow!
Do we need a test for this case?
msg174597 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2012年11月03日 00:56
>>> x=(b'\\N{WHITE SMILING FACE' + b'x' * 2**16 + b'}')
>>> y=x.decode('unicode-escape')
Traceback (most recent call last):
 File "<pyshell#1>", line 1, in <module>
 y=x.decode('unicode-escape')
UnicodeDecodeError: 'unicodeescape' codec can't decode bytes in position 0-65557: unknown Unicode character name
>>> x=(b'\\N{WHITE SMILING FACE}')
>>> y=x.decode('unicode-escape')
>>> y
'☺'
A manageable number of extra spaces raises (I presume correctly), an unmagageable number are ignored (as it seems), is bizarre. Creating the long version took about 15 seconds on a fast machine, so test should be limited to test all (slowly) on machine with high memory.
msg175226 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2012年11月09日 12:04
Tests would be good. You could use test.support.bigmemtest.
msg175241 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012年11月09日 14:41
Here are patches for different Python versions. Test added. Victor, now you can try it on 12GB. Unfortunately, I can't run the tests.
msg175832 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012年11月17日 23:42
Terry, can you measure how much memory tests really needed (3.2 and 3.3 should want different quantity)? Looks as I wrong in my assumptions.
msg175843 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2012年11月18日 04:43
Serhiy, please be more specific as to 'measure' and 'how much' for what effect. I ran two examples, one ran (with error), the other raised.
msg175845 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012年11月18日 08:31
I add tests. Victor ran the test and got MemoryError. This means that I incorrectly calculated the minimal memory size for bigmem. This is unacceptable, the test should skip or pass. Only someone with enough memory for test can measure a minimal memory requirement (I don't know how to do this).
May be apply the fix without a test? You tested this manually and this test too cumbersome for regular automatic testing.
msg180332 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013年01月21日 09:46
New changeset 7625866f8127 by Serhiy Storchaka in branch '3.2':
Issue #16335: Fix integer overflow in unicode-escape decoder.
http://hg.python.org/cpython/rev/7625866f8127
New changeset 494d341e9143 by Serhiy Storchaka in branch '3.3':
Issue #16335: Fix integer overflow in unicode-escape decoder.
http://hg.python.org/cpython/rev/494d341e9143
New changeset 8488febf7d79 by Serhiy Storchaka in branch 'default':
Issue #16335: Fix integer overflow in unicode-escape decoder.
http://hg.python.org/cpython/rev/8488febf7d79 
msg180333 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013年01月21日 09:49
New changeset f4d30d1a529e by Serhiy Storchaka in branch '2.7':
Issue #16335: Fix integer overflow in unicode-escape decoder.
http://hg.python.org/cpython/rev/f4d30d1a529e 
msg180334 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013年01月21日 09:51
I rewrote the test in EAFP style.
msg180336 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013年01月21日 11:06
New changeset f84a6c89ccbc by Serhiy Storchaka in branch '3.2':
Fix memory error in test_ucn.
http://hg.python.org/cpython/rev/f84a6c89ccbc
New changeset 7c2aae472b27 by Serhiy Storchaka in branch '3.3':
Fix memory error in test_ucn.
http://hg.python.org/cpython/rev/7c2aae472b27
New changeset f90d6ce49772 by Serhiy Storchaka in branch 'default':
Fix memory error in test_ucn.
http://hg.python.org/cpython/rev/f90d6ce49772
New changeset 38a10d0778d2 by Serhiy Storchaka in branch '2.7':
Fix memory error in test_ucn.
http://hg.python.org/cpython/rev/38a10d0778d2 
msg180348 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013年01月21日 18:30
New changeset ec3a35ab3659 by Serhiy Storchaka in branch '2.7':
Add bigmemtest decorator to test of issue #16335.
http://hg.python.org/cpython/rev/ec3a35ab3659
New changeset 6e0c3e4136b1 by Serhiy Storchaka in branch '3.2':
Add bigmemtest decorator to test of issue #16335.
http://hg.python.org/cpython/rev/6e0c3e4136b1
New changeset 0e622d2cbcf8 by Serhiy Storchaka in branch '3.3':
Use bigmemtest decorator for test of issue #16335.
http://hg.python.org/cpython/rev/0e622d2cbcf8
New changeset cdd1e60d31e5 by Serhiy Storchaka in branch 'default':
Use bigmemtest decorator for test of issue #16335.
http://hg.python.org/cpython/rev/cdd1e60d31e5 
msg180552 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2013年01月25日 00:14
I just ran the 2.7 tests while dealing with another issue, and
I'm getting a memory error or excessive swapping in test_ucn:
The statement
 x = b'\\N{SPACE' + b'x' * int(_testcapi.UINT_MAX + 1) + b'}'
uses over 8GB on my system, so I think that
 minsize=_testcapi.UINT_MAX + 1
is too low.
msg180556 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013年01月25日 08:18
New changeset fc21f8e83062 by Serhiy Storchaka in branch '2.7':
Don't run the test for issue #16335 when -M is not specified.
http://hg.python.org/cpython/rev/fc21f8e83062
New changeset e3d1b68d34e3 by Serhiy Storchaka in branch '3.2':
Increase the memory limit in the test for issue #16335.
http://hg.python.org/cpython/rev/e3d1b68d34e3
New changeset 43907b88ce93 by Serhiy Storchaka in branch '3.3':
Increase the memory limit in the test for issue #16335.
http://hg.python.org/cpython/rev/43907b88ce93
New changeset fcdb35b114ab by Serhiy Storchaka in branch 'default':
Increase the memory limit in the test for issue #16335.
http://hg.python.org/cpython/rev/fcdb35b114ab 
msg180558 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013年01月25日 08:31
Bigmem test in 2.7 ran even if -M option is not specified and this causes the memory error. But memuse parameter should be increased (I tested with smaller sizes and found that 1 + 4 // len(u'\U00010000') is not enough, but 2 + 4 // len(u'\U00010000') is enough). Let's see if it helps.
msg180559 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013年01月25日 09:00
> Bigmem test in 2.7 ran even if -M option is not specified and this
> causes the memory error.
Ah, yes, that's because you should have used `size` instead
of `_testcapi.UINT_MAX` inside the test.
msg180560 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013年01月25日 09:43
> Ah, yes, that's because you should have used `size` instead
> of `_testcapi.UINT_MAX` inside the test.
This test has sense only if size % (_testcapi.UINT_MAX + 1) == 0.
msg180561 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2013年01月25日 09:55
The test is fixed here, thanks.
The limits appear to be different in 2.7 and 3.4: In 2.7 the bigmem tests
are executed with -M x > 16G, in 3.4 with -M x >= 12G.
I don't know if that's deliberate, just mentioning it.
msg180563 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013年01月25日 10:20
Due to PEP 393 Python 3.3+ requires less memory for temporary output buffer. As for difference between ">" and ">=", the meaning of -M parameter a little differs in 2.7 and 3.x -- in 2.7 some overhead (5MiB) counted up.
msg180569 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013年01月25日 11:37
> Serhiy Storchaka added the comment:
> 
> > Ah, yes, that's because you should have used `size` instead
> > of `_testcapi.UINT_MAX` inside the test.
> 
> This test has sense only if size % (_testcapi.UINT_MAX + 1) == 0.
Why so? Does it fail otherwise?
msg180570 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013年01月25日 11:57
The test passed in any case, but for different size it doesn't check that the bug is fixed. Due to the bug bytes b'\\N{SPACExxxxxxxxxxxx...xxx'}' decoded as b'\\N{SPACE'}' if the number of x-es divisible by (UINT_MAX + 1). In this case unicode-escape decoding doesn't failed when the bug not fixed and failed (as expected) when the bug fixed. For all other numbers (>0) the decoding fails as when the bug fixed so when the bug is not fixed. And for other numbers the test is not relevant.
History
Date User Action Args
2022年04月11日 14:57:37adminsetgithub: 60539
2013年01月28日 13:29:44serhiy.storchakasetstatus: open -> closed
2013年01月25日 11:57:20serhiy.storchakasetmessages: + msg180570
2013年01月25日 11:37:44pitrousetmessages: + msg180569
2013年01月25日 10:20:52serhiy.storchakasetmessages: + msg180563
2013年01月25日 09:55:17skrahsetmessages: + msg180561
2013年01月25日 09:43:49serhiy.storchakasetmessages: + msg180560
2013年01月25日 09:00:14pitrousetmessages: + msg180559
2013年01月25日 08:31:42serhiy.storchakasetstatus: closed -> open

messages: + msg180558
2013年01月25日 08:18:01python-devsetmessages: + msg180556
2013年01月25日 00:14:46skrahsetnosy: + skrah
messages: + msg180552
2013年01月21日 18:30:04python-devsetmessages: + msg180348
2013年01月21日 11:06:21python-devsetmessages: + msg180336
2013年01月21日 09:51:30serhiy.storchakasetstatus: open -> closed
resolution: fixed
messages: + msg180334

stage: patch review -> resolved
2013年01月21日 09:49:14python-devsetmessages: + msg180333
2013年01月21日 09:46:20python-devsetnosy: + python-dev
messages: + msg180332
2013年01月07日 18:37:35serhiy.storchakasetassignee: serhiy.storchaka
2012年11月18日 08:31:55serhiy.storchakasetmessages: + msg175845
2012年11月18日 04:43:55terry.reedysetmessages: + msg175843
2012年11月17日 23:42:55serhiy.storchakasetmessages: + msg175832
2012年11月09日 14:41:48serhiy.storchakasetfiles: + decode_unicode_escape_overflow-3.3.patch, decode_unicode_escape_overflow-3.2.patch, decode_unicode_escape_overflow-2.7.patch

messages: + msg175241
2012年11月09日 14:41:43serhiy.storchakasetfiles: - decode_unicode_escape_overflow.patch
2012年11月09日 12:04:45ezio.melottisetmessages: + msg175226
2012年11月03日 00:56:56terry.reedysetmessages: + msg174597
2012年11月02日 21:30:02serhiy.storchakasetmessages: + msg174585
2012年11月02日 21:14:48terry.reedysetnosy: + terry.reedy
messages: + msg174579
2012年10月31日 22:35:49serhiy.storchakasetmessages: + msg174383
2012年10月31日 22:08:32vstinnersetmessages: + msg174381
2012年10月30日 09:39:05serhiy.storchakasetmessages: + msg174190
2012年10月30日 00:48:09vstinnersetmessages: + msg174169
2012年10月26日 22:55:21serhiy.storchakacreate

AltStyle によって変換されたページ (->オリジナル) /