Issue 16335: Integer overflow in unicode-escape decoder

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/60539

classification

Title:	Integer overflow in unicode-escape decoder
Type:	behavior	Stage:	resolved
Components:	Interpreter Core, Unicode	Versions:	Python 3.2, Python 3.3, Python 3.4, Python 2.7

process

Dependencies:	Superseder:
Status:	closed	Resolution:	fixed
Assigned To:	serhiy.storchaka	Nosy List:	benjamin.peterson, ezio.melotti, lemburg, mark.dickinson, pitrou, python-dev, serhiy.storchaka, skrah, terry.reedy, vstinner
Priority:	normal	Keywords:	patch

Created on 2012年10月26日 22:55 by serhiy.storchaka, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
decode_unicode_escape_overflow-3.3.patch	serhiy.storchaka, 2012年11月09日 14:41	review
decode_unicode_escape_overflow-3.2.patch	serhiy.storchaka, 2012年11月09日 14:41	review
decode_unicode_escape_overflow-2.7.patch	serhiy.storchaka, 2012年11月09日 14:41	review

Messages (27)
msg173902 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)	Date: 2012年10月26日 22:55
Size of parsed Unicode character name casted to int in unicode-escape decoder. This can cause integer overflow.
msg174169 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2012年10月30日 00:48
If I understood correctly, (b'\\N{' + b'x' * (INT_MAX+1)) + '}').decode('unicode-decode') may crash? Did you try such string?
msg174190 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)	Date: 2012年10月30日 09:39
(b'\\N{WHITE SMILING FACE' + b'x' * 2**32 + '}').decode('unicode-escape') may pass on platform with 32-bit int and more than 32-bit size_t if there is enough memory. I don't have so much memory.
msg174381 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2012年10月31日 22:08
I have 12 GB of RAM. Let's test. $ ./python Python 3.4.0a0 (default:8573a86c11b5+, Oct 31 2012, 22:17:00) [GCC 4.6.3 20120306 (Red Hat 4.6.3-2)] on linux >>> x=(b'\\N{WHITE SMILING FACE' + b'x' * 2**32 + b'}') >>> len(x) 4294967318 >>> y=x.decode('unicode-escape') Traceback (most recent call last): File "<stdin>", line 1, in <module> MemoryError There is no crash, but it would be better to get a SyntaxError("(unicode error) 'unicodeescape' codec can't decode bytes in position 0-6: unknown Unicode character name") instead. I propose to only fix this issue in Python 3.4.
msg174383 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)	Date: 2012年10月31日 22:35
> MemoryError It's because you need >4GB for source bytes + at least >8GB (>12GB on Windows) for temporary UCS2 string.
msg174579 - (view)	Author: Terry J. Reedy (terry.reedy) * (Python committer)	Date: 2012年11月02日 21:14
I don't know what to make of this, but... Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:57:17) [MSC v.1600 64 bit (AMD64)] on win32 Win7 pro, 24 gb mem >>> x=(b'\\N{WHITE SMILING FACE' + b'x' * 2**32 + b'}') >>> len(x) 4294967318 >>> y=x.decode('unicode-escape') >>> len(y) 1 >>> y '☺'
msg174585 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)	Date: 2012年11月02日 21:30
Wow! Do we need a test for this case?
msg174597 - (view)	Author: Terry J. Reedy (terry.reedy) * (Python committer)	Date: 2012年11月03日 00:56
>>> x=(b'\\N{WHITE SMILING FACE' + b'x' * 2**16 + b'}') >>> y=x.decode('unicode-escape') Traceback (most recent call last): File "<pyshell#1>", line 1, in <module> y=x.decode('unicode-escape') UnicodeDecodeError: 'unicodeescape' codec can't decode bytes in position 0-65557: unknown Unicode character name >>> x=(b'\\N{WHITE SMILING FACE}') >>> y=x.decode('unicode-escape') >>> y '☺' A manageable number of extra spaces raises (I presume correctly), an unmagageable number are ignored (as it seems), is bizarre. Creating the long version took about 15 seconds on a fast machine, so test should be limited to test all (slowly) on machine with high memory.
msg175226 - (view)	Author: Ezio Melotti (ezio.melotti) * (Python committer)	Date: 2012年11月09日 12:04
Tests would be good. You could use test.support.bigmemtest.
msg175241 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)	Date: 2012年11月09日 14:41
Here are patches for different Python versions. Test added. Victor, now you can try it on 12GB. Unfortunately, I can't run the tests.
msg175832 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)	Date: 2012年11月17日 23:42
Terry, can you measure how much memory tests really needed (3.2 and 3.3 should want different quantity)? Looks as I wrong in my assumptions.
msg175843 - (view)	Author: Terry J. Reedy (terry.reedy) * (Python committer)	Date: 2012年11月18日 04:43
Serhiy, please be more specific as to 'measure' and 'how much' for what effect. I ran two examples, one ran (with error), the other raised.
msg175845 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)	Date: 2012年11月18日 08:31
I add tests. Victor ran the test and got MemoryError. This means that I incorrectly calculated the minimal memory size for bigmem. This is unacceptable, the test should skip or pass. Only someone with enough memory for test can measure a minimal memory requirement (I don't know how to do this). May be apply the fix without a test? You tested this manually and this test too cumbersome for regular automatic testing.
msg180332 - (view)	Author: Roundup Robot (python-dev) (Python triager)	Date: 2013年01月21日 09:46
New changeset 7625866f8127 by Serhiy Storchaka in branch '3.2': Issue #16335: Fix integer overflow in unicode-escape decoder. http://hg.python.org/cpython/rev/7625866f8127 New changeset 494d341e9143 by Serhiy Storchaka in branch '3.3': Issue #16335: Fix integer overflow in unicode-escape decoder. http://hg.python.org/cpython/rev/494d341e9143 New changeset 8488febf7d79 by Serhiy Storchaka in branch 'default': Issue #16335: Fix integer overflow in unicode-escape decoder. http://hg.python.org/cpython/rev/8488febf7d79
msg180333 - (view)	Author: Roundup Robot (python-dev) (Python triager)	Date: 2013年01月21日 09:49
New changeset f4d30d1a529e by Serhiy Storchaka in branch '2.7': Issue #16335: Fix integer overflow in unicode-escape decoder. http://hg.python.org/cpython/rev/f4d30d1a529e
msg180334 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)	Date: 2013年01月21日 09:51
I rewrote the test in EAFP style.
msg180336 - (view)	Author: Roundup Robot (python-dev) (Python triager)	Date: 2013年01月21日 11:06
New changeset f84a6c89ccbc by Serhiy Storchaka in branch '3.2': Fix memory error in test_ucn. http://hg.python.org/cpython/rev/f84a6c89ccbc New changeset 7c2aae472b27 by Serhiy Storchaka in branch '3.3': Fix memory error in test_ucn. http://hg.python.org/cpython/rev/7c2aae472b27 New changeset f90d6ce49772 by Serhiy Storchaka in branch 'default': Fix memory error in test_ucn. http://hg.python.org/cpython/rev/f90d6ce49772 New changeset 38a10d0778d2 by Serhiy Storchaka in branch '2.7': Fix memory error in test_ucn. http://hg.python.org/cpython/rev/38a10d0778d2
msg180348 - (view)	Author: Roundup Robot (python-dev) (Python triager)	Date: 2013年01月21日 18:30
New changeset ec3a35ab3659 by Serhiy Storchaka in branch '2.7': Add bigmemtest decorator to test of issue #16335. http://hg.python.org/cpython/rev/ec3a35ab3659 New changeset 6e0c3e4136b1 by Serhiy Storchaka in branch '3.2': Add bigmemtest decorator to test of issue #16335. http://hg.python.org/cpython/rev/6e0c3e4136b1 New changeset 0e622d2cbcf8 by Serhiy Storchaka in branch '3.3': Use bigmemtest decorator for test of issue #16335. http://hg.python.org/cpython/rev/0e622d2cbcf8 New changeset cdd1e60d31e5 by Serhiy Storchaka in branch 'default': Use bigmemtest decorator for test of issue #16335. http://hg.python.org/cpython/rev/cdd1e60d31e5
msg180552 - (view)	Author: Stefan Krah (skrah) * (Python committer)	Date: 2013年01月25日 00:14
I just ran the 2.7 tests while dealing with another issue, and I'm getting a memory error or excessive swapping in test_ucn: The statement x = b'\\N{SPACE' + b'x' * int(_testcapi.UINT_MAX + 1) + b'}' uses over 8GB on my system, so I think that minsize=_testcapi.UINT_MAX + 1 is too low.
msg180556 - (view)	Author: Roundup Robot (python-dev) (Python triager)	Date: 2013年01月25日 08:18
New changeset fc21f8e83062 by Serhiy Storchaka in branch '2.7': Don't run the test for issue #16335 when -M is not specified. http://hg.python.org/cpython/rev/fc21f8e83062 New changeset e3d1b68d34e3 by Serhiy Storchaka in branch '3.2': Increase the memory limit in the test for issue #16335. http://hg.python.org/cpython/rev/e3d1b68d34e3 New changeset 43907b88ce93 by Serhiy Storchaka in branch '3.3': Increase the memory limit in the test for issue #16335. http://hg.python.org/cpython/rev/43907b88ce93 New changeset fcdb35b114ab by Serhiy Storchaka in branch 'default': Increase the memory limit in the test for issue #16335. http://hg.python.org/cpython/rev/fcdb35b114ab
msg180558 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)	Date: 2013年01月25日 08:31
Bigmem test in 2.7 ran even if -M option is not specified and this causes the memory error. But memuse parameter should be increased (I tested with smaller sizes and found that 1 + 4 // len(u'\U00010000') is not enough, but 2 + 4 // len(u'\U00010000') is enough). Let's see if it helps.
msg180559 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2013年01月25日 09:00
> Bigmem test in 2.7 ran even if -M option is not specified and this > causes the memory error. Ah, yes, that's because you should have used `size` instead of `_testcapi.UINT_MAX` inside the test.
msg180560 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)	Date: 2013年01月25日 09:43
> Ah, yes, that's because you should have used `size` instead > of `_testcapi.UINT_MAX` inside the test. This test has sense only if size % (_testcapi.UINT_MAX + 1) == 0.
msg180561 - (view)	Author: Stefan Krah (skrah) * (Python committer)	Date: 2013年01月25日 09:55
The test is fixed here, thanks. The limits appear to be different in 2.7 and 3.4: In 2.7 the bigmem tests are executed with -M x > 16G, in 3.4 with -M x >= 12G. I don't know if that's deliberate, just mentioning it.
msg180563 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)	Date: 2013年01月25日 10:20
Due to PEP 393 Python 3.3+ requires less memory for temporary output buffer. As for difference between ">" and ">=", the meaning of -M parameter a little differs in 2.7 and 3.x -- in 2.7 some overhead (5MiB) counted up.
msg180569 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2013年01月25日 11:37
> Serhiy Storchaka added the comment: > > > Ah, yes, that's because you should have used `size` instead > > of `_testcapi.UINT_MAX` inside the test. > > This test has sense only if size % (_testcapi.UINT_MAX + 1) == 0. Why so? Does it fail otherwise?
msg180570 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)	Date: 2013年01月25日 11:57
The test passed in any case, but for different size it doesn't check that the bug is fixed. Due to the bug bytes b'\\N{SPACExxxxxxxxxxxx...xxx'}' decoded as b'\\N{SPACE'}' if the number of x-es divisible by (UINT_MAX + 1). In this case unicode-escape decoding doesn't failed when the bug not fixed and failed (as expected) when the bug fixed. For all other numbers (>0) the decoding fails as when the bug fixed so when the bug is not fixed. And for other numbers the test is not relevant.

History
Date	User	Action	Args
2022年04月11日 14:57:37	admin	set	github: 60539
2013年01月28日 13:29:44	serhiy.storchaka	set	status: open -> closed
2013年01月25日 11:57:20	serhiy.storchaka	set	messages: + msg180570
2013年01月25日 11:37:44	pitrou	set	messages: + msg180569
2013年01月25日 10:20:52	serhiy.storchaka	set	messages: + msg180563
2013年01月25日 09:55:17	skrah	set	messages: + msg180561
2013年01月25日 09:43:49	serhiy.storchaka	set	messages: + msg180560
2013年01月25日 09:00:14	pitrou	set	messages: + msg180559
2013年01月25日 08:31:42	serhiy.storchaka	set	status: closed -> open messages: + msg180558
2013年01月25日 08:18:01	python-dev	set	messages: + msg180556
2013年01月25日 00:14:46	skrah	set	nosy: + skrah messages: + msg180552
2013年01月21日 18:30:04	python-dev	set	messages: + msg180348
2013年01月21日 11:06:21	python-dev	set	messages: + msg180336
2013年01月21日 09:51:30	serhiy.storchaka	set	status: open -> closed resolution: fixed messages: + msg180334 stage: patch review -> resolved
2013年01月21日 09:49:14	python-dev	set	messages: + msg180333
2013年01月21日 09:46:20	python-dev	set	nosy: + python-dev messages: + msg180332
2013年01月07日 18:37:35	serhiy.storchaka	set	assignee: serhiy.storchaka
2012年11月18日 08:31:55	serhiy.storchaka	set	messages: + msg175845
2012年11月18日 04:43:55	terry.reedy	set	messages: + msg175843
2012年11月17日 23:42:55	serhiy.storchaka	set	messages: + msg175832
2012年11月09日 14:41:48	serhiy.storchaka	set	files: + decode_unicode_escape_overflow-3.3.patch, decode_unicode_escape_overflow-3.2.patch, decode_unicode_escape_overflow-2.7.patch messages: + msg175241
2012年11月09日 14:41:43	serhiy.storchaka	set	files: - decode_unicode_escape_overflow.patch
2012年11月09日 12:04:45	ezio.melotti	set	messages: + msg175226
2012年11月03日 00:56:56	terry.reedy	set	messages: + msg174597
2012年11月02日 21:30:02	serhiy.storchaka	set	messages: + msg174585
2012年11月02日 21:14:48	terry.reedy	set	nosy: + terry.reedy messages: + msg174579
2012年10月31日 22:35:49	serhiy.storchaka	set	messages: + msg174383
2012年10月31日 22:08:32	vstinner	set	messages: + msg174381
2012年10月30日 09:39:05	serhiy.storchaka	set	messages: + msg174190
2012年10月30日 00:48:09	vstinner	set	messages: + msg174169
2012年10月26日 22:55:21	serhiy.storchaka	create

homepage