This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2012年10月26日 22:55 by serhiy.storchaka, last changed 2022年04月11日 14:57 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| decode_unicode_escape_overflow-3.3.patch | serhiy.storchaka, 2012年11月09日 14:41 | review | ||
| decode_unicode_escape_overflow-3.2.patch | serhiy.storchaka, 2012年11月09日 14:41 | review | ||
| decode_unicode_escape_overflow-2.7.patch | serhiy.storchaka, 2012年11月09日 14:41 | review | ||
| Messages (27) | |||
|---|---|---|---|
| msg173902 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2012年10月26日 22:55 | |
Size of parsed Unicode character name casted to int in unicode-escape decoder. This can cause integer overflow. |
|||
| msg174169 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2012年10月30日 00:48 | |
If I understood correctly, (b'\\N{' + b'x' * (INT_MAX+1)) + '}').decode('unicode-decode') may crash? Did you try such string?
|
|||
| msg174190 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2012年10月30日 09:39 | |
(b'\\N{WHITE SMILING FACE' + b'x' * 2**32 + '}').decode('unicode-escape') may pass on platform with 32-bit int and more than 32-bit size_t if there is enough memory.
I don't have so much memory.
|
|||
| msg174381 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2012年10月31日 22:08 | |
I have 12 GB of RAM. Let's test.
$ ./python
Python 3.4.0a0 (default:8573a86c11b5+, Oct 31 2012, 22:17:00)
[GCC 4.6.3 20120306 (Red Hat 4.6.3-2)] on linux
>>> x=(b'\\N{WHITE SMILING FACE' + b'x' * 2**32 + b'}')
>>> len(x)
4294967318
>>> y=x.decode('unicode-escape')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
MemoryError
There is no crash, but it would be better to get a SyntaxError("(unicode error) 'unicodeescape' codec can't decode bytes in position 0-6: unknown Unicode character name") instead.
I propose to only fix this issue in Python 3.4.
|
|||
| msg174383 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2012年10月31日 22:35 | |
> MemoryError It's because you need >4GB for source bytes + at least >8GB (>12GB on Windows) for temporary UCS2 string. |
|||
| msg174579 - (view) | Author: Terry J. Reedy (terry.reedy) * (Python committer) | Date: 2012年11月02日 21:14 | |
I don't know what to make of this, but...
Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:57:17) [MSC v.1600 64 bit (AMD64)] on win32
Win7 pro, 24 gb mem
>>> x=(b'\\N{WHITE SMILING FACE' + b'x' * 2**32 + b'}')
>>> len(x)
4294967318
>>> y=x.decode('unicode-escape')
>>> len(y)
1
>>> y
'☺'
|
|||
| msg174585 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2012年11月02日 21:30 | |
Wow! Do we need a test for this case? |
|||
| msg174597 - (view) | Author: Terry J. Reedy (terry.reedy) * (Python committer) | Date: 2012年11月03日 00:56 | |
>>> x=(b'\\N{WHITE SMILING FACE' + b'x' * 2**16 + b'}')
>>> y=x.decode('unicode-escape')
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
y=x.decode('unicode-escape')
UnicodeDecodeError: 'unicodeescape' codec can't decode bytes in position 0-65557: unknown Unicode character name
>>> x=(b'\\N{WHITE SMILING FACE}')
>>> y=x.decode('unicode-escape')
>>> y
'☺'
A manageable number of extra spaces raises (I presume correctly), an unmagageable number are ignored (as it seems), is bizarre. Creating the long version took about 15 seconds on a fast machine, so test should be limited to test all (slowly) on machine with high memory.
|
|||
| msg175226 - (view) | Author: Ezio Melotti (ezio.melotti) * (Python committer) | Date: 2012年11月09日 12:04 | |
Tests would be good. You could use test.support.bigmemtest. |
|||
| msg175241 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2012年11月09日 14:41 | |
Here are patches for different Python versions. Test added. Victor, now you can try it on 12GB. Unfortunately, I can't run the tests. |
|||
| msg175832 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2012年11月17日 23:42 | |
Terry, can you measure how much memory tests really needed (3.2 and 3.3 should want different quantity)? Looks as I wrong in my assumptions. |
|||
| msg175843 - (view) | Author: Terry J. Reedy (terry.reedy) * (Python committer) | Date: 2012年11月18日 04:43 | |
Serhiy, please be more specific as to 'measure' and 'how much' for what effect. I ran two examples, one ran (with error), the other raised. |
|||
| msg175845 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2012年11月18日 08:31 | |
I add tests. Victor ran the test and got MemoryError. This means that I incorrectly calculated the minimal memory size for bigmem. This is unacceptable, the test should skip or pass. Only someone with enough memory for test can measure a minimal memory requirement (I don't know how to do this). May be apply the fix without a test? You tested this manually and this test too cumbersome for regular automatic testing. |
|||
| msg180332 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2013年01月21日 09:46 | |
New changeset 7625866f8127 by Serhiy Storchaka in branch '3.2': Issue #16335: Fix integer overflow in unicode-escape decoder. http://hg.python.org/cpython/rev/7625866f8127 New changeset 494d341e9143 by Serhiy Storchaka in branch '3.3': Issue #16335: Fix integer overflow in unicode-escape decoder. http://hg.python.org/cpython/rev/494d341e9143 New changeset 8488febf7d79 by Serhiy Storchaka in branch 'default': Issue #16335: Fix integer overflow in unicode-escape decoder. http://hg.python.org/cpython/rev/8488febf7d79 |
|||
| msg180333 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2013年01月21日 09:49 | |
New changeset f4d30d1a529e by Serhiy Storchaka in branch '2.7': Issue #16335: Fix integer overflow in unicode-escape decoder. http://hg.python.org/cpython/rev/f4d30d1a529e |
|||
| msg180334 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2013年01月21日 09:51 | |
I rewrote the test in EAFP style. |
|||
| msg180336 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2013年01月21日 11:06 | |
New changeset f84a6c89ccbc by Serhiy Storchaka in branch '3.2': Fix memory error in test_ucn. http://hg.python.org/cpython/rev/f84a6c89ccbc New changeset 7c2aae472b27 by Serhiy Storchaka in branch '3.3': Fix memory error in test_ucn. http://hg.python.org/cpython/rev/7c2aae472b27 New changeset f90d6ce49772 by Serhiy Storchaka in branch 'default': Fix memory error in test_ucn. http://hg.python.org/cpython/rev/f90d6ce49772 New changeset 38a10d0778d2 by Serhiy Storchaka in branch '2.7': Fix memory error in test_ucn. http://hg.python.org/cpython/rev/38a10d0778d2 |
|||
| msg180348 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2013年01月21日 18:30 | |
New changeset ec3a35ab3659 by Serhiy Storchaka in branch '2.7': Add bigmemtest decorator to test of issue #16335. http://hg.python.org/cpython/rev/ec3a35ab3659 New changeset 6e0c3e4136b1 by Serhiy Storchaka in branch '3.2': Add bigmemtest decorator to test of issue #16335. http://hg.python.org/cpython/rev/6e0c3e4136b1 New changeset 0e622d2cbcf8 by Serhiy Storchaka in branch '3.3': Use bigmemtest decorator for test of issue #16335. http://hg.python.org/cpython/rev/0e622d2cbcf8 New changeset cdd1e60d31e5 by Serhiy Storchaka in branch 'default': Use bigmemtest decorator for test of issue #16335. http://hg.python.org/cpython/rev/cdd1e60d31e5 |
|||
| msg180552 - (view) | Author: Stefan Krah (skrah) * (Python committer) | Date: 2013年01月25日 00:14 | |
I just ran the 2.7 tests while dealing with another issue, and
I'm getting a memory error or excessive swapping in test_ucn:
The statement
x = b'\\N{SPACE' + b'x' * int(_testcapi.UINT_MAX + 1) + b'}'
uses over 8GB on my system, so I think that
minsize=_testcapi.UINT_MAX + 1
is too low.
|
|||
| msg180556 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2013年01月25日 08:18 | |
New changeset fc21f8e83062 by Serhiy Storchaka in branch '2.7': Don't run the test for issue #16335 when -M is not specified. http://hg.python.org/cpython/rev/fc21f8e83062 New changeset e3d1b68d34e3 by Serhiy Storchaka in branch '3.2': Increase the memory limit in the test for issue #16335. http://hg.python.org/cpython/rev/e3d1b68d34e3 New changeset 43907b88ce93 by Serhiy Storchaka in branch '3.3': Increase the memory limit in the test for issue #16335. http://hg.python.org/cpython/rev/43907b88ce93 New changeset fcdb35b114ab by Serhiy Storchaka in branch 'default': Increase the memory limit in the test for issue #16335. http://hg.python.org/cpython/rev/fcdb35b114ab |
|||
| msg180558 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2013年01月25日 08:31 | |
Bigmem test in 2.7 ran even if -M option is not specified and this causes the memory error. But memuse parameter should be increased (I tested with smaller sizes and found that 1 + 4 // len(u'\U00010000') is not enough, but 2 + 4 // len(u'\U00010000') is enough). Let's see if it helps. |
|||
| msg180559 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2013年01月25日 09:00 | |
> Bigmem test in 2.7 ran even if -M option is not specified and this > causes the memory error. Ah, yes, that's because you should have used `size` instead of `_testcapi.UINT_MAX` inside the test. |
|||
| msg180560 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2013年01月25日 09:43 | |
> Ah, yes, that's because you should have used `size` instead > of `_testcapi.UINT_MAX` inside the test. This test has sense only if size % (_testcapi.UINT_MAX + 1) == 0. |
|||
| msg180561 - (view) | Author: Stefan Krah (skrah) * (Python committer) | Date: 2013年01月25日 09:55 | |
The test is fixed here, thanks. The limits appear to be different in 2.7 and 3.4: In 2.7 the bigmem tests are executed with -M x > 16G, in 3.4 with -M x >= 12G. I don't know if that's deliberate, just mentioning it. |
|||
| msg180563 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2013年01月25日 10:20 | |
Due to PEP 393 Python 3.3+ requires less memory for temporary output buffer. As for difference between ">" and ">=", the meaning of -M parameter a little differs in 2.7 and 3.x -- in 2.7 some overhead (5MiB) counted up. |
|||
| msg180569 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2013年01月25日 11:37 | |
> Serhiy Storchaka added the comment: > > > Ah, yes, that's because you should have used `size` instead > > of `_testcapi.UINT_MAX` inside the test. > > This test has sense only if size % (_testcapi.UINT_MAX + 1) == 0. Why so? Does it fail otherwise? |
|||
| msg180570 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2013年01月25日 11:57 | |
The test passed in any case, but for different size it doesn't check that the bug is fixed. Due to the bug bytes b'\\N{SPACExxxxxxxxxxxx...xxx'}' decoded as b'\\N{SPACE'}' if the number of x-es divisible by (UINT_MAX + 1). In this case unicode-escape decoding doesn't failed when the bug not fixed and failed (as expected) when the bug fixed. For all other numbers (>0) the decoding fails as when the bug fixed so when the bug is not fixed. And for other numbers the test is not relevant.
|
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:37 | admin | set | github: 60539 |
| 2013年01月28日 13:29:44 | serhiy.storchaka | set | status: open -> closed |
| 2013年01月25日 11:57:20 | serhiy.storchaka | set | messages: + msg180570 |
| 2013年01月25日 11:37:44 | pitrou | set | messages: + msg180569 |
| 2013年01月25日 10:20:52 | serhiy.storchaka | set | messages: + msg180563 |
| 2013年01月25日 09:55:17 | skrah | set | messages: + msg180561 |
| 2013年01月25日 09:43:49 | serhiy.storchaka | set | messages: + msg180560 |
| 2013年01月25日 09:00:14 | pitrou | set | messages: + msg180559 |
| 2013年01月25日 08:31:42 | serhiy.storchaka | set | status: closed -> open messages: + msg180558 |
| 2013年01月25日 08:18:01 | python-dev | set | messages: + msg180556 |
| 2013年01月25日 00:14:46 | skrah | set | nosy:
+ skrah messages: + msg180552 |
| 2013年01月21日 18:30:04 | python-dev | set | messages: + msg180348 |
| 2013年01月21日 11:06:21 | python-dev | set | messages: + msg180336 |
| 2013年01月21日 09:51:30 | serhiy.storchaka | set | status: open -> closed resolution: fixed messages: + msg180334 stage: patch review -> resolved |
| 2013年01月21日 09:49:14 | python-dev | set | messages: + msg180333 |
| 2013年01月21日 09:46:20 | python-dev | set | nosy:
+ python-dev messages: + msg180332 |
| 2013年01月07日 18:37:35 | serhiy.storchaka | set | assignee: serhiy.storchaka |
| 2012年11月18日 08:31:55 | serhiy.storchaka | set | messages: + msg175845 |
| 2012年11月18日 04:43:55 | terry.reedy | set | messages: + msg175843 |
| 2012年11月17日 23:42:55 | serhiy.storchaka | set | messages: + msg175832 |
| 2012年11月09日 14:41:48 | serhiy.storchaka | set | files:
+ decode_unicode_escape_overflow-3.3.patch, decode_unicode_escape_overflow-3.2.patch, decode_unicode_escape_overflow-2.7.patch messages: + msg175241 |
| 2012年11月09日 14:41:43 | serhiy.storchaka | set | files: - decode_unicode_escape_overflow.patch |
| 2012年11月09日 12:04:45 | ezio.melotti | set | messages: + msg175226 |
| 2012年11月03日 00:56:56 | terry.reedy | set | messages: + msg174597 |
| 2012年11月02日 21:30:02 | serhiy.storchaka | set | messages: + msg174585 |
| 2012年11月02日 21:14:48 | terry.reedy | set | nosy:
+ terry.reedy messages: + msg174579 |
| 2012年10月31日 22:35:49 | serhiy.storchaka | set | messages: + msg174383 |
| 2012年10月31日 22:08:32 | vstinner | set | messages: + msg174381 |
| 2012年10月30日 09:39:05 | serhiy.storchaka | set | messages: + msg174190 |
| 2012年10月30日 00:48:09 | vstinner | set | messages: + msg174169 |
| 2012年10月26日 22:55:21 | serhiy.storchaka | create | |