homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: MemoryError with custom error handlers and multibyte codecs
Type: resource usage Stage: resolved
Components: Interpreter Core Versions: Python 3.4, Python 3.5, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: alexer, lemburg, loewis, python-dev, r.david.murray, serhiy.storchaka, vstinner
Priority: normal Keywords: patch

Created on 2015年01月10日 03:33 by alexer, last changed 2022年04月11日 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
python_codec_crasher.py alexer, 2015年01月10日 03:33
python_codec_crash_fix.patch alexer, 2015年01月10日 03:39 review
python_codec_crash_fix_2.patch serhiy.storchaka, 2015年02月15日 17:45 review
Messages (5)
msg233800 - (view) Author: Aleksi Torhamo (alexer) * Date: 2015年01月10日 03:33
Using a multibyte codec and a custom error handler that ignores errors to encode a string that contains characters not representable in said encoding causes exponential growth of the output buffer, raising MemoryError.
The problem is in multibytecodec_encerror() and REQUIRE_ENCODEBUFFER() in Modules/cjkcodecs/multibytecodec.c. multibytecodec_encerror() always uses REQUIRE_ENCODEBUFFER() to ensure there's enough space for the replacement string, and if more space is needed, REQUIRE_ENCODEBUFFER() calls expand_encodebuffer(), which in turn always grows the buffer by at least 50%. However, if size < 1, REQUIRE_ENCODEBUFFER() doesn't check if more space is actually needed. (It's used with negative values in other places)
I have no idea why the condition was originally size < 1 instead of size < 0, but changing it seems to fix this. The replacement string case is also the only use of the macro that may use 0 as the argument. 
In the patch, I've instead wrapped the REQUIRE_ENCODEBUFFER() (and memcpy) in a if(size > 0), since that's what the corresponding part in multibytecodec_decerror() did in the past:
https://hg.python.org/cpython/file/1c3f8d044589/Modules/cjkcodecs/multibytecodec.c#l438
Not sure which one makes more sense.
As for the tests, I'm not sure if 1) all of the affected encodings should be tested or only one (or even all encodings, affected or not?) and 2) whether it should be a new test or if I should just add it to test_longstrings in Lib/test/test_codeccallbacks.py. (Structurally it's a perfect fit, but it really isn't a "long string" test as it can happen with <50 characters) At the moment, the patch is testing affected encodings in a separate test.
Is the test philosophy "as thorough as possible" or "as fast as possible"?
msg234687 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015年01月25日 20:49
New changeset 3a9b1e5fe179 by R David Murray in branch '3.4':
#23215: reflow paragraph.
https://hg.python.org/cpython/rev/3a9b1e5fe179
New changeset 52a06812d5da by R David Murray in branch 'default':
Merge: #23215: note that time.sleep affects the current thread only.
https://hg.python.org/cpython/rev/52a06812d5da 
msg234689 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015年01月25日 20:51
Oops, typoed the issue number. That should have been 23251.
msg236054 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015年02月15日 17:45
Thank you for your patch Aleksi. It LGTM in general. Updated patch just moves the test to Lib/test/test_multibytecodec.py where it can reuse ALL_CJKENCODINGS and fixes few other minor bugs in multibyte codecs.
msg236343 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015年02月20日 23:23
New changeset af8089217cc6 by Serhiy Storchaka in branch '2.7':
Issue #23215: Multibyte codecs with custom error handlers that ignores errors
https://hg.python.org/cpython/rev/af8089217cc6
New changeset 4dc8b7ed8973 by Serhiy Storchaka in branch '3.4':
Issue #23215: Multibyte codecs with custom error handlers that ignores errors
https://hg.python.org/cpython/rev/4dc8b7ed8973
New changeset 5620691ce26b by Serhiy Storchaka in branch 'default':
Issue #23215: Multibyte codecs with custom error handlers that ignores errors
https://hg.python.org/cpython/rev/5620691ce26b 
History
Date User Action Args
2022年04月11日 14:58:11adminsetgithub: 67404
2015年02月20日 23:28:25serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2015年02月20日 23:23:33python-devsetmessages: + msg236343
2015年02月15日 17:45:41serhiy.storchakasetfiles: + python_codec_crash_fix_2.patch

messages: + msg236054
2015年02月15日 17:05:38serhiy.storchakasetassignee: serhiy.storchaka
2015年01月25日 20:51:59r.david.murraysetnosy: + r.david.murray
messages: + msg234689
2015年01月25日 20:49:58python-devsetnosy: + python-dev
messages: + msg234687
2015年01月10日 07:15:21serhiy.storchakasetnosy: + lemburg, loewis, vstinner, serhiy.storchaka
stage: patch review

versions: - Python 3.2, Python 3.3
2015年01月10日 03:39:12alexersetfiles: + python_codec_crash_fix.patch
keywords: + patch
2015年01月10日 03:33:03alexercreate

AltStyle によって変換されたページ (->オリジナル) /