homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: UTF-7 decoder can produce inconsistent Unicode string
Type: security Stage: resolved
Components: Library (Lib), Unicode Versions: Python 3.3, Python 3.4, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: BreamoreBoy, barry, benjamin.peterson, ezio.melotti, georg.brandl, glebourgeois, larry, mcepl, mrabarnett, ncoghlan, piotr.dobrogost, python-dev, serhiy.storchaka, vstinner
Priority: release blocker Keywords: patch

Created on 2013年10月17日 09:55 by glebourgeois, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
utf7_errors.patch serhiy.storchaka, 2013年10月17日 16:29 review
utf7_errors-2.7.patch serhiy.storchaka, 2013年10月18日 13:47 review
Messages (19)
msg200117 - (view) Author: Guillaume Lebourgeois (glebourgeois) Date: 2013年10月17日 09:55
After the fetch of a webpage with a wrongly declared encoding, the use of codecs module for a conversion crashes.
The issue is reproducible this way : 
>>> content = b"+1911\' rel=\'stylesheet\' type=\'text/css\' />\n<link rel="alternate" type="application/rss+xml"
>>> codecs.utf_7_decode(content, "replace", True)
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
SystemError: invalid maximum character passed to PyUnicode_New
Original issue here : https://github.com/kennethreitz/requests/issues/1682 
msg200132 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2013年10月17日 14:54
The bytestring literal isn't valid. It starts with b" and later on has an unescaped " followed by more characters.
Also, the usual way to decode by using the .decode method.
I get this:
>>> content = b"+1911\' rel=\'stylesheet\' type=\'text/css\' />\n<link rel=\"alternate\" type=\"application/rss+xml\""
>>> content.decode("utf-7", "strict")
Traceback (most recent call last):
 File "<pyshell#10>", line 1, in <module>
 content.decode("utf-7", "strict")
 File "C:\Python33\lib\encodings\utf_7.py", line 12, in decode
 return codecs.utf_7_decode(input, errors, True)
UnicodeDecodeError: 'utf7' codec can't decode bytes in position 0-5: partial character in shift sequence
msg200133 - (view) Author: Guillaume Lebourgeois (glebourgeois) Date: 2013年10月17日 15:07
My fault, bad paste. Should have written : 
>>> content = b'+1911\' rel=\'stylesheet\' type=\'text/css\' />\n<link rel="alternate" type="application/rss+xml'
>>> codecs.utf_7_decode(content, "replace", True)
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
SystemError: invalid maximum character passed to PyUnicode_New
msg200134 - (view) Author: Guillaume Lebourgeois (glebourgeois) Date: 2013年10月17日 15:13
"Also, the usual way to decode by using the .decode method."
The original bug happened using requests library, so I have no leverage on the used method for decoding.
But if you used the "replace" mode with your methodology, you would have raised the same Exception : 
>>> content = b'+1911\' rel=\'stylesheet\' type=\'text/css\' />\n<link rel="alternate" type="application/rss+xml'
>>> content.decode("utf-7", "replace")
File "<stdin>", line 1, in <module>
 File "/lib/python3.3/encodings/utf_7.py", line 12, in decode
 return codecs.utf_7_decode(input, errors, True)
SystemError: invalid maximum character passed to PyUnicode_New
msg200135 - (view) Author: Alyssa Coghlan (ncoghlan) * (Python committer) Date: 2013年10月17日 15:41
Indeed, 'utf-7' and the 'replace' error handler don't get along in this case.
msg200136 - (view) Author: Alyssa Coghlan (ncoghlan) * (Python committer) Date: 2013年10月17日 15:41
That is, I can locally reproduce the behaviour Guillaume describes on the latest tip build.
msg200144 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013年10月17日 16:29
Here is a patch for 3.3+.
Other versions are affected too. They don't raise SystemError, but produce illegal unicode string on wide build.
E.g. in Python 2.7:
>>> 'a+/,+IKw-b'.decode('utf-7', 'replace')
u'a\ufffd\U003f20acb'
\U003f20ac is illegal code.
As encoding and encoded data can come from external source, this can be used in secure attacks.
msg200253 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013年10月18日 13:47
And here is a patch for 2.7.
msg200263 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2013年10月18日 14:33
2.6.9 doesn't produce a SystemError afaict:
Python 2.6.9rc1+ (unknown, Oct 18 2013, 10:29:22) 
[GCC 4.4.3] on linux3
Type "help", "copyright", "credits" or "license" for more information.
>>> content = b'+1911\' rel=\'stylesheet\' type=\'text/css\' />\n<link rel="alternate" type="application/rss+xml'
>>> content.decode("utf-7", "replace")
u'\ud7dd\ufffd rel=\'stylesheet\' type=\'text\ufffdcss\' \ufffd>\n<link rel="alternate" type="application\ufffdrss\uc669\ufffd'
msg200264 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2013年10月18日 14:36
On Oct 18, 2013, at 02:33 PM, Barry A. Warsaw wrote:
>2.6.9 doesn't produce a SystemError afaict:
Please note that 2.6.9 is security only, so the threshold for worrying about
things is a remotely exploitable security vulnerability that cannot be
reasonably worked around in Python code.
msg200353 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2013年10月19日 01:24
Ping. Please fix before "beta 1".
msg200450 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013年10月19日 17:39
New changeset 214c0aac7540 by Serhiy Storchaka in branch '2.7':
Issue #19279: UTF-7 decoder no more produces illegal unicode strings.
http://hg.python.org/cpython/rev/214c0aac7540
New changeset f471f2f05621 by Serhiy Storchaka in branch '3.3':
Issue #19279: UTF-7 decoder no more produces illegal strings.
http://hg.python.org/cpython/rev/f471f2f05621
New changeset 7dde9c553f16 by Serhiy Storchaka in branch 'default':
Issue #19279: UTF-7 decoder no more produces illegal strings.
http://hg.python.org/cpython/rev/7dde9c553f16 
msg200465 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013年10月19日 18:17
New changeset 73ab6aba24e5 by Serhiy Storchaka in branch '3.3':
Fixed tests for issue #19279.
http://hg.python.org/cpython/rev/73ab6aba24e5 
msg201508 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013年10月27日 23:26
@Serhiy: What is the status of the issue?
msg201515 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013年10月28日 06:27
The bug is fixed on maintenance releases. Maintainer of 3.2 can backport the fix to 3.2 if it worth.
msg207788 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014年01月09日 19:39
Georg, is this issue wort to be fixed in 3.2? If yes, use the patch against 2.7.
msg215458 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014年04月03日 17:00
> Georg, is this issue wort to be fixed in 3.2? If yes, use the patch against 2.7.
Ping?
msg222203 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2014年07月03日 17:51
To repeat the question do we or don't we fix this in 3.2?
msg222223 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014年07月03日 21:41
I suggest to close the issue. It's "just" another way to crash Python 3.2, like any other bug fix. Python 3.2 does not accept bug fixes anymore.
History
Date User Action Args
2022年04月11日 14:57:52adminsetgithub: 63478
2014年07月04日 18:39:11serhiy.storchakasettitle: UTF-7 can produce inconsistent Unicode string -> UTF-7 decoder can produce inconsistent Unicode string
2014年07月04日 18:38:35serhiy.storchakasetstatus: open -> closed
title: UTF-7 to UTF-8 decoding crash -> UTF-7 can produce inconsistent Unicode string
stage: patch review -> resolved
resolution: fixed
versions: + Python 2.7, Python 3.3, Python 3.4, - Python 3.2
2014年07月03日 21:41:45vstinnersetmessages: + msg222223
2014年07月03日 17:51:26BreamoreBoysetnosy: + BreamoreBoy
messages: + msg222203
2014年04月03日 17:00:34vstinnersetmessages: + msg215458
2014年01月09日 19:39:08serhiy.storchakasetmessages: + msg207788
2013年11月22日 07:09:49mceplsetnosy: + mcepl
2013年10月28日 06:27:12serhiy.storchakasetmessages: + msg201515
2013年10月27日 23:26:38vstinnersetmessages: + msg201508
2013年10月22日 17:31:20serhiy.storchakasetassignee: serhiy.storchaka ->
versions: - Python 2.7, Python 3.3, Python 3.4
2013年10月19日 18:17:20python-devsetmessages: + msg200465
2013年10月19日 17:39:55python-devsetnosy: + python-dev
messages: + msg200450
2013年10月19日 01:24:47larrysetmessages: + msg200353
2013年10月18日 14:40:57barrysetversions: - Python 2.6
2013年10月18日 14:36:25barrysetmessages: + msg200264
2013年10月18日 14:33:18barrysetmessages: + msg200263
2013年10月18日 13:47:06serhiy.storchakasetfiles: + utf7_errors-2.7.patch

messages: + msg200253
2013年10月18日 10:32:53piotr.dobrogostsetnosy: + piotr.dobrogost
2013年10月17日 16:29:57serhiy.storchakasetfiles: + utf7_errors.patch
priority: normal -> release blocker
type: crash -> security

versions: + Python 2.6, Python 2.7, Python 3.2
keywords: + patch
nosy: + larry, benjamin.peterson, barry, georg.brandl

messages: + msg200144
stage: needs patch -> patch review
2013年10月17日 15:41:54ncoghlansetmessages: + msg200136
2013年10月17日 15:41:05ncoghlansetnosy: + ncoghlan
messages: + msg200135
2013年10月17日 15:13:00glebourgeoissetmessages: + msg200134
2013年10月17日 15:07:30glebourgeoissetmessages: + msg200133
2013年10月17日 14:54:02mrabarnettsetnosy: + mrabarnett
messages: + msg200132
2013年10月17日 10:02:11vstinnersetnosy: + vstinner
2013年10月17日 09:57:27serhiy.storchakasetversions: + Python 3.4
nosy: + ezio.melotti, serhiy.storchaka

assignee: serhiy.storchaka
components: + Unicode
stage: needs patch
2013年10月17日 09:55:36glebourgeoiscreate

AltStyle によって変換されたページ (->オリジナル) /