This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2013年10月17日 09:55 by glebourgeois, last changed 2022年04月11日 14:57 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| utf7_errors.patch | serhiy.storchaka, 2013年10月17日 16:29 | review | ||
| utf7_errors-2.7.patch | serhiy.storchaka, 2013年10月18日 13:47 | review | ||
| Messages (19) | |||
|---|---|---|---|
| msg200117 - (view) | Author: Guillaume Lebourgeois (glebourgeois) | Date: 2013年10月17日 09:55 | |
After the fetch of a webpage with a wrongly declared encoding, the use of codecs module for a conversion crashes. The issue is reproducible this way : >>> content = b"+1911\' rel=\'stylesheet\' type=\'text/css\' />\n<link rel="alternate" type="application/rss+xml" >>> codecs.utf_7_decode(content, "replace", True) Traceback (most recent call last): File "<stdin>", line 1, in <module> SystemError: invalid maximum character passed to PyUnicode_New Original issue here : https://github.com/kennethreitz/requests/issues/1682 |
|||
| msg200132 - (view) | Author: Matthew Barnett (mrabarnett) * (Python triager) | Date: 2013年10月17日 14:54 | |
The bytestring literal isn't valid. It starts with b" and later on has an unescaped " followed by more characters.
Also, the usual way to decode by using the .decode method.
I get this:
>>> content = b"+1911\' rel=\'stylesheet\' type=\'text/css\' />\n<link rel=\"alternate\" type=\"application/rss+xml\""
>>> content.decode("utf-7", "strict")
Traceback (most recent call last):
File "<pyshell#10>", line 1, in <module>
content.decode("utf-7", "strict")
File "C:\Python33\lib\encodings\utf_7.py", line 12, in decode
return codecs.utf_7_decode(input, errors, True)
UnicodeDecodeError: 'utf7' codec can't decode bytes in position 0-5: partial character in shift sequence
|
|||
| msg200133 - (view) | Author: Guillaume Lebourgeois (glebourgeois) | Date: 2013年10月17日 15:07 | |
My fault, bad paste. Should have written : >>> content = b'+1911\' rel=\'stylesheet\' type=\'text/css\' />\n<link rel="alternate" type="application/rss+xml' >>> codecs.utf_7_decode(content, "replace", True) Traceback (most recent call last): File "<stdin>", line 1, in <module> SystemError: invalid maximum character passed to PyUnicode_New |
|||
| msg200134 - (view) | Author: Guillaume Lebourgeois (glebourgeois) | Date: 2013年10月17日 15:13 | |
"Also, the usual way to decode by using the .decode method."
The original bug happened using requests library, so I have no leverage on the used method for decoding.
But if you used the "replace" mode with your methodology, you would have raised the same Exception :
>>> content = b'+1911\' rel=\'stylesheet\' type=\'text/css\' />\n<link rel="alternate" type="application/rss+xml'
>>> content.decode("utf-7", "replace")
File "<stdin>", line 1, in <module>
File "/lib/python3.3/encodings/utf_7.py", line 12, in decode
return codecs.utf_7_decode(input, errors, True)
SystemError: invalid maximum character passed to PyUnicode_New
|
|||
| msg200135 - (view) | Author: Alyssa Coghlan (ncoghlan) * (Python committer) | Date: 2013年10月17日 15:41 | |
Indeed, 'utf-7' and the 'replace' error handler don't get along in this case. |
|||
| msg200136 - (view) | Author: Alyssa Coghlan (ncoghlan) * (Python committer) | Date: 2013年10月17日 15:41 | |
That is, I can locally reproduce the behaviour Guillaume describes on the latest tip build. |
|||
| msg200144 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2013年10月17日 16:29 | |
Here is a patch for 3.3+.
Other versions are affected too. They don't raise SystemError, but produce illegal unicode string on wide build.
E.g. in Python 2.7:
>>> 'a+/,+IKw-b'.decode('utf-7', 'replace')
u'a\ufffd\U003f20acb'
\U003f20ac is illegal code.
As encoding and encoded data can come from external source, this can be used in secure attacks.
|
|||
| msg200253 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2013年10月18日 13:47 | |
And here is a patch for 2.7. |
|||
| msg200263 - (view) | Author: Barry A. Warsaw (barry) * (Python committer) | Date: 2013年10月18日 14:33 | |
2.6.9 doesn't produce a SystemError afaict:
Python 2.6.9rc1+ (unknown, Oct 18 2013, 10:29:22)
[GCC 4.4.3] on linux3
Type "help", "copyright", "credits" or "license" for more information.
>>> content = b'+1911\' rel=\'stylesheet\' type=\'text/css\' />\n<link rel="alternate" type="application/rss+xml'
>>> content.decode("utf-7", "replace")
u'\ud7dd\ufffd rel=\'stylesheet\' type=\'text\ufffdcss\' \ufffd>\n<link rel="alternate" type="application\ufffdrss\uc669\ufffd'
|
|||
| msg200264 - (view) | Author: Barry A. Warsaw (barry) * (Python committer) | Date: 2013年10月18日 14:36 | |
On Oct 18, 2013, at 02:33 PM, Barry A. Warsaw wrote: >2.6.9 doesn't produce a SystemError afaict: Please note that 2.6.9 is security only, so the threshold for worrying about things is a remotely exploitable security vulnerability that cannot be reasonably worked around in Python code. |
|||
| msg200353 - (view) | Author: Larry Hastings (larry) * (Python committer) | Date: 2013年10月19日 01:24 | |
Ping. Please fix before "beta 1". |
|||
| msg200450 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2013年10月19日 17:39 | |
New changeset 214c0aac7540 by Serhiy Storchaka in branch '2.7': Issue #19279: UTF-7 decoder no more produces illegal unicode strings. http://hg.python.org/cpython/rev/214c0aac7540 New changeset f471f2f05621 by Serhiy Storchaka in branch '3.3': Issue #19279: UTF-7 decoder no more produces illegal strings. http://hg.python.org/cpython/rev/f471f2f05621 New changeset 7dde9c553f16 by Serhiy Storchaka in branch 'default': Issue #19279: UTF-7 decoder no more produces illegal strings. http://hg.python.org/cpython/rev/7dde9c553f16 |
|||
| msg200465 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2013年10月19日 18:17 | |
New changeset 73ab6aba24e5 by Serhiy Storchaka in branch '3.3': Fixed tests for issue #19279. http://hg.python.org/cpython/rev/73ab6aba24e5 |
|||
| msg201508 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2013年10月27日 23:26 | |
@Serhiy: What is the status of the issue? |
|||
| msg201515 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2013年10月28日 06:27 | |
The bug is fixed on maintenance releases. Maintainer of 3.2 can backport the fix to 3.2 if it worth. |
|||
| msg207788 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2014年01月09日 19:39 | |
Georg, is this issue wort to be fixed in 3.2? If yes, use the patch against 2.7. |
|||
| msg215458 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2014年04月03日 17:00 | |
> Georg, is this issue wort to be fixed in 3.2? If yes, use the patch against 2.7. Ping? |
|||
| msg222203 - (view) | Author: Mark Lawrence (BreamoreBoy) * | Date: 2014年07月03日 17:51 | |
To repeat the question do we or don't we fix this in 3.2? |
|||
| msg222223 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2014年07月03日 21:41 | |
I suggest to close the issue. It's "just" another way to crash Python 3.2, like any other bug fix. Python 3.2 does not accept bug fixes anymore. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:52 | admin | set | github: 63478 |
| 2014年07月04日 18:39:11 | serhiy.storchaka | set | title: UTF-7 can produce inconsistent Unicode string -> UTF-7 decoder can produce inconsistent Unicode string |
| 2014年07月04日 18:38:35 | serhiy.storchaka | set | status: open -> closed title: UTF-7 to UTF-8 decoding crash -> UTF-7 can produce inconsistent Unicode string stage: patch review -> resolved resolution: fixed versions: + Python 2.7, Python 3.3, Python 3.4, - Python 3.2 |
| 2014年07月03日 21:41:45 | vstinner | set | messages: + msg222223 |
| 2014年07月03日 17:51:26 | BreamoreBoy | set | nosy:
+ BreamoreBoy messages: + msg222203 |
| 2014年04月03日 17:00:34 | vstinner | set | messages: + msg215458 |
| 2014年01月09日 19:39:08 | serhiy.storchaka | set | messages: + msg207788 |
| 2013年11月22日 07:09:49 | mcepl | set | nosy:
+ mcepl |
| 2013年10月28日 06:27:12 | serhiy.storchaka | set | messages: + msg201515 |
| 2013年10月27日 23:26:38 | vstinner | set | messages: + msg201508 |
| 2013年10月22日 17:31:20 | serhiy.storchaka | set | assignee: serhiy.storchaka -> versions: - Python 2.7, Python 3.3, Python 3.4 |
| 2013年10月19日 18:17:20 | python-dev | set | messages: + msg200465 |
| 2013年10月19日 17:39:55 | python-dev | set | nosy:
+ python-dev messages: + msg200450 |
| 2013年10月19日 01:24:47 | larry | set | messages: + msg200353 |
| 2013年10月18日 14:40:57 | barry | set | versions: - Python 2.6 |
| 2013年10月18日 14:36:25 | barry | set | messages: + msg200264 |
| 2013年10月18日 14:33:18 | barry | set | messages: + msg200263 |
| 2013年10月18日 13:47:06 | serhiy.storchaka | set | files:
+ utf7_errors-2.7.patch messages: + msg200253 |
| 2013年10月18日 10:32:53 | piotr.dobrogost | set | nosy:
+ piotr.dobrogost |
| 2013年10月17日 16:29:57 | serhiy.storchaka | set | files:
+ utf7_errors.patch priority: normal -> release blocker type: crash -> security versions: + Python 2.6, Python 2.7, Python 3.2 keywords: + patch nosy: + larry, benjamin.peterson, barry, georg.brandl messages: + msg200144 stage: needs patch -> patch review |
| 2013年10月17日 15:41:54 | ncoghlan | set | messages: + msg200136 |
| 2013年10月17日 15:41:05 | ncoghlan | set | nosy:
+ ncoghlan messages: + msg200135 |
| 2013年10月17日 15:13:00 | glebourgeois | set | messages: + msg200134 |
| 2013年10月17日 15:07:30 | glebourgeois | set | messages: + msg200133 |
| 2013年10月17日 14:54:02 | mrabarnett | set | nosy:
+ mrabarnett messages: + msg200132 |
| 2013年10月17日 10:02:11 | vstinner | set | nosy:
+ vstinner |
| 2013年10月17日 09:57:27 | serhiy.storchaka | set | versions:
+ Python 3.4 nosy: + ezio.melotti, serhiy.storchaka assignee: serhiy.storchaka components: + Unicode stage: needs patch |
| 2013年10月17日 09:55:36 | glebourgeois | create | |