This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2008年01月27日 22:44 by alexandre.vassalotti, last changed 2022年04月11日 14:56 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| unicode_string_overflow.patch | alexandre.vassalotti, 2008年01月27日 22:44 | |||
| unicode_string_overflow-2.patch | alexandre.vassalotti, 2008年04月12日 21:56 | |||
| Messages (15) | |||
|---|---|---|---|
| msg61753 - (view) | Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) | Date: 2008年01月27日 22:44 | |
I have found a few instances of the following pattern in Py3k: char buf[MAX]; len = PyUnicode_GET_SIZE(str); if (len >= MAX) /* return error */ strcpy(buf, PyUnicode_AsString(str)); which could overflow if str contains non-ASCII characters. These were probably introduced during the PyString -> PyUnicode transition. Anyway, I got a patch that fixes (hopefully) most of these bugs. |
|||
| msg63360 - (view) | Author: Alexander Belopolsky (belopolsky) * (Python committer) | Date: 2008年03月07日 18:43 | |
Your description of the patch is a bit misleading. As far as I can tell only the first chunk (Python/import.c changes) addresses a potential buffer overflow. For example the last chunk (Modules/posixmodule.c changes) simply eliminates an unused variable. While a worthwhile change, it should not be bundled with what is potentially a security patch. I have a few suggestions: 1. It will really help if you produce a test case that crashes the interpretor. I am sure that will get noticed. 2. If any of buffer overflows apply to the current production versions (2.4 or 2.5) or even the alpha release (2.6a1), it would make sense to backport it to the trunk. Once again, security issues in the trunk will get noticed much faster than in py3k branch. |
|||
| msg63363 - (view) | Author: Alexander Belopolsky (belopolsky) * (Python committer) | Date: 2008年03月07日 20:19 | |
I tried to produce a buffer overflow in get_parent (import.c), but an
attempt to import a module with non-ascii characters is aborted in
getargs.c before get_parent is reached:
>>> __import__("\u0080xyz")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: __import__() argument 1 must be string without null bytes,
not str
This looks like a bug. At the very least the error message is
misleading because there are no null bytes in "\u0080xyz" string.
The offending code is
if ((Py_ssize_t)strlen(*p) !=
PyUnicode_GetSize(arg))
return converterr("string without null
bytes",
arg, msgbuf, bufsize);
at getargs.c:826
However, given the preceding "XXX WAAAAH!" comment, this is probably a
sign of not yet implemented feature rather than a bug.
|
|||
| msg63369 - (view) | Author: Alexander Belopolsky (belopolsky) * (Python committer) | Date: 2008年03月07日 22:00 | |
Here are my comments on the other parts of the patch: Python/structmember.c The existing code is safe, but would silently produce wrong result if T_CHAR attribute is assigned a non-ascii character. With the patch this situation will be detected and an exception raised. I am not sure that would be a desired behavior of py3k. I could not find any examples of using T_CHAR member in the stdlib, but an alternative solution would be to change T_CHAR code to mean PY_UNICODE_TYPE instead of char member. Objects/typeobject.c "%s" -> ".400s" is an obviously good change. The existing __doc__ processing code is correct. Proposed code may be marginally faster, but will allow docstrings with embedded null characters, which may or may not be desirable (and may break other code that uses tp_doc). Finally PyUnicode_AsStringAndSize always returns null-terminated strings, so memcpy logic does not need to be altered. Objects/structseq.c Change from macros to enums is purely stylistic and python C style seem to favor macros. I don't think a repr of a python object can contain embedded null characters, but even if that were the case, the patched code would not support it because the resulting buffer is returned with PyUnicode_FromString(buf). Modules/datetimemodule.c Existing code compensates for an error in initial estimate of totalnew when it checks for overflow, but the proposed change will make code more efficient. Modules/zipimport.c Since 's' format unit in PyArg_ParseTuple does not properly support unicode yet, it is hard to tell if the current code is wrong, but unicode paths cannot have embedded null characters, so use of 's#' is not necessary. Modules/timemodule.c Supporting format strings with null characters is probably a good idea, but that would be an RFE rather than a bug fix. Modules/parsermodule.c Looks like there is a bug there. |
|||
| msg63433 - (view) | Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) | Date: 2008年03月10日 02:18 | |
Thanks for the review! > Your description of the patch is a bit misleading. As far as I can > tell only the first chunk (Python/import.c changes) addresses a > potential buffer overflow. Yes, you are right. It seems only the bug in import.c could easily be exploited. > 1. It will really help if you produce a test case that crashes the > interpretor. I am sure that will get noticed. % cat pkg/__init__.py __package__ = "\U000c9c9c9" * 900 from . import f % ./python Python 3.0a3+ (py3k:61164, Mar 1 2008, 19:55:42) >>> import pkg *** stack smashing detected ***: ./python terminated [1] 9503 abort (core dumped) ./python > 2. If any of buffer overflows apply to the current production > versions (2.4 or 2.5) or even the alpha release (2.6a1), it would > make sense to backport it to the trunk. I don't think the trunk is affected in any way by the issues mentioned here. > The existing __doc__ processing code is correct. Proposed code may be > marginally faster, but will allow docstrings with embedded null > characters, which may or may not be desirable (and may break other code > that uses tp_doc) Good call! I will check out if null-characters may pose a problem for tp_doc and update the patch consequently. > I don't think a repr of a python object can contain embedded null > characters, but even if that were the case, the patched code would not > support it because the resulting buffer is returned with > PyUnicode_FromString(buf). Oh, that is true. I will remove that part from the patch, then. > Modules/datetimemodule.c > > Existing code compensates for an error in initial estimate of totalnew > when it checks for overflow, but the proposed change will make code more > efficient. Right again. > Modules/zipimport.c [...] > Modules/timemodule.c [...] > Modules/parsermodule.c [...] I need to check again the code for these three modules, before commenting. I will clean up the patch with your recommendation and post it again. Thanks for taking the time to review my patch. It's greatly appreciated. |
|||
| msg63638 - (view) | Author: Guido van Rossum (gvanrossum) * (Python committer) | Date: 2008年03月17日 13:35 | |
Any progress? |
|||
| msg65423 - (view) | Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) | Date: 2008年04月12日 21:56 | |
I revised the patch with respect to Alexander's comments. In summary, here is what I changed from the previous patch: - Removed the unnecessary "fixes" to Objects/structseq.c and Modules/timemodule.c - Updated Objects/typeobject.c to forbid null-bytes in __doc__, since they cannot be handled in the `tp_doc` member. - Removed an erroneous pointer dereference in Modules/zipimport.c - Changed `len+1` to `len` in the memcpy() call in the Modules/parsermodule.c fixes |
|||
| msg65861 - (view) | Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) | Date: 2008年04月27日 01:18 | |
So, any comment on the latest patch? If everything is all right, I would like to commit the patch to py3k. |
|||
| msg65872 - (view) | Author: Alexander Belopolsky (belopolsky) * (Python committer) | Date: 2008年04月27日 04:26 | |
The patch looks good. Just a question: I thought the strings returned by PyUnicode_AsStringAndSize are 0-terminated, while your patch at several places attempts to explicitly 0-terminate a copy of such string. Are you sure this is necessary? |
|||
| msg65960 - (view) | Author: Marc-Andre Lemburg (lemburg) * (Python committer) | Date: 2008年04月29日 11:07 | |
@@ -2195,7 +2200,7 @@
}
return Py_None;
}
- len = lastdot - start;
+ len = (size_t)(lastdot - start);
if (len >= MAXPATHLEN) {
PyErr_SetString(PyExc_ValueError,
"Module name too long");
The above cast needs to be (Py_ssize_t). size_t is an unsigned length type.
|
|||
| msg65962 - (view) | Author: Marc-Andre Lemburg (lemburg) * (Python committer) | Date: 2008年04月29日 11:31 | |
BTW: The API PyUnicode_AsString() is pretty useless by itself - there's no way to access the size information of the returned string without again going to the Unicode object. I'd suggest to remove the API altogether and not only deprecating it. Furthermore, the API PyUnicode_AsStringAndSize() does not follow the API signature of PyString_AsStringAndSize() in that it passes back the pointer to the string as output parameter. That should be changed as well. Note that PyString_AsStringAndSize() already does this for both 8-bit strings and Unicode, so the special Unicode API is not really needed at all or you may want to rename PyString_AsStringAndSize() to PyUnicode_AsStringAndSize(). Finally, since there are many cases where the string buffer contents are copied to a new buffer, it's probably worthwhile to add a new API which does the copying straight away and also deals with the overflow cases in a central place. I'd suggest PyUnicode_AsChar() (with an API like PyUnicode_AsWideChar()). |
|||
| msg66160 - (view) | Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) | Date: 2008年05月03日 17:49 | |
Alexander Belopolsky wrote: > The patch looks good. Just a question: I thought the strings returned > by PyUnicode_AsStringAndSize are 0-terminated, while your patch at > several places attempts to explicitly 0-terminate a copy of such string. > Are you sure this is necessary? I wasn't sure if the strings returned by PyUnicode_AsStringAndSize were 0-terminated, so I didn't take any chance and explicitly terminated them. But I just verified myself and they are indeed 0-terminated. So, I modified the patch in consequence. |
|||
| msg66162 - (view) | Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) | Date: 2008年05月03日 17:58 | |
Marc-Andre Lemburg wrote:
[SNIP]
> The above cast needs to be (Py_ssize_t). size_t is an unsigned length
type.
Actually, the cast is right (even though it is not strictly necessary).
It just the patch that is confusing. Here is the relevant code:
/* Normal module, so work out the package name if any */
char *start = PyUnicode_AsString(modname);
char *lastdot = strrchr(start, '.');
size_t len;
int error;
/* snip */
len = (size_t)(lastdot - start);
if (len >= MAXPATHLEN) {
PyErr_SetString(PyExc_ValueError,
"Module name too long");
return NULL;
}
I removed the cast from the patch (I don't know why I added it, anyway)
to avoid further confusion.
|
|||
| msg66163 - (view) | Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) | Date: 2008年05月03日 18:25 | |
Committed to r62667. Thank you all for your comments! |
|||
| msg66167 - (view) | Author: Marc-Andre Lemburg (lemburg) * (Python committer) | Date: 2008年05月03日 19:53 | |
On 2008年05月03日 20:25, Alexandre Vassalotti wrote: > Alexandre Vassalotti <alexandre@peadrop.com> added the comment: > > Committed to r62667. > > Thank you all for your comments! > > ---------- > resolution: -> fixed > status: open -> closed What about my comments regarding the PyUnicode_AsString() API in http://bugs.python.org/msg65962 Should I open a separate tracker item for this ? I don't know who added those APIs, but they are neither in line with the rest of the Unicode API, nor are they really all that helpful. I guess they were just added out of a misunderstanding of the already existing code. I'd suggest to remove PyUnicode_AsString() altogether (which your patch has already done in a couple of places). |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:56:30 | admin | set | github: 46242 |
| 2013年11月11日 11:56:09 | ncoghlan | unlink | issue19347 dependencies |
| 2013年11月11日 11:55:38 | ncoghlan | link | issue19347 dependencies |
| 2008年05月03日 19:53:27 | lemburg | set | messages:
+ msg66167 title: Potential overflows due to incorrect usage of PyUnicode_AsString. -> Potential overflows due to incorrect usage of PyUnicode_AsString. |
| 2008年05月03日 18:25:30 | alexandre.vassalotti | set | status: open -> closed resolution: fixed messages: + msg66163 |
| 2008年05月03日 17:58:32 | alexandre.vassalotti | set | messages: + msg66162 |
| 2008年05月03日 17:49:58 | alexandre.vassalotti | set | messages: + msg66160 |
| 2008年04月29日 11:31:49 | lemburg | set | messages: + msg65962 |
| 2008年04月29日 11:07:49 | lemburg | set | nosy:
+ lemburg messages: + msg65960 |
| 2008年04月27日 04:27:55 | belopolsky | set | messages: + msg65872 |
| 2008年04月27日 01:18:19 | alexandre.vassalotti | set | messages: + msg65861 |
| 2008年04月12日 21:56:25 | alexandre.vassalotti | set | files:
+ unicode_string_overflow-2.patch messages: + msg65423 |
| 2008年03月17日 13:35:12 | gvanrossum | set | nosy:
+ gvanrossum messages: + msg63638 |
| 2008年03月10日 02:18:58 | alexandre.vassalotti | set | messages: + msg63433 |
| 2008年03月07日 22:00:29 | belopolsky | set | messages: + msg63369 |
| 2008年03月07日 20:19:23 | belopolsky | set | messages: + msg63363 |
| 2008年03月07日 18:43:14 | belopolsky | set | nosy:
+ belopolsky messages: + msg63360 |
| 2008年01月27日 22:45:12 | alexandre.vassalotti | set | title: Potential Overflow due to incorrect usage of PyUnicode_AsString. -> Potential overflows due to incorrect usage of PyUnicode_AsString. |
| 2008年01月27日 22:44:54 | alexandre.vassalotti | create | |