homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Replace %.100s by %s in PyErr_Format(): the arbitrary limit of 500 bytes is outdated
Type: Stage:
Components: Interpreter Core Versions: Python 3.3
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: amaury.forgeotdarc, belopolsky, eric.smith, python-dev, vstinner, ysj.ray
Priority: normal Keywords: patch

Created on 2011年01月05日 03:02 by vstinner, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
use_format.patch vstinner, 2011年01月05日 03:10
Messages (17)
msg125399 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年01月05日 03:02
Guido created the "convenience function" PyErr_Format() 13 years ago (r7580) with a naive implementation using char buffer[500] and vsprintf(). He added a comment:
/* Caller is responsible for limiting the format */
Ok, that was true 13 years ago. But today Python uses dynamically allocated buffers, and so we can simply replace %.100s ("%\.[0-9]+s" regex) by %s.
Anyway, PyUnicode_FromFormatV() doesn't support precision for %s and %U formats :-)
msg125400 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年01月05日 03:10
use_format.patch is a patch to avoid PyOS_snprintf() with a fixed buffer: use directly PyUnicode_FromFormat(), PyErr_Format() or PySys_FormatStderr() instead. This patch is just a draft, it should be tested :-) There are other calls to PyOS_snprintf() that can be replaced.
msg125404 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年01月05日 03:40
Woops, I didn't want to do it, but I already commited the simple part of this issue (U format, eg. %.100U): r87753.
msg125422 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2011年01月05日 13:21
I always thought that one of the reasons for specifying the length was in case a pointer pointed to garbage: at least you'd be limiting how much trash was printed. But maybe that's just my imagination and there is no such reason.
msg125573 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2011年01月06日 17:57
I agree with Eric. Limiting field width when formatting error messages is a good safety measure. It is not only the amount of garbage that can be spitted in error logs, if garbage is not null-terminated, you may get a SEGV instead of a somewhat meaningful error message.
msg125574 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2011年01月06日 18:04
> /* Caller is responsible for limiting the format */
Note that it is a good idea to avoid memory allocation during exception processing. Think of out of memory errors: how would you report those if you need more memory to format the error message than the failed operation?
msg125586 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年01月06日 20:49
eric> I always thought that one of the reasons for specifying the length
eric> was in case a pointer pointed to garbage: at least you'd be limiting 
eric> how much trash was printed.
No, it's because of the old (removed) 500 bytes limit.
There are different types of %s arguments:
 - name of a type, eg. PyType(op)->tp_name
 - constant message (a switch/case or if chose between different messages)
 - function/method name
 - ...
 - and sometimes an argument from the user, eg. codec.lookup(name) repeats the name in the result
belopolsky> Limiting field width when formatting error messages
belopolsky> is a good safety measure. It is not only the amount
belopolsky> of garbage that can be spitted in error logs, if garbage
belopolsky> is not null-terminated, ...
Can you give me at least one example? I think that it is very unlikely, or just impossible. But if I do replace "%.100s" by "%s" in all the code base, I accept to check each time that the argument is null-terminated and/or not controlable by the user. And maybe keep "%.100s" if I am not sure.
belopolsky> Think of out of memory errors: ...
PyErr_NoMemory() doesn't use PyErr_Format(), it's message is fixed, and the exception object is preallocated.
msg125592 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2011年01月06日 21:30
On Thu, Jan 6, 2011 at 3:49 PM, STINNER Victor <report@bugs.python.org> wrote:
..
> eric> I always thought that one of the reasons for specifying the length
> eric> was in case a pointer pointed to garbage: at least you'd be limiting
> eric> how much trash was printed.
>
> No, it's because of the old (removed) 500 bytes limit.
>
How do you know? I, for one, did use %.100s when formatting strings
long before I knew that there was a 500 bytes limit.
..
> belopolsky> Limiting field width when formatting error messages
> belopolsky> is a good safety measure. It is not only the amount
> belopolsky> of garbage that can be spitted in error logs, if garbage
> belopolsky> is not null-terminated, ...
>
> Can you give me at least one example? I think that it is very unlikely, or just impossible.
I am concerned about conditions that are impossible in a valid
program. However, if you have a buffer overflow that trashes your
tp_name pointer so that it suddenly resolves to a binary code section,
you will need all help you can get to find the bug. Having the
program spit 100-character garbage in this case is often preferable to
a SEGV.
> But if I do replace "%.100s" by "%s" in all the code base, I accept to check each
> time that the argument is null-terminated and/or not controlable by the user.
As long as we have ctypes, everything is controllable by user. Again,
limiting field width when formatting strings is defensive programming.
 And defensive programming pays off when you have to chase bugs.
> And maybe keep "%.100s" if I am not sure.
Just keep "%.100s" always. It does not hurt and it is less work for you. :-)
>
> belopolsky> Think of out of memory errors: ...
>
> PyErr_NoMemory() doesn't use PyErr_Format(), it's message is fixed, and the exception object is preallocated.
Indeed, there are no PyErr_Format(PyExc_MemoryError, ..) in the code
base, but there are a few PyErr_SetString(PyExc_MemoryError, ..) such
as PyErr_SetString(PyExc_MemoryError, "math.fsum partials") in
Modules/mathmodule.c. Maybe these should be reviewed. Still, there
may be code paths where OOM triggers an error other than MemoryError.
In any case, I don't see anything wrong with using fixed size buffers
in error handling and truncating too long output.
Code cleanup projects like this are rightfully frowned upon because
they are hard to review and usually error-prone. However, if for each
change that you make, you would make sure that the error path is
covered by unit tests, it will definitely be an improvement.
msg125841 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年01月09日 12:52
> I am concerned about conditions that are impossible in a valid
> program. However, if you have a buffer overflow that trashes your
> tp_name pointer so that it suddenly resolves to a binary code section,
Yes, %.100s may avoid a crash after the buffer overflow on the string formatting, but it may quickly crash on another instruction. I don't think that you should limit the error message length to protect Python against buffer overflow. A buffer overflow can corrupt everything, and we should fix the buffer overflow instead :-)
> you will need all help you can get to find the bug.
You have debuggers like gdb for that.
msg126302 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年01月14日 22:05
belopolsky> Limiting field width when formatting error messages is 
belopolsky> a good safety measure.
The problem is that PyUnicode_FromFormatV() doesn't support %.100s format (it ignores .100). If we truncate at 100 bytes, it may truncate an UTF-8 string in the middle of a character :-/ And I don't want to implement it because I don't like arbitrary limitations in Python.
msg128962 - (view) Author: ysj.ray (ysj.ray) Date: 2011年02月21日 13:34
see also #7330, I'm implementing "%.100s" in that issue.
msg129940 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年03月03日 09:02
I am working with Python since 5 years (and on Python since 3 years): I never seen any garbage string in error messages. My concern is to not truncate strings in error messages anymore. I consider that it is more important than using a hack to workaround some crashs.
I call %.100s a hack because on a buffer overflow, other attributes are overwritten, and I think that the other attributes are more important than tp_name and the program will crash quickly. Try a buffer overflow on PyLong_Type and check how much time does it take to crash Python: I bet that the program will crash before formatting any exception.
> Indeed, there are no PyErr_Format(PyExc_MemoryError, ..) in the code
> base, but there are a few PyErr_SetString(PyExc_MemoryError, ..) such
> as PyErr_SetString(PyExc_MemoryError, "math.fsum partials") in
> Modules/mathmodule.c. Maybe these should be reviewed.
This instruction comes from math.fsum() which allocate a new list with the size of the input size (not exactly, but it is based on the size of the input list). Python may fail to allocate an huge list, but it doesn't mean that it will be unable to allocate memory for the short string "math.fsum partials". Anyway, if the allocation of "math.fsum partials" string fail, another MemoryError (without any message) will be used. So it doesn't really matter.
> Still, there may be code paths where OOM triggers an error other 
> than MemoryError.
Same than before: Python should be able to allocate the new error object, and if it fails: it will use MemoryError anyway.
> In any case, I don't see anything wrong with
> using fixed size buffers in error handling and truncating too long
> output.
I consider that there is no more good reason to truncate strings, I want to see the full error message!
msg129945 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年03月03日 09:31
History of PyErr_Format():
 - r7580 (13 years ago): creation of PyErr_Format() using a buffer of 500 bytes (fixed size buffer, allocated on the stack)
 - r17159 (10 years ago): PyErr_Format() allocates a dynamic buffer on the heap
 - r22722 (9 years ago): PyErr_Format() reuses PyString_FromFormatV() (dynamic buffer, allocated on the heap)
belopolsky>> Limiting field width when formatting error messages
belopolsky>> is a good safety measure.
me> Can you give me at least one example? I think that it is very
me> unlikely, or just impossible.
Python allocates a dynamic buffer since r17159 (10 years ago), and the strings were *never* truncated just because %.100s format was never supported (it is interpreted as %s).
If you still consider that %.100s protects is a good solution against crashes: you have to realize that Python doesn't truncate strings since 10 years, and nobody complained.
--
The situation is changing because Ray Allen wrote a patch implementing %.100s: #7330. I would like to decide what to do with %.100s in error messages before commiting #7330.
--
Eric and Alexander: do you still consider that %.100s is important in error messages? Or do you know agree that we can replace them with %s?
msg129985 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2011年03月03日 18:00
I don't feel strongly one way or the other about it. I was just relaying the reason I heard when I asked the question about why it's done the way it is.
I suggest mentioning that you're going to commit this change on python-dev, since this seems like an issue that people might care about without knowing about its existence in the bug tracker.
msg131638 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011年03月21日 12:26
New changeset 74d3dc78f0db by Victor Stinner in branch 'default':
Issue #10833: Use PyUnicode_FromFormat() and PyErr_Format() instead of
http://hg.python.org/cpython/rev/74d3dc78f0db 
msg131675 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011年03月21日 17:15
New changeset 513bab5cfb18 by Victor Stinner in branch 'default':
Issue #10833: Remove the buffer allocated on the stack, it isn't used anymore
http://hg.python.org/cpython/rev/513bab5cfb18
New changeset 4c2135930882 by Victor Stinner in branch 'default':
Issue #10833: Use PyErr_Format() and PyUnicode_FromFormat() instead of
http://hg.python.org/cpython/rev/4c2135930882 
msg132053 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年03月24日 23:23
My stronger argument was that %.100s was never supported. I am wrong: Python 2 is only to truncate strings in PyErr_Format() (PyString_FromFormatV() supports precision for %s format). As I am the only one wanting to changing that, and I don't have any argument anymore, I accept to leave the code unchanged, but #7330 must be fixed.
History
Date User Action Args
2022年04月11日 14:57:10adminsetgithub: 55042
2011年03月24日 23:23:50vstinnersetstatus: open -> closed
resolution: not a bug
2011年03月24日 23:23:39vstinnersetmessages: + msg132053
2011年03月21日 17:15:58python-devsetnosy: amaury.forgeotdarc, belopolsky, vstinner, eric.smith, ysj.ray, python-dev
messages: + msg131675
2011年03月21日 12:26:59python-devsetnosy: + python-dev
messages: + msg131638
2011年03月03日 18:00:04eric.smithsetnosy: amaury.forgeotdarc, belopolsky, vstinner, eric.smith, ysj.ray
messages: + msg129985
2011年03月03日 09:31:09vstinnersetnosy: amaury.forgeotdarc, belopolsky, vstinner, eric.smith, ysj.ray
messages: + msg129945
2011年03月03日 09:02:21vstinnersetnosy: amaury.forgeotdarc, belopolsky, vstinner, eric.smith, ysj.ray
messages: + msg129940
2011年02月21日 13:34:03ysj.raysetnosy: + ysj.ray
messages: + msg128962
2011年01月14日 22:05:02vstinnersetnosy: amaury.forgeotdarc, belopolsky, vstinner, eric.smith
messages: + msg126302
2011年01月09日 12:52:07vstinnersetnosy: amaury.forgeotdarc, belopolsky, vstinner, eric.smith
messages: + msg125841
2011年01月06日 21:30:57belopolskysetnosy: amaury.forgeotdarc, belopolsky, vstinner, eric.smith
messages: + msg125592
2011年01月06日 20:49:10vstinnersetnosy: amaury.forgeotdarc, belopolsky, vstinner, eric.smith
messages: + msg125586
2011年01月06日 18:04:03belopolskysetnosy: amaury.forgeotdarc, belopolsky, vstinner, eric.smith
messages: + msg125574
2011年01月06日 17:57:36belopolskysetnosy: + belopolsky
messages: + msg125573
2011年01月05日 13:21:07eric.smithsetnosy: + eric.smith
messages: + msg125422
2011年01月05日 03:40:02vstinnersetmessages: + msg125404
2011年01月05日 03:10:11vstinnersetfiles: + use_format.patch

messages: + msg125400
keywords: + patch
2011年01月05日 03:02:30vstinnercreate

AltStyle によって変換されたページ (->オリジナル) /