-cc: python-steering-council
On Fri, Mar 5, 2021 at 4:26 PM Guido van Rossum <[email protected]
<mailto:[email protected]>> wrote:
On Fri, Mar 5, 2021 at 11:11 AM Brett Cannon <[email protected]
<mailto:[email protected]>> wrote:
Speaking for myself ...
Ditto ...
On Fri, Mar 5, 2021 at 7:04 AM Mark Shannon <[email protected]
<mailto:[email protected]>> wrote:
[...]
In some cases, the PEP would have improved the situation.
For example:
sys.setrecursionlimit(5000)
def f():
f()
Currently, it raises a RecursionError on linux, but crashes the
interpreter on Windows.
With PEP 651 it would have raised a RecursionError on both
platforms.
Am I missing something here?
So your example shows a user already comfortable in raising their
recursion limit to work around needing more stack space to reach completion.
What is stopping the user from continuing to raise the limit until they still
reach their memory limit even with PEP 651? If you're worried about runaway
recursion you will very likely hit that with the default stack depth already,
so I personally don't see how a decoupled stack counter from the C stack
specifically makes it any easier/better to detect runaway recursion. And if I
need more recursion than the default, you're going to bump the recursion depth
anyway, which weakens the protection in either the C or decoupled counter
scenarios. Sure, it's currently platform-specific, but plenty of people want to
push that limit based on their machine anyway and don't need consistency on
platforms they will never run on, i.e. I don't see a huge benefit to being able
to say that an algorithm consistently won't go past 5000 calls on
all platforms compared to what the C stack protection already gives us (not to
say there's zero benefit, but it isn't massive or widespread either IMO). I personally
just don't see many people saying, "I really want to limit my program to an exact
call stack depth of 5000 on all platforms which is beyond the default, but anything under
is fine and anything over -- regardless of what the system can actually handle -- is
unacceptable".
Tack on the amount of changes required to give a cross-platform stack
count and limit check compared to the benefit being proposed, and to me that
pushes what the PEP is proposing into net-negative payoff.
To me, the point of that example is as a reminder that currently fiddling
with the recursion limit can cause segfaults.
Mark's PEP proposes two, somewhat independent, changes: (1) don't consume C
stack on pure Python-to-Python (pp2p) function calls; (2) implement fool-proof
C stack overflow checks.
Change (2) makes it safe for users to mess with the stack overflow limit
however they see fit. Despite (1), the limit for pp2p calls remains at 1000 so
that users who unintentionally write some naively recursive code don't have to
wait until they fill up all of memory before they get a traceback. (Of course
they could also write a while-True loop that keeps adding an item to a list and
they'd end up in the same situation. But in my experience that situation is
less painful to deal with than accidental stack overflow, and I'd shudder at
the thought of a traceback of a million lines.)
Given that we have (1), why is (2) still needed? Because there are ways to
recursively call Python code that aren't pp2p calls. By a pp2p (pure
Python-to-Python) call, I mean any direct call, e.g. a method call or a
function call. But what about other operations that can invoke Python code?
E.g. if we have a dict d and a class C, we could create an instance of C and
use it to index d, e.g. d[C()]. This operation is not a p2pp call -- the
BINARY_SUBSCR opcode calls the dict's `__getitem__` method, and that calls the
key's `__hash__` method. Here's a silly demonstration:
```
def f(c):
d = {}
return d[c]
class C:
def __hash__(self):
return f(self)
f(C())
```
Note that the "traceback compression" implemented for simple recursive
calls fails here -- I just ran this code and got 2000 lines of output.
The way I imagine Mark wants to implement pp2p calls means that in this
case each recursion step *does* add several other C stack frames, and this
would be caught by the limit implemented in (2). I see no easy way around this
-- after all the C code involved in the recursion could be a piece of 3rd party
C code that itself is not at fault.
So we could choose to implement only (2), robust C stack overflow checks.
This would require a bunch of platform-specific code, and there might be
platforms where we don't know how to implement this (I vaguely recall a UNIX
version where main() received a growable stack but each thread only had a fixed
64kb stack), but those would be no worse off than today.
Or we could choose to implement only (1), eliminating C stack usage for pp2p calls. But in that
case we'd still need a recursion limit for non-pp2p calls. We could eliminate the recursion limit
entirely for pp2p calls, but then we'd probably end up with user complaints -- I could imagine a
framework that executes arbitrary user code and attempts recovery after receiving an exception (for
example a REPL). Such a framework can never be 100% foolproof (e.g. `while True: l.append(1)`, or
`os._exit(0)`), but accidental recursion would be a reasonable thing to attempt to curtail, so we'd
still need to implement a limit on pp2p calls. Since the new pp2p implementation allows for a much
larger recursion limit, Mark's design with two separate limits and exceptions makes sense --
"recursion" to limit pp2p call recursion, and "stack overflow" to limit the C
stack overflow. To make things more reliable, it makes sense to throw in (2), better checks for C
stack overflow, but that
could be done independently -- we could just implement the two separate limits and
exceptions and C-stack-frame-free pp2p calls without (2), and we'd be no worse off than
today with regard to the safety of "wild" recursion like my example above.
/(Resurrecting a 9 month old thread a bit as C stack overflows came up in the context
of https://github.com/python/cpython/pull/30855
<https://github.com/python/cpython/pull/30855>)/
Regarding (2)... robust C stack overflow checks are hard - if even possible. At
most you can get an improvement but never be able to guarantee it. What I've
seen proposed for that in this thread I believe won't deliver in some real
scenarios. Any code can spawn threads from extension modules or embedding code
with whatever C stack size it wants. Those threads can (read: do) call back
into the CPython interpreter. Any assumption of a constant minimum C stack size
is incorrect unless you go with the absolute platform minimum possible value
(too small). 512k is considered huge.
One way other language runtimes deal with stack overflows is via a
sigaltstack'd SIGSEGV handler that introspects what caused the signal (!) to
see if it was maybe a stack overflow and if so fixes things up (hello Golang
dynamic stack implementation) to recover before resuming execution. This is
difficult and potentially not even possible to guarantee correct when you don't
control the entire runtime as Golang does.
There has been some interesting talk about techniques recently in The Orange Site's
https://news.ycombinator.com/item?id=28682945
<https://news.ycombinator.com/item?id=28682945> thread which has cropped up since this
discussion. The idea around allocating our own C stack and swapping over to that is
"neat" - but it'd be a challenge as we're a re-entrant runtime so we'd effectively
need to grow a list of newly allocated C stacks for use upon every entry where we find one of
our thread's stacks is not the one in use. I suspect such an implementation would probably be
more complex than the feature is worth - but I'll leave judgment on that up to after anyone
actually manages to implement a cross platform version of such an idea.
-gps