homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Crash in _PyThreadState_DeleteExcept() at fork in the process child
Type: Stage:
Components: Interpreter Core Versions: Python 3.10, Python 3.9
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: David.Edelsohn, Michael.Felt, erlendaasland, shihai1991, vstinner
Priority: normal Keywords:

Created on 2020年03月27日 17:16 by vstinner, last changed 2022年04月11日 14:59 by admin.

Messages (13)
msg365173 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020年03月27日 17:16
At fork, Python calls PyOS_AfterFork_Child() in the child process which indirectly calls _PyThreadState_DeleteExcept() whichs calls release_sentinel() of the thread which releases the thread state lock (threading.Thread._tstate_lock).
Problem: using a lock after fork is unsafe and can crash.
That's exactly what happens randomly on AIX when stressing ThreadJoinOnShutdown.test_reinit_tls_after_fork() of test_threading:
https://bugs.python.org/issue40068#msg365031
There are different options to solve this issue:
* Reset _tstate_lock before using it... not sure that it's worth it, since we are going to delete the threading.Thread object with its _tstate_lock object anymore. After calling fork, the child process has exactly 1 thread: all other threads have been removed.
* Modify release_sentinel() to not use the lock: avoid PyThread_release_lock() call.
msg365175 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020年03月27日 17:18
See also bpo-40089: "Add _at_fork_reinit() method to locks".
msg367277 - (view) Author: Hai Shi (shihai1991) * (Python triager) Date: 2020年04月25日 17:54
>since we are going to delete the threading.Thread object with its _tstate_lock object anymore.
Do we have any pep or discuss record about this plan?
> * Modify release_sentinel() to not use the lock: avoid PyThread_release_lock() call.
Hm, I am not sure about it. Looks like we can not remove it directly.
msg367422 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020年04月27日 13:42
> Do we have any pep or discuss record about this plan?
"since we are going to delete the threading.Thread object with its _tstate_lock object anyway" sentence is a description of the current _PyThreadState_DeleteExcept() implementation.
msg367496 - (view) Author: Hai Shi (shihai1991) * (Python triager) Date: 2020年04月28日 05:12
> "since we are going to delete the threading.Thread object with its _tstate_lock object anyway" sentence is a description of the current _PyThreadState_DeleteExcept() implementation.
Oh, got it. thanks for your explanation :)
msg389937 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021年03月31日 22:21
I marked bpo-43665 "AIX: test_importlib regression (ENV change)" as a duplicate of this issue.
test_multiprocessing_pool_circular_import() of test_importlib now triggers this crash as well.
msg389966 - (view) Author: Michael Felt (Michael.Felt) * Date: 2021年04月01日 09:52
Adding 3.10.
While a (sort of) duplicate I also would like to add that before revision "7cb033c423b65def1632d6c3c747111543b342a2" this was not showing up as an issue with test_importlib.
my issue was with test_importlib suddenly going into error.
As this seem to be a long-standing issue is it perhaps a possibility to change the test so that the bot can go green again?
msg389968 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021年04月01日 09:59
> As this seem to be a long-standing issue is it perhaps a possibility to change the test so that the bot can go green again?
This issue is a real crash and it seems quite easy to get it. I prefer to not hide the bug in the test suite. The bug must be fixed.
msg389969 - (view) Author: Michael Felt (Michael.Felt) * Date: 2021年04月01日 09:59
OK: further.
Two options are suggested:
There are different options to solve this issue:
* Reset _tstate_lock before using it... not sure that it's worth it, since we are going to delete the threading.Thread object with its _tstate_lock object anymore. After calling fork, the child process has exactly 1 thread: all other threads have been removed.
* Modify release_sentinel() to not use the lock: avoid PyThread_release_lock() call.
** as to option 1 - it is 'worth it' if it stops the crashes
** This is deeper than I usually go in Python code - but I'll make an effort - help is appreciated.
msg389972 - (view) Author: Michael Felt (Michael.Felt) * Date: 2021年04月01日 11:20
OK. Please explain. Looking at tstate assignment
In posixmodule.c:PyOSAfterFork_Child()
 PyStatus status;
 _PyRuntimeState *runtime = &_PyRuntime;
...
 PyThreadState *tstate = _PyThreadState_GET();
and later calls
 status = _PyRuntimeState_ReInitThreads(runtime);
Yet in Posix/ceval.c
PyStatus
_PyEval_ReInitThreads(PyThreadState *tstate)
{
 _PyRuntimeState *runtime = tstate->interp->runtime;
** this looks like runtime->interp->runtime
And then we get down to:
 /* Destroy all threads except the current one */
 _PyThreadState_DeleteExcept(runtime, tstate);
Is this correct - as it looks like:
_PyThreadState_DeleteExcept(runtime->interp->runtime, runtime) -- where runtime == &_PyRuntime;
msg390407 - (view) Author: Michael Felt (Michael.Felt) * Date: 2021年04月07日 09:08
Willing to spend more time on this - but the variable names chosen blind me - looks like a circle.
And, thinking about the address in the core dump starting with 0xd (segment 13) - confuses me somewhat - as from memory - I thought the shared library code/data converged on segment 14 (ie, addresses would begin with 0xe).
So, perhaps it is the 0xd* address that is killing everything.
See: https://bugs.python.org/issue43665#msg389923 -- as I am also still confused re: this line - looks suspect: with all arguments at 0xdddddddd
PyVectorcall_Call(callable = 0xdddddddd, tuple = 0xdddddddd, kwargs = 
0xdddddddd), line 255 in "call.c"
And also this line, and lines like it - notice the value for what looks like a variable for argument_count:
_PyEval_Vector(tstate = 0x22023cf0, con = 0x2000120c, locals = 
0x22023960, args = 0x220239f0, argcount = 3508213100, kwnames = 
0x22023cf0), line 46 in "pycore_ceval.h"
_PyFunction_Vectorcall(func = 0x220239f0, stack = (nil), nargsf = 
570572016, kwnames = 0x220239f0), line 361 in "call.c"
method_vectorcall(method = 0x10103bdc, args = (nil), nargsf = 0, kwnames 
= 0x20045994), line 119 in "abstract.h"
msg394225 - (view) Author: David Edelsohn (David.Edelsohn) * Date: 2021年05月23日 20:31
How could release_sentinel() be structured to not call PyThread_release_lock()? This seems to be a situation where _PyThreadState_DeleteExcept() is deleting all thread states. thread__set_sentinel() sets release_sentinel() as its on_delete hook. The thread state is being deleted, release_sentinel() is called and discovers that the lock is still held.
 if (lock->locked) {
 PyThread_release_lock(lock->lock_lock);
Do you suggest something like _PyThread_at_fork_reinit() and leak the memory? To not release the lock in the thread of the child process?
msg394266 - (view) Author: David Edelsohn (David.Edelsohn) * Date: 2021年05月24日 20:16
It seems that PyOS_AfterFork_Child() needs to do something like
PyThreadState *tstate = PyThreadState_Get();
PyObject *wr = _PyObject_CAST(tstate->on_delete_data);
PyObject *obj = PyWeakref_GET_OBJECT(wr);
lockobject *lock;
if (obj != Py_None) {
 lock = (lockobject *) obj;
 if (lock->locked) {
 /* Leak memory on purpose. */
 lock->locked = 0;
 }
}
before the call to _PyEval_ReInitThreads.
Maybe encapsulate that as PyThread_ReInitThreads().
The locks in the threads in the child need to be cleared before _PyThreadState_DeleteExcept() so that Python does not try to release the locks in the child.
History
Date User Action Args
2022年04月11日 14:59:28adminsetgithub: 84273
2021年05月24日 20:58:45erlendaaslandsetnosy: + erlendaasland
2021年05月24日 20:16:41David.Edelsohnsetmessages: + msg394266
2021年05月23日 20:31:08David.Edelsohnsetmessages: + msg394225
2021年05月20日 19:27:31David.Edelsohnsetnosy: + David.Edelsohn
2021年04月07日 09:08:26Michael.Feltsetmessages: + msg390407
2021年04月01日 11:20:55Michael.Feltsetmessages: + msg389972
2021年04月01日 09:59:19Michael.Feltsetmessages: + msg389969
2021年04月01日 09:59:09vstinnersetmessages: + msg389968
2021年04月01日 09:52:42Michael.Feltsetnosy: + Michael.Felt

messages: + msg389966
versions: + Python 3.10
2021年03月31日 22:21:37vstinnersetmessages: + msg389937
2021年03月31日 22:20:40vstinnerlinkissue43665 superseder
2020年04月28日 05:12:33shihai1991setmessages: + msg367496
2020年04月27日 13:42:26vstinnersetmessages: + msg367422
2020年04月25日 17:54:07shihai1991setmessages: + msg367277
2020年04月17日 17:13:22shihai1991setnosy: + shihai1991
2020年03月27日 17:18:10vstinnersetmessages: + msg365175
2020年03月27日 17:16:55vstinnercreate

AltStyle によって変換されたページ (->オリジナル) /