homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Throw away more radioactive locks that could be held across a fork in threading.py
Type: Stage:
Components: Interpreter Core, Library (Lib) Versions: Python 2.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: gregory.p.smith Nosy List: Rhamphoryncus, automatthias, barry, collinwinter, gregory.p.smith, jyasskin, nadeem.vawda, rnk
Priority: release blocker Keywords: needs review, patch

Created on 2009年08月04日 18:56 by rnk, last changed 2022年04月11日 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
forkjoindeadlock.py rnk, 2009年08月04日 18:56 Failing fork/threads test case.
forkdeadlock.diff rnk, 2009年08月04日 21:18 Patch to fix the deadlock
thread-fork-join.diff rnk, 2010年07月10日 19:35 Updated patch
thread-fork-join.diff rnk, 2010年07月10日 20:39 clear condition waiters also
issue6643-release26_maint_gps01.diff gregory.p.smith, 2011年01月04日 01:25
test_thread.diff nadeem.vawda, 2011年01月04日 16:30 Patch to fix AttributeError in test_thread
Messages (14)
msg91265 - (view) Author: Reid Kleckner (rnk) (Python committer) Date: 2009年08月04日 18:56
This bug is similar to the importlock deadlock, and it's really part of
a larger problem that you should release all locks before you fork. 
However, we can fix this in the threading module directly by freeing and
resetting the locks on the main thread after a fork.
I've attached a test case that inserts calls to sleep at the right
places to make the following occur:
- Main thread spawns a worker thread.
- Main thread joins worker thread.
- To join, the main thread acquires the lock on the condition variable
(worker.__block.acquire()).
== switch to worker ==
- Worker thread forks.
== switch to child process ==
- Worker thread, which is now the only thread in the process, returns.
- __bootstrap_inner calls self.__stop() to notify any other threads
waiting for it that it returned.
- __stop() tries to acquire self.__block, which has been left in an
acquired state, so the child process hangs here.
== switch to worker in parent process ==
- Worker thread calls os.waitpid(), which hangs, since the child never
returns.
So there's the deadlock.
I think I should be able to fix it just by resetting the condition
variable lock and any other locks hanging off the only thread left
standing after the fork.
msg91273 - (view) Author: Reid Kleckner (rnk) (Python committer) Date: 2009年08月04日 21:18
Here's a patch for 3.2 which adds the fix and a test case. I also
verified that the problem exists in 3.1, 2.7, and 2.6 and backported the
patch to those versions, but someone should review this one before I
upload those.
msg109914 - (view) Author: Reid Kleckner (rnk) (Python committer) Date: 2010年07月10日 19:35
Here's an updated patch for py3k (3.2). The test still fails without the fix, and passes with the fix.
Thinking more about this, I'll try summarizing the bug more coherently:
When the main thread joins the child threads, it acquires some locks. If a fork in a child thread occurs while those locks are held, they remain locked in the child process. My solution is to do here what we do elsewhere in CPython: abandon radioactive locks and allocate fresh ones.
msg109933 - (view) Author: Reid Kleckner (rnk) (Python committer) Date: 2010年07月10日 20:39
I realized that in a later fix for unladen-swallow, we also cleared the condition variable waiters list, since it has radioactive synchronization primitives in it as well.
Here's an updated patch that simplifies the fix by just using __init__() to completely reinitialize the condition variables and adds a test.
This corresponds to unladen-swallow revisions r799 and r834.
msg110071 - (view) Author: Adam Olsen (Rhamphoryncus) Date: 2010年07月12日 06:34
I don't have any direct opinions on this, as it is just a bandaid. fork, as defined by POSIX, doesn't allow what we do with it, so we're reliant on great deal of OS and library implementation details. The only portable and robust solution would be to replace it with a unified fork-and-exec API that's implemented directly in C.
msg110092 - (view) Author: Reid Kleckner (rnk) (Python committer) Date: 2010年07月12日 15:11
I completely agree, but the cat is out of the bag on this one. I don't see how we could get rid of fork until Py4K, and even then I'm sure there will be people who don't want to see it go, and I'd rather not spend my time arguing this point.
The only application of fork that doesn't use exec that I've heard of is pre-forked Python servers. But those don't seem like they would be very useful, since with refcounting the copy-on-write behavior doesn't get you very many wins.
The problem that this bandaid solves for me is that test_threading.py already tests thread+fork behaviors, and can fail non-deterministically.
This problem was exacerbated while I was working on making the compilation thread.
I don't think we can un-support fork and threads in the near future either, because subprocess.py uses fork, and libraries can use fork behind the user's back.
msg125236 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2011年01月03日 20:48
fwiw a unified fork-and-exec API implemented in C is what I added in Modules/_posixsubprocess.c to at least avoid this issue as much as possible when using subprocess.
msg125240 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2011年01月03日 21:07
patch looks good. committed in r87710 for 3.2. needs back porting to 3.1 and 2.7 and optionally 2.6.
msg125270 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2011年01月04日 01:10
r87726 for release31-maint
r87727 for release27-maint - this required a bit more fiddling as _block and _started and _cond were __ private.
msg125273 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2011年01月04日 01:25
Attached is a patch for Python 2.6 release26_maint for reference incase someone wants it. That branch is closed - security fixes only.
msg125338 - (view) Author: Nadeem Vawda (nadeem.vawda) * (Python committer) Date: 2011年01月04日 16:30
r87710 introduces an AttributeError in test_thread's TestForkInThread test case. If os.fork() is called from a thread created by the _thread module, threading._after_fork() will get a _DummyThread (with no _block attribute) as the current thread.
I've attached a patch that checks whether the thread has a _block attribute before trying to reinitialize it.
msg125346 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2011年01月04日 18:34
eek, thanks for noticing that!
r87740 fixes this in py3k. backporting to 3.1 and 2.7 now.
msg125350 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2011年01月04日 18:44
r87741 3.1
r87742 2.7
msg193923 - (view) Author: Maciej Bliziński (automatthias) Date: 2013年07月30日 10:54
Python version: 2.7.5
OS: Solaris 9
I'm still observing this issue (or Issue5114) on Solaris 9. The symptom is that test_threading hangs indefinitely (tested: overnight) and running pstack on the process, I'm seeing:
----------------- lwp# 1 / thread# 1 --------------------
 ff3dc734 lwp_park (0, 0, 0)
 ff3d3c74 s9_lwp_park (0, 0, 0, 1, feed4f48, 18f5a4) + 28
 ff3dc698 s9_handler (0, 0, 0, 1, feed4f48, 18f5a4) + 90
 ff1dea70 _sema_wait (0, feee66a0, fed6b054, feee6000, 2a298478, d1f20) + 1d4
 ff1dec30 sema_wait (81aa8, ff1dec24, 722a5b4b, 1101c, feed4f48, 134d60) + c
 feed4f48 sem_wait (81aa8, 0, fed6b1ac, 0, 0, 1) + 20
 ff050890 PyThread_acquire_lock (81aa8, 1, fed6b214, 2, 0, 1ae778) + 5c
 ff05524c lock_PyThread_acquire_lock (0, 22030, 0, 13ee40, 16a298, 55150) + 50
 fefa779c PyCFunction_Call (1ae788, 22030, 0, ff0d8eb8, 55150, ff0551fc) + e4
 ff016b14 PyEval_EvalFrameEx (18f5a0, 0, 0, d4f66, 16a298, 22030) + 5ee8
 ff0185d0 PyEval_EvalCodeEx (12c968, 0, 18f5a0, 4, 1, 18f5a4) + 924
 ff0168f8 PyEval_EvalFrameEx (1902b8, 0, 1, 1765c0, 16a298, 1b12d0) + 5ccc
 ff0185d0 PyEval_EvalCodeEx (13f608, 0, 1902b8, 4, 1, 1902bc) + 924
 ff0168f8 PyEval_EvalFrameEx (154748, 0, 1, 31f7f, 16a298, 1b1250) + 5ccc
 ff0185d0 PyEval_EvalCodeEx (10d650, 54a50, 154748, 2203c, 0, 2203c) + 924
 fef8e11c function_call (22038, 22030, 1386f0, 2203c, 130730, 22030) + 168
 fef604e8 PyObject_Call (130730, 22030, 1386f0, ff0e0340, fef8dfb4, 0) + 60
 ff0137dc PyEval_EvalFrameEx (169110, 0, 22030, 10e62d, 16a298, 22030) + 2bb0
 ff017478 PyEval_EvalFrameEx (168f80, 0, 169114, 1769fa, 16a298, 16a298) + 684c
 ff017478 PyEval_EvalFrameEx (176cb0, 0, 168f84, 12a2c0, 16a298, 16a298) + 684c
 ff0185d0 PyEval_EvalCodeEx (13f410, 176cb4, 176cb0, 13433c, 1, 0) + 924
 fef8e040 function_call (1b26f0, 134330, 0, ff1bc000, 1b26f0, 0) + 8c
 fef604e8 PyObject_Call (1b26f0, 134330, 0, ff0e0340, fef8dfb4, 134320) + 60
 fef6e530 instancemethod_call (0, 134330, 0, 0, 1b26f0, 134bd0) + a4
 fef604e8 PyObject_Call (c3b48, 22030, 0, ff0e0340, fef6e48c, 0) + 60
 ff01051c PyEval_CallObjectWithKeywords (c3b48, 22030, 0, 0, 0, 0) + 68
 ff05568c t_bootstrap (63bd0, 0, 0, 0, 16a298, ff0e2804) + 4c
 ff1e53a4 _lwp_start (0, 0, 0, 0, 0, 0)
----------------- lwp# 2 / thread# 2 --------------------
 ff3dc734 lwp_park (0, 0, 0)
 ff3d3c74 s9_lwp_park (0, 0, 0, 1, b64a0d58, 136818) + 28
 ff3dc698 s9_handler (0, 0, 0, 1, b64a0d58, 136818) + 90
 ff1dea70 _sema_wait (0, feee66a0, fec6b054, feee6000, 2a298478, d1f20) + 1d4
 ff1dec30 sema_wait (8ab00, ff1dec24, 722a5b4b, 1101c, feed4f48, 134d60) + c
 feed4f48 sem_wait (8ab00, 0, fec6b1ac, 0, 0, 1) + 20
 ff050890 PyThread_acquire_lock (8ab00, 1, fec6b214, 2, 0, 1ae610) + 5c
 ff05524c lock_PyThread_acquire_lock (0, 22030, 0, 13ee40, 156168, 55160) + 50
 fefa779c PyCFunction_Call (1ae620, 22030, 0, ff0d8eb8, 55160, ff0551fc) + e4
 ff016b14 PyEval_EvalFrameEx (18fe60, 0, 0, d4f66, 156168, 22030) + 5ee8
 ff0185d0 PyEval_EvalCodeEx (12c968, 0, 18fe60, 4, 1, 18fe64) + 924
 ff0168f8 PyEval_EvalFrameEx (18fce8, 0, 1, 1765c0, 156168, 1b11b0) + 5ccc
 ff0185d0 PyEval_EvalCodeEx (13f608, 0, 18fce8, 4, 1, 18fcec) + 924
 ff0168f8 PyEval_EvalFrameEx (18fb88, 0, 1, 136155, 156168, 1a2930) + 5ccc
 ff0185d0 PyEval_EvalCodeEx (48b60, 18fb8c, 18fb88, 19d41c, 1, 2203c) + 924
 fef8e11c function_call (22038, 19d410, 1b3c00, 2203c, 130370, 22030) + 168
 fef604e8 PyObject_Call (130370, 19d410, 1b3c00, ff0e0340, fef8dfb4, 19d400) + 60
 ff0137dc PyEval_EvalFrameEx (18fa20, 0, 19d410, 10e62d, 156168, 134950) + 2bb0
 ff017478 PyEval_EvalFrameEx (18f890, 0, 18fa24, 1769fa, 156168, 156168) + 684c
 ff017478 PyEval_EvalFrameEx (18f728, 0, 18f894, 12a2c0, 156168, 156168) + 684c
 ff0185d0 PyEval_EvalCodeEx (13f410, 18f72c, 18f728, 19d3fc, 1, 0) + 924
 fef8e040 function_call (1b26f0, 19d3f0, 0, ff1bc000, 1b26f0, 0) + 8c
 fef604e8 PyObject_Call (1b26f0, 19d3f0, 0, ff0e0340, fef8dfb4, 19d3e0) + 60
 fef6e530 instancemethod_call (0, 19d3f0, 0, 0, 1b26f0, 1b1250) + a4
 fef604e8 PyObject_Call (1aeaf8, 22030, 0, ff0e0340, fef6e48c, 0) + 60
 ff01051c PyEval_CallObjectWithKeywords (1aeaf8, 22030, 0, 0, 0, 0) + 68
 ff05568c t_bootstrap (63c30, 0, 0, 0, 156168, ff0e2804) + 4c
 ff1e53a4 _lwp_start (0, 0, 0, 0, 0, 0)
The problem does not occur on Solaris 10.
History
Date User Action Args
2022年04月11日 14:56:51adminsetgithub: 50892
2013年07月30日 10:54:40automatthiassetnosy: + automatthias
messages: + msg193923
2011年06月25日 10:43:10neologixlinkissue5114 superseder
2011年01月04日 18:44:17gregory.p.smithsetstatus: open -> closed

messages: + msg125350
resolution: accepted -> fixed
nosy: barry, collinwinter, gregory.p.smith, Rhamphoryncus, jyasskin, nadeem.vawda, rnk
2011年01月04日 18:34:43gregory.p.smithsetpriority: normal -> release blocker
nosy: + barry
messages: + msg125346

2011年01月04日 16:40:56pitrousetstatus: closed -> open
nosy: collinwinter, gregory.p.smith, Rhamphoryncus, jyasskin, nadeem.vawda, rnk
2011年01月04日 16:30:45nadeem.vawdasetfiles: + test_thread.diff
nosy: + nadeem.vawda
messages: + msg125338

2011年01月04日 01:25:20gregory.p.smithsetstatus: open -> closed
files: + issue6643-release26_maint_gps01.diff
versions: - Python 2.7
nosy: collinwinter, gregory.p.smith, Rhamphoryncus, jyasskin, rnk
messages: + msg125273

keywords: + patch
2011年01月04日 01:10:40gregory.p.smithsetnosy: collinwinter, gregory.p.smith, Rhamphoryncus, jyasskin, rnk
messages: + msg125270
versions: - Python 3.1, Python 3.2
2011年01月03日 21:07:41gregory.p.smithsetassignee: rnk -> gregory.p.smith
messages: + msg125240
resolution: accepted
nosy: collinwinter, gregory.p.smith, Rhamphoryncus, jyasskin, rnk
2011年01月03日 20:48:25gregory.p.smithsetnosy: collinwinter, gregory.p.smith, Rhamphoryncus, jyasskin, rnk
messages: + msg125236
2010年07月18日 14:50:49rnklinkissue6642 dependencies
2010年07月18日 14:49:23rnksetkeywords: + needs review, - patch
assignee: rnk
2010年07月12日 15:11:29rnksetmessages: + msg110092
2010年07月12日 06:34:26Rhamphoryncussetmessages: + msg110071
2010年07月11日 14:49:42pitrousetnosy: + gregory.p.smith, Rhamphoryncus
2010年07月11日 13:22:50rnksettitle: joining a child that forks can deadlock in the forked child process -> Throw away more radioactive locks that could be held across a fork in threading.py
2010年07月10日 20:39:02rnksetfiles: + thread-fork-join.diff

messages: + msg109933
2010年07月10日 19:35:50rnksetfiles: + thread-fork-join.diff

messages: + msg109914
2009年08月11日 18:19:33collinwintersetnosy: + jyasskin, collinwinter
components: + Interpreter Core
2009年08月04日 21:18:43rnksetfiles: + forkdeadlock.diff
keywords: + patch
messages: + msg91273

versions: + Python 3.1, Python 2.7, Python 3.2
2009年08月04日 18:56:48rnkcreate

AltStyle によって変換されたページ (->オリジナル) /