This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2009年08月04日 18:56 by rnk, last changed 2022年04月11日 14:56 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| forkjoindeadlock.py | rnk, 2009年08月04日 18:56 | Failing fork/threads test case. | ||
| forkdeadlock.diff | rnk, 2009年08月04日 21:18 | Patch to fix the deadlock | ||
| thread-fork-join.diff | rnk, 2010年07月10日 19:35 | Updated patch | ||
| thread-fork-join.diff | rnk, 2010年07月10日 20:39 | clear condition waiters also | ||
| issue6643-release26_maint_gps01.diff | gregory.p.smith, 2011年01月04日 01:25 | |||
| test_thread.diff | nadeem.vawda, 2011年01月04日 16:30 | Patch to fix AttributeError in test_thread | ||
| Messages (14) | |||
|---|---|---|---|
| msg91265 - (view) | Author: Reid Kleckner (rnk) (Python committer) | Date: 2009年08月04日 18:56 | |
This bug is similar to the importlock deadlock, and it's really part of a larger problem that you should release all locks before you fork. However, we can fix this in the threading module directly by freeing and resetting the locks on the main thread after a fork. I've attached a test case that inserts calls to sleep at the right places to make the following occur: - Main thread spawns a worker thread. - Main thread joins worker thread. - To join, the main thread acquires the lock on the condition variable (worker.__block.acquire()). == switch to worker == - Worker thread forks. == switch to child process == - Worker thread, which is now the only thread in the process, returns. - __bootstrap_inner calls self.__stop() to notify any other threads waiting for it that it returned. - __stop() tries to acquire self.__block, which has been left in an acquired state, so the child process hangs here. == switch to worker in parent process == - Worker thread calls os.waitpid(), which hangs, since the child never returns. So there's the deadlock. I think I should be able to fix it just by resetting the condition variable lock and any other locks hanging off the only thread left standing after the fork. |
|||
| msg91273 - (view) | Author: Reid Kleckner (rnk) (Python committer) | Date: 2009年08月04日 21:18 | |
Here's a patch for 3.2 which adds the fix and a test case. I also verified that the problem exists in 3.1, 2.7, and 2.6 and backported the patch to those versions, but someone should review this one before I upload those. |
|||
| msg109914 - (view) | Author: Reid Kleckner (rnk) (Python committer) | Date: 2010年07月10日 19:35 | |
Here's an updated patch for py3k (3.2). The test still fails without the fix, and passes with the fix. Thinking more about this, I'll try summarizing the bug more coherently: When the main thread joins the child threads, it acquires some locks. If a fork in a child thread occurs while those locks are held, they remain locked in the child process. My solution is to do here what we do elsewhere in CPython: abandon radioactive locks and allocate fresh ones. |
|||
| msg109933 - (view) | Author: Reid Kleckner (rnk) (Python committer) | Date: 2010年07月10日 20:39 | |
I realized that in a later fix for unladen-swallow, we also cleared the condition variable waiters list, since it has radioactive synchronization primitives in it as well. Here's an updated patch that simplifies the fix by just using __init__() to completely reinitialize the condition variables and adds a test. This corresponds to unladen-swallow revisions r799 and r834. |
|||
| msg110071 - (view) | Author: Adam Olsen (Rhamphoryncus) | Date: 2010年07月12日 06:34 | |
I don't have any direct opinions on this, as it is just a bandaid. fork, as defined by POSIX, doesn't allow what we do with it, so we're reliant on great deal of OS and library implementation details. The only portable and robust solution would be to replace it with a unified fork-and-exec API that's implemented directly in C. |
|||
| msg110092 - (view) | Author: Reid Kleckner (rnk) (Python committer) | Date: 2010年07月12日 15:11 | |
I completely agree, but the cat is out of the bag on this one. I don't see how we could get rid of fork until Py4K, and even then I'm sure there will be people who don't want to see it go, and I'd rather not spend my time arguing this point. The only application of fork that doesn't use exec that I've heard of is pre-forked Python servers. But those don't seem like they would be very useful, since with refcounting the copy-on-write behavior doesn't get you very many wins. The problem that this bandaid solves for me is that test_threading.py already tests thread+fork behaviors, and can fail non-deterministically. This problem was exacerbated while I was working on making the compilation thread. I don't think we can un-support fork and threads in the near future either, because subprocess.py uses fork, and libraries can use fork behind the user's back. |
|||
| msg125236 - (view) | Author: Gregory P. Smith (gregory.p.smith) * (Python committer) | Date: 2011年01月03日 20:48 | |
fwiw a unified fork-and-exec API implemented in C is what I added in Modules/_posixsubprocess.c to at least avoid this issue as much as possible when using subprocess. |
|||
| msg125240 - (view) | Author: Gregory P. Smith (gregory.p.smith) * (Python committer) | Date: 2011年01月03日 21:07 | |
patch looks good. committed in r87710 for 3.2. needs back porting to 3.1 and 2.7 and optionally 2.6. |
|||
| msg125270 - (view) | Author: Gregory P. Smith (gregory.p.smith) * (Python committer) | Date: 2011年01月04日 01:10 | |
r87726 for release31-maint r87727 for release27-maint - this required a bit more fiddling as _block and _started and _cond were __ private. |
|||
| msg125273 - (view) | Author: Gregory P. Smith (gregory.p.smith) * (Python committer) | Date: 2011年01月04日 01:25 | |
Attached is a patch for Python 2.6 release26_maint for reference incase someone wants it. That branch is closed - security fixes only. |
|||
| msg125338 - (view) | Author: Nadeem Vawda (nadeem.vawda) * (Python committer) | Date: 2011年01月04日 16:30 | |
r87710 introduces an AttributeError in test_thread's TestForkInThread test case. If os.fork() is called from a thread created by the _thread module, threading._after_fork() will get a _DummyThread (with no _block attribute) as the current thread. I've attached a patch that checks whether the thread has a _block attribute before trying to reinitialize it. |
|||
| msg125346 - (view) | Author: Gregory P. Smith (gregory.p.smith) * (Python committer) | Date: 2011年01月04日 18:34 | |
eek, thanks for noticing that! r87740 fixes this in py3k. backporting to 3.1 and 2.7 now. |
|||
| msg125350 - (view) | Author: Gregory P. Smith (gregory.p.smith) * (Python committer) | Date: 2011年01月04日 18:44 | |
r87741 3.1 r87742 2.7 |
|||
| msg193923 - (view) | Author: Maciej Bliziński (automatthias) | Date: 2013年07月30日 10:54 | |
Python version: 2.7.5 OS: Solaris 9 I'm still observing this issue (or Issue5114) on Solaris 9. The symptom is that test_threading hangs indefinitely (tested: overnight) and running pstack on the process, I'm seeing: ----------------- lwp# 1 / thread# 1 -------------------- ff3dc734 lwp_park (0, 0, 0) ff3d3c74 s9_lwp_park (0, 0, 0, 1, feed4f48, 18f5a4) + 28 ff3dc698 s9_handler (0, 0, 0, 1, feed4f48, 18f5a4) + 90 ff1dea70 _sema_wait (0, feee66a0, fed6b054, feee6000, 2a298478, d1f20) + 1d4 ff1dec30 sema_wait (81aa8, ff1dec24, 722a5b4b, 1101c, feed4f48, 134d60) + c feed4f48 sem_wait (81aa8, 0, fed6b1ac, 0, 0, 1) + 20 ff050890 PyThread_acquire_lock (81aa8, 1, fed6b214, 2, 0, 1ae778) + 5c ff05524c lock_PyThread_acquire_lock (0, 22030, 0, 13ee40, 16a298, 55150) + 50 fefa779c PyCFunction_Call (1ae788, 22030, 0, ff0d8eb8, 55150, ff0551fc) + e4 ff016b14 PyEval_EvalFrameEx (18f5a0, 0, 0, d4f66, 16a298, 22030) + 5ee8 ff0185d0 PyEval_EvalCodeEx (12c968, 0, 18f5a0, 4, 1, 18f5a4) + 924 ff0168f8 PyEval_EvalFrameEx (1902b8, 0, 1, 1765c0, 16a298, 1b12d0) + 5ccc ff0185d0 PyEval_EvalCodeEx (13f608, 0, 1902b8, 4, 1, 1902bc) + 924 ff0168f8 PyEval_EvalFrameEx (154748, 0, 1, 31f7f, 16a298, 1b1250) + 5ccc ff0185d0 PyEval_EvalCodeEx (10d650, 54a50, 154748, 2203c, 0, 2203c) + 924 fef8e11c function_call (22038, 22030, 1386f0, 2203c, 130730, 22030) + 168 fef604e8 PyObject_Call (130730, 22030, 1386f0, ff0e0340, fef8dfb4, 0) + 60 ff0137dc PyEval_EvalFrameEx (169110, 0, 22030, 10e62d, 16a298, 22030) + 2bb0 ff017478 PyEval_EvalFrameEx (168f80, 0, 169114, 1769fa, 16a298, 16a298) + 684c ff017478 PyEval_EvalFrameEx (176cb0, 0, 168f84, 12a2c0, 16a298, 16a298) + 684c ff0185d0 PyEval_EvalCodeEx (13f410, 176cb4, 176cb0, 13433c, 1, 0) + 924 fef8e040 function_call (1b26f0, 134330, 0, ff1bc000, 1b26f0, 0) + 8c fef604e8 PyObject_Call (1b26f0, 134330, 0, ff0e0340, fef8dfb4, 134320) + 60 fef6e530 instancemethod_call (0, 134330, 0, 0, 1b26f0, 134bd0) + a4 fef604e8 PyObject_Call (c3b48, 22030, 0, ff0e0340, fef6e48c, 0) + 60 ff01051c PyEval_CallObjectWithKeywords (c3b48, 22030, 0, 0, 0, 0) + 68 ff05568c t_bootstrap (63bd0, 0, 0, 0, 16a298, ff0e2804) + 4c ff1e53a4 _lwp_start (0, 0, 0, 0, 0, 0) ----------------- lwp# 2 / thread# 2 -------------------- ff3dc734 lwp_park (0, 0, 0) ff3d3c74 s9_lwp_park (0, 0, 0, 1, b64a0d58, 136818) + 28 ff3dc698 s9_handler (0, 0, 0, 1, b64a0d58, 136818) + 90 ff1dea70 _sema_wait (0, feee66a0, fec6b054, feee6000, 2a298478, d1f20) + 1d4 ff1dec30 sema_wait (8ab00, ff1dec24, 722a5b4b, 1101c, feed4f48, 134d60) + c feed4f48 sem_wait (8ab00, 0, fec6b1ac, 0, 0, 1) + 20 ff050890 PyThread_acquire_lock (8ab00, 1, fec6b214, 2, 0, 1ae610) + 5c ff05524c lock_PyThread_acquire_lock (0, 22030, 0, 13ee40, 156168, 55160) + 50 fefa779c PyCFunction_Call (1ae620, 22030, 0, ff0d8eb8, 55160, ff0551fc) + e4 ff016b14 PyEval_EvalFrameEx (18fe60, 0, 0, d4f66, 156168, 22030) + 5ee8 ff0185d0 PyEval_EvalCodeEx (12c968, 0, 18fe60, 4, 1, 18fe64) + 924 ff0168f8 PyEval_EvalFrameEx (18fce8, 0, 1, 1765c0, 156168, 1b11b0) + 5ccc ff0185d0 PyEval_EvalCodeEx (13f608, 0, 18fce8, 4, 1, 18fcec) + 924 ff0168f8 PyEval_EvalFrameEx (18fb88, 0, 1, 136155, 156168, 1a2930) + 5ccc ff0185d0 PyEval_EvalCodeEx (48b60, 18fb8c, 18fb88, 19d41c, 1, 2203c) + 924 fef8e11c function_call (22038, 19d410, 1b3c00, 2203c, 130370, 22030) + 168 fef604e8 PyObject_Call (130370, 19d410, 1b3c00, ff0e0340, fef8dfb4, 19d400) + 60 ff0137dc PyEval_EvalFrameEx (18fa20, 0, 19d410, 10e62d, 156168, 134950) + 2bb0 ff017478 PyEval_EvalFrameEx (18f890, 0, 18fa24, 1769fa, 156168, 156168) + 684c ff017478 PyEval_EvalFrameEx (18f728, 0, 18f894, 12a2c0, 156168, 156168) + 684c ff0185d0 PyEval_EvalCodeEx (13f410, 18f72c, 18f728, 19d3fc, 1, 0) + 924 fef8e040 function_call (1b26f0, 19d3f0, 0, ff1bc000, 1b26f0, 0) + 8c fef604e8 PyObject_Call (1b26f0, 19d3f0, 0, ff0e0340, fef8dfb4, 19d3e0) + 60 fef6e530 instancemethod_call (0, 19d3f0, 0, 0, 1b26f0, 1b1250) + a4 fef604e8 PyObject_Call (1aeaf8, 22030, 0, ff0e0340, fef6e48c, 0) + 60 ff01051c PyEval_CallObjectWithKeywords (1aeaf8, 22030, 0, 0, 0, 0) + 68 ff05568c t_bootstrap (63c30, 0, 0, 0, 156168, ff0e2804) + 4c ff1e53a4 _lwp_start (0, 0, 0, 0, 0, 0) The problem does not occur on Solaris 10. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:56:51 | admin | set | github: 50892 |
| 2013年07月30日 10:54:40 | automatthias | set | nosy:
+ automatthias messages: + msg193923 |
| 2011年06月25日 10:43:10 | neologix | link | issue5114 superseder |
| 2011年01月04日 18:44:17 | gregory.p.smith | set | status: open -> closed messages: + msg125350 resolution: accepted -> fixed nosy: barry, collinwinter, gregory.p.smith, Rhamphoryncus, jyasskin, nadeem.vawda, rnk |
| 2011年01月04日 18:34:43 | gregory.p.smith | set | priority: normal -> release blocker nosy: + barry messages: + msg125346 |
| 2011年01月04日 16:40:56 | pitrou | set | status: closed -> open nosy: collinwinter, gregory.p.smith, Rhamphoryncus, jyasskin, nadeem.vawda, rnk |
| 2011年01月04日 16:30:45 | nadeem.vawda | set | files:
+ test_thread.diff nosy: + nadeem.vawda messages: + msg125338 |
| 2011年01月04日 01:25:20 | gregory.p.smith | set | status: open -> closed files: + issue6643-release26_maint_gps01.diff versions: - Python 2.7 nosy: collinwinter, gregory.p.smith, Rhamphoryncus, jyasskin, rnk messages: + msg125273 keywords: + patch |
| 2011年01月04日 01:10:40 | gregory.p.smith | set | nosy:
collinwinter, gregory.p.smith, Rhamphoryncus, jyasskin, rnk messages: + msg125270 versions: - Python 3.1, Python 3.2 |
| 2011年01月03日 21:07:41 | gregory.p.smith | set | assignee: rnk -> gregory.p.smith messages: + msg125240 resolution: accepted nosy: collinwinter, gregory.p.smith, Rhamphoryncus, jyasskin, rnk |
| 2011年01月03日 20:48:25 | gregory.p.smith | set | nosy:
collinwinter, gregory.p.smith, Rhamphoryncus, jyasskin, rnk messages: + msg125236 |
| 2010年07月18日 14:50:49 | rnk | link | issue6642 dependencies |
| 2010年07月18日 14:49:23 | rnk | set | keywords:
+ needs review, - patch assignee: rnk |
| 2010年07月12日 15:11:29 | rnk | set | messages: + msg110092 |
| 2010年07月12日 06:34:26 | Rhamphoryncus | set | messages: + msg110071 |
| 2010年07月11日 14:49:42 | pitrou | set | nosy:
+ gregory.p.smith, Rhamphoryncus |
| 2010年07月11日 13:22:50 | rnk | set | title: joining a child that forks can deadlock in the forked child process -> Throw away more radioactive locks that could be held across a fork in threading.py |
| 2010年07月10日 20:39:02 | rnk | set | files:
+ thread-fork-join.diff messages: + msg109933 |
| 2010年07月10日 19:35:50 | rnk | set | files:
+ thread-fork-join.diff messages: + msg109914 |
| 2009年08月11日 18:19:33 | collinwinter | set | nosy:
+ jyasskin, collinwinter components: + Interpreter Core |
| 2009年08月04日 21:18:43 | rnk | set | files:
+ forkdeadlock.diff keywords: + patch messages: + msg91273 versions: + Python 3.1, Python 2.7, Python 3.2 |
| 2009年08月04日 18:56:48 | rnk | create | |