homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Segfault in test_multiprocessing
Type: Stage:
Components: Versions:
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: neologix, pitrou, python-dev, vstinner
Priority: normal Keywords:

Created on 2011年06月10日 09:11 by vstinner, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Messages (13)
msg138060 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年06月10日 09:11
test_multiprocessing segfaults in a loop. The crash occurs in _Condition.release() on waiter.release(), called from Queue._finalize_close(). Possible related changes:
 - a5c8b6ebe895: new sigwait() test using thread (issue #8407)
 - 6d6099f7fe89: add sentinels to multiprocessing (issue #9205)
Example of a crash:
[333/356/1] test_multiprocessing
Fatal Python error: Segmentation fault
Thread 0x01cde800:
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/threading.py", line 237 in wait
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/multiprocessing/queues.py", line 252 in _feed
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/threading.py", line 690 in run
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/threading.py", line 737 in _bootstrap_inner
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/threading.py", line 710 in _bootstrap
Current thread 0xa000d000:
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/threading.py", line 298 in notify
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/multiprocessing/queues.py", line 227 in _finalize_close
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/multiprocessing/util.py", line 202 in __call__
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/multiprocessing/process.py", line 270 in _bootstrap
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/multiprocessing/forking.py", line 133 in __init__
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/multiprocessing/process.py", line 134 in start
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/multiprocessing/pool.py", line 220 in _repopulate_pool
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/multiprocessing/pool.py", line 157 in __init__
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/multiprocessing/__init__.py", line 231 in Pool
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/test_multiprocessing.py", line 2186 in test_main
 File "./Lib/test/regrtest.py", line 1043 in runtest_inner
 File "./Lib/test/regrtest.py", line 841 in runtest
 File "./Lib/test/regrtest.py", line 668 in main
 File "./Lib/test/regrtest.py", line 1618 in <module>
There are approximatively 698 crashes in last tests on "x86 Tiger 3.x"!
http://www.python.org/dev/buildbot/all/builders/x86%20Tiger%203.x/builds/2722/steps/test/logs/stdio
Most occured on Queue._finalize_close() -> _Condition.release() -> waiter.release().
msg138064 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年06月10日 09:26
> a5c8b6ebe895: new sigwait() test using thread (issue #8407)
> 6d6099f7fe89: add sentinels to multiprocessing (issue #9205)
The first segfaults occured in build #2719 (cedceeb45030):
http://www.python.org/dev/buildbot/all/builders/x86%20Tiger%203.x/builds/2719
In the timeline, 6d6099f7fe89 < cedceeb45030 < a5c8b6ebe895, so the change is more likely coming from 6d6099f7fe89.
msg138067 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011年06月10日 09:40
Victor, how can there be hundreds of crashes? Isn't the process supposed to terminate when a crash occurs?
There are several crashes in test_signal, so it's not only test_multiprocessing:
Thread 0xa000d000:
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/test_signal.py", line 172 in test_main
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/case.py", line 407 in _executeTestPart
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/case.py", line 462 in run
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/case.py", line 514 in __call__
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/suite.py", line 105 in run
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/suite.py", line 67 in __call__
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/suite.py", line 105 in run
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/suite.py", line 67 in __call__
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/support.py", line 1166 in run
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/support.py", line 1254 in _run_suite
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/support.py", line 1280 in run_unittest
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/test_signal.py", line 687 in test_main
 File "./Lib/test/regrtest.py", line 1043 in runtest_inner
 File "./Lib/test/regrtest.py", line 841 in runtest
 File "./Lib/test/regrtest.py", line 668 in main
 File "./Lib/test/regrtest.py", line 1618 in <module>
Fatal Python error: Segmentation fault
Thread 0xa000d000:
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/test_signal.py", line 248 in test_wakeup_fd_early
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/case.py", line 407 in _executeTestPart
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/case.py", line 462 in run
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/case.py", line 514 in __call__
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/suite.py", line 105 in run
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/suite.py", line 67 in __call__
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/suite.py", line 105 in run
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/suite.py", line 67 in __call__
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/support.py", line 1166 in run
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/support.py", line 1254 in _run_suite
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/support.py", line 1280 in run_unittest
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/test_signal.py", line 687 in test_main
 File "./Lib/test/regrtest.py", line 1043 in runtest_inner
 File "./Lib/test/regrtest.py", line 841 in runtest
 File "./Lib/test/regrtest.py", line 668 in main
 File "./Lib/test/regrtest.py", line 1618 in <module>
Fatal Python error: Segmentation fault
Thread 0xa000d000:
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/test_signal.py", line 372 in readpipe_interrupted
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/test_signal.py", line 394 in test_siginterrupt_on
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/case.py", line 407 in _executeTestPart
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/case.py", line 462 in run
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/case.py", line 514 in __call__
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/suite.py", line 105 in run
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/suite.py", line 67 in __call__
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/suite.py", line 105 in run
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/suite.py", line 67 in __call__
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/support.py", line 1166 in run
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/support.py", line 1254 in _run_suite
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/support.py", line 1280 in run_unittest
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/test_signal.py", line 687 in test_main
 File "./Lib/test/regrtest.py", line 1043 in runtest_inner
 File "./Lib/test/regrtest.py", line 841 in runtest
 File "./Lib/test/regrtest.py", line 668 in main
 File "./Lib/test/regrtest.py", line 1618 in <module>
Fatal Python error: Segmentation fault
Thread 0xa000d000:
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/test_signal.py", line 495 in test_itimer_prof
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/case.py", line 407 in _executeTestPart
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/case.py", line 462 in run
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/case.py", line 514 in __call__
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/suite.py", line 105 in run
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/suite.py", line 67 in __call__
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/suite.py", line 105 in run
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/suite.py", line 67 in __call__
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/support.py", line 1166 in run
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/support.py", line 1254 in _run_suite
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/support.py", line 1280 in run_unittest
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/test_signal.py", line 687 in test_main
 File "./Lib/test/regrtest.py", line 1043 in runtest_inner
 File "./Lib/test/regrtest.py", line 841 in runtest
 File "./Lib/test/regrtest.py", line 668 in main
 File "./Lib/test/regrtest.py", line 1618 in <module>
Fatal Python error: Segmentation fault
Thread 0xa000d000:
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/test_signal.py", line 472 in test_itimer_virtual
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/case.py", line 407 in _executeTestPart
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/case.py", line 462 in run
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/case.py", line 514 in __call__
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/suite.py", line 105 in run
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/suite.py", line 67 in __call__
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/suite.py", line 105 in run
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/suite.py", line 67 in __call__
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/support.py", line 1166 in run
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/support.py", line 1254 in _run_suite
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/support.py", line 1280 in run_unittest
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/test_signal.py", line 687 in test_main
 File "./Lib/test/regrtest.py", line 1043 in runtest_inner
 File "./Lib/test/regrtest.py", line 841 in runtest
 File "./Lib/test/regrtest.py", line 668 in main
 File "./Lib/test/regrtest.py", line 1618 in <module>
Fatal Python error: Segmentation fault
msg138069 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年06月10日 09:42
> Victor, how can there be hundreds of crashes?
> Isn't the process supposed to terminate when a crash occurs?
Yes, a process does terminate on SIGSEGV, but multiprocessing creates subprocesses: I suppose that crashes occur in child processes.
For test_signal, I have to investigate this.
msg138074 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年06月10日 11:02
Le vendredi 10 juin 2011 à 09:40 +0000, Antoine Pitrou a écrit :
> There are several crashes in test_signal
Commit a17710e27ea2 should fix some (all?) test_signal crashes.
msg138132 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年06月10日 21:10
It looks like the sentinel doesn't handle fatal death of the child process:
test test_multiprocessing crashed -- Traceback (most recent call last):
 File "./Lib/test/regrtest.py", line 1043, in runtest_inner
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/test_multiprocessing.py", line 2189, in test_main
 ManagerMixin.manager.start()
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/multiprocessing/managers.py", line 531, in start
 self._address = reader.recv()
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/multiprocessing/connection.py", line 273, in recv
 buf = self._recv_bytes(sentinels=sentinels)
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/multiprocessing/connection.py", line 430, in _recv_bytes
 buf = self._recv(4, sentinels)
 File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/multiprocessing/connection.py", line 413, in _recv
 raise EOFError
EOFError
msg138154 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011年06月11日 09:59
I think it might be related to Issue #6721.
Using a mutex/condition variable after fork (from the child process) is unsafe: it can lead to deadlocks, and on OS-X, it seems like it can lead to segfaults.
Normally, Queue's synchronization primitives are reinitialized after fork, see Queue._after_fork() method.
But here, what happens is the following (well, that's an hypothesis):
Lib/multiprocessing/process.py", line 270 in _bootstrap
:
 _current_process = self
 util._finalizer_registry.clear()
 util._run_after_forkers()
- the old _current_process' reference count drops to zero
- it's deallocated, and since it holds the last reference to a Queue, the queue is finalized
- as a consequence, the Queue._finalize_close() callback (registered through a Finalize object) is called
- Queue._finalize_close() tries to acquire, signal and release the _notempty Condition, but the Condition hasn't been reinitialized yet, because util._run_after_forkers() is called 2 lines below
- segfault
It's probably been triggered by Antoine's patches, but I'm pretty sure this bug has always been there.
I think that moving util._run_after_forkers() up 2 lines should fix the segfaults, but with that change test_number_of_objects fails (I didn't investigate why).
msg138163 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011年06月11日 16:12
Less disruptive approach:
 old_process = _current_process
 _current_process = self
 try:
 util._finalizer_registry.clear()
 util._run_after_forkers()
 finally:
 del old_process
This will delay finalization of the old process object until after _run_after_forkers() is executed, without (hopefully) messing with semantics.
msg138168 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011年06月11日 16:44
Yes, I also tried this.
This fixed the test_multiprocessing failure, but I think it triggered a failure in test_concurrent_futures (didin't look in more detail).
Would it be possible to try this on the buildbot to see if it fixes the segfaults?
msg138190 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年06月11日 23:34
> Would it be possible to try this on the buildbot to see if it fixes
> the segfaults?
You can fork cpython, modify the code, and run a custom buildbot on your
clone.
msg138494 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011年06月17日 10:38
New changeset e6e7e42efdc2 by Victor Stinner in branch '3.2':
Issue #12310: finalize the old process after _run_after_forkers()
http://hg.python.org/cpython/rev/e6e7e42efdc2
New changeset a73e5c1f57d7 by Victor Stinner in branch 'default':
(Merge 3.2) Issue #12310: finalize the old process after _run_after_forkers()
http://hg.python.org/cpython/rev/a73e5c1f57d7 
msg138495 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年06月17日 10:39
Let's try on "real" buildbots. If the commit fixes the issue on 3.x, I will port the fix to Python 2.7.
msg138511 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年06月17日 14:10
test_multiprocessing pass with success on PPC Tiger 3.x (and x86 Tiger 3.x, but the segfaults only occurred on PPC), but this issue is a sporadic issue. I close the issue because I hope that it is closed, but reopen it if you still see segfaults on PPC Tiger 3.x.
History
Date User Action Args
2022年04月11日 14:57:18adminsetgithub: 56519
2011年06月17日 14:10:27vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg138511
2011年06月17日 10:39:25vstinnersetmessages: + msg138495
2011年06月17日 10:38:34python-devsetnosy: + python-dev
messages: + msg138494
2011年06月11日 23:34:21vstinnersetmessages: + msg138190
2011年06月11日 16:44:45neologixsetmessages: + msg138168
2011年06月11日 16:12:38pitrousetmessages: + msg138163
2011年06月11日 09:59:03neologixsetnosy: + neologix
messages: + msg138154
2011年06月10日 21:10:43vstinnersetmessages: + msg138132
2011年06月10日 11:02:09vstinnersetmessages: + msg138074
2011年06月10日 09:42:49vstinnersetmessages: + msg138069
2011年06月10日 09:40:29pitrousetmessages: + msg138067
2011年06月10日 09:26:48vstinnersetmessages: + msg138064
2011年06月10日 09:11:53vstinnercreate

AltStyle によって変換されたページ (->オリジナル) /