homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Deadlock in test_concurrent_futures
Type: behavior Stage: resolved
Components: Tests Versions: Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: rosslagerwall Nosy List: neologix, pitrou, python-dev, rosslagerwall, vstinner
Priority: normal Keywords: patch

Created on 2011年06月19日 17:25 by vstinner, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
issue.patch rosslagerwall, 2012年01月03日 06:12 patch review
itest.py rosslagerwall, 2012年01月03日 06:12 test program
Messages (9)
msg138648 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年06月19日 17:25
[271/356/1] test_concurrent_futures
Traceback (most recent call last):
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/multiprocessing/queues.py", line 268, in _feed
 send(obj)
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/multiprocessing/connection.py", line 229, in send
 self._send_bytes(memoryview(buf))
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/multiprocessing/connection.py", line 423, in _send_bytes
 self._send(struct.pack("=i", len(buf)))
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/multiprocessing/connection.py", line 392, in _send
 n = write(self._handle, buf)
OSError: [Errno 32] Broken pipe
Timeout (1:00:00)!
Thread 0x00000954:
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/threading.py", line 237 in wait
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/multiprocessing/queues.py", line 252 in _feed
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/threading.py", line 690 in run
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/threading.py", line 737 in _bootstrap_inner
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/threading.py", line 710 in _bootstrap
Thread 0x00000953:
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/multiprocessing/forking.py", line 146 in poll
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/multiprocessing/forking.py", line 166 in wait
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/multiprocessing/process.py", line 150 in join
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/concurrent/futures/process.py", line 208 in shutdown_worker
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/concurrent/futures/process.py", line 264 in _queue_management_worker
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/threading.py", line 690 in run
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/threading.py", line 737 in _bootstrap_inner
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/threading.py", line 710 in _bootstrap
Thread 0x00000001:
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/threading.py", line 237 in wait
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/threading.py", line 851 in join
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/concurrent/futures/process.py", line 395 in shutdown
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/test_concurrent_futures.py", line 67 in tearDown
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/case.py", line 407 in _executeTestPart
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/case.py", line 463 in run
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/case.py", line 514 in __call__
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/suite.py", line 105 in run
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/suite.py", line 67 in __call__
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/suite.py", line 105 in run
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/suite.py", line 67 in __call__
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/support.py", line 1166 in run
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/support.py", line 1254 in _run_suite
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/support.py", line 1280 in run_unittest
 File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/test_concurrent_futures.py", line 628 in test_main
 File "./Lib/test/regrtest.py", line 1043 in runtest_inner
 File "./Lib/test/regrtest.py", line 841 in runtest
 File "./Lib/test/regrtest.py", line 668 in main
 File "./Lib/test/regrtest.py", line 1618 in <module>
*** Error code 1
make: Fatal error: Command failed for target `buildbottest'
program finished with exit code 1
See commit e6e7e42efdc2 of the issue #12310.
msg138765 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年06月20日 23:27
Message on a stackoverflow thread:
"I have suffered from the same problem, even if connecting on localhost in python 2.7.1. After a day of debugging i found the cause and a workaround:
Cause: BaseProxy class has thread local storage which caches the connection, which is reused for future connections causing "broken pipe" errors even on creating a new Manager
Workaround: Delete the cached connection before reconnecting
if address in BaseProxy._address_to_local:
 del BaseProxy._address_to_local[self.address][0].connection"
http://stackoverflow.com/questions/3649458/broken-pipe-when-using-python-multiprocessing-managers-basemanager-syncmanager/5884967#5884967
---
See also maybe the (closed) issue #11663: multiprocessing doesn't detect killed processes
msg138766 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年06月20日 23:37
Connection._send_bytes() has a comment about broken pipes:
 def _send_bytes(self, buf):
 # For wire compatibility with 3.2 and lower
 n = len(buf)
 self._send(struct.pack("=i", len(buf)))
 # The condition is necessary to avoid "broken pipe" errors
 # when sending a 0-length buffer if the other end closed the pipe.
 if n > 0:
 self._send(buf)
But the OSError(32, "Broken pipe") occurs on sending the buffer size (a chunk of 4 bytes: self._send(struct.pack("=i", len(buf)))), not on sending the buffer content.
See also maybe the (closed) issue #9205: Parent process hanging in multiprocessing if children terminate unexpectedly
msg138767 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年06月21日 00:42
Ah, submit a new task after the manager shutdown fails with OSError(32, 'Broken pipe'). Example:
---------------
from multiprocessing.managers import BaseManager
class MathsClass(object):
 def foo(self):
 return 42
class MyManager(BaseManager):
 pass
MyManager.register('Maths', MathsClass)
if __name__ == '__main__':
 manager = MyManager()
 manager.start()
 maths = manager.Maths()
 maths.foo()
 manager.shutdown()
 try:
 maths.foo()
 finally:
 manager.shutdown()
---------------
This example doesn't hang, but this issue is about concurrent.futures, not multiprocessing.
msg138768 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年06月21日 01:05
Oh, I think that I found a deadlock (or something like that):
----------------------------
import concurrent.futures
import faulthandler
import os
import signal
import time
def work(n):
 time.sleep(0.1)
def main():
 faulthandler.register(signal.SIGUSR1)
 print("pid: %s" % os.getpid())
 with concurrent.futures.ProcessPoolExecutor() as executor:
 for number, prime in executor.map(work, range(100)):
 print("shutdown")
 executor.shutdown()
 print("shutdown--")
if __name__ == '__main__':
 main()
----------------------------
Trace:
----------------------------
Thread 0x00007fbfc83bd700:
 File "/home/haypo/prog/HG/cpython/Lib/threading.py", line 237 in wait
 File "/home/haypo/prog/HG/cpython/Lib/multiprocessing/queues.py", line 252 in _feed
 File "/home/haypo/prog/HG/cpython/Lib/threading.py", line 690 in run
 File "/home/haypo/prog/HG/cpython/Lib/threading.py", line 737 in _bootstrap_inner
 File "/home/haypo/prog/HG/cpython/Lib/threading.py", line 710 in _bootstrap
Thread 0x00007fbfc8bbe700:
 File "/home/haypo/prog/HG/cpython/Lib/multiprocessing/queues.py", line 101 in put
 File "/home/haypo/prog/HG/cpython/Lib/concurrent/futures/process.py", line 268 in _queue_management_worker
 File "/home/haypo/prog/HG/cpython/Lib/threading.py", line 690 in run
 File "/home/haypo/prog/HG/cpython/Lib/threading.py", line 737 in _bootstrap_inner
 File "/home/haypo/prog/HG/cpython/Lib/threading.py", line 710 in _bootstrap
Current thread 0x00007fbfcc2e3700:
 File "/home/haypo/prog/HG/cpython/Lib/threading.py", line 237 in wait
 File "/home/haypo/prog/HG/cpython/Lib/threading.py", line 851 in join
 File "/home/haypo/prog/HG/cpython/Lib/concurrent/futures/process.py", line 395 in shutdown
 File "/home/haypo/prog/HG/cpython/Lib/concurrent/futures/_base.py", line 570 in __exit__
 File "y.py", line 17 in main
 File "y.py", line 20 in <module>
----------------------------
There are two child processes, but both are zombies (displayed as "<defunct>" by ps). Send SIGUSR1 signal to the frozen process to display the traceback (thanks to faulthandler).
msg150497 - (view) Author: Ross Lagerwall (rosslagerwall) (Python committer) Date: 2012年01月03日 06:12
Retrieving the result of a future after the executor has been shut down can cause a hang.
It seems like this regression was introduced in a76257a99636. This regression exists only for ProcessPoolExecutor.
The problem is that even if there are pending work items, the processes are still signaled to shut down leaving the pending work items permanently unfinished. The patch simply removes the call to shut down the processes when there are pending work items.
Attached is a patch.
msg150498 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012年01月03日 14:33
Well I was sure I had added this code for a reason, but the tests seem to run without...
Just a comment: the test isn't ProcessPoolExecutor-specific, so it should really be in the generic tests.
msg150851 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012年01月08日 06:43
New changeset 26389e9efa9c by Ross Lagerwall in branch '3.2':
Issue #12364: Fix a hang in concurrent.futures.ProcessPoolExecutor.
http://hg.python.org/cpython/rev/26389e9efa9c
New changeset 25f879011102 by Ross Lagerwall in branch 'default':
Merge with 3.2 for #12364.
http://hg.python.org/cpython/rev/25f879011102 
msg150853 - (view) Author: Ross Lagerwall (rosslagerwall) (Python committer) Date: 2012年01月08日 09:35
Thanks!
History
Date User Action Args
2022年04月11日 14:57:18adminsetgithub: 56573
2012年01月08日 09:35:49rosslagerwallsetstatus: open -> closed
type: behavior
messages: + msg150853

assignee: rosslagerwall
resolution: fixed
stage: resolved
2012年01月08日 06:43:54python-devsetnosy: + python-dev
messages: + msg150851
2012年01月03日 14:33:51pitrousetmessages: + msg150498
2012年01月03日 06:12:54rosslagerwallsetfiles: + itest.py
2012年01月03日 06:12:19rosslagerwallsetfiles: + issue.patch

nosy: + rosslagerwall
messages: + msg150497

keywords: + patch
2011年07月05日 12:22:26vstinnersettitle: Timeout (1 hour) in test_concurrent_futures.tearDown() on sparc solaris10 gcc 3.x -> Deadlock in test_concurrent_futures
2011年06月21日 01:05:34vstinnersetmessages: + msg138768
2011年06月21日 00:42:34vstinnersetmessages: + msg138767
2011年06月20日 23:37:03vstinnersetmessages: + msg138766
2011年06月20日 23:27:39vstinnersetmessages: + msg138765
2011年06月19日 17:25:15vstinnercreate

AltStyle によって変換されたページ (->オリジナル) /