This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2017年06月19日 15:20 by vstinner, last changed 2022年04月11日 14:58 by admin. This issue is now closed.
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 2408 | closed | vstinner, 2017年06月26日 13:46 | |
| PR 2415 | merged | pitrou, 2017年06月26日 21:54 | |
| PR 2527 | merged | pitrou, 2017年07月01日 15:26 | |
| PR 2581 | merged | masamoto, 2017年07月05日 08:01 | |
| Messages (25) | |||
|---|---|---|---|
| msg296348 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2017年06月19日 15:20 | |
regrtest hangs while running test_multiprocessing_forkserver in a worker process on the "x86-64 Sierra 3.x" buildbot. Bug seen twice: build 343 and 337. http://buildbot.python.org/all/builders/x86-64%20Sierra%203.x/builds/337/steps/test/logs/stdio Run tests in parallel using 2 child processes ... 0:29:36 load avg: 5.24 [402/406] test_deque passed -- running: test_multiprocessing_forkserver (1346 sec) 0:29:37 load avg: 5.24 [403/406] test_descrtut passed -- running: test_multiprocessing_forkserver (1346 sec) 0:29:37 load avg: 4.90 [404/406] test_weakset passed -- running: test_multiprocessing_forkserver (1347 sec) 0:29:40 load avg: 4.90 [405/406] test_distutils passed -- running: test_multiprocessing_forkserver (1350 sec) command timed out: 1200 seconds without output running ['make', 'buildbottest', 'TESTOPTS=-j2', 'TESTPYTHONOPTS=', 'TESTTIMEOUT=900'], attempting to kill process killed by signal 9 program finished with exit code -1 elapsedTime=2983.908679 |
|||
| msg296376 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2017年06月19日 21:17 | |
Different issue, but same behaviour: test hangs and then killed by buildbot, whereas a single test was still running. See also bpo-30351 which tracks similar bugs, but on Python 2.7. I really hate such bugs :-( http://buildbot.python.org/all/builders/AMD64%20Debian%20root%203.x/builds/928/steps/test/logs/stdio ... 0:15:10 load avg: 0.52 [404/406] test_unicodedata passed -- running: test_pydoc (401 sec) 0:15:10 load avg: 0.52 [405/406] test_exception_hierarchy passed -- running: test_pydoc (401 sec) command timed out: 1200 seconds without output running ['make', 'buildbottest', 'TESTOPTS=-j2', 'TESTPYTHONOPTS=', 'TESTTIMEOUT=900'], attempting to kill process killed by signal 9 program finished with exit code -1 elapsedTime=2111.415970 |
|||
| msg296419 - (view) | Author: Matt Billenstein (mattbillenstein) | Date: 2017年06月20日 12:49 | |
I've been debugging this and I can repro on El Capitan on a different machine as well -- it's an infrequent hang, I've been running in a loop: mattb@mattb-mbp:~/src/misc/cpython master$ for i in $(seq 1000); do echo "Run: $i -- $(date)"; time./python.exe -m test --timeout 180 test_multiprocessing_forkserver || break; sleep 10; done And in this case after ~30 successful runs: Run: 31 -- Tue Jun 20 05:12:19 PDT 2017 Run tests sequentially 0:00:00 load avg: 4.28 [1/1] test_multiprocessing_forkserver Timeout (0:03:00)! Thread 0x00007fffa1b3a3c0 (most recent call first): File "/Users/mattb/src/misc/cpython/Lib/selectors.py", line 415 in select File "/Users/mattb/src/misc/cpython/Lib/multiprocessing/connection.py", line 916 in wait File "/Users/mattb/src/misc/cpython/Lib/multiprocessing/popen_fork.py", line 45 in wait File "/Users/mattb/src/misc/cpython/Lib/multiprocessing/process.py", line 121 in join File "/Users/mattb/src/misc/cpython/Lib/test/_test_multiprocessing.py", line 431 in test_many_processes File "/Users/mattb/src/misc/cpython/Lib/unittest/case.py", line 605 in run File "/Users/mattb/src/misc/cpython/Lib/unittest/case.py", line 653 in __call__ File "/Users/mattb/src/misc/cpython/Lib/unittest/suite.py", line 122 in run File "/Users/mattb/src/misc/cpython/Lib/unittest/suite.py", line 84 in __call__ File "/Users/mattb/src/misc/cpython/Lib/unittest/suite.py", line 122 in run File "/Users/mattb/src/misc/cpython/Lib/unittest/suite.py", line 84 in __call__ File "/Users/mattb/src/misc/cpython/Lib/unittest/suite.py", line 122 in run File "/Users/mattb/src/misc/cpython/Lib/unittest/suite.py", line 84 in __call__ File "/Users/mattb/src/misc/cpython/Lib/test/support/__init__.py", line 1772 in run File "/Users/mattb/src/misc/cpython/Lib/test/support/__init__.py", line 1896 in _run_suite File "/Users/mattb/src/misc/cpython/Lib/test/support/__init__.py", line 1936 in run_unittest File "/Users/mattb/src/misc/cpython/Lib/test/libregrtest/runtest.py", line 168 in test_runner File "/Users/mattb/src/misc/cpython/Lib/test/libregrtest/runtest.py", line 169 in runtest_inner File "/Users/mattb/src/misc/cpython/Lib/test/libregrtest/runtest.py", line 137 in runtest File "/Users/mattb/src/misc/cpython/Lib/test/libregrtest/main.py", line 374 in run_tests_sequential File "/Users/mattb/src/misc/cpython/Lib/test/libregrtest/main.py", line 454 in run_tests File "/Users/mattb/src/misc/cpython/Lib/test/libregrtest/main.py", line 530 in _main File "/Users/mattb/src/misc/cpython/Lib/test/libregrtest/main.py", line 504 in main File "/Users/mattb/src/misc/cpython/Lib/test/libregrtest/main.py", line 573 in main File "/Users/mattb/src/misc/cpython/Lib/test/__main__.py", line 2 in <module> File "/Users/mattb/src/misc/cpython/Lib/runpy.py", line 85 in _run_code File "/Users/mattb/src/misc/cpython/Lib/runpy.py", line 193 in _run_module_as_main real 3m0.115s user 0m5.804s sys 0m0.683s I also ran ~600 passes overnight on this test on an Ubuntu 16.04 machine without a hang, so it seems to be OSX specific. |
|||
| msg296420 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2017年06月20日 12:53 | |
> I've been debugging this and I can repro on El Capitan on a different machine as well -- it's an infrequent hang, I've been running in a loop: Hum, a timeout of 3 minutes seems short for test_multiprocessing_forkserver. How long does it take *usually* to run this test? At least, we have a first clue: test_many_processes() of Lib/test/_test_multiprocessing.py. |
|||
| msg296460 - (view) | Author: Matt Billenstein (mattbillenstein) | Date: 2017年06月20日 15:39 | |
It consistently takes between ~61 and ~73 seconds with this setup. |
|||
| msg296641 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2017年06月22日 16:11 | |
Matt, if you try the following command, it will run the specific test in a loop in verbose mode: $ ./python -m test --timeout=30 -F -m test_many_processes -v test_multiprocessing_forkserver |
|||
| msg296774 - (view) | Author: Matt Billenstein (mattbillenstein) | Date: 2017年06月24日 15:56 | |
Cool -- do you need me to do something more to help debug this? |
|||
| msg296775 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2017年06月24日 15:57 | |
It would be nice to know if there is additional output (for example exceptions happening in other threads or processes) when you run that command and manage to trigger a hang. |
|||
| msg296777 - (view) | Author: Matt Billenstein (mattbillenstein) | Date: 2017年06月24日 16:56 | |
I don't see anything odd -- it runs for awhile and then times out once it's deadlocked: 0:03:18 load avg: 3.20 [224] test_multiprocessing_forkserver test_many_processes (test.test_multiprocessing_forkserver.WithProcessesTestProcess) ... ok test_many_processes (test.test_multiprocessing_forkserver.WithThreadsTestProcess) ... skipped 'test not appropriate for threads' ---------------------------------------------------------------------- Ran 2 tests in 0.849s OK (skipped=1) 0:03:18 load avg: 3.11 [225] test_multiprocessing_forkserver test_many_processes (test.test_multiprocessing_forkserver.WithProcessesTestProcess) ... ok test_many_processes (test.test_multiprocessing_forkserver.WithThreadsTestProcess) ... skipped 'test not appropriate for threads' ---------------------------------------------------------------------- Ran 2 tests in 0.865s OK (skipped=1) 0:03:19 load avg: 3.11 [226] test_multiprocessing_forkserver test_many_processes (test.test_multiprocessing_forkserver.WithProcessesTestProcess) ... Timeout (0:00:30)! Thread 0x00007fff790c1000 (most recent call first): File "/Users/mattb/src/misc/cpython/Lib/selectors.py", line 415 in select File "/Users/mattb/src/misc/cpython/Lib/multiprocessing/connection.py", line 916 in wait File "/Users/mattb/src/misc/cpython/Lib/multiprocessing/popen_fork.py", line 45 in wait File "/Users/mattb/src/misc/cpython/Lib/multiprocessing/process.py", line 121 in join File "/Users/mattb/src/misc/cpython/Lib/test/_test_multiprocessing.py", line 431 in test_many_processes File "/Users/mattb/src/misc/cpython/Lib/unittest/case.py", line 605 in run File "/Users/mattb/src/misc/cpython/Lib/unittest/case.py", line 653 in __call__ File "/Users/mattb/src/misc/cpython/Lib/unittest/suite.py", line 122 in run File "/Users/mattb/src/misc/cpython/Lib/unittest/suite.py", line 84 in __call__ File "/Users/mattb/src/misc/cpython/Lib/unittest/suite.py", line 122 in run File "/Users/mattb/src/misc/cpython/Lib/unittest/suite.py", line 84 in __call__ File "/Users/mattb/src/misc/cpython/Lib/unittest/suite.py", line 122 in run File "/Users/mattb/src/misc/cpython/Lib/unittest/suite.py", line 84 in __call__ File "/Users/mattb/src/misc/cpython/Lib/unittest/runner.py", line 176 in run File "/Users/mattb/src/misc/cpython/Lib/test/support/__init__.py", line 1896 in _run_suite File "/Users/mattb/src/misc/cpython/Lib/test/support/__init__.py", line 1936 in run_unittest File "/Users/mattb/src/misc/cpython/Lib/test/libregrtest/runtest.py", line 168 in test_runner File "/Users/mattb/src/misc/cpython/Lib/test/libregrtest/runtest.py", line 169 in runtest_inner File "/Users/mattb/src/misc/cpython/Lib/test/libregrtest/runtest.py", line 137 in runtest File "/Users/mattb/src/misc/cpython/Lib/test/libregrtest/main.py", line 374 in run_tests_sequential File "/Users/mattb/src/misc/cpython/Lib/test/libregrtest/main.py", line 454 in run_tests File "/Users/mattb/src/misc/cpython/Lib/test/libregrtest/main.py", line 530 in _main File "/Users/mattb/src/misc/cpython/Lib/test/libregrtest/main.py", line 504 in main File "/Users/mattb/src/misc/cpython/Lib/test/libregrtest/main.py", line 573 in main File "/Users/mattb/src/misc/cpython/Lib/test/__main__.py", line 2 in <module> File "/Users/mattb/src/misc/cpython/Lib/runpy.py", line 85 in _run_code File "/Users/mattb/src/misc/cpython/Lib/runpy.py", line 193 in _run_module_as_main |
|||
| msg296778 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2017年06月24日 16:59 | |
Thanks for trying :-) Hmm, that's annoying. I don't know if you'd like to give me shell access to the machine (and a CPython checkout I can play with, perhaps)? |
|||
| msg296824 - (view) | Author: Matt Billenstein (mattbillenstein) | Date: 2017年06月25日 14:20 | |
Yes, I'll email you the details. |
|||
| msg296876 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2017年06月26日 11:46 | |
Ok, I think I've managed to dig to the core issue. It is actually the same issue as https://bugs.python.org/issue11768 (which was wrongly closed as fixed, apparently :-)). Py_AddPendingCall() calls PyThread_acquire_lock() to try and take the pending calls lock. Unfortunately, PyThread_acquire_lock() is not reentrant in the case where semaphores are not used (e.g. on OS X). We can probably fix that first issue by calling pthread_mutex_trylock() instead of pthread_mutex_lock(). There is a second more fundamental issue, though, which is that PyThread_acquire_lock() calls into functions that are not async-signal-safe (see http://man7.org/linux/man-pages/man7/signal-safety.7.html for a list). So, depending on the particular OS and libc implementation, PyThread_acquire_lock() can fail in mysterious ways (including hang the process) when called from a signal handler. So perhaps the ultimate fix would be to remove the OS-based locking in Py_AddPendingCall and use a busy spinwait... The performance implications may be bad, though. |
|||
| msg296879 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2017年06月26日 11:51 | |
Using pthread_mutex_trylock() *and* disabling the CHECK_STATUS_PTHREAD() calls (which use the non-async-signal-safe fprintf()) at least seems to suppress the hangs on Matt's OS X machine (after more than 1000 runs). |
|||
| msg296881 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2017年06月26日 11:55 | |
> It is actually the same issue as https://bugs.python.org/issue11768 (which was wrongly closed as fixed, apparently :-)) Well, don't be confused by the issue title. The only made change is to only call Py_AddPendingCall() only once, instead of calling it for each received signal. I don't know if the signal handler is really reentrant or not. Signal handling is hard :-( |
|||
| msg296895 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2017年06月26日 13:44 | |
If I understood corretly, the problem is that the Python C signal handler is not reentrant because it calls Py_AddPendingCall() which uses a lock and a list. Before, the signal handler queued a new call to checksignals_witharg() (which just calls PyErr_CheckSignals()) for each received signal. Now, we only queue a single call to checksignals_witharg(). To prevent reentrency issues, can't we hardcoded a call to PyErr_CheckSignals() in ceval.c when SIGNAL_PENDING_CALLS() is called? Py_AddPendingCall() feature is rarely used, it's mostly used for processing signals, no? Calling PyErr_CheckSignals() when no signal was received is cheap, so it shouldn't hurt. |
|||
| msg296897 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2017年06月26日 13:50 | |
I think Kristjan uses Py_AddPendingCall, so ideally we would make it more reliable. However, we're right that for the purpose of signal delivery, we can hardcall a call inside Py_MakePendingCalls(). |
|||
| msg296903 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2017年06月26日 14:25 | |
It seems https://github.com/python/cpython/pull/2408 alone solves the issue on Matt's machine. It's also a pleasantly simple patch :-) |
|||
| msg296905 - (view) | Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) | Date: 2017年06月26日 14:55 | |
Thanks for the mention, @pitrou. CCP was using Py_AddPendingCall but not from signal handlers, but external threads. Also on windows only. You'll also be happy to know that I have left CCP and the Eve codebase is being kept stable while regularly adding security patches from the 2.7 codebase, as far as I know :) |
|||
| msg297208 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2017年06月28日 19:03 | |
For the record, https://github.com/python/cpython/pull/2415 is ready. |
|||
| msg297234 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2017年06月28日 21:29 | |
New changeset c08177a1ccad2ed0d50898c2731b518c631aed14 by Antoine Pitrou in branch 'master': bpo-30703: Improve signal delivery (#2415) https://github.com/python/cpython/commit/c08177a1ccad2ed0d50898c2731b518c631aed14 |
|||
| msg297361 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2017年06月30日 08:37 | |
I will wait a bit and then backport this to 3.6. |
|||
| msg297405 - (view) | Author: Matt Billenstein (mattbillenstein) | Date: 2017年06月30日 14:14 | |
Verified this by letting the test_many_processes loop overnight (42bc8beadd49">master@42bc8beadd49 )-- 34k passes later and no hangs. Nice work! |
|||
| msg297413 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2017年06月30日 14:47 | |
I concur with Matt: nice job Antoine, thanks for making Python more reliable ;-) |
|||
| msg297418 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2017年06月30日 15:28 | |
Thanks for running such a lengthy test, Matt :-) |
|||
| msg297484 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2017年07月01日 17:12 | |
New changeset 3024c0529077f5cff0b32dc84b5923c8fba99a87 by Antoine Pitrou in branch '3.6': [3.6] bpo-30703: Improve signal delivery (GH-2415) (#2527) https://github.com/python/cpython/commit/3024c0529077f5cff0b32dc84b5923c8fba99a87 |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:58:47 | admin | set | github: 74888 |
| 2019年03月29日 16:08:52 | eric.snow | set | nosy:
+ eric.snow |
| 2017年07月05日 08:01:51 | masamoto | set | pull_requests: + pull_request2652 |
| 2017年07月01日 17:14:23 | pitrou | set | status: open -> closed stage: backport needed -> resolved |
| 2017年07月01日 17:12:08 | pitrou | set | messages: + msg297484 |
| 2017年07月01日 15:26:01 | pitrou | set | pull_requests: + pull_request2592 |
| 2017年06月30日 15:28:06 | pitrou | set | messages: + msg297418 |
| 2017年06月30日 14:47:33 | vstinner | set | messages: + msg297413 |
| 2017年06月30日 14:14:49 | mattbillenstein | set | messages: + msg297405 |
| 2017年06月30日 08:37:48 | pitrou | set | resolution: fixed stage: patch review -> backport needed messages: + msg297361 versions: + Python 3.6 |
| 2017年06月28日 21:29:31 | pitrou | set | messages: + msg297234 |
| 2017年06月28日 19:03:57 | pitrou | set | messages: + msg297208 |
| 2017年06月26日 21:54:45 | pitrou | set | pull_requests: + pull_request2473 |
| 2017年06月26日 14:55:13 | kristjan.jonsson | set | messages: + msg296905 |
| 2017年06月26日 14:26:03 | pitrou | set | stage: patch review |
| 2017年06月26日 14:25:53 | pitrou | set | dependencies: - PyThread_acquire_lock can block even when asked not ot, Make CHECK_STATUS_PTHREAD signal-safe |
| 2017年06月26日 14:25:49 | pitrou | set | messages: + msg296903 |
| 2017年06月26日 13:50:50 | pitrou | set | nosy:
+ kristjan.jonsson messages: + msg296897 |
| 2017年06月26日 13:46:02 | vstinner | set | pull_requests: + pull_request2456 |
| 2017年06月26日 13:45:01 | vstinner | set | title: test_multiprocessing_forkserver hangs on the master branch -> Non-reentrant signal handler (test_multiprocessing_forkserver hangs) |
| 2017年06月26日 13:44:40 | vstinner | set | messages: + msg296895 |
| 2017年06月26日 12:52:55 | pitrou | set | type: behavior versions: - Python 2.7, Python 3.5, Python 3.6 |
| 2017年06月26日 12:44:15 | pitrou | set | dependencies: + PyThread_acquire_lock can block even when asked not ot, Make CHECK_STATUS_PTHREAD signal-safe |
| 2017年06月26日 11:55:08 | vstinner | set | messages: + msg296881 |
| 2017年06月26日 11:51:13 | pitrou | set | messages: + msg296879 |
| 2017年06月26日 11:46:23 | pitrou | set | nosy:
+ njs, neologix messages: + msg296876 versions: + Python 2.7, Python 3.5, Python 3.6 |
| 2017年06月25日 14:20:41 | mattbillenstein | set | messages: + msg296824 |
| 2017年06月24日 16:59:17 | pitrou | set | messages: + msg296778 |
| 2017年06月24日 16:56:21 | mattbillenstein | set | messages: + msg296777 |
| 2017年06月24日 15:57:56 | pitrou | set | messages: + msg296775 |
| 2017年06月24日 15:56:55 | mattbillenstein | set | messages: + msg296774 |
| 2017年06月22日 16:11:26 | pitrou | set | nosy:
+ pitrou messages: + msg296641 |
| 2017年06月22日 16:05:14 | pitrou | set | title: regrtest hangs on the master branch -> test_multiprocessing_forkserver hangs on the master branch |
| 2017年06月20日 15:39:59 | mattbillenstein | set | messages: + msg296460 |
| 2017年06月20日 12:53:02 | vstinner | set | messages: + msg296420 |
| 2017年06月20日 12:49:28 | mattbillenstein | set | nosy:
+ mattbillenstein messages: + msg296419 |
| 2017年06月19日 21:17:37 | vstinner | set | title: test_multiprocessing_forkserver hangs on x86-64 Sierra 3.x. buildbot -> regrtest hangs on the master branch |
| 2017年06月19日 21:17:11 | vstinner | set | messages: + msg296376 |
| 2017年06月19日 15:21:04 | vstinner | set | nosy:
+ ronaldoussoren, ned.deily components: + macOS |
| 2017年06月19日 15:20:57 | vstinner | create | |