homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Azure Pipeline 3.8 CI: multiple tests hung and timed out on macOS 10.13
Type: Stage: resolved
Components: macOS, Tests Versions: Python 3.8
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: aeros, davin, ned.deily, pitrou, ronaldoussoren, steve.dower, vstinner
Priority: normal Keywords:

Created on 2019年06月12日 11:57 by vstinner, last changed 2022年04月11日 14:59 by admin. This issue is now closed.

Messages (12)
msg345325 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019年06月12日 11:57
I backported a change to 3.8:
https://github.com/python/cpython/pull/14000
The macOS job of Azure Pipelines failed badly:
https://dev.azure.com/Python/cpython/_build/results?buildId=45071&view=results
* test_importlib/test_locks.py: test_deadlock() => TIMEOUT
* test_multiprocessing_spawn: test_thread_safety() => TIMEOUT
* test_concurrent_futures: test_pending_calls_race() => TIMEOUT
* test_functools: test_threaded() => TIMEOUT
* test_multiprocessing_forkserver: test_thread_safety() => TIMEOUT
* test_threading: test_is_alive_after_fork() => TIMEOUT
0:20:21 load avg: 4.55 [155/423/1] test_importlib crashed (Exit code 1) -- running: test_concurrent_futures (16 min 12 sec), test_functools (13 min 30 sec), test_multiprocessing_spawn (18 min 51 sec)
Timeout (0:20:00)!
Thread 0x0000700004627000 (most recent call first):
Thread 0x00007fff96f1a380 (most recent call first):
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/test/lock_tests.py", line 49 in __init__
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/test/test_importlib/test_locks.py", line 84 in run_deadlock_avoidance_test
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/test/test_importlib/test_locks.py", line 89 in test_deadlock
 ...
0:21:30 load avg: 4.76 [160/423/2] test_multiprocessing_spawn crashed (Exit code 1) -- running: test_concurrent_futures (17 min 21 sec), test_functools (14 min 39 sec), test_threading (35 sec 923 ms)
Timeout (0:20:00)!
Thread 0x0000700000ff1000 (most recent call first):
Thread 0x00007fff96f1a380 (most recent call first):
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/threading.py", line 847 in start
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/test/support/__init__.py", line 2299 in start_threads
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/contextlib.py", line 113 in __enter__
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/test/_test_multiprocessing.py", line 4138 in test_thread_safety
 ...
0:24:09 load avg: 4.11 [207/423/3] test_concurrent_futures crashed (Exit code 1) -- running: test_functools (17 min 18 sec), test_timeout (34 sec 14 ms), test_threading (3 min 14 sec)
Timeout (0:20:00)!
Thread 0x0000700006644000 (most recent call first):
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/concurrent/futures/thread.py", line 78 in _worker
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/threading.py", line 865 in run
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/threading.py", line 923 in _bootstrap_inner
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/threading.py", line 885 in _bootstrap
Thread 0x0000700006141000 (most recent call first):
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/concurrent/futures/thread.py", line 78 in _worker
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/threading.py", line 865 in run
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/threading.py", line 923 in _bootstrap_inner
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/threading.py", line 885 in _bootstrap
Thread 0x0000700005c3e000 (most recent call first):
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/concurrent/futures/thread.py", line 78 in _worker
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/threading.py", line 865 in run
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/threading.py", line 923 in _bootstrap_inner
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/threading.py", line 885 in _bootstrap
Thread 0x000070000573b000 (most recent call first):
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/concurrent/futures/thread.py", line 78 in _worker
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/threading.py", line 865 in run
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/threading.py", line 923 in _bootstrap_inner
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/threading.py", line 885 in _bootstrap
Thread 0x0000700005238000 (most recent call first):
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/concurrent/futures/thread.py", line 78 in _worker
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/threading.py", line 865 in run
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/threading.py", line 923 in _bootstrap_inner
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/threading.py", line 885 in _bootstrap
Thread 0x00007fff96f1a380 (most recent call first):
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/test/libregrtest/setup.py", line 92 in _test_audit_hook
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/concurrent/futures/_base.py", line 146 in __init__
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/concurrent/futures/_base.py", line 288 in wait
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/test/test_concurrent_futures.py", line 565 in test_pending_calls_race
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/unittest/case.py", line 628 in _callTestMethod
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/unittest/case.py", line 671 in run
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/unittest/case.py", line 731 in __call__
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/unittest/suite.py", line 122 in run
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/unittest/suite.py", line 84 in __call__
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/unittest/suite.py", line 122 in run
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/unittest/suite.py", line 84 in __call__
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/unittest/suite.py", line 122 in run
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/unittest/suite.py", line 84 in __call__
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/unittest/runner.py", line 176 in run
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/test/support/__init__.py", line 1984 in _run_suite
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/test/support/__init__.py", line 2080 in run_unittest
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/test/test_concurrent_futures.py", line 1300 in test_main
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/test/support/__init__.py", line 2212 in decorator
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/test/libregrtest/runtest.py", line 228 in _runtest_inner2
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/test/libregrtest/runtest.py", line 264 in _runtest_inner
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/test/libregrtest/runtest.py", line 135 in _runtest
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/test/libregrtest/runtest.py", line 187 in runtest
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/test/libregrtest/runtest_mp.py", line 66 in run_tests_worker
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/test/libregrtest/main.py", line 611 in _main
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/test/libregrtest/main.py", line 588 in main
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/test/libregrtest/main.py", line 663 in main
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/test/regrtest.py", line 46 in _main
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/test/regrtest.py", line 50 in <module>
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/runpy.py", line 85 in _run_code
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/runpy.py", line 192 in _run_module_as_main
0:26:51 load avg: 5.14 [259/423/4] test_functools crashed (Exit code 1) -- running: test_io (1 min 11 sec), test_threading (5 min 56 sec)
Timeout (0:20:00)!
Thread 0x000070000666f000 (most recent call first):
Thread 0x00007fff96f1a380 (most recent call first):
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/threading.py", line 847 in start
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/test/support/__init__.py", line 2299 in start_threads
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/contextlib.py", line 113 in __enter__
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/test/test_functools.py", line 2464 in test_threaded
...
0:47:26 load avg: 1.71 [423/423/6] test_multiprocessing_forkserver crashed (Exit code 1)
Timeout (0:20:00)!
Thread 0x0000700009471000 (most recent call first):
Thread 0x00007fff96f1a380 (most recent call first):
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/threading.py", line 246 in __enter__
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/threading.py", line 555 in wait
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/threading.py", line 852 in start
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/test/support/__init__.py", line 2299 in start_threads
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/contextlib.py", line 113 in __enter__
 File "/Users/vsts/agent/2.152.1/work/1/s/Lib/test/_test_multiprocessing.py", line 4138 in test_thread_safety
...
6 tests failed:
 test_concurrent_futures test_functools test_importlib
 test_multiprocessing_forkserver test_multiprocessing_spawn
 test_threading
--
pythoninfo:
2019年06月12日T02:45:41.9759180Z Py_DEBUG: Yes (sys.gettotalrefcount() present)
2019年06月12日T02:45:41.9784500Z datetime.datetime.now: 2019年06月12日 02:45:40.875495
2019年06月12日T02:45:41.9806170Z platform.architecture: 64bit
2019年06月12日T02:45:41.9797810Z os.environ[MACOSX_DEPLOYMENT_TARGET]: 10.13
2019年06月12日T02:45:41.9806990Z platform.platform: macOS-10.13.6-x86_64-i386-64bit
2019年06月12日T02:45:41.9801150Z os.login: _spotlight
2019年06月12日T02:45:41.9817220Z socket.hostname: Mac-483.local
2019年06月12日T02:45:41.9835490Z sys.version: 3.8.0b1+ (remotes/pull/14000/merge:ce71235d6, Jun 12 2019, 02:45:09) [Clang 10.0.0 (clang-10001145.5)]
msg345326 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019年06月12日 11:57
os.uname: posix.uname_result(sysname='Darwin', nodename='Mac-483.local', release='17.7.0', version='Darwin Kernel Version 17.7.0: Wed Apr 24 21:17:24 PDT 2019; root:xnu-45707145~1/RELEASE_X86_64', machine='x86_64')
msg345458 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2019年06月13日 06:21
FWIW, I tried reproducing with 3.8 at 996e52623af3854552d41751e0c2522bc0a7e84f (the PR 14000 checkin) on both a current 10.14.5 Mojave system and on a 10.13.6 High Sierra system (the version used in the failed Azure test) and did not see any unusual failures. I don't recall seeing a timeout like in test_concurrent_futures, at least recently!, but, if it is due to some race condition, there might be a more significant difference, like number of CPUs available, that might precipitate the failure. I'll leave it up to you, Victor, on whether or how long to leave this issue open but I don't see that there is anything practical to do until it can be reproduced.
msg345936 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2019年06月17日 23:00
I'm still seeing this, maybe 1 in 20 builds, so it's semi-random. A new deadlock, maybe?
msg351143 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019年09月04日 16:35
> I'm still seeing this, maybe 1 in 20 builds, so it's semi-random. A new deadlock, maybe?
It seems like only the jobs on Azure are killed by timeout. The jobs on macOS buildbots look fine. Maybe macOS on Azure is running slower and we should just increase the timeout?
The bug still occurs:
https://github.com/python/cpython/pull/15651
https://dev.azure.com/Python/cpython/_build/results?buildId=49786&view=results
0:49:27 load avg: 1.41 [419/419/6] test_threading crashed (Exit code 1)
6 tests failed:
 test_concurrent_futures test_functools test_importlib
 test_multiprocessing_forkserver test_multiprocessing_spawn
 test_threading
The whole job was killed after 57 minutes.
msg351146 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2019年09月04日 16:59
> Maybe macOS on Azure is running slower and we should just increase the timeout?
(Aside, why don't the macOS buildbots have a tag saying that? Took me ages to find them...)
I doubt it's running 6-7x slower. More likely something is causing one of the workers to crash at a point where the lock remains held instead of being released (I saw this at work the other week in a slightly different context, but same symptoms).
Could os._exit() at the wrong time cause it?
It also looks like Azure is running tests with 4 processes, but the buildbot (at least the one I'm looking at) is only using 2. So perhaps there are more conflicts from that?
msg351147 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2019年09月04日 17:20
Yeah, I agree that increasing the timeout shouldn't be the answer here. I still have never seen failure modes like this when running my own tests. The idea about CPUs is one worth pursuing although I usually run with -j3. Also I wonder how much memory the VM is configured with. Any way we can find out number of cpus and memory easily?
msg351194 - (view) Author: Kyle Stanley (aeros) * (Python committer) Date: 2019年09月05日 10:56
It looks like the Azure macOS tests timed out again in the recently opened PR-15688. Specifically, for test_multiprocessing_spawn and test_functools (both of which also timed out in PR-15651, which Victor mentioned earlier):
0:26:41 load avg: 2.89 [418/419/1] test_multiprocessing_spawn crashed (Exit code 1) -- running: test_functools (14 min 38 sec)
Timeout (0:20:00)!
0:32:03 load avg: 3.17 [419/419/2] test_functools crashed (Exit code 1)
Timeout (0:20:00)!
Build logs: https://dev.azure.com/Python/cpython/_build/results?buildId=49868&view=logs&j=18d1a34d-6940-5fc1-f55b-405e2fba32b1
As far as I can tell, PR-15688 should have had no direct influence on test_multiprocessing_spawn or test_functools.
> Maybe macOS on Azure is running slower and we should just increase the timeout?
> Yeah, I agree that increasing the timeout shouldn't be the answer here. 
Since this seems to be affecting multiple PRs, would it be appropriate to attempt to increase the timeout duration as a temporary fix and open an issue for further investigation on the cause of the intermittent slowdown on those tests?
msg351528 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2019年09月09日 16:43
I suspect this code is a repro - it certainly locks up the host process reliably enough.
Perhaps if we unblock multiprocessing in the context of a crashed worker then it'll show what the underlying errors are?
import os
from multiprocessing import Pool
def f(x):
 os._exit(0)
 return "success"
if __name__ == '__main__':
 with Pool(1) as p:
 print(p.map(f, [1]))
msg351588 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019年09月10日 07:52
Steve: Would you mind to open a separated issue for the multiprocessing bug? multiprocessing is supposed to handle this case.
msg351595 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2019年09月10日 08:58
Filed as issue38084
I recommend not investigating this issue any further until that one is resolved.
msg353665 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019年10月01日 10:48
It seems like macOS job pass again on Azure Pipelines. I close the issue.
History
Date User Action Args
2022年04月11日 14:59:16adminsetgithub: 81426
2019年10月01日 10:48:44vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg353665

stage: resolved
2019年09月10日 08:58:52steve.dowersetmessages: + msg351595
2019年09月10日 07:52:00vstinnersetmessages: + msg351588
2019年09月09日 16:43:51steve.dowersetnosy: + pitrou, davin
messages: + msg351528
2019年09月05日 10:56:35aerossetnosy: + aeros
messages: + msg351194
2019年09月04日 17:20:30ned.deilysetmessages: + msg351147
2019年09月04日 16:59:18steve.dowersetmessages: + msg351146
2019年09月04日 16:35:42vstinnersetmessages: + msg351143
2019年06月17日 23:00:25steve.dowersetnosy: + steve.dower
messages: + msg345936
2019年06月13日 06:21:29ned.deilysetmessages: + msg345458
title: Azure Pipeline: sick macOS job on Python 3.8? -> Azure Pipeline 3.8 CI: multiple tests hung and timed out on macOS 10.13
2019年06月12日 11:57:58vstinnersetmessages: + msg345326
2019年06月12日 11:57:02vstinnercreate

AltStyle によって変換されたページ (->オリジナル) /