homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: threading._shutdown() race condition: test_threading test_threads_join_2() fails randomly
Type: Stage: resolved
Components: Library (Lib) Versions: Python 3.9, Python 3.8, Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: eric.snow, koobs, kristjan.jonsson, pablogsal, pitrou, vstinner
Priority: normal Keywords: patch

Created on 2019年03月22日 23:18 by vstinner, last changed 2022年04月11日 14:59 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
py_finalize.patch vstinner, 2019年06月10日 23:10
Pull Requests
URL Status Linked Edit
PR 13889 closed vstinner, 2019年06月07日 10:53
PR 13946 closed vstinner, 2019年06月10日 20:45
PR 13948 merged vstinner, 2019年06月10日 22:39
PR 13949 closed vstinner, 2019年06月10日 23:04
PR 14033 closed miss-islington, 2019年06月12日 23:30
PR 14034 closed miss-islington, 2019年06月12日 23:30
PR 14047 merged vstinner, 2019年06月13日 08:58
PR 14050 merged vstinner, 2019年06月13日 10:11
PR 14054 merged miss-islington, 2019年06月13日 11:45
Messages (16)
msg338622 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019年03月22日 23:18
https://buildbot.python.org/all/#/builders/168/builds/801
0:23:17 load avg: 5.00 [334/420/1] test_threading crashed (Exit code -6) -- running: test_tools (8 min 42 sec), test_multiprocessing_spawn (5 min 41 sec), test_zipfile (30 sec 787 ms)
Fatal Python error: Py_EndInterpreter: not the last thread
Current thread 0x0000000800acd000 (most recent call first):
 File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/test/support/__init__.py", line 2778 in run_in_subinterp
 File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/test/test_threading.py", line 917 in test_threads_join_2
 File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/unittest/case.py", line 642 in run
 ...
The test crashed once, but then passed when run again in verbose mode ("Re-running test 'test_threading' in verbose mode").
msg338623 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019年03月22日 23:20
As far as I remember, test_threads_join_2() was already unstable.
I created this issue to try to track if it's a regression or not.
If it's a regression, I would suggest to have a look at Eric Snow's recent commits.
At this point, I simply have no idea if the test fails exactly one in the lifetime of the buildbot worker, or if it started to fail frequently on this FreeBSD buildbot.
msg344611 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019年06月04日 17:36
New failure:
AMD64 FreeBSD CURRENT Shared 3.x:
https://buildbot.python.org/all/#/builders/168/builds/1154
...
test_thread_leak (test.test_threading.PyRLockTests) ... ok
test_timeout (test.test_threading.PyRLockTests) ... ok
test_try_acquire (test.test_threading.PyRLockTests) ... ok
test_try_acquire_contended (test.test_threading.PyRLockTests) ... ok
test_weakref_deleted (test.test_threading.PyRLockTests) ... ok
test_weakref_exists (test.test_threading.PyRLockTests) ... ok
test_with (test.test_threading.PyRLockTests) ... ok
test_acquire (test.test_threading.SemaphoreTests) ... ok
test_acquire_contended (test.test_threading.SemaphoreTests) ... ok
test_acquire_destroy (test.test_threading.SemaphoreTests) ... ok
test_acquire_timeout (test.test_threading.SemaphoreTests) ... ok
test_constructor (test.test_threading.SemaphoreTests) ... ok
test_default_value (test.test_threading.SemaphoreTests) ... ok
test_release_unacquired (test.test_threading.SemaphoreTests) ... ok
test_try_acquire (test.test_threading.SemaphoreTests) ... ok
test_try_acquire_contended (test.test_threading.SemaphoreTests) ... ok
test_with (test.test_threading.SemaphoreTests) ... ok
test_daemon_threads_fatal_error (test.test_threading.SubinterpThreadingTests) ... ok
test_threads_join (test.test_threading.SubinterpThreadingTests) ... ok
Fatal Python error: Py_EndInterpreter: not the last thread
Current thread 0x0000000800ac3000 (most recent call first):
 File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/test/support/__init__.py", line 2794 in run_in_subinterp
 File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/test/test_threading.py", line 923 in test_threads_join_2
test_threads_join_2 (test.test_threading.SubinterpThreadingTests) ... *** Signal 6
Stop.
make: stopped in /usr/home/buildbot/python/3.x.koobs-freebsd-current/build
 File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/unittest/case.py", line 628 in _callTestMethod
 File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/unittest/case.py", line 671 in run
 File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/unittest/case.py", line 731 in __call__
 File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/unittest/suite.py", line 122 in run
 File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/unittest/suite.py", line 84 in __call__
 File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/unittest/suite.py", line 122 in run
 File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/unittest/suite.py", line 84 in __call__
 File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/unittest/suite.py", line 122 in run
 File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/unittest/suite.py", line 84 in __call__
 File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/unittest/runner.py", line 176 in run
 File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/test/support/__init__.py", line 1984 in _run_suite
 File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/test/support/__init__.py", line 2080 in run_unittest
 File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/test/libregrtest/runtest.py", line 203 in _test_module
 File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/test/libregrtest/runtest.py", line 228 in _runtest_inner2
 File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/test/libregrtest/runtest.py", line 264 in _runtest_inner
 File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/test/libregrtest/runtest.py", line 148 in _runtest
 File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/test/libregrtest/runtest.py", line 187 in runtest
 File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/test/libregrtest/main.py", line 291 in rerun_failed_tests
 File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/test/libregrtest/main.py", line 646 in _main
 File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/test/libregrtest/main.py", line 588 in main
 File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/test/libregrtest/main.py", line 663 in main
 File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/test/__main__.py", line 2 in <module>
 File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/runpy.py", line 85 in _run_code
 File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/runpy.py", line 192 in _run_module_as_main
program finished with exit code 1
elapsedTime=3523.834729
msg344613 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019年06月04日 17:39
In the same build:
https://buildbot.python.org/all/#builders/168/builds/1154
0:28:57 load avg: 12.93 [208/423/1] test_threading crashed (Exit code 1) -- running: test_lib2to3 (7 min 9 sec), test_multiprocessing_spawn (1 min 36 sec)
Timeout (0:25:00)!
Thread 0x0000000800ac3000 (most recent call first):
 File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/threading.py", line 1015 in _wait_for_tstate_lock
 File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/threading.py", line 999 in join
 File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/test/test_threading.py", line 1287 in test_interrupt_main_subthread
 (...)
msg344922 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019年06月07日 11:00
I wrote PR 13889: with this change, I can easily reproduce the crash on Linux:
$ ./python -m test test_threading -m test_threads_join_2 -F
Run tests sequentially
0:00:00 load avg: 0.51 [ 1] test_threading
Fatal Python error: Py_EndInterpreter: not the last thread
Current thread 0x00007f84ad74d740 (most recent call first):
 File "/home/vstinner/prog/python/master/Lib/test/support/__init__.py", line 2794 in run_in_subinterp
 File "/home/vstinner/prog/python/master/Lib/test/test_threading.py", line 923 in test_threads_join_2
...
Py_EndInterpreter() calls wait_for_thread_shutdown() to wait until threading._shutdown() completes.
When the assertion fails, threading.enumerate() only contains the main thread: the spawned thread is already gone. But the assertion fails, which means that the Python thread state of the thread (which looks to be completed) is still around.
This unit test comes from bpo-18808:
commit 7b4769937fb612d576b6829c3b834f3dd31752f1
Author: Antoine Pitrou <solipsis@pitrou.net>
Date: Sat Sep 7 23:38:37 2013 +0200
 Issue #18808: Thread.join() now waits for the underlying thread state to be destroyed before returning.
 This prevents unpredictable aborts in Py_EndInterpreter() when some non-daemon threads are still running.
msg345143 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019年06月10日 20:59
Oh. Using PR 13889, I'm able to reproduce the bug up to Python 3.4. Example at commit commit e76cbc781044ee01b059f3702c580e66266b84c5 (tag: v3.4.10):
$ wget 'https://github.com/python/cpython/pull/13889.patch'
$ git apply 13889.patch
$ ./python -m test -F -m test_threads_join_2 test_threading
[ 1] test_threading
[ 2] test_threading
(...)
[ 10] test_threading
[ 11] test_threading
Fatal Python error: Py_EndInterpreter: not the last thread
Current thread 0x00007f418b3280c0 (most recent call first):
 File "/home/vstinner/prog/python/3.4/Lib/test/support/__init__.py", line 2304 in run_in_subinterp
 File "/home/vstinner/prog/python/3.4/Lib/test/test_threading.py", line 877 in test_threads_join_2
 (...)
Aborted (core dumped)
msg345146 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019年06月10日 21:57
threading._shutdown() uses threading.enumerate() which iterations on threading._active.
threading.Thread registers itself into threading._active using its _bootstrap_inner() method. It unregisters itself when _bootstrap_inner() completes, whereas its is_alive() method still returns true: since the underlying native thread still runs and the Python thread state still exists.
_thread._set_sentinel() creates a lock and registers a tstate->on_delete callback to release this lock. It's called by threading.Thread._set_tstate_lock() to set threading.Thread._tstate_lock. This lock is used by threading.Thread.join() to wait until the thread completes.
_thread.start_new_thread() calls the C function t_bootstrap() which ends with:
 tstate->interp->num_threads--;
 PyThreadState_Clear(tstate);
 PyThreadState_DeleteCurrent();
 PyThread_exit_thread();
_PyThreadState_DeleteCurrent() calls tstate->on_delete() which releases threading.Thread._tstate_lock lock.
In test_threads_join_2() test, PyThreadState_Clear() blocks on clearing thread variables: the Sleeper destructor of the Sleeper instance sleeps.
The race condition is that:
* threading._shutdown() rely on threading._alive
* Py_EndInterpreter() rely on the interpreter linked list of Python thread states: interp->tstate_head.
msg345149 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019年06月10日 22:52
Other references to test_threads_join_2() failures:
* bpo-36989: fails once AIX
* bpo-27791: 2 failures in 2016 in "AMD64 FreeBSD 10 Shared 3.x" buildbots (I closed the issue with: "Sadly, such bug is hard to reproduce. I didn't look at buildbots recently, and I don't have access to FreeBSD, so I just close the issue :-/")
* bpo-28084: once failure on "AMD64 FreeBSD 9.x 3.x" buildbot (closed as a duplicate of bpo-27791)
See also bpo-18808: "Thread.join returns before PyThreadState is destroyed" (issue which added the test).
msg345150 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019年06月10日 23:10
test_threading: test_threads_join_2() was added by commit 7b47699 in 2013, but the test failed randomly since it was added. It's just that failures were ignored until I created https://bugs.python.org/issue36402 last March.
In fact, when the test failed randomly on buildbot (with tests run in parallel), it was fine since test_threading was re-run alone and then the test passed. The buildbot build was seen overall as a success. Previous issues were closed (see my previous comment).
The test shows the bug using subinterpreters (Py_EndInterpreter), but the bug also exists in Py_Finalize() which hash the same race condition (it also calls threading._shutdown()). It's just that Py_EndInterpreter() is stricter, it contains this assertion:
 if (tstate != interp->tstate_head || tstate->next != NULL)
 Py_FatalError("Py_EndInterpreter: not the last thread");
Attached py_finalize.patch adds the same assertion to Py_Finalize.
I added test_threading.test_finalization_shutdown() to PR 13948. If you run test_finalization_shutdown() with py_finalize.patch, Py_Finalize() fails with a similar assertion error.
But py_finalize.patch is incompatible with the principle of daemon threads and so cannot be commited.
msg345429 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019年06月12日 23:30
New changeset 468e5fec8a2f534f1685d59da3ca4fad425c38dd by Victor Stinner in branch 'master':
bpo-36402: Fix threading._shutdown() race condition (GH-13948)
https://github.com/python/cpython/commit/468e5fec8a2f534f1685d59da3ca4fad425c38dd
msg345492 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019年06月13日 10:06
New changeset 6f75c873752a16a7ad8f35855b1e29f59d048e84 by Victor Stinner in branch 'master':
tbpo-36402: Fix threading.Thread._stop() (GH-14047)
https://github.com/python/cpython/commit/6f75c873752a16a7ad8f35855b1e29f59d048e84
msg345502 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019年06月13日 11:44
New changeset e40a97a721d46307dfdc2b0322028ccded6eb571 by Victor Stinner in branch '3.8':
[3.8] bpo-36402: Fix threading._shutdown() race condition (GH-13948) (GH-14050)
https://github.com/python/cpython/commit/e40a97a721d46307dfdc2b0322028ccded6eb571
msg345508 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019年06月13日 12:22
New changeset 6eb2878e42152e9c45d7ee5e6f889532d753e67c by Victor Stinner (Miss Islington (bot)) in branch '3.7':
bpo-36402: Fix threading._shutdown() race condition (GH-13948) (GH-14050) (GH-14054)
https://github.com/python/cpython/commit/6eb2878e42152e9c45d7ee5e6f889532d753e67c
msg345509 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019年06月13日 12:27
The bpo-18808 "Thread.join returns before PyThreadState is destroyed" was not fixed in Python 2.7: threading.Thread has no _tstate_lock attribute. I'm not comfortable to backport bpo-18808 "feature" or "bugfix" to Python 2.7, not to backport this change.
Python 2.7 works as it is, and it's going to reach its end of life at the end of the year. I guess that people learnt how to work around Python 2.7 limitation like bpo-18808.
msg345510 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019年06月13日 12:28
Ok, the root issue (threading._shutdown() race condition) has been fixed in Python 3.7, 3.8 and master branches. I close the issue. Thanks for the reviews!
msg349281 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2019年08月09日 11:08
Please note that this fix appears to be the cause of #37788 
History
Date User Action Args
2022年04月11日 14:59:12adminsetgithub: 80583
2020年03月27日 00:26:31vstinnerlinkissue36989 superseder
2019年08月09日 11:08:52kristjan.jonssonsetnosy: + kristjan.jonsson
messages: + msg349281
2019年06月13日 12:28:09vstinnersetstatus: open -> closed
messages: + msg345510

components: + Library (Lib), - Tests
resolution: fixed
stage: patch review -> resolved
2019年06月13日 12:27:13vstinnersetmessages: + msg345509
2019年06月13日 12:22:25vstinnersetmessages: + msg345508
2019年06月13日 11:45:00miss-islingtonsetpull_requests: + pull_request13916
2019年06月13日 11:44:29vstinnersetmessages: + msg345502
2019年06月13日 10:11:05vstinnersetpull_requests: + pull_request13912
2019年06月13日 10:06:30vstinnersetmessages: + msg345492
2019年06月13日 08:58:12vstinnersetpull_requests: + pull_request13909
2019年06月12日 23:31:59vstinnersetversions: + Python 3.7, Python 3.9
2019年06月12日 23:30:37miss-islingtonsetpull_requests: + pull_request13898
2019年06月12日 23:30:28miss-islingtonsetpull_requests: + pull_request13897
2019年06月12日 23:30:20vstinnersetmessages: + msg345429
2019年06月11日 01:06:28koobssetnosy: + koobs
2019年06月10日 23:12:29vstinnersettitle: test_threading: test_threads_join_2() failed with "Fatal Python error: Py_EndInterpreter: not the last thread" on AMD64 FreeBSD CURRENT Shared 3.x -> threading._shutdown() race condition: test_threading test_threads_join_2() fails randomly
2019年06月10日 23:10:50vstinnersetfiles: + py_finalize.patch

messages: + msg345150
2019年06月10日 23:04:33vstinnersetpull_requests: + pull_request13817
2019年06月10日 22:52:15vstinnersetmessages: + msg345149
2019年06月10日 22:39:36vstinnersetpull_requests: + pull_request13816
2019年06月10日 21:57:27vstinnersetmessages: + msg345146
2019年06月10日 20:59:51vstinnersetmessages: + msg345143
2019年06月10日 20:45:58vstinnersetpull_requests: + pull_request13813
2019年06月07日 11:00:36vstinnersetnosy: + pitrou, eric.snow
2019年06月07日 11:00:19vstinnersetmessages: + msg344922
2019年06月07日 10:53:05vstinnersetkeywords: + patch
stage: patch review
pull_requests: + pull_request13763
2019年06月04日 17:39:53vstinnersetmessages: + msg344613
2019年06月04日 17:36:29vstinnersetmessages: + msg344611
2019年03月22日 23:20:28vstinnersetnosy: + pablogsal
messages: + msg338623
2019年03月22日 23:18:24vstinnercreate

AltStyle によって変換されたページ (->オリジナル) /