This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2017年06月21日 22:20 by vstinner, last changed 2022年04月11日 14:58 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| notify_bug.py | vstinner, 2017年06月21日 23:17 | |||
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 2320 | merged | vstinner, 2017年06月22日 00:01 | |
| PR 2334 | merged | serhiy.storchaka, 2017年06月22日 19:57 | |
| PR 2351 | merged | serhiy.storchaka, 2017年06月23日 10:40 | |
| PR 2352 | merged | serhiy.storchaka, 2017年06月23日 10:41 | |
| PR 2353 | merged | serhiy.storchaka, 2017年06月23日 10:45 | |
| Messages (18) | |||
|---|---|---|---|
| msg296591 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2017年06月21日 22:20 | |
Example where the test hangs after at the 48th run: ---- [haypo@freebsd ~/prog/python/2.7]$ ./python -m test -F -m test.test_threading.ConditionTests.test_notify -v test_threading (...) 0:00:10 [ 47] test_threading test_notify (test.test_threading.ConditionTests) ... ok (...) 0:00:10 [ 48] test_threading test_notify (test.test_threading.ConditionTests) ... ---- It looks like a timing issue / race condition since the likehood of the bug seems to depend on the system load. Traceback using my watchdog: https://github.com/python/cpython/pull/2317 --- 0:00:57 [285] test_threading test_notify (test.test_threading.ConditionTests) ... *** STACKTRACE - START *** # ThreadID: 34391913984 File: "/usr/home/haypo/prog/python/2.7/Lib/test/lock_tests.py", line 37, in task f() File: "/usr/home/haypo/prog/python/2.7/Lib/test/lock_tests.py", line 360, in f cond.wait() File: "/usr/home/haypo/prog/python/2.7/Lib/threading.py", line 340, in wait waiter.acquire() # ThreadID: 34392086528 File: "/usr/home/haypo/prog/python/2.7/Lib/test/lock_tests.py", line 37, in task f() File: "/usr/home/haypo/prog/python/2.7/Lib/test/lock_tests.py", line 360, in f cond.wait() File: "/usr/home/haypo/prog/python/2.7/Lib/threading.py", line 340, in wait waiter.acquire() # ThreadID: 34384994304 File: "/usr/home/haypo/prog/python/2.7/Lib/runpy.py", line 174, in _run_module_as_main "__main__", fname, loader, pkg_name) File: "/usr/home/haypo/prog/python/2.7/Lib/runpy.py", line 72, in _run_code exec code in run_globals File: "/usr/home/haypo/prog/python/2.7/Lib/test/__main__.py", line 3, in <module> regrtest.main_in_temp_cwd() File: "/usr/home/haypo/prog/python/2.7/Lib/test/regrtest.py", line 2030, in main_in_temp_cwd main() File: "/usr/home/haypo/prog/python/2.7/Lib/test/regrtest.py", line 812, in main result = local_runtest() File: "/usr/home/haypo/prog/python/2.7/Lib/test/regrtest.py", line 798, in local_runtest testdir=testdir) File: "/usr/home/haypo/prog/python/2.7/Lib/test/regrtest.py", line 1007, in runtest return runtest_inner(test, verbose, quiet, huntrleaks, pgo, testdir) File: "/usr/home/haypo/prog/python/2.7/Lib/test/regrtest.py", line 1196, in runtest_inner indirect_test() File: "/usr/home/haypo/prog/python/2.7/Lib/test/test_threading.py", line 929, in test_main ThreadingExceptionTests, File: "/usr/home/haypo/prog/python/2.7/Lib/test/support/__init__.py", line 1571, in run_unittest _run_suite(suite) File: "/usr/home/haypo/prog/python/2.7/Lib/test/support/__init__.py", line 1530, in _run_suite result = runner.run(suite) File: "/usr/home/haypo/prog/python/2.7/Lib/unittest/runner.py", line 151, in run test(result) File: "/usr/home/haypo/prog/python/2.7/Lib/unittest/suite.py", line 70, in __call__ return self.run(*args, **kwds) File: "/usr/home/haypo/prog/python/2.7/Lib/unittest/suite.py", line 108, in run test(result) File: "/usr/home/haypo/prog/python/2.7/Lib/unittest/suite.py", line 70, in __call__ return self.run(*args, **kwds) File: "/usr/home/haypo/prog/python/2.7/Lib/unittest/suite.py", line 108, in run test(result) File: "/usr/home/haypo/prog/python/2.7/Lib/unittest/case.py", line 393, in __call__ return self.run(*args, **kwds) File: "/usr/home/haypo/prog/python/2.7/Lib/unittest/case.py", line 329, in run testMethod() File: "/usr/home/haypo/prog/python/2.7/Lib/test/lock_tests.py", line 401, in test_notify self._check_notify(cond) File: "/usr/home/haypo/prog/python/2.7/Lib/test/lock_tests.py", line 384, in _check_notify _wait() File: "/usr/home/haypo/prog/python/2.7/Lib/test/lock_tests.py", line 16, in _wait time.sleep(0.01) # ThreadID: 34384996864 File: "/usr/home/haypo/prog/python/2.7/Lib/threading.py", line 774, in __bootstrap self.__bootstrap_inner() File: "/usr/home/haypo/prog/python/2.7/Lib/threading.py", line 801, in __bootstrap_inner self.run() File: "/usr/home/haypo/prog/python/2.7/Lib/test/regrtest.py", line 1971, in run self.dump_threads() File: "/usr/home/haypo/prog/python/2.7/Lib/test/regrtest.py", line 1997, in dump_threads self.dump_thread(stack) # ThreadID: 34392083968 File: "/usr/home/haypo/prog/python/2.7/Lib/test/lock_tests.py", line 37, in task f() File: "/usr/home/haypo/prog/python/2.7/Lib/test/lock_tests.py", line 360, in f cond.wait() File: "/usr/home/haypo/prog/python/2.7/Lib/threading.py", line 340, in wait waiter.acquire() # ThreadID: 34392080128 File: "/usr/home/haypo/prog/python/2.7/Lib/test/lock_tests.py", line 37, in task f() File: "/usr/home/haypo/prog/python/2.7/Lib/test/lock_tests.py", line 360, in f cond.wait() File: "/usr/home/haypo/prog/python/2.7/Lib/threading.py", line 340, in wait waiter.acquire() *** STACKTRACE - END *** --- |
|||
| msg296593 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2017年06月21日 22:38 | |
test.test_threading.ConditionTests.test_notify() tests the threading._Condition. threading._Condition uses internally a threading._RLock. threading._RLock uses an internal lock created with thread.allocate_lock(). thread.allocate_lock() calls internally PyThread_allocate_lock(). On my Linux (Fedora 25) and FreeBSD (FreeBSD 11 VM) machines, PyThread_allocate_lock() calls sem_init(): POSIX pthread semaphores. In Python 3, RLock was implemented in C for speed (the new Python 3.0 io module was slow because of RLock performance): see bpo-3001. I know at least one race condition in Python 2.7 RLock: bpo-13697, RLock bug with signals. |
|||
| msg296594 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2017年06月21日 22:40 | |
Oh, I reproduced the bug on Linux as well by running "./python -m test -F -m test.test_threading.ConditionTests.test_notify -v test_threading" 6 times in different terminals: the test hangs in two terminal after 2508 and 2262 runs. |
|||
| msg296598 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2017年06月21日 23:17 | |
notify_bug.py: script close totest.test_threading.ConditionTests.test_notify() but different: * _wait() sleeps 1 nanosecond rather than 10 ms * log many messages into stdout (file descriptor 1) using os.write() It's quite easy to reproduce the bug with this script. |
|||
| msg296601 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2017年06月21日 23:29 | |
See also bpo-8799 "Hang in lib/test/test_threading.py" and the commit 020af2a2bc4708215360a3793b5a1790e15d05dd which added two _wait() calls to ConditionTests._check_notify(). |
|||
| msg296604 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2017年06月21日 23:53 | |
The bug occurs when notify(n) is called while less than n threads are waiting in cond.wait(): when some threads didn't reach cond.wait() yet. The synchronization code is very fragile and depends on time. The issue was worked again in the master banch by adding more _wait() calls... To really fix the race condition, reliable synchronization primitives should be used, like threading.Event. |
|||
| msg296606 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2017年06月22日 00:01 | |
It's getting late here, so I chose to first backport the commit 020af2a2bc4708215360a3793b5a1790e15d05dd to Python 2.7, to reduce the likehood of this bug on 2.7 buildbots. |
|||
| msg296621 - (view) | Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) | Date: 2017年06月22日 11:23 | |
My favorite topic :) You could use threading.Barrier() which is designed to synchronize N threads doing this kind of lock-step processing. The problem is that the Barrier() is implemented using Condition variables, so for unit-testing, Condition Variables, this presents a conundrum... |
|||
| msg296622 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2017年06月22日 11:39 | |
Yes, the whole reason I wrote the tests like this is to avoid depending on Python synchronization primitives inside the tests for Python synchronization primitives. |
|||
| msg296623 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2017年06月22日 11:48 | |
I still didn't see a code, but perhaps we could use a spin-lock. Every child thread increases the global counter, and the main thread waits until the counter equal to the number of child threads (sleeping for short time in a loop) before calling notify(). |
|||
| msg296653 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2017年06月22日 20:05 | |
Actually this approach already is used in this test. PR 2334 uses it to ensure all workers into cond.wait(). |
|||
| msg296695 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2017年06月23日 10:36 | |
New changeset 32cb968a2edd482891c33b6f2ebae10f1d305424 by Serhiy Storchaka in branch 'master': bpo-30727: Fix a race condition in test_threading. (#2334) https://github.com/python/cpython/commit/32cb968a2edd482891c33b6f2ebae10f1d305424 |
|||
| msg296698 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2017年06月23日 10:52 | |
New changeset c1d5345679eaa5fccd719b1c130140eecc8ba4c8 by Serhiy Storchaka in branch '3.6': [3.6] bpo-30727: Fix a race condition in test_threading. (GH-2334) (#2351) https://github.com/python/cpython/commit/c1d5345679eaa5fccd719b1c130140eecc8ba4c8 |
|||
| msg296699 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2017年06月23日 10:52 | |
New changeset e2aec8e691d8acb08373889d9af48a5b1d03b689 by Serhiy Storchaka in branch '3.5': [3.5] bpo-30727: Fix a race condition in test_threading. (GH-2334) (#2352) https://github.com/python/cpython/commit/e2aec8e691d8acb08373889d9af48a5b1d03b689 |
|||
| msg296700 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2017年06月23日 11:13 | |
New changeset 7709b4d57b433ef027a2e7e63b4cab3fc9ad910d by Serhiy Storchaka in branch '2.7': [2.7] bpo-30727: Fix a race condition in test_threading. (GH-2334). (#2353) https://github.com/python/cpython/commit/7709b4d57b433ef027a2e7e63b4cab3fc9ad910d |
|||
| msg296701 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2017年06月23日 11:17 | |
In Python 3 the test was failed if decrease the sleep time in _wait() to 0.001. Patched test no longer fail for any small sleep intervals. |
|||
| msg296706 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2017年06月23日 12:53 | |
> In Python 3 the test was failed if decrease the sleep time in _wait() to 0.001. Patched test no longer fail for any small sleep intervals. I used sleep(1e-9), 1 nanosecond, for my tests :-) I confirm that I'm unable to reproduce the bug on 2.7 and master branches anymore. Well done Serhiy! |
|||
| msg296968 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2017年06月26日 23:04 | |
Hum, my commit was not reported in the issue: commit da6d305b6fcd49ba1224b1fd2131d7648a5be6b9 Author: Victor Stinner <victor.stinner@gmail.com> Date: Thu Jun 22 02:09:40 2017 +0200 bpo-8799: Reduce timing sensitivity of condition test by explicitly (#2320) delaying the main thread so that it doesn't race ahead of the workers. (cherry picked from commit 020af2a2bc4708215360a3793b5a1790e15d05dd) |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:58:48 | admin | set | github: 74912 |
| 2017年06月26日 23:04:03 | vstinner | set | messages: + msg296968 |
| 2017年06月23日 12:53:29 | vstinner | set | messages: + msg296706 |
| 2017年06月23日 11:17:09 | serhiy.storchaka | set | status: open -> closed versions: + Python 3.5, Python 3.6, Python 3.7 messages: + msg296701 resolution: fixed stage: resolved |
| 2017年06月23日 11:13:41 | serhiy.storchaka | set | messages: + msg296700 |
| 2017年06月23日 10:52:24 | serhiy.storchaka | set | messages: + msg296699 |
| 2017年06月23日 10:52:08 | serhiy.storchaka | set | messages: + msg296698 |
| 2017年06月23日 10:45:53 | serhiy.storchaka | set | pull_requests: + pull_request2399 |
| 2017年06月23日 10:41:39 | serhiy.storchaka | set | pull_requests: + pull_request2398 |
| 2017年06月23日 10:40:05 | serhiy.storchaka | set | pull_requests: + pull_request2397 |
| 2017年06月23日 10:36:39 | serhiy.storchaka | set | messages: + msg296695 |
| 2017年06月22日 20:05:48 | serhiy.storchaka | set | messages: + msg296653 |
| 2017年06月22日 19:57:13 | serhiy.storchaka | set | pull_requests: + pull_request2378 |
| 2017年06月22日 11:48:07 | serhiy.storchaka | set | messages: + msg296623 |
| 2017年06月22日 11:39:11 | pitrou | set | messages: + msg296622 |
| 2017年06月22日 11:23:27 | kristjan.jonsson | set | nosy:
+ kristjan.jonsson messages: + msg296621 |
| 2017年06月22日 00:01:22 | vstinner | set | messages:
+ msg296606 pull_requests: + pull_request2372 |
| 2017年06月21日 23:53:38 | vstinner | set | messages: + msg296604 |
| 2017年06月21日 23:29:17 | vstinner | set | messages: + msg296601 |
| 2017年06月21日 23:17:10 | vstinner | set | files:
+ notify_bug.py messages: + msg296598 |
| 2017年06月21日 22:46:56 | vstinner | set | title: [2.7] test_threading.ConditionTests.test_notify() hangs randomly on FreeBSD on Python 2.7 -> [2.7] test_threading.ConditionTests.test_notify() hangs randomly on Python 2.7 |
| 2017年06月21日 22:40:26 | vstinner | set | messages: + msg296594 |
| 2017年06月21日 22:38:56 | vstinner | set | nosy:
+ pitrou, serhiy.storchaka |
| 2017年06月21日 22:38:43 | vstinner | set | messages: + msg296593 |
| 2017年06月21日 22:20:05 | vstinner | create | |