homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: test_3_join_in_forked_from_thread() of test_threading hangs 1 hour on "x86 Ubuntu Shared 3.x"
Type: Stage: resolved
Components: Tests Versions: Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: gps, neologix, python-dev, vstinner
Priority: normal Keywords: patch

Created on 2011年04月18日 20:17 by vstinner, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
test_threading_fork.diff neologix, 2011年06月26日 20:55 review
debug_stuck.diff neologix, 2011年12月10日 12:03 review
threading_reinit_lock.diff neologix, 2011年12月18日 16:05 review
Messages (19)
msg133990 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年04月18日 20:17
test_3_join_in_forked_from_thread() of test_threading failed on "x86 Ubuntu Shared 3.x" buildbot:
-----------------------------------
[201/354] test_threading
[41179 refs]
[40407 refs]
[40407 refs]
[40407 refs]
Timeout (1:00:00)!
Thread 0x404218c0:
 File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/subprocess.py", line 466 in _eintr_retry_call
 File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/subprocess.py", line 1486 in _try_wait
 File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/subprocess.py", line 1528 in wait
 File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/test_threading.py", line 455 in _run_and_join
 File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/test_threading.py", line 518 in test_3_join_in_forked_from_thread
 File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/case.py", line 387 in _executeTestPart
 File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/case.py", line 442 in run
 File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/case.py", line 494 in __call__
 File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/suite.py", line 105 in run
 File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/suite.py", line 67 in __call__
 File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/suite.py", line 105 in run
 File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/suite.py", line 67 in __call__
 File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/support.py", line 1078 in run
 File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/support.py", line 1166 in _run_suite
 File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/support.py", line 1192 in run_unittest
 File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/test_threading.py", line 728 in test_main
 File "./Lib/test/regrtest.py", line 1041 in runtest_inner
 File "./Lib/test/regrtest.py", line 835 in runtest
 File "./Lib/test/regrtest.py", line 659 in main
 File "./Lib/test/regrtest.py", line 1619 in <module>
make: *** [buildbottest] Error 1
program finished with exit code 2
elapsedTime=4426.776675
http://www.python.org/dev/buildbot/all/builders/x86%20Ubuntu%20Shared%203.x/builds/3577/steps/test/logs/stdio
-----------------------------------
Code of the test:
----------------------------------
 def _run_and_join(self, script):
 script = """if 1:
 import sys, os, time, threading
 # a thread, which waits for the main program to terminate
 def joiningfunc(mainthread):
 mainthread.join()
 print('end of thread')
 # stdout is fully buffered because not a tty, we have to flush
 # before exit.
 sys.stdout.flush()
 \n""" + script
 p = subprocess.Popen([sys.executable, "-c", script], stdout=subprocess.PIPE)
 rc = p.wait() <~~~ HANG HERE ~~~~
 data = p.stdout.read().decode().replace('\r', '')
 p.stdout.close()
 self.assertEqual(data, "end of main\nend of thread\n")
 self.assertFalse(rc == 2, "interpreter was blocked")
 self.assertTrue(rc == 0, "Unexpected error")
 @unittest.skipUnless(hasattr(os, 'fork'), "needs os.fork()")
 def test_3_join_in_forked_from_thread(self):
 # Like the test above, but fork() was called from a worker thread
 # In the forked process, the main Thread object must be marked as stopped.
 # Skip platforms with known problems forking from a worker thread.
 # See http://bugs.python.org/issue3863.
 if sys.platform in ('freebsd4', 'freebsd5', 'freebsd6', 'netbsd5',
 'os2emx'):
 raise unittest.SkipTest('due to known OS bugs on ' + sys.platform)
 script = """if 1:
 main_thread = threading.current_thread()
 def worker():
 childpid = os.fork()
 if childpid != 0:
 os.waitpid(childpid, 0)
 sys.exit(0)
 t = threading.Thread(target=joiningfunc,
 args=(main_thread,))
 print('end of main')
 t.start()
 t.join() # Should not block: main_thread is already stopped
 w = threading.Thread(target=worker)
 w.start()
 """
 self._run_and_join(script)
----------------------------------
msg139076 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011年06月25日 14:51
test_2_join_in_forked_process fails on FreeBSD 6.4 buildbot.
http://www.python.org/dev/buildbot/all/builders/x86 FreeBSD 6.4 3.x/builds/1606/steps/test/logs/stdio
"""
======================================================================
FAIL: test_2_join_in_forked_process (test.test_threading.ThreadJoinOnShutdown)
----------------------------------------------------------------------
Traceback (most recent call last):
 File "/usr/home/db3l/buildarea/3.x.bolen-freebsd/build/Lib/test/test_threading.py", line 464, in test_2_join_in_forked_process
 self._run_and_join(script)
 File "/usr/home/db3l/buildarea/3.x.bolen-freebsd/build/Lib/test/test_threading.py", line 436, in _run_and_join
 self.assertEqual(data, "end of main\nend of thread\n")
AssertionError: '' != 'end of main\nend of thread\n'
+ end of main
+ end of thread
"""
I think it's the same problem as issue #12316: in the child process, even calling pthread_create can segfault/abort on FreeBSD6 (async-safe blahblah...).
Tests creating a thread from a fork()ed process should be skipped on FreeBSD6.
Patch attached, along with some refactoring to use the skipIf idiom.
As for test_3_join_in_forked_from_thread, well, it could be more or less the same problem. We're really doing something prohibited by POSIX, so things might break in unexpected ways. For example, calling pthread_condition_destroy from the child process can deadlock (see http://bugs.python.org/issue6721#msg136047).
Victor: to debug this kind of problem, it would be great if faulthandler could also dump tracebacks of children processes. Do you mind if I create a new issue?
msg139121 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年06月25日 21:50
> Victor: to debug this kind of problem, it would be great
> if faulthandler could also dump tracebacks of children processes.
> Do you mind if I create a new issue?
Please open a new issue.
msg139129 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年06月25日 23:23
+ @unittest.skipIf(sys.platform in ('freebsd4', 'freebsd5', 'freebsd6',
+ 'netbsd5', 'os2emx'), "due to known OS bug")
This skip gives very few information, and it is duplicated for each function. I would prefer a constant of the "broken OSes" with your following comment attached to the constant:
+ # Between fork() and exec(), only async-safe functions are allowed (issues
+ # #12316 and #11870), and fork() from a worker thread is known to trigger
+ # problems with some operating systems: skip problematic tests on platforms
+ # known to behave badly.
Or split the test case into two testcases: one using fork and skipped on broken platforms, one not using fork?
---
> As for test_3_join_in_forked_from_thread, well, it could be more
> or less the same problem. We're really doing something prohibited
> by POSIX, so things might break in unexpected ways.
If the creation of a thread after a fork is reliable on some systems, we should not deny the creation of new threads after a fork. You can replace "creation of new threads" by any other non async-safe function in my previous sentence. Therefore I agree that the good answer to this issue is to skip the test on "broken systems" (or should we call them "POSIX compliant systems?" :-)).
msg139131 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011年06月25日 23:40
> This skip gives very few information, and it is duplicated for each 
> function. I would prefer a constant of the "broken OSes" with your 
> following comment attached to the constant:
Ok, I'll try to write something along those lines.
> If the creation of a thread after a fork is reliable on some systems,
> we should not deny the creation of new threads after a fork.
Well, the problem is that it is not reliable on any platform, but happens to work "most of the time" on some platforms ;-)
msg139219 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011年06月26日 20:55
Patch attached.
msg139337 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年06月27日 23:14
Your patch is linux3 compliant, go ahead!
msg139574 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011年07月01日 12:53
New changeset 0ed5e6ff10f8 by Victor Stinner in branch '3.2':
Issue #11870: Skip test_threading.test_2_join_in_forked_process() on platforms
http://hg.python.org/cpython/rev/0ed5e6ff10f8
New changeset f43dee86fffd by Victor Stinner in branch 'default':
(merge 3.2) Issue #11870: Skip test_threading.test_2_join_in_forked_process()
http://hg.python.org/cpython/rev/f43dee86fffd 
msg139576 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011年07月01日 13:05
New changeset ff36b8cadfd6 by Victor Stinner in branch '2.7':
Issue #11870: Skip test_threading.test_2_join_in_forked_process() on platforms
http://hg.python.org/cpython/rev/ff36b8cadfd6 
msg139583 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年07月01日 14:06
The initial problem was test_3_join_in_forked_from_thread() and the hangs does still happen:
[324/356] test_threading
Timeout (1:00:00)!
Thread 0x404248c0:
 File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/subprocess.py", line 1498 in _communicate_with_poll
 File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/subprocess.py", line 1423 in _communicate
 File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/subprocess.py", line 836 in communicate
 File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/script_helper.py", line 32 in _assert_python
 File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/script_helper.py", line 50 in assert_python_ok
 File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/test_threading.py", line 434 in _run_and_join
 File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/test_threading.py", line 493 in test_3_join_in_forked_from_thread
 File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/case.py", line 407 in _executeTestPart
 File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/case.py", line 462 in run
 File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/case.py", line 514 in __call__
 File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/suite.py", line 105 in run
 File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/suite.py", line 67 in __call__
 File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/suite.py", line 105 in run
 File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/suite.py", line 67 in __call__
 File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/runner.py", line 168 in run
 File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/support.py", line 1259 in _run_suite
 File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/support.py", line 1285 in run_unittest
 File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/test_threading.py", line 774 in test_main
 File "./Lib/test/regrtest.py", line 1070 in runtest_inner
 File "./Lib/test/regrtest.py", line 861 in runtest
 File "./Lib/test/regrtest.py", line 669 in main
 File "./Lib/test/regrtest.py", line 1648 in <module>
http://www.python.org/dev/buildbot/all/builders/x86%20Ubuntu%20Shared%203.x/builds/4081/steps/test/logs/stdio
(neologix's patch doesn't change anything for x86 Ubuntu Shared 3.x buildbot, which is a Linux, not a FreeBSD)
I don't know why it only hangs on this Linux buildbot. It's maybe an old Linux kernel, an old GNU libc version, or something like that?
msg139602 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011年07月01日 20:48
> The initial problem was test_3_join_in_forked_from_thread() and the hangs does still happen:
>
Yes, the patch was there to fix test_2_join_in_forked_from_thread.
> [324/356] test_threading
> Timeout (1:00:00)!
> Thread 0x404248c0:
> File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/subprocess.py", line 1498 in _communicate_with_poll
> File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/subprocess.py", line 1423 in _communicate
> File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/subprocess.py", line 836 in communicate
> File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/script_helper.py", line 32 in _assert_python
> File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/script_helper.py", line 50 in assert_python_ok
> File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/test_threading.py", line 434 in _run_and_join
> File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/test_threading.py", line 493 in test_3_join_in_forked_from_thread
> File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/case.py", line 407 in _executeTestPart
> File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/case.py", line 462 in run
> File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/case.py", line 514 in __call__
> File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/suite.py", line 105 in run
> File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/suite.py", line 67 in __call__
> File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/suite.py", line 105 in run
> File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/suite.py", line 67 in __call__
> File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/runner.py", line 168 in run
> File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/support.py", line 1259 in _run_suite
> File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/support.py", line 1285 in run_unittest
> File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/test_threading.py", line 774 in test_main
> File "./Lib/test/regrtest.py", line 1070 in runtest_inner
> File "./Lib/test/regrtest.py", line 861 in runtest
> File "./Lib/test/regrtest.py", line 669 in main
> File "./Lib/test/regrtest.py", line 1648 in <module>
>
> http://www.python.org/dev/buildbot/all/builders/x86%20Ubuntu%20Shared%203.x/builds/4081/steps/test/logs/stdio
>
This means that the subprocess hangs, but without a backtrace of the
child process (issue #12413), it's hard to analyse it further.
I've had a look at the code, and couldn't find anything obviously
wrong; Gregory's patches to sanitize threading's lock should have
fixed this. I also tried running this test in a loop for 48 hours but
couldn't reproduce it.
One possible explanation (but it's just a wild guess) is that with a
certain kernel/libc configuration, the lock deallocation code can
deadlock (I've seen pthread_cond_destroy() block, this could maybe
happen with sem_destroy()).
So I suggest to try to come up with a solution to #12413, which should
help analyzing this - and similar - issues.
msg148698 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年12月01日 00:43
> Gregory's patches to sanitize threading's lock should have fixed this
The subprocess hang still occurs something, it just happened:
http://www.python.org/dev/buildbot/all/builders/x86%20Ubuntu%20Shared%203.x/builds/4898/steps/test/logs/stdio
[110/363] test_threading
Timeout (1:00:00)!
Thread 0x404888c0:
 File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/subprocess.py", line 1513 in _communicate_with_poll
 File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/subprocess.py", line 1438 in _communicate
 File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/subprocess.py", line 850 in communicate
 File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/script_helper.py", line 32 in _assert_python
 File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/script_helper.py", line 50 in assert_python_ok
 File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/test_threading.py", line 441 in _run_and_join
 File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/test_threading.py", line 497 in test_3_join_in_forked_from_thread
...
msg148993 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年12月07日 21:35
I removed my two previous message (msg148991 and msg148992), there were unrelated to this issue: the test hangs in debug mode in the IO module because of a deadleak in IO related to the fork...
msg148996 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011年12月07日 22:04
To debug this, we should probably make use of faulthandler (but not
dump_tracebacks_later, since it creates a new thread). The way to go
could be to make the parent process send a fatal signal to the child
process if the latter takes too long to complete.
msg149148 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011年12月10日 12:02
Here's a patch to help nail this down.
msg149773 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011年12月18日 16:05
Victor, could you try the patch attached?
msg149784 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011年12月18日 17:47
New changeset 775319cebaa3 by Charles-François Natali in branch '2.7':
Issue #11870: threading: Properly reinitialize threads internal locks and
http://hg.python.org/cpython/rev/775319cebaa3
New changeset de962ec0faaa by Charles-François Natali in branch '3.2':
Issue #11870: threading: Properly reinitialize threads internal locks and
http://hg.python.org/cpython/rev/de962ec0faaa
New changeset cec0d77d01c4 by Charles-François Natali in branch 'default':
Issue #11870: threading: Properly reinitialize threads internal locks and
http://hg.python.org/cpython/rev/cec0d77d01c4 
msg149788 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011年12月18日 18:12
Should be fixed now.
msg180158 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013年01月17日 22:37
New changeset cd54b48946ca by Stefan Krah in branch '3.3':
Issue #11870: Skip test_3_join_in_forked_from_thread() on HP-UX.
http://hg.python.org/cpython/rev/cd54b48946ca 
History
Date User Action Args
2022年04月11日 14:57:16adminsetgithub: 56079
2013年01月17日 22:37:26python-devsetmessages: + msg180158
2011年12月18日 18:12:47neologixsetstatus: open -> closed
resolution: fixed
messages: + msg149788

stage: resolved
2011年12月18日 17:47:32python-devsetmessages: + msg149784
2011年12月18日 16:05:27neologixsetfiles: + threading_reinit_lock.diff

messages: + msg149773
2011年12月10日 12:03:00neologixsetfiles: + debug_stuck.diff

messages: + msg149148
2011年12月07日 22:04:19neologixsetmessages: + msg148996
2011年12月07日 21:35:34vstinnersetmessages: + msg148993
2011年12月07日 21:34:25vstinnersetmessages: - msg148992
2011年12月07日 21:34:23vstinnersetmessages: - msg148991
2011年12月07日 21:26:20vstinnersetmessages: + msg148992
2011年12月07日 21:19:07vstinnersetmessages: + msg148991
2011年12月01日 00:43:31vstinnersetmessages: + msg148698
2011年07月01日 20:48:47neologixsetmessages: + msg139602
2011年07月01日 14:06:49vstinnersetmessages: + msg139583
2011年07月01日 13:05:01python-devsetmessages: + msg139576
2011年07月01日 12:53:42python-devsetnosy: + python-dev
messages: + msg139574
2011年06月27日 23:14:10vstinnersetmessages: + msg139337
2011年06月26日 20:55:31neologixsetfiles: - test_threading_fork.diff
2011年06月26日 20:55:08neologixsetfiles: + test_threading_fork.diff

messages: + msg139219
2011年06月25日 23:40:27neologixsetmessages: + msg139131
2011年06月25日 23:23:05vstinnersetmessages: + msg139129
2011年06月25日 21:50:24vstinnersetmessages: + msg139121
2011年06月25日 14:51:09neologixsetfiles: + test_threading_fork.diff

nosy: + neologix
messages: + msg139076

keywords: + patch
2011年04月19日 00:43:03pitrousetnosy: + gps
2011年04月18日 20:17:59vstinnercreate

AltStyle によって変換されたページ (->オリジナル) /