libjava test suite keeps getting stuck

Zack Weinberg zackw@Stanford.EDU
Thu Apr 19 18:14:00 GMT 2001


On Thu, Apr 19, 2001 at 10:42:17AM -0700, H . J . Lu wrote:
> On Thu, Apr 19, 2001 at 10:28:40AM -0700, Zack Weinberg wrote:
> > On Thu, Apr 19, 2001 at 10:12:08AM -0700, Zack Weinberg wrote:
> > This is likely to be a long-standing bug in the Linux thread library.
> > exit(3) from the main thread does not reliably terminate all other
> > threads. libjava may be assuming that it does. The bug is more or
> > less unfixable, and needs to be worked around.
>> Can you provide a small testcase to show the Linux thread library
> bug?

Upon further investigation, there may not be a thread library bug. I
thought I knew what was going on but I am no longer sure.
Initially, it appeared that the thread manager could get all the way
through pthread_handle_exit without actually killing all the other
threads. I am still not certain that it can't.
There's a libtool bug and at least one libjava bug confusing the
issue. The libtool bug: it links libgcc_s and libc ahead of
libpthread, which obviously will cause problems, because libpthread
won't get to override libc symbols. I think this is the transitive
dependencies issue that was being discussed the other week. My
suggested fix for this is to stop using libtool. :-)
The immediate libjava bug: if I take the test case I'm currently
looking at (Divide_1, from the libjava test suite) and get rid of the
explicit -lgcc_s -lc -lgcc_s, then the program prints
-2147483648
-2147483648
0
0
0
Exception in thread "main" java.lang.ArithmeticException: / by zero
 at 0x401529bc: _Jv_ThrowSignal (../.libs/libgcj.so.2)
 at 0x40152a52: _Jv_ThrowSignal (../.libs/libgcj.so.2)
 at 0x08049191: Divide_1::probe() (/home/zack/src/gcc_vanilla/libjava/testsuite/libjava.lang/Divide_1.java:52)
 at 0x08049813: Divide_1::main(JArray<java::lang::String*>*) (/home/zack/src/gcc_vanilla/libjava/testsuite/libjava.lang/Divide_1.java:103)
 at 0x4016902b: gnu.gcj.runtime.FirstThread.run() (../.libs/libgcj.so.2)
 at 0x40173fb1: java.lang.Thread.run_(java.lang.Object) (../.libs/libgcj.so.2)
 at 0x4028e125: _Jv_ThreadSetPriority(_Jv_Thread_t, int) (../.libs/libgcj.so.2)
 at 0x4046262c: GC_start_routine (../../boehm-gc/.libs/libgcjgc.so.1)
 at 0x4047c065: pthread_detach (/lib/libpthread.so.0)
 at 0x40591a4a: __clone (/lib/libc.so.6)
That exception should have been caught inside Divide_1::probe. All
the operations which divide by zero (which this program does,
deliberately) are wrapped in try/catch clauses.
After it prints its traceback, the main thread goes into an infinite
loop. [Note that libjava's "main" thread is NOT the POSIX-threads
main thread. libjava's main thread is dead at this point.] The
manager is fine and would notice when the main thread terminated, but
it doesn't. In the infinite loop, the main thread is receiving
hundreds of SIGSEGVs and allocating memory. Here's where we are,
according to gdb:
#0 0x404c0b57 in extract_cie_info (cie=0x4035a15c, context=0xbfee92ac, 
 fs=0xbfee91ec) at ../../../gcc_vanilla/gcc/unwind-dw2.c:265
#1 0x404c1b8c in uw_frame_state_for (context=0xbfee92ac, fs=0xbfee91ec)
 at ../../../gcc_vanilla/gcc/unwind-dw2.c:966
#2 0x404c21b6 in _Unwind_RaiseException (exc=0x810dca0)
 at ../../../gcc_vanilla/gcc/unwind-dw2.c:1076
#3 0x4015bc99 in _Jv_Throw (value=0x8062fe0)
 at ../../../../gcc_vanilla/libjava/exception.cc:104
#4 0x401529c4 in _Jv_ThrowSignal ()
 at ../../../../gcc_vanilla/libjava/prims.cc:112
#5 0x401529fd in _Z10catch_segvi ()
 at ../../../../gcc_vanilla/libjava/prims.cc:121
#6 0x404c1c4c in uw_frame_state_for (context=0xbfee95f4, fs=0xbfee9534)
 at ../../../gcc_vanilla/gcc/unwind-dw2.c:966
#7 0x404c21b6 in _Unwind_RaiseException (exc=0x810dce0)
 at ../../../gcc_vanilla/gcc/unwind-dw2.c:1076
#8 0x4015bc99 in _Jv_Throw (value=0x8062fe0)
 at ../../../../gcc_vanilla/libjava/exception.cc:104
#9 0x401529c4 in _Jv_ThrowSignal ()
 at ../../../../gcc_vanilla/libjava/prims.cc:112
#10 0x401529fd in _Z10catch_segvi ()
Yep, that's infinite recursion. I can't single step - "ptrace: no
such process" - probably more gdb threads bugs - and gdb's confused
about where we actually are inside the library, too, it keeps thinking
the PC's pointing at an abort() when it manifestly isn't. Anyway, the
trouble seems to be that the exception handler tables have gotten
clobbered. We try to convert a SIGSEGV into a NullPointerException,
we fault again, we catch the fault, we try to throw
NullPointerException again, ... I cut the stack limit down to 64k,
and after a short time the runaway thread got a genuinely fatal
SIGSEGV (huh, I thought this is what SIGSTKFLT is for).
Immediate band-aid: catch_segv() should somehow flag itself as
currently executing, and re-raise the signal if it is called
recursively. That will at least prevent the infinite recursion. I
think I know how to do this.
zw


More information about the Java mailing list

AltStyle によって変換されたページ (->オリジナル) /