GC failure w/ THREAD_LOCAL_ALLOC ?
Michael Smith
msmith@spinnakernet.com
Wed Mar 20 10:19:00 GMT 2002
Bryce McKinlay wrote:
> While testing thread local allocation on PowerPC, I ran into a problem
> which is also reproducable on x86. The attached stress-test-case
> GCTest.java will lock up with ~100% reproducability with
> THREAD_LOCAL_ALLOC enabled. It runs fine without THREAD_LOCAL_ALLOC.
>> What I am seeing in the debugger is most threads waiting in
> GC_suspend_handler, but one thread segfaulting in GC_mark_read.
> libjava's segv handler gets called and the collector is re-entered
> during the stack trace, causing the freeze.
I actually ran into this problem in my application 2 months ago (using
gcc version 3.1 20010911 (experimental)), and reported it to Hans. I
couldn't water down my application to create such a simple test case, so
tracking it down was somewhat difficult.
From the stack trace I provided back in January, Hans intially
responded with:
Hans Boehm wrote:
> I'm not terribly worried about the SIGSEGV getting turned into a
> deadlock. Such things seem to be largely unavoidable.
>
> I would like to understand where the SIGSEGV is coming from. Typically
> a failure here is caused by a bogus object descriptor. This may
> happen because something was overwritten by client code, or because
> there's an undiscovered bug in the GC, or in the gcj generated
> descriptor.
With some further pointers, it turns out there _was_ a bogus object
descriptor. At my last contact with Hans, he suspected the problem was
related to THREAD_LOCAL_ALLOC, but was unable to find any likely
problems when reviewing the code. Here's an excerpt:
Hans Boehm wrote:
> I spent a bit of time:
>
> - Staring at the thread-specific-storage implementation, and
>
> - adding some tests for thread-local allocation to gctest.
>
> The new tests failed to make the problem reproducible here.
>
> I cleaned up a few things. The only thing substantive I found was
> that specific.c could fail if one of the thread stacks ended up at the
> extreme high end of the addres space, i.e. if 0xfffff000 is the
> address of a valid stack page. Are you configuring your kernel in
> some nonstandard way, e.g. to maximize virtual address space?
> Otherwise this seems unlikely to account for the problem, since that's
> normally kernel address space on Linux/X86, as I recall. (I vaguely
> recall that Mandrake Linux might do something strange in this area.)
Hans sent me new versions of specific.c and specific.h to fix the above
mentioned problem (thread stacks at the high end of the address space),
but I never had the chance to try them out. I had a workaround that
made the problem go away for me, and other work priorities are
preventing me from continuing to dig into the issue.
My workarounds were to increase the initial heap size of my application
(reducing the required garbage collections), and turning on
GC_IGNORE_GCJ_INFO (which I had to add to gcj's version of the collector
since it was added after the version I am using). Neither of which
really "fixes" the problem though. They just make it much more unlikely
that I'll hit the problem (I haven't since then).
regards,
michael
More information about the Java
mailing list