GC failure w/ THREAD_LOCAL_ALLOC ?
Boehm, Hans
hans_boehm@hp.com
Wed Mar 20 17:03:00 GMT 2002
I can't reproduce the problem on X86 either. Questions:
0) Do other people see similar problems?
For Bryce and Jeff:
1) How was gcj configured?
2) What compiler was it built with?
3) Does it appear that this problem was recently introduced (and thus
different from Michael Smith's)? (My 3.1 tree is a few days old, and I
built with a stable compiler.)
4) What was the machine configuration, e.g. which X86 processor(s) was/were
used? Is it reproducible on older processors? (I tried a plain Pentium, a
Pentium II, and a 4xPPro machine, all of which are rather old.)
5) Which Linux distribution? Was this a standard RedHat kernel? Any danger
the main stack starts at 0xffffffff? Anything weird about the kernel?
Hans
> -----Original Message-----
> From: Boehm, Hans
> Sent: Wednesday, March 20, 2002 2:03 PM
> To: 'Michael Smith'; Bryce McKinlay
> Cc: java@gcc.gnu.org; Boehm, Hans
> Subject: RE: GC failure w/ THREAD_LOCAL_ALLOC ?
>>> I just tried Bryce's test on an Itanium here, since I had a
> prebuilt gcj 3.1. It uses the stock CVS garbage collector.
> I couldn't get it to fail. I will try on X86, though that
> will take a bit longer.
>> Hans
>> > -----Original Message-----
> > From: Michael Smith [mailto:msmith@spinnakernet.com]
> > Sent: Wednesday, March 20, 2002 10:15 AM
> > To: Bryce McKinlay
> > Cc: java@gcc.gnu.org; Boehm, Hans
> > Subject: Re: GC failure w/ THREAD_LOCAL_ALLOC ?
> >
> >
> > Bryce McKinlay wrote:
> > > While testing thread local allocation on PowerPC, I ran
> > into a problem
> > > which is also reproducable on x86. The attached stress-test-case
> > > GCTest.java will lock up with ~100% reproducability with
> > > THREAD_LOCAL_ALLOC enabled. It runs fine without
> THREAD_LOCAL_ALLOC.
> > >
> > > What I am seeing in the debugger is most threads waiting in
> > > GC_suspend_handler, but one thread segfaulting in GC_mark_read.
> > > libjava's segv handler gets called and the collector is
> re-entered
> > > during the stack trace, causing the freeze.
> >
> > I actually ran into this problem in my application 2 months
> > ago (using
> > gcc version 3.1 20010911 (experimental)), and reported it
> to Hans. I
> > couldn't water down my application to create such a simple
> > test case, so
> > tracking it down was somewhat difficult.
> >
> > From the stack trace I provided back in January, Hans intially
> > responded with:
> >
> > Hans Boehm wrote:
> > > I'm not terribly worried about the SIGSEGV getting turned into a
> > > deadlock. Such things seem to be largely unavoidable.
> > >
> > > I would like to understand where the SIGSEGV is coming
> > from. Typically
> > > a failure here is caused by a bogus object descriptor. This may
> > > happen because something was overwritten by client code,
> or because
> > > there's an undiscovered bug in the GC, or in the gcj generated
> > > descriptor.
> >
> > With some further pointers, it turns out there _was_ a bogus object
> > descriptor. At my last contact with Hans, he suspected the
> > problem was
> > related to THREAD_LOCAL_ALLOC, but was unable to find any likely
> > problems when reviewing the code. Here's an excerpt:
> >
> > Hans Boehm wrote:
> > > I spent a bit of time:
> > >
> > > - Staring at the thread-specific-storage implementation, and
> > >
> > > - adding some tests for thread-local allocation to gctest.
> > >
> > > The new tests failed to make the problem reproducible here.
> > >
> > > I cleaned up a few things. The only thing substantive I
> found was
> > > that specific.c could fail if one of the thread stacks
> > ended up at the
> > > extreme high end of the addres space, i.e. if 0xfffff000 is the
> > > address of a valid stack page. Are you configuring your
> kernel in
> > > some nonstandard way, e.g. to maximize virtual address space?
> > > Otherwise this seems unlikely to account for the problem,
> > since that's
> > > normally kernel address space on Linux/X86, as I recall.
> > (I vaguely
> > > recall that Mandrake Linux might do something strange in
> > this area.)
> >
> > Hans sent me new versions of specific.c and specific.h to fix
> > the above
> > mentioned problem (thread stacks at the high end of the
> > address space),
> > but I never had the chance to try them out. I had a
> workaround that
> > made the problem go away for me, and other work priorities are
> > preventing me from continuing to dig into the issue.
> >
> > My workarounds were to increase the initial heap size of my
> > application
> > (reducing the required garbage collections), and turning on
> > GC_IGNORE_GCJ_INFO (which I had to add to gcj's version of
> > the collector
> > since it was added after the version I am using). Neither of which
> > really "fixes" the problem though. They just make it much
> > more unlikely
> > that I'll hit the problem (I haven't since then).
> >
> > regards,
> > michael
> >
>
More information about the Java
mailing list