gcj: mem leaks & speed.

Mon Feb 2 20:01:00 GMT 2004

Per -
A type-accurate stack scan introduces some problems, especially if we
don't have a conservative fallback. It introduces a new class of potential subtle,
hard-to-debug gcj bugs. It changes CNI semantics. Passing opaque pointers to
gcj objects to third-party library becomes much more problematic. You can no longer
make unrestricted C++ use of pointers to Java objects, since you can't put them
in unions. Pointers to fields of Java objects get much more exciting.
(I am in favor of avoiding conservative scanning of static roots. But that's much easier,
has more benefit, and can easily be made optional to support applications that mix Java
with garbage-collected C/C++. I have mixed feelings abut the smart-pointer-based null
pointer checks, since they have some of the same issues above, but not nearly to the
same extent.)
Precise stack scanning does have some advantages, but I'm not sure it
completely solves any real problems:
1) It allows fully copying collection. But it's not clear (to me anyway) that this is
a big advantage over mostly copying. And as far as I can tell, the trend in JVMs is
toward allowing object pinning anyway. It's otherwise to hard to pass a Java array
to some native numerical library or the like. And I suspect the hardest problem with
full performance copying collection in gcj is that we're constrained to a standard ABI,
which is both a handicap, but also one of gcj's biggest strengths.
2) It gives you some more confidence that there won't be unexpected heap growth. That's
probably a valid argument for Scheme or ML. But for C++ or Java, it's more marginal, since it's
really up to the compiler to decide what's reachable and what isn't. There is no available
language spec to define it. Defining it restricts optimization. Hence it's hard
for the programmer to really reason about heap size either way, though the approximate
arguments do become slightly less dubious with precise root identification. (Getting the
compiler to provide info about definitely dead or nonpointer locations would also get
most of this benefit without most of the costs. So long as there is a way to ignore
the information it's much easier to debug it. If you can avoid dealing with the few hard cases,
you don't need to restrict CNI or precompiled libraries.)
My first priority along these lines would be to ensure that the gcc back end maintains a
recognizable pointer to every reachable object, e.g. that it doesn't compile
for (i = 0; i < 100; ++i)
 {
 ... a[i - 1 ] ...
 }
/* a dead here */
as 
--a;
for (i = 0; i < 100; ++i)
 {
 ... a[i] ...
 }
(The transformation is OK, but the original a needs to be kept somewhere.)
That's already important now, and would be a first step towards precise pointer identification.
My impression is that it's still not guaranteed, mostly because it never actually happens in real
life, and hence there's no bug report, and hence it doesn't get fixed.
Hans
> -----Original Message-----
> From: java-owner@gcc.gnu.org 
> [mailto:java-owner@gcc.gnu.org]On Behalf Of
> Per Bothner
> Sent: Monday, February 02, 2004 10:07 AM
> To: Boehm, Hans
> Cc: java@gcc.gnu.org
> Subject: Re: gcj: mem leaks & speed.
>>> Boehm, Hans wrote:
>> > Adding a conventional copied young generation to gcj is 
> nontrivial. It
> > would have to really be a "mostly copied" young generation, 
> since we have to
> > deal with ambiguous roots.
>> About a decade ago Elliot Moss's group (at U-Mass Amherst, I believe)
> worked on precise GC for GNU Modula-3, modifying Gcc to generate
> information for precise root information. The project kind of fizzled
> out, probably though lack of funding, and because it was difficult to
> do efficiently (and it was easy to do conservative collection).
>> Since then we've now implemented Dwarf-2 unwind information,
> where the compiler generates a compact encoding of what registers
> need to be restored on exception handling. I'm wandering whether
> that model could be extended to tell the GC what it needs to know
> for precise (and possibly copying) collection.
>> Native methods are a complication. For CNI we can modify g++ in
> the same way we modify gcj. For JNI we probably don't want to
> use compiler assist, bu use the strategies that other VMs use.
> -- 
> 	--Per Bothner
> per@bothner.com http://per.bothner.com/
>