performance problem with process fork in gcj compiled CNI

Fri Jan 27 23:26:00 GMT 2006

Hi,
 I realized that the performance problem is not in the software, actually 
is in the hardware.
 In the first scenario I tested in an intel hiperthreading with 2 logical 
CPUs and linux.
 Now I tested in a sparc-sun with 6 real CPUs and solaris.
 Intel:
 1 instance of the process took T CPU cycles
 2 instance of the process took (~1.85).T CPU cycles each
 Sun:
 1 instance of the process took T CPU cycles
 2 instance of the process took T CPU cycles each
 I still have a problem to solve about the fork.
 I'm thinking abount trying a late link with libgcj.so only after the fork 
using dlopen.
 I don't know if a static linking with libgcj could help me anyway.
Temporal.
>From: "Boehm, Hans" <hans.boehm@hp.com>
>To: "Ricardo Temporal" <ricardotemporal@hotmail.com>,<java@gcc.gnu.org>
>Subject: RE: performance problem with process fork in gcj compiled CNI
>Date: 2006年1月27日 11:46:26 -0800
>> > -----Original Message-----
> > From: Ricardo Temporal [mailto:ricardotemporal@hotmail.com]
> > Hi,
> >
> > I saw SUSV3 about the fork and really pthread_atfork
> > documentations says:
> >
> > "There are at least two serious problems with the semantics
> > of fork() in a
> > multi-threaded program. One problem has to do with state (for
> > example,
> > memory) covered by mutexes. Consider the case where one
> > thread has a mutex
> > locked and the state covered by that mutex is inconsistent
> > while another
> > thread calls fork(). In the child, the mutex is in the locked
> > state (locked
> > by a nonexistent thread and thus can never be unlocked).
> > Having the child
> > simply reinitialize the mutex is unsatisfactory since this
> > approach does not
> > resolve the question about how to correct or otherwise deal with the
> > inconsistent state in the child."
> >
> > The documentation suggests a workaround using fork handlers
> > to be done in
> > libgcj and not in my application.
>Things are worse than that. When you fork a multithreaded process, only
>one thread exists in the child. Thus I strongly suspect that some
>system threads needed by libgcj will just no longer exist. I don't see
>any a priori reason that the resulting child process should be at all
>healthy. But it appears you were somehow getting lucky, and it's at
>least close.
> >
> > So I tried to forget the fork and launch 2 instances of
> > the program by
> > the shell and I've got the same results.
> >
> > It seems that the library libgcj.so is shared and synchronized.
> >
> > Follow the new version of the program without any fork.
> >
> > Please comments.
>I have no good explanation for that. Only the read-only parts of libgcj
>should be shared. There shouldn't really be any synchronization between
>the two processes. Depending on your platform, there may be memory
>bandwidth issues or the like, especially since this application does
>nothing but allocate and garbage collect. The usual next step is to use
>a profiler and/or performance counter tools to figure out where the time
>is going, and why the time spent in each process is so different in the
>two cases. You might also try running with the GC_PRINT_STATS
>environment variable defined to see if the garbage collector is behaving
>similarly in both cases.
>>You are presumably talking about two physical processors, one hardware
>thread per processor, not two hardware threads (e.g. Intel's
>hyperthreading)? If this is an Opteron-based or other NUMA system,
>there may be memory placement issues, though I'd be surprised if this
>had that much of an impact.
>>Hans