benchmark result

Mon Dec 6 12:16:00 GMT 2004

Mathieu Lacage writes:
 > Well, I am a bit stubborn and I must say I was very disapointed by the
 > performance of the code generated by gcj so, I started looking into what
 > are the bottlenecks:
 > 
 > 1) I did run my simulation code built with gcc 3.4.3. The same setup as
 > previously reported in this email was used. Interestingly, I also used a
 > new patch which decreased the amount of short-lived objects created by
 > the code at the cost of creating a long-lived pool of maximum number
 > possible of these short-lived objects. Although this code had little to
 > no influence (except on really large simulations) on the simulation time
 > when it ran on sun's jdk 1.4.2 (at best, a 5% improvement). This new
 > patch did have an immediate effect on the simulation run time when
 > compiled with gcc 3.4.3 and 4.0.0 (a ~10% improvement even on really
 > short simulations). This experiment led me to believe there was probably
 > a major bottleneck in the GC used by gcj. So, I decided to do a bit of
 > system-level profiling with oprofile.
Great. Thanks for doing this!
 > 2) opreport first reported:
 > Profiling through timer interrupt
 > TIMER:0|
 > samples| %|
 > ------------------
 > 75829 84.9767 vmlinux
 > 6506 7.2909 libgcj.so.5.0.0
 > 2767 3.1008 MainSimulation
 > ...
 > 
 > Which seemed to imply that the overall simulation time was spent in the
 > libgcj and MainSimulation binaries. Obviously, I did not expect that
 > much time to be spent in libgcj so, I decided to look into a profile
 > specific to that binary.
 > 
 > 3) Thus, I ran: 
 > [mlacage@chronos treegrowth]$ opgprof /opt/gcc-3.4.3/lib/libgcj.so
 > [mlacage@chronos treegrowth]$ gprof /opt/gcc-3.4.3/lib/libgcj.so -p gmon.out
 > Flat profile:
 > 
 > Each sample counts as 1 samples.
 > % cumulative self self total
 > time samples samples calls T1/call T1/call name
 > 22.35 1454.00 1454.00 _ieee754_log
Ouch! Compile with -ffast-math and get some speedup.
We're using fairly slow floating-point code by default in gcj in order
to meet the requirements of the original Java language spec. Now that
Java has strictfp we could change the default.
 > 12.87 2291.00 837.00 GC_mark_from
Okay, garbage collection time. Hard to get away from that, but see
below.
 > 10.19 2954.00 663.00 Jv_LookupInterfaceMethodIdx
It's interesting, and perhaps a little surprising, that interface
dispatch occupies such a large proportion of your runtime.
 > 7.79 3461.00 507.00 ZN4java4util9ArrayList3getEi
 > 4.98 3785.00 324.00 Z20_Jv_IsAssignableFromPN4java4lang5ClassES2_
 > 4.92 4105.00 320.00 Jv_CheckCast
You're doing a lot of access to generic contatiners, so there's a lot
of cast checks.
 > 4.33 4387.00 282.00 frame_dummy
 > 3.37 4606.00 219.00 ZN4java4util14AbstractList17ドルhasNextEv
 > 3.30 4821.00 215.00 ZN4java4util14AbstractList18ドルcheckModEv
 > 3.01 5017.00 196.00 GC_local_gcj_malloc
 > 2.47 5178.00 161.00 ZN4java4util14AbstractList14ドルnextEv
 > 2.34 5330.00 152.00 ZN4java4util9ArrayList3addEPNS_4lang6ObjectE
 > 2.08 5465.00 135.00 ZN4java4util9ArrayList19checkBoundExclusiveEi
 > 1.48 5561.00 96.00 Jv_AllocObjectNoFinalizer
 > 1.48 5657.00 96.00 Jv_CheckArrayStore
 > 1.38 5747.00 90.00 init
 > 1.14 5821.00 74.00 ZN4java4lang4Math3logEd
 > 1.08 5891.00 70.00 ZN4java4util12AbstractList8iteratorEv
 > 
 > Ok, so, my application does a lot of calls to the log function, no
 > surprise here. Now, I expected the GC to be pretty high here and, well,
 > it seems to be with GC_mark_from. However, I must say I am pretty
 > surprised to see Jv_LookupInterfaceMethodIdx which I don't know anything
 > about.
 > 
 > Could someone tell me what this function really does ? Is it expected to
 > be so high into an application profile ? If not, what could I do to
 > reduce its usage ?
You're doing a great many interface calls. I don't think gcj's
interface dispatch is particularly slow, so interface dispatch might
be just as significant a drain on runtime on other systems. You might
get better performance by using ArrayList in your code instead of
AbstractList.
As far as I'm aware the Boehm gc is fine, but we aren't taking
advantage of the opportunity quickly to recycle very short-lived
objects. This is something that we can improve. But to have gc use
only 10% in an application really isn't tragically bad.
Thank you *very* much for doing this. It's important for us to get
this kind of input.
Andrew.