benchmark result

Tue Dec 7 10:53:00 GMT 2004

Mathieu Lacage writes:
 > On Tue, 2004年12月07日 at 10:06 +0000, Andrew Haley wrote:
 > > Mathieu Lacage writes:
 > > > On Mon, 2004年12月06日 at 16:07 +0100, Mathieu Lacage wrote:
 > > > > On Mon, 2004年12月06日 at 13:56 +0000, Andrew Haley wrote:
 > > > > > We could improve the performance of these collection classes with
 > > > > > little work. What is required is someone to study the code, do some
 > > > > > profiling, and fix things.
 > > > > 
 > > > > I will look into this when I find some time.
 > > > 
 > > > I tried to replace my iterators by getters on the critical paths of the
 > > > code. The patch itself is rather simple and I found the results
 > > > interesting...
 > > 
 > > Could you let me know what you changed? A diff or somesuch?
 > 
 > It took me a while to find the exact iterator access which triggered the
 > performance gain/loss but this is the 3-liner responsible for 95% of the
 > performance gain you see in my previous email. I believe that this patch
 > has a major influence on the GC too because it avoids the creation of
 > these really short-lived Iterator objects and this code runs in a really
 > tight loop (unexpectedly for me because I had never seen it in any jdk
 > profile). 
 > 
 > In the specific scenario I use here, the m_trees ArrayList always
 > contains only a single element. It seems clear to me that the jdk JIT
 > unrolls this single-iteration loop.
gcj won't unroll at -O2.
 > I already tried to use -fprofile-
 > arcs to avoid this but I get an ICE on gcc 3.4.3 and gcc 4.0.0. I assume
 > this is a known problem.
 > 
 > - for (Iterator i = m_trees.iterator(); i.hasNext (); ) {
 > - Tree tree = (Tree) i.next ();
 > + int size = m_trees.size ();
 > + for (int i = 0; i < size; i++) {
 > + Tree tree = (Tree) m_trees.get (i);
OK, I see. I can certainly see why this would be faster for gcj.
This is something we can improve.
 > Here is a new oprofile result:
 > 
 > opreport:
 > 13418 10.2113 libgcj.so.5.0.0
 > 9785 7.4466 MainSimulation
 > 
 > opgprof on libgcj.so:
 > % cumulative self self total
 > time samples samples calls T1/call T1/call name
 > 21.95 2945.00 2945.00 _ieee754_log
What CPU are you using? I would expect these calls to log to be
inlined on an x86.
The rest of this looks pretty resonable. It's still a bit worrying
that we're spending 10% in interface dispatch.
 > 13.09 4701.00 1756.00 GC_mark_from
 > 9.58 5986.00 1285.00 Jv_LookupInterfaceMethodIdx
 > 6.60 6871.00 885.00 ZN4java4util9ArrayList3getEi
 > 5.69 7635.00 764.00 Z20_Jv_IsAssignableFromPN4java4lang5ClassES2_
 > 5.25 8340.00 705.00 Jv_CheckCast
 > 4.58 8954.00 614.00 frame_dummy
 > 3.39 9409.00 455.00 ZN4java4util14AbstractList18ドルcheckModEv
 > 3.27 9848.00 439.00 GC_local_gcj_malloc
 > 3.20 10278.00 430.00 ZN4java4util14AbstractList17ドルhasNextEv
 > 2.62 10629.00 351.00 ZN4java4util14AbstractList14ドルnextEv
 > 2.31 10939.00 310.00 ZN4java4util9ArrayList19checkBoundExclusiveEi
 > 1.86 11188.00 249.00 ZN4java4util9ArrayList3addEPNS_4lang6ObjectE
 > 1.42 11379.00 191.00 Jv_AllocObjectNoFinalizer
 > 1.38 11564.00 185.00 init
 > 1.33 11742.00 178.00 Jv_CheckArrayStore
 > 1.26 11911.00 169.00 ZN4java4util9ArrayList4sizeEv
 > 1.19 12071.00 160.00 ZN4java4util14AbstractList16ドルfinit$Ev
 > 1.13 12223.00 152.00 ZN4java4util12AbstractList8iteratorEv
 > 1.04 12362.00 139.00 ZN4java4lang4Math3logEd
 > 0.95 12490.00 128.00 log
Andrew.