benchmark result
Andrew Haley
aph@redhat.com
Tue Dec 7 10:53:00 GMT 2004
Mathieu Lacage writes:
> On Tue, 2004年12月07日 at 10:06 +0000, Andrew Haley wrote:
> > Mathieu Lacage writes:
> > > On Mon, 2004年12月06日 at 16:07 +0100, Mathieu Lacage wrote:
> > > > On Mon, 2004年12月06日 at 13:56 +0000, Andrew Haley wrote:
> > > > > We could improve the performance of these collection classes with
> > > > > little work. What is required is someone to study the code, do some
> > > > > profiling, and fix things.
> > > >
> > > > I will look into this when I find some time.
> > >
> > > I tried to replace my iterators by getters on the critical paths of the
> > > code. The patch itself is rather simple and I found the results
> > > interesting...
> >
> > Could you let me know what you changed? A diff or somesuch?
>
> It took me a while to find the exact iterator access which triggered the
> performance gain/loss but this is the 3-liner responsible for 95% of the
> performance gain you see in my previous email. I believe that this patch
> has a major influence on the GC too because it avoids the creation of
> these really short-lived Iterator objects and this code runs in a really
> tight loop (unexpectedly for me because I had never seen it in any jdk
> profile).
>
> In the specific scenario I use here, the m_trees ArrayList always
> contains only a single element. It seems clear to me that the jdk JIT
> unrolls this single-iteration loop.
gcj won't unroll at -O2.
> I already tried to use -fprofile-
> arcs to avoid this but I get an ICE on gcc 3.4.3 and gcc 4.0.0. I assume
> this is a known problem.
>
> - for (Iterator i = m_trees.iterator(); i.hasNext (); ) {
> - Tree tree = (Tree) i.next ();
> + int size = m_trees.size ();
> + for (int i = 0; i < size; i++) {
> + Tree tree = (Tree) m_trees.get (i);
OK, I see. I can certainly see why this would be faster for gcj.
This is something we can improve.
> Here is a new oprofile result:
>
> opreport:
> 13418 10.2113 libgcj.so.5.0.0
> 9785 7.4466 MainSimulation
>
> opgprof on libgcj.so:
> % cumulative self self total
> time samples samples calls T1/call T1/call name
> 21.95 2945.00 2945.00 _ieee754_log
What CPU are you using? I would expect these calls to log to be
inlined on an x86.
The rest of this looks pretty resonable. It's still a bit worrying
that we're spending 10% in interface dispatch.
> 13.09 4701.00 1756.00 GC_mark_from
> 9.58 5986.00 1285.00 Jv_LookupInterfaceMethodIdx
> 6.60 6871.00 885.00 ZN4java4util9ArrayList3getEi
> 5.69 7635.00 764.00 Z20_Jv_IsAssignableFromPN4java4lang5ClassES2_
> 5.25 8340.00 705.00 Jv_CheckCast
> 4.58 8954.00 614.00 frame_dummy
> 3.39 9409.00 455.00 ZN4java4util14AbstractList18ドルcheckModEv
> 3.27 9848.00 439.00 GC_local_gcj_malloc
> 3.20 10278.00 430.00 ZN4java4util14AbstractList17ドルhasNextEv
> 2.62 10629.00 351.00 ZN4java4util14AbstractList14ドルnextEv
> 2.31 10939.00 310.00 ZN4java4util9ArrayList19checkBoundExclusiveEi
> 1.86 11188.00 249.00 ZN4java4util9ArrayList3addEPNS_4lang6ObjectE
> 1.42 11379.00 191.00 Jv_AllocObjectNoFinalizer
> 1.38 11564.00 185.00 init
> 1.33 11742.00 178.00 Jv_CheckArrayStore
> 1.26 11911.00 169.00 ZN4java4util9ArrayList4sizeEv
> 1.19 12071.00 160.00 ZN4java4util14AbstractList16ドルfinit$Ev
> 1.13 12223.00 152.00 ZN4java4util12AbstractList8iteratorEv
> 1.04 12362.00 139.00 ZN4java4lang4Math3logEd
> 0.95 12490.00 128.00 log
Andrew.
More information about the Java
mailing list