4.0.0 JNI/reflection invocation 2/4/8 times as slow as in 3.3.3

Mon Jan 31 07:21:00 GMT 2005

On 2005年1月29日 10:58:56 -0800, Per Bothner <per@bothner.com> wrote:
> Ranjit Mathew wrote:
> > Instead of changing over to chaining, the current open
> > addressing implementation can perhaps be redeemed by
> > using a compacting remove(), eliminating the need for
> > tombstones.
> > 
> > This is also described in Knuth's TAOCP Vol. 3 (Sorting
> > and Searching) and he shows that it doesn't impact
> > performance that much. See section 6.4, Algorithm R (IIRC).
>> I don't have TAOCP (I bought it in grad school, but
> it disappeared ...), but I assume you're talking about

Sacrilege! What self-respecting programmer
doesn't have all TAOCP volumes on his
bookshelf? :-P
Just kidding.
> some variant of the following:
>> To remove an entry e at index i, after clearing tab[i]:
> j = next_empty_slot(i);
> for (int k = i;;)
> {
> k=(k+1)%tab.length;
> if (k == j) break;
> e = tab[k];
> tab[k] = null;
> if (e != null)
> quick_insert(e);
> }
[...]
> The idea is we check if any following entries would get
> hashed differently due to the now-evailable empty slot.

His algorithm works backwards in the table.
He bothers with an element only if its
hashcode is the same as that for the element
just removed. Otherwise, yes, the algorithms
are quite similar.
> Assuming the table isn't too full that shouldn't take
> long.

Note that IdentityHashMap right now expands
the table if the size is at the threshold (default
75%). So there always are sufficient empty
slots.
> > BTW, why is our default capacity 21 and not a prime
> > like (say) 19 or 23? Primes are much better for hash
> > table sizes (again, refer to TAOCP Vol. 3).
>> I seem to remember that powers-of-2 is not unreasonable.

Powers of 2 on a binary computer, and powers of
10 on a decimal computer (e.g. MIX), are not
kosher according to Knuth (and many others) - the
modulo operation just gives you the lower bits/digits
of the number.
> (You have to make sure using a probe increment is relatively
> prime - which 1 is.) At least on some embedded chips
> a modulo operations could be fairly expensive. On modern
> desktop-and-above processors what we care about is cache
> misses, so division doesn't matter. (Note that linear
> probing is likely to work better for caches.)

Agreed.
Thanks,
Ranjit.
-- 
Ranjit Mathew Email: rmathew AT gmail DOT com
Bangalore, INDIA. Web: http://ranjitmathew.hostingzero.com/