compacting _Jv_Utf8Const

Thu May 6 03:05:00 GMT 2004

Tom Tromey wrote:
> Per> $ nm libgcj.so|grep ' _Utf'|awk '{print 1ドル}'>/tmp/bar
> Per> $ uniq </tmp/bar|wc
> Per> 54395 54395 489555
>> This doesn't measure duplication of contents, only symbol names.

Yes and no. It was meant to measure "post-duplicate-elimination".
The 'print 1ドル' selects out the *addresses*, not names. However,
I forgot to sort by address, which can be done by passing
-n to nm, or by adding an explicit 'sort'. If we do that, we get:
 24533 24533 220797
So duplicate elimination is more effective than 10% - it better than
halves the number of symbols, from 60208 to 24533. It would be
interesting to to see how much space is reduced - i.e. are shorter
symbols or longer symbols more likely to be duplicated?
If we compiled larger units (packages or libraries rather than
source files usually with a single class), then compile-time
duplicate elimination would be more effective, and link-time
duplicate elimination might no longer be worth it.
> BTW I looked at the `nm | grep' output a bit closer and there are a
> few symbols with the string `_Utf' in their names that aren't
> Utf8Consts, eg:
>> _Z13_Jv_FindClassP13_Jv_Utf8ConstPN4java4lang11ClassLoaderE
>> grepping for ` _Utf' may be more robust.

Yes, I already did that.
-- 
	--Per Bothner
per@bothner.com http://per.bothner.com/