Interesting paper on Supporting Binary Compatibility with Static Compilation

Sun Aug 18 22:24:00 GMT 2002

Jeff Sturm wrote:
>On 2002年8月12日, Andrew Haley wrote:
>>>> > As a data point, when I build my CMS app with gcj I have 7 DSOs totalling
>> > about 190,000 load-time relocations (not counting libgcj.so). Some of
>> > these are resolved lazily, most are in .data and cannot be. Startup
>> > time is about 2 seconds on sparc-solaris and initial memory footprint
>> > around 28MB. Not too impressive, compared with 1 second and 15MB for the
>> > JRE.
>>>>That's weird, because IME interpreted Java takes forever to start
>>because of lazy class loading.
>>>>>>Yes, but... there are some 1900 classes in my app, plus another 1326 in
>libgcj.jar.
>>With the JRE, I see just 296 classes loaded initially, and 395 when it
>reaches steady-state.
>>With gcj, I have to wait for ld.so to link ~3200 classes before anything
>happens.
>>Suppose the compiled class metadata were free of pointers instead. No
>relocations, except lazy function calls. The metadata could then be
>constant and loaded into .rodata. Some advantages:
>
When thinking about the layout of the class and binary compatiblity 
structures I've worked on the assumption that non-symbolic, private 
relocations within the same binary object are much cheaper than symbolic 
relocations (like the vtable ones) which need to be looked up globally. 
If that isn't really the case then I guess we'd need to re-think the design.
However on Linux, libgcj's startup time is still insignificant compared 
to the Hotspot VM, especially with the most recent glibc versions. Other 
OSs like Mac OS X support prelinking which basically eliminates the need 
for runtime relocations (at the cost of waiting around while prebindings 
get updated every time you install an updated), so it isn't really an 
issue there either.
With binary compatibility we can make the class metadata pretty small. 
All that really needs to be there is:
class name
super-class name
access flags
methods metadata (note: cannot avoid function pointers here unless we 
used dlsym() calls?)
fields metadata
possibly, a lock field for use during initialization (could avoid this 
with a hash table or something)
everything else (ie the actual java.lang.Class object) can be 
constructed at runtime. In this case references to classes would change 
to go through something like a _Jv_GetClass call with a table of 
locally-referenced classes. This wouldnt really add any overhead because 
classes need to be checked for initialization in these situations 
anyway, and the class pointer's existance in the table would guarantee 
that it has been initialized. As we discussed a while back this would 
require a read barrier on alpha etc in order to be MP-safe, however.
While it would possibly allow us to make the metadata completely static 
(not just the strings), saving some startup time, I'm not convinced, 
given the fairly small size of the structure above, that it would be 
worthwhile due to:
a) extra metadata size (in the binary) due to loss of merged utf8consts
b) extra code complexity in compiler and libgcj to deal with the 
metadata format not being arranged in nice simple pointers
>c) GC would have a far smaller root set.
>>This is a major contributor to collection times. The GC must scan ~6MB of
>static data per collection, almost none of which contains any heap
>pointers.
>
With this scheme, and with class fields being part of the class objects, 
static data wouldn't need to be scanned at all. Class objects would be 
on the heap so everything would be reachable from the stacks.
regards
Bryce.