Thoughts about static linking and reducing size of binaries
Bryce McKinlay
bryce@albatross.co.nz
Wed Mar 14 16:25:00 GMT 2001
Over the last few nights I've been messing around with importing the
Java 1.2 java.security code from classpath and updating libgcj to use
it. Although the security framework doesn't do much in the context of
native compiled code, it will definitely be good to have for
dynamically loaded bytecode and is presumably required by some of the
J2EE type APIs.
Although it seems quite nicely designed, the full java.security
package is quite large (maybe 100+ classes!), and I noted with horror
that it adds some 70K to a stripped, static binary, even though most
applications probably do not use any java.security classes directly.
This is because several security classes are imported by
java.lang.Class and java.lang.ClassLoader. The linker of course brings
in _everything_ that is ever referenced from any class used by the
application, so if something is imported by java.lang.Class then
everything used by that class must be brought in as well.
It is not an issue in shared-library land (or even for static binaries
on PCs, really), but it worries me that libgcj will become less useful
for embedded development if we can't keep the size of static binaries
down as it bloats up with more and more of the modern Java APIs. My
guess is that for many applications maybe 50%+ of the classes in a
static binary are never even initialized on any used code path.
One solution is to create a "libgcj lite" which removes dependencies
on things such as java.security that would not typically be used by
small and embedded applications, but a) it would suck to have to
maintain separate trees and b) its difficult to come up with a
one-size-fits-all profile of what to include and what not to include
that would work for all applications. This could also be done using
#ifdefs and such in libgcj to permit different configuration profiles,
but again it makes things less maintainable and we don't want that.
A much better and far more general solution would be to have the
ability to link in only the classes which are actually used
(loaded/initialized) during execution. It would be simple to give
libgcj the ability to track this and dump out a list of used classes
during execution that could then be used to make an "size optimal"
build by feeding that list back to the linker. Is it possible to have
the linker bring in only a given set of classes at link time, and
treat other references as weak symbols? I'm sure this could be done at
the compiler level, but we wouldn't want to have to recompile parts of
libgcj just to re-link an application.
The cool thing about this solution is that it also neatly fixes the
problem of the linker not bringing in classes that are dynamically
forName() loaded!
Does this strategy sound feasible? How hard would it be to get the
linker to play along? Anyone have any other ideas for reducing the
footprint of static binaries against an ever-growing library?
regards
[ bryce ]
More information about the Java
mailing list