Experiences using GCJ as an embedded compiler
Jon Olson
olson@mmsi.com
Mon Mar 29 08:59:00 GMT 1999
For about the past 4 months I've been working with, on, and underneath
the hot new GCJ frontend for compiling Java class files to native code.
Before I enumerate its current weaknesses, let me thank the people at
cygnus for this wonderful tool. I've used this compiler to build an
entire Java run-time environment and, even though I'm still using the
original 980906 alpha version, have found it to produce quite reliable
and efficient code.
For an embedded environment, GCJ currently has numerous deficiencies.
I've overcome most of these with quite nice implementations and hope
that sometime in the future these changes can be integrated into the
released version of GCJ.
1) Lightweight exceptions
I wrote a lengthy email previously on the current implementation
of exceptions in `gcc' and why they are not usable in an embedded
environment. I developed a very lightweight exception implementation
which drastically decreases the overhead required for exception
support.
2) Write barriers
Embedded environments typically have real-time constraints to which
they must respond. Without write barriers, all threads which modify
the collected heap must be stopped until collection completes. The
only threads which can be higher priority than garbage collection are
those which modify only simple, pointerless data structures.
Fortunately, write-barriers were relatively simple to add to GCC.
The following code, added to `gcc/expr.c', invokes a new `write_barrier'
RTL pattern whenever it writes a pointer to memory.
#if defined (HAVE_write_barrier)
/*
* Generate write barrier code if we're setting memory with
* something that's not a constant, not a vtable, and with
* something that is a pointer.
*/
else if (flag_write_barriers
&& GET_CODE (target) == MEM
&& GET_MODE (target) == Pmode
#ifdef TARGET_NEEDS_WRITE_BARRIER
&& TARGET_NEEDS_WRITE_BARRIER (target)
#endif
&& !(TREE_CODE(exp) == ADDR_EXPR
&& DECL_VIRTUAL_P (TREE_OPERAND (exp, 0)))
&& !really_constant_p (exp)
&& contains_pointers_p (TREE_TYPE (exp))) {
temp = force_reg (Pmode, temp);
emit_insn (gen_write_barrier (target, temp));
}
#endif
Here's a sample `write_barrier' pattern for the AMD A29K. This
processor implements write barriers in a particularly elegant
way. The `write_barrier' register, gr95, normally contains
0xffffffff. The assert instruction, asleu, asserts that a
particular pointer being written to memory is less or equal to
the write barrier register. When GC is inactive, this is always
true and the write barrier requires only 1 instruction which
acts as a NOP. When GC begins collecting, the write barrier
is moved to the start of the heap. During collection, only
writing pointers higher than the start of heap generate traps
to the write barrier code which then records the object as
a live object.
;; Garbage collector write barrier
(define_insn "write_barrier"
[(parallel [
(set (match_operand:SI 0 "memory_operand" "=m")
(match_operand:SI 1 "register_operand" "r"))
(unspec_volatile [(match_dup 1)] 16)])]
""
"store 0,0,%1,%0\;asleu V_write_barrier,%1,gr95")
I have similar implementations for StrongARM and Hitachi Super-H,
although they require more instructions.
3) JVM `jsr' instructions
The current GCJ compiler generates pretty bad code for Java `jsr'
instructions. This is because a `jsr' isn't really a the same as
a RTL `call' since the `jsr' shares and corrupts the stack frame
of the caller. Thus, the compiler must trace the `jsr' as if it
was a simple jump which saves its return address and returns
using an `indirect_jump'.
Instead of using a native `call' instruction, GCJ currently computes
the return label and uses a `jump' instruction. This, however,
was relatively easy to fix by modifying `build_java_jsr' to use
a new RTL pattern named `subroutine_jump'. Only trick is you must
tell `flow.c' that this instruction computes the address its return
label or else you get invalid code. Here's the modified code from
`build_java_jsr'.
/*
* Java jsr's aren't the same as a normal `call_insn', since
* they call a local label which doesn't save any registers.
* The compiler must know this instruction as a `jump_insn'
* which includes REG_NOTES that declare the return label
* as being a label which can be jumped to by an indirect_jump.
*/
rtx return_label = gen_label_rtx ();
rtx insn = emit_jump_insn (
gen_subroutine_jump (DECL_RTL (where), return_label));
REG_NOTES (insn) = gen_rtx_EXPR_LIST (REG_LABEL, return_label,
REG_NOTES (insn));
emit_label (return_label);
I also added code in for RISC machines `decl.c' that recognizes
cases where a return address is popped from top of stack and
optimizing this to access the machine's return address register.
4) 64-bit code
GCJ generates pretty bad code for many 64-bit operations.
Unfortunately, this isn't easy to fix in one place (at least I couldn't
find a way) and alot of hacking on each .md file to add appropriate
RTL patterns that optimize what you want it to do. As an example, try
the following code that Java uses alot to merge two 32-bit integers.
long merge(int hi, int lo) {
return ((long)hi << 32) | ((long)lo & 0xffffffff);
}
On 32-bit architectures, GCJ will generate all the sign extension,
shifting, and oring operations instead of the desired move of two
registers ;-( Oh well. I fixed it on the RISC architectures by
adding alot of RTL patterns and some peepholes but they're very
specific to this particular operation. It would be nice if GCC handled
64-bit values a bit more gracefully in general.
5) Size of compiled metadata
GCJ generates a very convenient metadata structure that requires
very little run-time loading. Here's a sample declaration of the
JvClass object that I gleaned from the GCJ sources.
class JvClass : public JvObject // Declaration of a single class, (__CL_name)
{
friend class java::lang::Class;
jref unused_0;
jutf8 name; // UTF class name
jshort accflags; // Access rights
jclass superclass;
jclass arrayclass; // Class for arrays of these objects
jref unused_1;
JvConstants constants; // Strings and other constants
JvMethods methods; // Methods
JvFields fields; // Fields
jdtable dtable; // Dispatch table for objects of this class
jclass *interfaces; // Vector of interfaces implemented by this object
java::lang::ClassLoader *loader;
jshort interface_len; // Number of interfaces in `interfaces' vector
jbyte state;
jbyte final;
};
Every JvClass then references a vector of constants, a vector of methods,
a vector of fields, etc.... This JvClass structure can be directly used
by the run-time environment with minimal loading. In a conventional PC
or workstation environment in which the compiled data segment lives on
inexpensive disk storage and gets loaded directly into RAM, this structure
makes alot of sense.
Unfortunately, embedded environments typically use much more precious
flash memory for non-volatile store. Since the JvClass structure must
be writable and the internal pointers relocated to live in RAM, all the
metadata must be moved to RAM and fixup records generates. After building
an entire Java run-time environment, I've discovered that the Java metadata
and all its associated fixups consumes about 40% of the total binary size.
In an embedded environment, this metadata exists twice: once in flash,
once again in RAM.
My proposal for embedded environments is to serialize the metadata
into a much more compact format which gets loaded at run-time into
much the same format that the compiler currently generates. This
format actually has two advantages:
a) Significantly decreasing the flash requirements for embedded systems
b) Decoupling the compiler's generated metadata format from the
run-time format. Currently, changes to the run-time format
require changes to the compiler; similarly, changes to the
compile time format require modifications to the run-time.
6) Interface calling mechanism
Currently, the GCJ compiler generates the following run-time call to
lookup the native code for a given Java interface.
jnative
_Jv_LookupInterfaceMethod(jclass cl, jutf8 methodName, jutf8 methodSignature)
{
}
This calling method is a relatively straightforward translation from the
JVM `invokeinterface' instruction. Unfortunately, this calling convention
for interfaces is relatively inefficient, since a method name and signature
must be found at run-time via class introspection. Since Java uses interfaces
as its preferred method calling mechanism, it is vital that interface
invokation be FAST.
Instead, I propose that GCJ layout vtables for each interface and, at
compile time, determine the interface and vtable index being invoked.
Thus, the above run-time interface would change to:
jnative
_Jv_LookupInterfaceMethod(jclass cl, jclass xface, int vtableIndex)
{
}
Finding an interface would then be merely matching the `xface' against
the list of interfaces for the given class, and returning the appropriate
method from its vtable.
Is anybody currently considering or working on such an implementation???
If not, this is my next project for GCJ.
--
Jon Olson, Modular Mining Systems
3289 E. Hemisphere Loop
Tucson, AZ 85706
INTERNET: olson@mmsi.com
PHONE: (520)746-9127
FAX: (520)889-5790
More information about the Java
mailing list