Experiences using GCJ as an embedded compiler

Mon Mar 29 08:59:00 GMT 1999

For about the past 4 months I've been working with, on, and underneath
the hot new GCJ frontend for compiling Java class files to native code.
Before I enumerate its current weaknesses, let me thank the people at
cygnus for this wonderful tool. I've used this compiler to build an
entire Java run-time environment and, even though I'm still using the
original 980906 alpha version, have found it to produce quite reliable
and efficient code.
For an embedded environment, GCJ currently has numerous deficiencies.
I've overcome most of these with quite nice implementations and hope
that sometime in the future these changes can be integrated into the
released version of GCJ.
 1) Lightweight exceptions
 I wrote a lengthy email previously on the current implementation
 of exceptions in `gcc' and why they are not usable in an embedded
 environment. I developed a very lightweight exception implementation
 which drastically decreases the overhead required for exception
 support.
 2) Write barriers
 Embedded environments typically have real-time constraints to which
 they must respond. Without write barriers, all threads which modify
 the collected heap must be stopped until collection completes. The
 only threads which can be higher priority than garbage collection are
 those which modify only simple, pointerless data structures.
 Fortunately, write-barriers were relatively simple to add to GCC.
 The following code, added to `gcc/expr.c', invokes a new `write_barrier'
 RTL pattern whenever it writes a pointer to memory.
 #if defined (HAVE_write_barrier)
 /*
	 * Generate write barrier code if we're setting memory with
 * something that's not a constant, not a vtable, and with
 * something that is a pointer.
 */
 else if (flag_write_barriers
 	 && GET_CODE (target) == MEM
	 && GET_MODE (target) == Pmode
 #ifdef TARGET_NEEDS_WRITE_BARRIER
	 && TARGET_NEEDS_WRITE_BARRIER (target)
 #endif
	 && !(TREE_CODE(exp) == ADDR_EXPR
		 && DECL_VIRTUAL_P (TREE_OPERAND (exp, 0)))
	 && !really_constant_p (exp)
	 && contains_pointers_p (TREE_TYPE (exp))) {
 	 temp = force_reg (Pmode, temp);
	 emit_insn (gen_write_barrier (target, temp));
 }
 #endif
 Here's a sample `write_barrier' pattern for the AMD A29K. This
 processor implements write barriers in a particularly elegant
 way. The `write_barrier' register, gr95, normally contains
 0xffffffff. The assert instruction, asleu, asserts that a
 particular pointer being written to memory is less or equal to
 the write barrier register. When GC is inactive, this is always
 true and the write barrier requires only 1 instruction which
 acts as a NOP. When GC begins collecting, the write barrier
 is moved to the start of the heap. During collection, only
 writing pointers higher than the start of heap generate traps
 to the write barrier code which then records the object as
 a live object.
 ;; Garbage collector write barrier
 (define_insn "write_barrier"
 [(parallel [
	 (set (match_operand:SI 0 "memory_operand" "=m")
	 (match_operand:SI 1 "register_operand" "r"))
	 (unspec_volatile [(match_dup 1)] 16)])]
 ""
 "store 0,0,%1,%0\;asleu V_write_barrier,%1,gr95")
 I have similar implementations for StrongARM and Hitachi Super-H,
 although they require more instructions.
 3) JVM `jsr' instructions
 The current GCJ compiler generates pretty bad code for Java `jsr'
 instructions. This is because a `jsr' isn't really a the same as
 a RTL `call' since the `jsr' shares and corrupts the stack frame
 of the caller. Thus, the compiler must trace the `jsr' as if it
 was a simple jump which saves its return address and returns
 using an `indirect_jump'.
 Instead of using a native `call' instruction, GCJ currently computes
 the return label and uses a `jump' instruction. This, however,
 was relatively easy to fix by modifying `build_java_jsr' to use
 a new RTL pattern named `subroutine_jump'. Only trick is you must
 tell `flow.c' that this instruction computes the address its return
 label or else you get invalid code. Here's the modified code from
 `build_java_jsr'.
 /*
 * Java jsr's aren't the same as a normal `call_insn', since
 * they call a local label which doesn't save any registers.
 * The compiler must know this instruction as a `jump_insn'
 * which includes REG_NOTES that declare the return label
 * as being a label which can be jumped to by an indirect_jump.
 */
 rtx return_label = gen_label_rtx ();
 rtx insn = emit_jump_insn (
 gen_subroutine_jump (DECL_RTL (where), return_label));
 REG_NOTES (insn) = gen_rtx_EXPR_LIST (REG_LABEL, return_label,
					 REG_NOTES (insn));
 emit_label (return_label);
 I also added code in for RISC machines `decl.c' that recognizes
 cases where a return address is popped from top of stack and
 optimizing this to access the machine's return address register.
 4) 64-bit code
 GCJ generates pretty bad code for many 64-bit operations.
 Unfortunately, this isn't easy to fix in one place (at least I couldn't
 find a way) and alot of hacking on each .md file to add appropriate
 RTL patterns that optimize what you want it to do. As an example, try
 the following code that Java uses alot to merge two 32-bit integers.
 
 long merge(int hi, int lo) {
 return ((long)hi << 32) | ((long)lo & 0xffffffff);
 }
 On 32-bit architectures, GCJ will generate all the sign extension,
 shifting, and oring operations instead of the desired move of two
 registers ;-( Oh well. I fixed it on the RISC architectures by
 adding alot of RTL patterns and some peepholes but they're very
 specific to this particular operation. It would be nice if GCC handled
 64-bit values a bit more gracefully in general.
 5) Size of compiled metadata
 GCJ generates a very convenient metadata structure that requires
 very little run-time loading. Here's a sample declaration of the
 JvClass object that I gleaned from the GCJ sources.
 class JvClass : public JvObject	// Declaration of a single class, (__CL_name)
 {
 friend class java::lang::Class;
 jref unused_0;
 jutf8 name;		// UTF class name
 jshort accflags;		// Access rights
 jclass superclass;
 jclass arrayclass;	// Class for arrays of these objects
 jref unused_1;
 JvConstants constants;	// Strings and other constants
 JvMethods methods;	// Methods
 JvFields fields;		// Fields
 jdtable dtable;		// Dispatch table for objects of this class
 jclass *interfaces;	// Vector of interfaces implemented by this object
 java::lang::ClassLoader *loader;
 jshort interface_len;	// Number of interfaces in `interfaces' vector
 jbyte state;
 jbyte final;
 };
 Every JvClass then references a vector of constants, a vector of methods,
 a vector of fields, etc.... This JvClass structure can be directly used
 by the run-time environment with minimal loading. In a conventional PC
 or workstation environment in which the compiled data segment lives on
 inexpensive disk storage and gets loaded directly into RAM, this structure
 makes alot of sense.
 Unfortunately, embedded environments typically use much more precious
 flash memory for non-volatile store. Since the JvClass structure must
 be writable and the internal pointers relocated to live in RAM, all the
 metadata must be moved to RAM and fixup records generates. After building
 an entire Java run-time environment, I've discovered that the Java metadata
 and all its associated fixups consumes about 40% of the total binary size.
 In an embedded environment, this metadata exists twice: once in flash,
 once again in RAM.
 My proposal for embedded environments is to serialize the metadata
 into a much more compact format which gets loaded at run-time into
 much the same format that the compiler currently generates. This
 format actually has two advantages:
 a) Significantly decreasing the flash requirements for embedded systems
 b) Decoupling the compiler's generated metadata format from the
 run-time format. Currently, changes to the run-time format
 require changes to the compiler; similarly, changes to the
 compile time format require modifications to the run-time.
 6) Interface calling mechanism
 Currently, the GCJ compiler generates the following run-time call to
 lookup the native code for a given Java interface.
 jnative
 _Jv_LookupInterfaceMethod(jclass cl, jutf8 methodName, jutf8 methodSignature)
 {
 }
 This calling method is a relatively straightforward translation from the
 JVM `invokeinterface' instruction. Unfortunately, this calling convention
 for interfaces is relatively inefficient, since a method name and signature
 must be found at run-time via class introspection. Since Java uses interfaces
 as its preferred method calling mechanism, it is vital that interface
 invokation be FAST.
 Instead, I propose that GCJ layout vtables for each interface and, at
 compile time, determine the interface and vtable index being invoked.
 Thus, the above run-time interface would change to:
 jnative
 _Jv_LookupInterfaceMethod(jclass cl, jclass xface, int vtableIndex)
 {
 }
 Finding an interface would then be merely matching the `xface' against
 the list of interfaces for the given class, and returning the appropriate
 method from its vtable.
 Is anybody currently considering or working on such an implementation???
 If not, this is my next project for GCJ.
-- 
Jon Olson, Modular Mining Systems
	 3289 E. Hemisphere Loop
	 Tucson, AZ 85706
INTERNET: olson@mmsi.com
PHONE: (520)746-9127
FAX: (520)889-5790