Exception handling and embedded `gcj'

Wed Feb 3 11:33:00 GMT 1999

I'm including the `netwinder' group on this email because
exception support on the StrongARM is currently very broken.
Hopefully, this email will stimulate some conversation regarding
this topic.
Now that I've got `gcj' happily generating code for both our A29K and
StrongARM embedded targets I thought that I'd share one of my biggest
problems.
In C++ exception handling is useful but not necessary. Most
applications don't use it and in an embedded environment you almost
always turn it off. Compiling most C++ apps on a workstation I
normally turn off exceptions because of the code bloat it causes.
Note that C++ compilers, including g++, generate alot of exception
code even if you never throw or catch an exception! This is because
C++ is required to call destructors even if some called method or
function throws an exception. Thus, each and every destructor must
get wrapped into the equivalent of a `finally' clause just in case one
of your methods throws an exception.
In Java, on the other hand, exception handling is required. All Java
code uses exceptions, most methods throw one or more exceptions, and
the API is designed to throw exceptions rather than return error
codes.
In g++, exceptions are implemented in one of two ways. In my opinion,
both of these mechanisms are not appropriate for embedded targets.
1) Range tables
 This method has the advantage that it causes zero run-time overhead
 if exceptions are not thrown. Throwing an exception causes
 considerable run-time cost, but the idea is that you don't often
 throw exceptions.
 In this method, the compiler emits range tables containing the
 starting address, ending address, entry code label, and
 run-time type. Associated with each of these range tables is an
 exception matching function which returns TRUE the currently
 thrown exception matches the run-time type. Range tables
 closely mirror the structure of Java class files, which also
 use an exception range table to represent their exceptions.
 The problem with this strategy is that it requires the run-time
 environment be able to unwind the hardware call stack to determine
 the PC within each frame and ultimately reload the registers
 in the context of the catching method. On many, if not most,
 architectures this requires debugging information be contained
 within the binary that specifies which registers were saved
 at what point in the stack frame. The currently supported
 unwind methods for X86 and SPARC utilize DWARF-2 unwind information
 which is sort of opcodes for a stack unwinding state machine.
 In addition, most embedded GCC targets are not supported with
 DWARF-2 unwinding information. From what I can tell, unwinding
 information is currently only available for X86 and SPARC.
 In a typical C++ program containing lots of methods, this unwind
 information can add 50% or more to the size of the resulting
 binary. All code must be compiled with exceptions enabled, or else
 the entire call stack can not be unwound and exceptions will break.
 In a system with libraries not compiled with the proper unwind
 information which makes callbacks to methods that throw exceptions,
 you lose! In a system where disk space is cheap and the unwind
 information seldom gets paged in, the range table is a good
 approach since it is merely a data table on disk; in an embedded
 system where ROM or FLASH is expensive, the range table approach
 is too costly.
2) setjmp/longjmp
 This method has the advantage that it can be supported with only
 the trusty setjmp/longjmp functions which are already in common
 existence. Code which does not catch exceptions incurs no
 performance or code size penalty and need not be compiled with
 exceptions enabled.
 At run-time, an exception gets pushed onto a singly linked list of
 exception frames. Each exception frame contains basically just a
 jmp_buf and a link to the next exception frame. Thus, for a simple
 try block, the compiler emits the equivalent of the following:
 ExceptionFrame frame;
 int res;
 frame.next = exceptionStack;
 exceptionStack = &frame;
 if( (res = setjmp(frame.jmp_buf)) == 0 )
 {
 doTryStuff();
 exceptionStack = frame.next;
 }
 else
 {
 Exception *e = (Exception *)res;
 exceptionStack = frame.next;
 if( matches(e, type) )
	doExceptionStuff();
 else
 rethrow(e);
 }
 This strategy has many disadvantages:
 a) Code which catches exceptions incurs incredible code bloat.
 All the exception stack manipulation, type matching, setjmp
 calling, and result checking gets performed inline. The
 resulting code is usually 200% larger than the same code
 compiled without exceptions.
 b) The setjmp/longjmp support in GCC is buggy on many targets.
 On StrongARMs, C++ code can currently only be built with
 exceptions disabled. Enabling exceptions causes core dumps
 during inline stack manipulation. Makes it difficult when
 `GDB' tells your core dump is on a line containing just
 a `}' bracket.
 c) setjmp() is unnecessarily slow and inefficient for supporting
 an exception handling mechanism. Many implementations of
 setjmp/longjmp save and restore the Unix signal mask; Even
 on architectures which have a fast version, setjmp() and
 longjmp() typically save the entire register set, causing
 large ExceptionFrames on the stack, and slower execution.
 An exception need not save the entire register set, since
 GCC can easily be configured to know what registers are
 clobbered on entry to an exception handler label. The stack
 size bloat gets particularly severe with exceptions, since
 GCC allocates a separate ExceptionFrame for each try block
 encountered, even when they're not nested. For a Java
 try/catch/finally block requires two ExceptionFrame objects
 on the stack.
 d) The `exceptionStack' list pointer must be stored in thread
 local storage for this implementation to be thread safe.
 Typically, GCC makes an additional function call just to
 get the address of where `exceptionStack' is stored.
 In summary, for Java code and C++ code that catches alot of
 exceptions, the setjmp/longjmp mechanism creates far too much
 code bloat, and is too inefficient and buggy for an embedded
 implementation.
What I've implemented in `gcj' is a mechanism which is sort of a
hybrid between the range table approach and the setjmp approach.
Like the range table approach, it sets up exception tables which
specify code labels and types; like the setjmp approach, it explicitly
marks the beginning and end of an exception region by pushing
an ExceptionFrame onto a thread local exception stack. Here's
an example of the run-time declarations for this mechanism.
 class ExceptionHandler
 {
 friend class ExceptionFrame;
 ExceptionType *type;	// Class metadata identifying the caught type
 void (*handler)();	// Handler invoked for matched types
 };
 class ExceptionFrame
 {
 ExceptionFrame *next;
 ExceptionHandler *handlers;
 ExceptionJmpBuf jmpbuf;
 public:
 void push(ExceptionHandler *h);
 static ExceptionFrame *pop();
 static void throwException(ExceptionObject *);
 u_int &pc() { return jmpbuf[0].jmpbuf[JMP_BUF_PC]; }
 u_int &sp() { return jmpbuf[0].jmpbuf[JMP_BUF_SP]; }
 };
The ExceptionHandler objects contain the types and entries emitted by
the compiler for each try/catch block. The compiler emits these
entries in order, with the last block containing a NULL `type' field.
If this last entry contains a `handler', then the handler points to
the start of the finally block for this try/catch sequence.
At the start of a try block, the compiler emits a call to
`ExceptionFrame::push()', passing it a stack allocated exception block;
at the end of a try block, the compiler emits a call to ExceptionFrame::pop().
This requires no arguments, since it implicitly pops the top of the
exception stack. On a RISC implementation, this typically requires
3 instructions for entry and 1 instruction for try exit. Obviously,
there are additional instructions executed by push() and pop(), but
these are very short methods and aren't expanded inline.
In both RISC implementations, I've only needed to save 3 registers
in ExceptionJmpBuf: PC, FP, and SP. Thus, the size of an ExceptionFrame
is 20 bytes. This is significantly smaller than would be required
for a complete `jmpbuf', since the compiler knows, via an `exception_receiver'
RTL pattern, that all registers get clobbered at entry to an exception
handler.
When throwing an exception, the compiler invokes
ExceptionFrame::throwException, passing it the exception being thrown.
Throwing an exception does not need to unwind the hardware call stack,
but only the exception stack. For each ExceptionFrame, it matches the
ExceptionObject's type against the handlers for that frame. If it
finds a matching type, it either advances the `handlers' to point to
the finally block or pops the exception if no finally block exists and
executes the exception handler. Thus, no explicit `pop()' is required
exception handlers. If it finds no matching type, but finds a
finally block, it pops the exception and executes the finally handler.
The finally handler need only rethrow the exception.
This exception handling mechanism has many advantages over the
existing ones used both in g++ and gcj.
 1) It has zero overhead both in code size and run-time performance
 if exceptions are not used.
 2) It has small code size overhead (approximately 4 instructions)
 when exceptions are caught.
 3) It uses modest stack space requirements compared with using
 setjmp/longjmp.
 4) It provides a single location where exception stacks are
 manipulated. This allows us to perform sanity checks on the
 exception stack without inlining it everywhere.
 5) It does not require the hardware dependent stack unwinding
 which is costly or impossible with optimized embedded code.
 6) Although slightly slower than range tables (we can't beat zero
 run-time overhead), it's significantly faster than using
 setjmp/longjmp because it saves fewer registers.
I have it currently working quite nicely for Java exceptions. C++
exceptions are a bit uglier, since C++ allows throwing objects instead
of just references to objects.
Comments???
-- 
Jon Olson, Modular Mining Systems
	 3289 E. Hemisphere Loop
	 Tucson, AZ 85706
INTERNET: olson@mmsi.com
PHONE: (520)746-9127
FAX: (520)889-5790