thin locks (was Re: libgcj/117:)

Fri Dec 10 11:57:00 GMT 1999

>> 3) Ideally negative overhead relative to pthreads. Or enough hooks to add
> some flavor of platform-specific thin locks later. This stuff seems to be a
> lot more performance critical for Java than it is for most other clients,
> and is probably more performance critical for gcj than Mozilla. That may
> eliminate them both (again?).
>
Apropos thin locks.
Has anybody developed or picked a design for those yet?
I was really disappointed to learn that Kaffe's thin locks won't work with
gcj's current _Jv_MonitorEnter/_Jv_MonitorExit. They don't work because Kaffe's
symmetry assumption demands that a lock is unlocked with a stack pointer value
that is equal to or greater (on a downward-growing stack) than the stack pointer
value when the lock was locked. If the stack pointer is lower on the unlock
than on the lock, Kaffe assumes that this was a non-final recursive unlock.
That's why kaffe's awt currently deadlocks when compiled with gcj :-(
The reason this doesn't work with gcj is that gcj doesn't guarantee that the sp 
value is the same (or higher) for a matching Jv_MonitorExit as it is for a 
Jv_MonitorEnter. Because of deferred pop optimizations, it's also not something 
one should ask for.
Anyway, you probably don't care much why it doesn't work in Kaffe: my point is
that it would be nice to develop a design that does work together, and draw from
the experiences in previous work. Or at least, that whoever does the design posts 
it to the list for feedback.
I believe that EF's thin locks may suffer from a similar problem as Kaffe's;
I've contacted one of the authors to confirm that. EF stores information in 
the current frame, so it should deal with deferred pop; but I don't believe it
can deal with locked objects being unlocked in a caller. Keep in mind that
JNI and CNI, for the matter, demands that (Jv_)MonitorEnter/Exit can be called
from anywhere in native C code, and that the call has the same effect as an
inlined thin lock somewhere in compiled Java code.
IBM's thinlocks (Bacon et al PLDI98) may be one alternative. They keep an 
explicit 8 bit count in the object header for shallow recursive acquisitions, which 
means they don't rely on the stack pointer position. However, they require that 
a 16bit thread id is stored with the object to identify recursive invocations. 
Clearly, getting a thread id by using pthread_self() or the like would eliminate
any advantage you hope to see from thin locks. They say that they get the thread 
index with a single load from the so-called "execution environment" --- whatever 
that is in their JVM. That work was done on IBM/AIX PPC, btw.
There's two possibilities, IMO, as to what that "execution environment" is: either 
it's a global variable that is adjusted on every context switch, in which case it 
would be right out for any native threading.
Or it's an additional argument that's carried with every function, such as JNIEnv*
in JNI. If the latter, it should be possible to combine this with stack limit checking.
In fact, one may be able to use a thread's stack limit, or bits thereof, as its id.
Btw, I noted that somebody added some stack limit checking options to gcc; but
it seems this is only for architectures such as the ARM where a register is or
can be set aside to hold the limit. But no discussion about this has ever reached
this list, it seems; so I assume it's for some other language.
On a unrelated note, and not to bug you too much, what is the current thinking on
getting backtraces in gcj? Do you expect to implement that in libgcj or in libgcc
or in some combination thereof? Me thinks providing stacktraces for C++ code may
be a nice g++ extension(?)
	- Godmar