gcj 3.0/unicode

Mon Jun 25 20:23:00 GMT 2001

>>>>> "David" == David Brownell <david-b@pacbell.net> writes:

David> I was comparing the output of an api test suite running on
David> jdk 1.3 versus gcj 3.0, and came across a few errors.
David> These are demonstrable with the appended program.
David> 0x0e2f, // jdk 1.3: false
David> 0x0eaf // jdk 1.3: false
I'm suprised by this. The JDK 1.3 online docs say that
isUnicodeIdentifierStart() returns true if and only if the character
is a letter. Character.isLetter() is defined in terms of the
Unicode standard.
If you look at the Unicode data, these two characters clearly
represent letters. They are both in category `Lo'.
Maybe the problem is that we are using the 3.x Unicode, but the JDK is
using the 2.x Unicode. Hmm. Is this lame of us? It seemed
reasonable when I did it. And I seem to remember that when I used the
2.x tables I found errors in them. OTOH the online docs pretty clear
say "Unicode 2.0" (something they didn't say before JDK 1.3, at least
as far as I remember). Maybe we ought to downgrade. Comments?
I must say I'm a bit suprised that anybody relies on this. It would
be better not to rely on the precise details if possible. Otherwise
if Sun ever does move to a newer Unicode standard, your program will
break. This is true even if all you want is interoperability, because
everybody isn't going to upgrade their Java implementations at the
same time.
David> 0x3006, // jdk 1.3: false
This is a letter ("Lo") in Unicode 3.1.
David> 0x309b, // jdk 1.3: true
David> 0x309c // jdk 1.3: true
These are modifier symbols ("Sk") in Unicode 3.1.
Tom