more gcj 3.0 fun -- catch { try {...}...}, encodings

Mon Jun 25 19:34:00 GMT 2001

There's another problem that keeps coming up when I run that
API test suite with GCJ 3.0, and this one seems a bit strange.
(The suite tests the SAX parser API for XML, and the problems
come up with its sanity test mode, using a reference parser. Earlier
versions of GCJ couldn't get as far with as GCJ 3.0 does; so these
reports represent clear progress!)
Basically the failure is that certain logic in the XML parser(s)
under test seems to be misbehaving, and it's related to exception
processing. (Works under JDK, not GCJ.) The logic goes like this:
 try {
 handle "<?xml ... encoding='...' ... ?>"
 } catch (X1: not a built-in encoding) {
 try {
 ...
 reader = new InputStreamReader (stream, encoding)
 ... success ...
 } catch (X2: JVM can't handle it either) {
 report X3: "can't deal with it" 
 }
 }
The strangenesses I saw were in the nested "try":
 - Exception X1 gets thrown in cases where X3
 should get thrown.
 
 - The success path doesn't seem to trigger in cases where
 libgcj should handle the encoding in question (EUC-JP,
 ISO-2022-JP, and so on) ... X1 gets thrown.
 
 - When I put System.err.println() calls in there to figure
 out what gives, their output lands in the bitbucket.
The first two seem like they should be related. They didn't
repeat with obvious small testcases. GNATS didn't seem to
mention particular issues for code in catch clauses; has anyone
else happened across similar problems there?
Then, more fun -- when I got rid of the nested "try", as a
workaround for those problems:
 
 - parsing a largish document encoded in EUC-JP worked,
 but its 'iso-2022-jp' or 'shift_jis' forms seemed to
 go into an infinite loop. Shorter documents in those
 encodings worked fine.
 
 - Bogus encoding names ("XYZ+999" and the like) were
 accepted by the InputStreamReader constructor, but
 later caused null pointer exceptions. Haven't quite
 tracked it down; natIconv.cc is the suspect.
I'll see if I can get more of a handle on those two. One
thing that looks strange is that the hashtable for names
in gnu.gcj.convert.IOConvert is case-sensitive, while the
encoding names are by definition (IANA) insensitive,
which might have affected the "shift_jis" mode.
- Dave