more gcj 3.0 fun -- catch { try {...}...}, encodings

Tue Jun 26 08:47:00 GMT 2001

> David> - parsing a largish document encoded in EUC-JP worked, but
> David> its 'iso-2022-jp' or 'shift_jis' forms seemed to go into an
> David> infinite loop. Shorter documents in those encodings worked
> David> fine.
>> gdb ought to work well enough for Java to track down an infinite loop.

Yep, I filed PR 3426 on this. Basically java.io.InputStreamReader is
getting no progress because it expects the converter will always progress
on each pass through that for(;;) loop. But in the case I looked at (with
iso-2022-jp) it doesn't, because there are only two bytes left at the end
of the buffer, and it appears that the character is encoded using more
bytes than that. (It's clear that this is the "no-progress" loop, I'm just
assuming that with two bytes left, the root cause is a three byte character.)
Presumably I could have filed a simpler "how to reproduce", namely
just try to read the xml/suite/japanese/pr-xml-iso-2022-jp.xml file from
CVS for http://xmlconf.sourceforge.net ... but I didn't verify that it'd
reproduce in that simpler context.
- Dave