Universal Character Names, v2

Neil Booth neil@daikokuya.co.uk
Sat Nov 30 05:00:00 GMT 2002


Martin v. L?wis wrote:-
> > I suggest this should only be a warning (it could be -S with the
> > output used on a different assembler, or for some other purpose),
> > only be emitted once per translation unit, and be moved to c_lex().
>> It was an explicit request to have that kind of determination, for
> Java compatibility.
>> The Java compiler requires UCN support on all platforms (and has
> mangling to do so), but also requires C++ compatibility on systems
> where C++ supports UCNs.
>> So if I assume that C++ can use UTF-8 everywhere, the Java compiler
> will break on systems where no suitable assembler is available.

I don't see how my request is affected by this.
> The code isn't actually duplicated. In one case, the results are
> written to a FILE*, in the other case, they are written to a char*
> buffer. How can I unify those two?

Write the buffer to the FILE *? It may not be an improvement.
> > Can I suggest that, instead of doing this, you have a routine that
> > reads a UCS's digits (4 or 8) into a uchar[8] buffer, and that you
> > re-use maybe_read_ucs() on this buffer? maybe_read_ucs() might
> > need a few small tweaks. Again, this would avoid duplication.
>> I can try, but I doubt it saves much duplication. Instead of

Why not
for (len = 0; len < 4 or 8; len++)
{
 c = get_effective_char (pfile);
 if (c == EOF || !ISXDIGIT (c))
 { BACKUP(); break;}
 buf[len] = c;
}
 // maybe_read_ucs handles diagnostics
 temp = buf;
 maybe_read_ucs (pfile, &temp, buf + len, &val);
where VAL contains what the routine expects. You might be able to
find a better way by modifying maybe_read_ucs somehow; or by breaking
out most of it into a common subroutine that both use.
If we find \U in a file, we should assume it is a UCN. There is little
use for \ as a separate token if followed by U. If there is a syntax
error in a UCN with an invalid char, there is no obviously right thing to
do to recover; certainly I don't think backing up to the \U and making
two tokens out of it is a good idea. It might not even be worth the
XDIGIT check above; let maybe_read_ucs handle it, and if there is an
error, don't add any UCS to the identifier and stop lexing the identifier.
Neil.


More information about the Java mailing list

AltStyle によって変換されたページ (->オリジナル) /