Universal Character Names, v2

Sat Nov 30 05:00:00 GMT 2002

Martin v. L?wis wrote:-
> > I suggest this should only be a warning (it could be -S with the
> > output used on a different assembler, or for some other purpose),
> > only be emitted once per translation unit, and be moved to c_lex().
>> It was an explicit request to have that kind of determination, for
> Java compatibility.
>> The Java compiler requires UCN support on all platforms (and has
> mangling to do so), but also requires C++ compatibility on systems
> where C++ supports UCNs.
>> So if I assume that C++ can use UTF-8 everywhere, the Java compiler
> will break on systems where no suitable assembler is available.

I don't see how my request is affected by this.
> The code isn't actually duplicated. In one case, the results are
> written to a FILE*, in the other case, they are written to a char*
> buffer. How can I unify those two?

Write the buffer to the FILE *? It may not be an improvement.
> > Can I suggest that, instead of doing this, you have a routine that
> > reads a UCS's digits (4 or 8) into a uchar[8] buffer, and that you
> > re-use maybe_read_ucs() on this buffer? maybe_read_ucs() might
> > need a few small tweaks. Again, this would avoid duplication.
>> I can try, but I doubt it saves much duplication. Instead of

Why not
for (len = 0; len < 4 or 8; len++)
{
 c = get_effective_char (pfile);
 if (c == EOF || !ISXDIGIT (c))
 { BACKUP(); break;}
 buf[len] = c;
}
 // maybe_read_ucs handles diagnostics
 temp = buf;
 maybe_read_ucs (pfile, &temp, buf + len, &val);
where VAL contains what the routine expects. You might be able to
find a better way by modifying maybe_read_ucs somehow; or by breaking
out most of it into a common subroutine that both use.
If we find \U in a file, we should assume it is a UCN. There is little
use for \ as a separate token if followed by U. If there is a syntax
error in a UCN with an invalid char, there is no obviously right thing to
do to recover; certainly I don't think backing up to the \U and making
two tokens out of it is a good idea. It might not even be worth the
XDIGIT check above; let maybe_read_ucs handle it, and if there is an
error, don't add any UCS to the identifier and stop lexing the identifier.
Neil.