Solaris -vs- iconv

Fri Mar 30 17:30:00 GMT 2001

Tom Tromey <tromey@redhat.com> writes:
> * We want to use UCS-2 in the lexer.
> Well, ok, we probably don't really *need* to. We currently do
> because I didn't feel like rewriting the whole lexer. However using
> UCS-2 here is reasonable since it makes parts of the code cleaner.

My intuition would be that the lexer should use UTF-8. This what
we should be using for identifiers and assembler labels. It is also
(what at some point should be) the preferred encoding for input files.
So I think we should optimize for UTF-8 input files.
> Using something like UTF-8 would mean returning strings and such.

I don't understand this. Do you mean what the lexer returns as the
value of a character token? That seems like an issue completely
unrelated issue to what kind of buffers the lexer and parser uses.
I don't see any difference in terms of programming complexity or
performance between a buffer of UCS-2 characters and a buffer of
bytes in UTF-8 encoding.
-- 
	--Per Bothner
per@bothner.com http://www.bothner.com/~per/