Re: Plea for the support of unicode escape sequences
[
Date Prev][
Date Next][
Thread Prev][
Thread Next]
[
Date Index]
[
Thread Index]
- Subject: Re: Plea for the support of unicode escape sequences
- From: Lorenzo Donati <lorenzodonatibz@...>
- Date: 2011年6月29日 07:18:38 +0200
On 29/06/2011 5.29, Tom N Harris wrote:
On 06/28/2011 04:24 PM, Lorenzo Donati wrote:
Unicode escape sequences are platform independent. They are useful for
the same reasons why ASCII codes are useful, at least for people working
with Unicode.
Technically, Lua doesn't even require ASCII,
I admit I cut the sentence short, but I didn't mean that Lua supports
ASCII (the manual expressly states that string.byte returns non-portable
codes), but that, in general, if a language supports a specific
character set (ASCII was an example), it is useful to specify character
codes in a program instead of characters. And if it is useful for a
given pre-unicode charset, it is useful for Unicode too (for the same
reasons).
>
as the recent adventures
with lctype.c have shown. Unicode is platform specific because not all
platforms use the same encoding (UTF-8 vs UTF-16). And when Unicode
isn't being used at all this will just be dead-weight in the parser.
Well, I'm not an expert, but aside from the different encodings (UTF-8,
16, 32 and endianness variants), Unicode is standardized. So if you are
going to write a file in UTF-8, then the byte sequence for, say, a
smiley, will be the seme on any computer on Earth that claims support
for UTF-8. There is no risk of "codepage hell". Of course there are lots
of non- or partially conforming applications/systems, but that's another
point.
How about supporting escape sequences greater than 255 when
sizeof(char)>1 ?
I don't understand exactly what you mean. Do you mean writing, for
example (assuming a new \GXXXX...multibyte esc sequence),\G10fa1b
instead of \x10\xfa\x1b (here I assume translation to Lua 5.2 new esc
sequences)?
The power of specific unicode esc sequences is that Lua will make the
table lookup for you, so it will translate a code point to the specific
byte sequence for, say, UTF-8 encoding.
-- Lorenzo