Re: Lua interpreter and Lua files encoding
[
Date Prev][
Date Next][
Thread Prev][
Thread Next]
[
Date Index]
[
Thread Index]
- Subject: Re: Lua interpreter and Lua files encoding
- From: Peter Odding <peter@...>
- Date: 2011年1月06日 11:45:43 +0100
Anyway, is this only an implementation artifact? Or is something that
will last? In this latter case a mention in the reference manual could
be useful, since utf8 is very common nowadays and generating utf8 files
using Lua, _without specialized libraries_ and without the hassle of
encoding literals with escape sequence, is really a useful!
The Lua 5.1 reference manual defines that "strings in Lua can contain
any 8-bit value" but it doesn't guarantee the same for literal strings
embedded in Lua source code. So if you really want to guarantee
compatibility with different Lua implementations (e.g. LuaJIT, Kahlua,
luaj, Jill, LuaCLR, LuaToCee, the list* goes on for quite a while..)
then it might be wise to encode UTF-8 string literals using escape
sequences. On the other hand, the end of line normalization mentioned
earlier should never corrupt valid UTF-8 sequences because UTF-8 was
specifically designed to be compatible with ASCII.
I don't think the Lua reference manual should mention UTF-8 unless it
will guarantee that string literals with UTF-8 contents are passed
through unharmed. However the writing in the Lua reference manual is
generally quite conservative. I think one of the reasons for this is to
ease the implementation of Lua on a range of platforms with different
characteristics.
- Peter Odding
* http://lua-users.org/wiki/LuaImplementations
PS. Given the above I don't see how you would need "specialized
libraries" to generate Lua source code containing literal strings with
UTF-8 using escape sequences, i.e. the following should suffice to
output such string literals:
function encode_literal(s)
return '"' .. s:gsub('[^A-Za-z0-9 ]', function(c)
return ('\\%d'):format(c:byte())
end) .. '"'
end
print(encode_literal 'Ångström')