lua-users home
lua-l archive

Re: Unicode escape sequences?

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


As Petite mentioned, you can include any > 0x7F character in a single-byte Lua string unescaped (which includes the bytes of all extended characters in UTF-8). If you want to escape Unicode characters, how depends on what byte encoding you want the character to be in. If you're using UTF-8 (generally the most sensible choice), you can include U+2500 as 226円148円128円 (or \xe2\x94\x80 in 5.2). If you're using UTF-16 (in which case you'll likely have to make other changes to the system, at which point you can just as well add a \u character encoding yourself), you can just split the character into two hex encodings (in 5.2, anyway): \x25\x00 (without hex encodings, it would be 37円0円). If you want this to be handled more easily, you can include something like HTML unicode character entities, and then write a function that will do the conversion for you into your encoding of choice:
 local function utf8(num)
 num = tonumber(num,16)
 local char = string.char
 local floor = math.floor
 local highbits = 7
 local sparebytes = 0
 while num >= 2^(highbits + sparebytes * 6) do
 highbits = highbits - 1
 if highbits < 1 then error "utf-8 sequence out of range" end
 sparebytes = sparebytes + 1
 end
 if sparebytes == 0 then
 return char(num)
 else
 local bytes = {}
 for i=1, sparebytes do
 local byte = floor((num / 2^((i-1)*6)) % 2^6)
 bytes[sparebytes+2-i] = char(byte + 2^7)
 end
 local byte = floor(num / 2^(sparebytes*6))
 bytes[1] = char(byte + 2^8 - 2^(highbits))
 return table.concat(bytes)
 end
 end
 return (string.gsub(input,"&u(%x%x%x%x%x?%x?);",utf8))
For more information on the complexities of multi-byte character encoding (which Lua chooses not to address), see http://lua-users.org/wiki/LuaUnicode. On 2011年12月02日 14:08:22 -0800, Bernd Eggink <monoped@sudrala.de> wrote:
Hi all,
it seems that Lua 5.2.0 (rc4) doesn't support unicode escape sequences, such as \u2500. Is there any chance that this could be implemented in the final version? It would make handling of exotic characters much easier.
Greetings,
Bernd

AltStyle によって変換されたページ (->オリジナル) /