lua-users home
lua-l archive

Re: Feature request: "u" option to file:read

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


2018年02月23日 16:46 GMT+02:00 Luiz Henrique de Figueiredo <lhf@tecgraf.puc-rio.br>:
>> "u": reads one or more bytes forming one UTF-8 character, and returns
>> that character as a string. Returns nil if the file at the current
>> position does not start with a valid UTF-8 sequence.
>
> Can this be done without having to unget more than one byte from the stream?
My rough workaround in Lua is this:
function readutf8(f)
 local pos = f:seek()
 local len = math.min(4,f:seek"end"-pos)
 f:seek("set",pos)
 local buf = f:read(len)
 while utf8.len(buf)~=1 and #buf>1 do buf = buf:sub(1,-2) end
 if utf8.len(buf)==1 then
 f:seek("set",pos+#buf)
 return buf
 end
 f:seek("set",pos)
end
This has only one unget, at the expense of always reading four bytes
except when the file is shorter.

AltStyle によって変換されたページ (->オリジナル) /