lua-users home
lua-l archive

Re: Of Unicode in the next Lua version

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Wed, Jun 12, 2013 at 5:02 PM, Dirk Laurie <dirk.laurie@gmail.com> wrote:
> If `pos` comes before `char`, one can write an iterator on the model
> of `ipairs`:
>
> for pos,char in utf8(str) do ...
Almost... but you end up with the position of the next character... So
you need some trickery. Assuming a valid UTF-8 string:
Usage:
 for finish, start, char in utf8_next_char, "˙†ƒ˙©√" do
 print(cpt)
 end
`start` and `finish` being the bounds of the character, and `cpt`
being the UTF-8 code point.
It produces:
 ˙
 †
 ƒ
 ˙
 ©
 √
local
function utf8_next_char (subject, i)
 i = i and i+1 or 1
 if i > #subject then return end
 local offset = utf8_offset(s_byte(subject,i))
 return i + offset, i, s_sub(subject, i, i + offset)
end
it has the annoying property of passing the end position before the
start position, but it is stateless.
-- Pierre-Yves

AltStyle によって変換されたページ (->オリジナル) /