steve donovan wrote: [...] > BTW, anybody have experience with the Lua string library working with > widechar strings? At least within the confines of the BMP they have a > regular concept of 'character'. Unfortunately not: sequences of combining characters mustn't be split. I don't know if there's a maximum length for a grapheme cluster, but, e.g., most Korean Hangul syllables are three code points long. Doesn't somebody have a Lua library that will decompose a UTF-8 string into an array of grapheme clusters? The rules to do so are well-defined, and would probably be the simplest approach if you want to deal with 'characters'. -- ┌─── dg@cowlark.com ───── http://www.cowlark.com ───── │ "I have always wished for my computer to be as easy to use as my │ telephone; my wish has come true because I can no longer figure out │ how to use my telephone." --- Bjarne Stroustrup
Attachment:
signature.asc
Description: OpenPGP digital signature