lua-users home
lua-l archive

Re: LPEG - next version

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Miles Bader wrote:
[...]
It seems there needs to be a clear distinction between "raw char" (given
that lpeg is quite usable for binary data) and "unicode char".
The problem is that Unicode doesn't really have any such concept as a 'character', which means that traditional string handling methods basically don't work with it (even if you ignore UTF-8 encoding). A single displayable thing can actually be made up of several Unicode code points, and may even have several different (but technically equivalent) representations. I'm afraid it's just a fundamentally hard problem, and I haven't seen any decent abstractions over it yet.
Making P(x) count utf8 chars would certainly be convenient for people
reading utf8 files, but... it doesn't seem the cleanest thing in
general....
*Nothing* about Unicode is clean...
--
David Given
dg@cowlark.com

AltStyle によって変換されたページ (->オリジナル) /