Re: LPEG - next version
[
Date Prev][
Date Next][
Thread Prev][
Thread Next]
[
Date Index]
[
Thread Index]
- Subject: Re: LPEG - next version
- From: David Given <dg@...>
- Date: 2009年6月12日 10:02:35 +0100
Miles Bader wrote:
[...]
It seems there needs to be a clear distinction between "raw char" (given
that lpeg is quite usable for binary data) and "unicode char".
The problem is that Unicode doesn't really have any such concept as a
'character', which means that traditional string handling methods
basically don't work with it (even if you ignore UTF-8 encoding). A
single displayable thing can actually be made up of several Unicode code
points, and may even have several different (but technically equivalent)
representations.
I'm afraid it's just a fundamentally hard problem, and I haven't seen
any decent abstractions over it yet.
Making P(x) count utf8 chars would certainly be convenient for people
reading utf8 files, but... it doesn't seem the cleanest thing in
general....
*Nothing* about Unicode is clean...
--
David Given
dg@cowlark.com