Re: LPEG - next version

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: LPEG - next version
From: David Given <dg@...>
Date: 2009年6月12日 10:02:35 +0100

Miles Bader wrote:
[...]

It seems there needs to be a clear distinction between "raw char" (given
that lpeg is quite usable for binary data) and "unicode char".

The problem is that Unicode doesn't really have any such concept as a'character', which means that traditional string handling methodsbasically don't work with it (even if you ignore UTF-8 encoding). Asingle displayable thing can actually be made up of several Unicode codepoints, and may even have several different (but technically equivalent)representations.I'm afraid it's just a fundamentally hard problem, and I haven't seenany decent abstractions over it yet.

Making P(x) count utf8 chars would certainly be convenient for people
reading utf8 files, but... it doesn't seem the cleanest thing in
general....

*Nothing* about Unicode is clean...
--
David Given
dg@cowlark.com

Follow-Ups:
- Re: LPEG - next version, Miles Bader

References:
- LPEG - next version, Thomas Harning Jr.
- Re: LPEG - next version, Roberto Ierusalimschy
- Re: LPEG - next version, Florian Weimer
- Re: LPEG - next version, Miles Bader

Prev by Date: Re: thread safety.
Next by Date: Re: LPEG - next version
Previous by thread: Re: LPEG - next version
Next by thread: Re: LPEG - next version
Index(es):
- Date
- Thread