lua-users home
lua-l archive

Re: Overloading and extending operators, (l)PEGs and grammars

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


I can appreciate how difficult it is to make a small language with flexible
operator extensions. Its too bad, things like lpeg could benefit from it.
The PEG operators (*, +, /, etc.) are easy and mnemonic:
	’’ Literal string
	"" Literal string
	[] Character class
	. Any character
	(e) Grouping
	e? Optional
	e* Zero-or-more
	e+ One-or-more
	e1 e2 Sequence
	e1/e2 Prioritized Choice
	&e And-predicate
	!e Not-predicate
Anybody who has used a regex, or one of dozens of EBNF variants, can
remember this easily.
With lpeg, we have:
 Operator Description
 lpeg.P(string) Matches string literally
 lpeg.P(number) Matches exactly number characters
 lpeg.S(string) Matches any character in string (set)
 lpeg.R("xy") Matches any character between x and y (range)
 patt^n Matches at least n repetitions of patt
 patt^-n Matches at most n repetitions of patt
 patt1 * patt2 Matches patt1 followed by patt2
 patt1 + patt2 Matches patt1 or patt2 (ordered choice)
 patt1 - patt2 Matches patt1 if patt2 does not match
 -patt Equivalent to "" - patt
 patt1 / ... Used to capture matches? Why not have the same meaning as PEG?
There isn't any commonality here, I find it quite anti-mnemonic (all the
operators are used for different purposes than in the original PEG grammars). I
can't read a PEG without the table above taped to my monitor.
With boost::sprit (which looks pretty similar to PEGs, though their
might be theoretic differences in capability), you can use C++'s much
more flexible operator overloading to get:
	Unary:
	!P Matches P or an empty string
	*P Matches P zero or more times
	+P Matches P one or more times
	~P Matches anything that does not match P
	Binary:
	P1 | P2 Matches P1 or P2
	P1 - P2 Matches P1 but not P2
	P1 >> P2 Matches P1 followed by P2
	P1 % P2 Matches one or more P1 separated by P2
	P1 & P2 Matches both P1 and P2
	P1 ^ P2 Matches P1 or P2, but not both
	P1 && P2 Synonym for P1 >> P2
	P1 || P2 Matches P1 | P2 | P1 >> P2
It starts off well, the unary operators are pretty familiar, as is |.
After that it gets successively worse as various boolean and mathematical
operators are stolen for things with no particular relation to their common
usage.
I have mixed feelings about (ab)using operator overloading to support inline
expression of grammars. I can see the appeal to somehow add grammars as
elements of the language, rather than strings, but strings aren't so hard to
use, and are fairly flexible. I wonder if it wouldn't be better to use lpeg to
write something like:
equalcount = lpeg.grammar[[
	S = "0" B
	 / "1" A
	 / ""
	A = "0" S
	 / "1" A A
	B = "1" S
	 / "0" B B
]]
instead of:
	local S, A, B = 1, 2, 3
	equalcount = lpeg.P{
	 [S] = "0" * lpeg.V(B) + "1" * lpeg.V(A) + "",
	 [A] = "0" * lpeg.V(S) + "1" * lpeg.V(A) * lpeg.V(A),
	 [B] = "1" * lpeg.V(S) + "0" * lpeg.V(B) * lpeg.V(B),
	} * -1
or 
pascalcomment = lpeg.grammar[[
	C = "(*" N* "*)"
	N = C
	 / !"(*" .
]]
instead of
 ...
Cheers,
Sam

AltStyle によって変換されたページ (->オリジナル) /