lua-users home
lua-l archive

Re: Elegant design for creating error messages in LPEG parser

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


It was thus said that the Great joy mondal once stated:
> Hi Spc,
 Hi.
> So essentially what you are saying is the '/' function syntax is just
> syntax sugar ? without having much value to creating a parser ?
 Not necessarily.
 First off, the documentation for LPEG [1] does document all of LPEG but
like the Lua documentation, it can be terse. 
 Second, '/' is documented in the Capture subsection, so the result of '/'
is to produce a capture. The expression:
	num = lpeg.R"09"^1 / tonumber
will match digits, then those digits are passed to the function tonumber(),
which converts a string to a number. It's this number that is returned. An
example:
	num = lpeg.R"09"^1
	SP = lpeg.P" "
	patt = lpeg.Ct((num * SP^-1)^0)
	dump('result',patt:match"1 2 3 4") -- just dumps a table
	result =
	{
	}
num doesn't return any captures, so nothing is captured into the table
returned by lpeg.Ct(). Now, let's capture the output of num (I'm only
changing the rule for num---the rest stays the same, except for the output
which I'm showing):
	num = lpeg.C(lpeg.R"09"^1)
	result =
	{
	 [1] = "1",
	 [2] = "2",
	 [3] = "3",
	 [4] = "4",
	}
This captures the digits as strings. If we wanted to convert these to
numbers, that's when '/' comes in:
	num = lpeg.R"09"^1 / tonumber
	result =
	{
	 [1] = 1.000000,
	 [2] = 2.000000,
	 [3] = 3.000000,
	 [4] = 4.000000,
	}
We now get actual numbers. You *can* do the same thing with lpeg.Cmt():
	num = lpeg.Cmt(lpeg.R"09"^1,function(_,position,capture)
	 return position,tonumber(capture)
	end)
	result =
	{
	 [1] = 1.000000,
	 [2] = 2.000000,
	 [3] = 3.000000,
	 [4] = 4.000000,
	}
but you aren't really buying anything in this example, other than being a
bit more verbose (or explicit).
 Here's another example of using '/':
	char = lpeg.P"\n" / "\\n"
	 + lpeg.P"\t" / "\\t"
	 + lpeg.P(1)
	safe = lpeg.Cs(char^0)
 Here I'm doing a substitution capture on the input string. For each
character in the string, if it's a newline character, replace it with the
escaped version '\n'; the same for the tab character. Here, the newline
character is replaced with a string using the '/' operator. Again, you
could do this with lpeg.Cmt() but it would lose some clarity:
	char = lpeg.Cmt(lpeg.P"\n",function(_,position) return position,"\\n" end)
 + lpeg.Cmt(lpeg.P"\t",function(_,position) return position,"\\t" end)
	 + lpeg.P(1)
	safe = lpeg.Cs(char^0)
 So I suppose you could say that '/' is syntatic surgar for lpeg.Cmt(), in
that everything you can do with '/' you can do with lpeg.Cmt(). But I find
using '/' clearer than using lpeg.Cmt(). It's not to say I don't use
lpeg.Cmt(), but only when I need to do some other processing at match time.
> I was stuck trying to use Cb ( back referencing ) and Cg - which are
> confusing.
> 
> Then I read that Cb is experimental.
 It was at one point, but that doesn't seem to be the case anymore. I
generally use Cg() in conjunction with Ct(); I think I've used Cb() once
when parsing text that had variable delimeters.
 -spc
[1]	http://www.inf.puc-rio.br/~roberto/lpeg/

AltStyle によって変換されたページ (->オリジナル) /