Re: tables as parsers
[
Date Prev][
Date Next][
Thread Prev][
Thread Next]
[
Date Index]
[
Thread Index]
- Subject: Re: tables as parsers
- From: "Soni \"They/Them\" L." <fakedme@...>
- Date: 2019年3月27日 23:06:05 -0300
On 2019年03月27日 10:10 p.m., Coda Highland wrote:
On Wed, Mar 27, 2019 at 6:47 PM Soni "They/Them" L. <fakedme@gmail.com
<mailto:fakedme@gmail.com>> wrote:
I'm rolling with this weird idea. is it possible to make a simple
parser
out of tables? I tried some stuff but ran into a few issues. I don't
wanna use LPeg or anything more complicated than string.find(foo,
bar,
1, true).
e.g.:
local ltk = {} -- lua_tokens
ltk.string = {}
ltk['"'] = "string"
ltk["'"] = "string" -- how to handle end of string correctly?
ltk.string['\\'] = "escape"
ltk.string.escape = {}
ltk.string.escape['z'] = {}
ltk.string.escape['z'].next = ltk.string.escape['z']
for i, v in ipairs({" ", "\t", "\n", etc}) do
ltk.string.escape['z'][v] = "next"
end
ltk.string.escape['z'][''] = ltk.string -- not sure if there'd be a
better way to do this
-- etc
Not a bad start for someone who's never done this formally before.
There are a zillion different ways to define a parser. The Dragon Book
recommendation is well-given.
I think the insight you're missing is that the table really represents
the structure of a state machine. Your top-level table shouldn't
contain things like ["'"] = "string" directly; rather, you'd want
something like ltk.base["'"]. Then you wouldn't have
ltk.string.escape, you'd just have ltk.string and ltk.escape. The
nesting doesn't belong in the table itself; instead, your parser will
maintain a stack of states, and you push a state when you start into a
parsing rule and you pop it off when you've finished that rule.
Yeah but then I wouldn't have self-referential tables. Besides I wasn't
planning on having a stack. FSMs (same class as regex) don't have a stack.
The place where this is suddenly going to get complicated is if your
grammar contains any prefix ambiguities (e.g. "fork" vs "forward"),
because then you have to implement backtracking. For most simpler
grammars it's best to just break it into two passes -- one that makes
simple lexical tokens out of a string of characters, and one that
evaluates the syntax out of a list of lexical tokens.
"fork" vs "forward" is unambiguous and doesn't require backtracking.
I don't mind offering assistance with this. I've actually built a
complete standards-compliant parser for C++ from first principles
before. I think it's pretty fun.
/s/ Adam