I recently wrote code to parse lambda expressions written in a lisp like format, using a small hand-rolled parsing library.
The goal
This code was written as a part of this code-golf challenge not as code-golf, but as a demo for the concept. My goals for this were
- It should be simple enough that by looking over it one should be able to understand what is being parsed.
- It should be self-contained so that if someone wants to dig into the nitty-gritty and understand how it works they can.
I chose Haskell for this because:
- In my experience imperative parsers are more difficult to understand than combinator parsers.
- In my experience combinator parsing has a smaller footprint since it can assemble complex parsers from simple components.
- In my experience it is a joy to write and read parsers in Haskell.
The code then consists of a minimalist "library" of simple helper functions, and the definition of the actual parser. If it is important, you should be able to see exactly how it is presented via the link above.
I'd like to get an idea of how well I achieved these goals and what steps could be taken to better present the idea through code.
Expressions
The goal is to verify that a string is a properly formatted lambda expression, consisting of
abs
(abstraction) to introduce a new variable.var
(variable) to use a variable.app
(application) to apply one lambda to another.
formatted as a lisp list.
For example
(abs x (app (var x) (var x)))
Is a valid expression representing \$\lambda x. x x\$.
Additionally we want to verify the expression doesn't use undeclared variables and doesn't shadow any variables.
So
(abs y (var x))
is not valid because x
is not declared. And
(abs y (abs y (abs x (var x))))
is not valid because y
is shadowed.
A precise grammar for the valid expressions is given
\$ S\left(C\right) := \left\{ \begin{array}[rr] \ \color{green}{\texttt{(var }}x\color{green}{\texttt{)}} & \textrm{ where } x\in C \\ \color{green}{\texttt{(abs }} x\texttt{ }S(C\cup\{x\})\color{green}{\texttt{)}} & \textrm{ where } x\notin C\\ \color{green}{\texttt{(app }} S(C)\texttt{ }S(C)\color{green}{\texttt{)}} & \end{array} \right. \$
Where valid expressions are strings spanned by this grammar starting from \$S(\{\})\$.
There are two extra things to note:
- The whitespace is intentionally inflexible in this specification.
- Our Parse is not producing a parse tree, only confirming whether an expression meets the specifications.
Library
import Control.Applicative
( liftA2
, Alternative (..)
)
import Control.Monad
( guard
)
newtype Parser a =
Parser (String -> [(String, a)])
apply :: Parser a -> String -> [(String, a)]
apply (Parser p) s =
p s
instance Functor Parser where
fmap func (Parser p) =
Parser $ fmap (map (fmap func)) p
instance Applicative Parser where
pure x =
Parser ( \ string -> [(string, x)] )
liftA2 f (Parser p1) (Parser p2) =
Parser
( \ string -> do
(rem1, res1) <- p1 string
(rem2, res2) <- p2 rem1
return (rem2, f res1 res2)
)
instance Monad Parser where
Parser p >>= f =
Parser
( \ string -> do
(rem1, res1) <- p string
apply (f res1) rem1
)
instance Alternative Parser where
empty =
Parser ( \ _ -> [] )
Parser p1 <|> Parser p2 =
Parser $ ( \ string -> p1 string ++ p2 string )
charBy :: (Char -> Bool) -> Parser Char
charBy predicate =
Parser go
where
go (a : b) =
[ (b, a) | predicate a ]
go [] =
[]
char :: Char -> Parser Char
char x =
charBy (x==)
string :: String -> Parser String
string =
traverse char
choice :: [Parser a] -> Parser a
choice =
foldr (<|>) empty
Parser
lambda :: [String] -> Parser ()
lambda variables =
do
char '('
choice
[ do
string "abs "
name <- some $ charBy (`elem` ['a'..'z'])
guard $ notElem name variables
char ' '
lambda (name : variables)
, do
string "var "
choice (map string variables)
return ()
, do
string "app "
lambda variables
char ' '
lambda variables
]
char ')'
return ()
Description
This uses monadic parsing to get the job done. I build a parser type implement the necessary type classes and a couple of helper functions. The actual parser follows the same structure as the given grammar.
The code does a single pass without any tokenizer. This might be a little odd, but because the domain I think that such would make the code more complex, and we have very rigid white-space rules.
Review
I welcome all feedback on how to make this code more presentable. However the specific questions I would like to prompt are:
- Is
do
notation a good choice given my goals? - Was it a good choice to use a single pass method? Might it be more presentable to parse into a tree and then verify certain properties of the tree?
- Are there any subtle Haskell-isms that I have that could easily be removed to make the more approachable to a general audience?
of course keeping my initial goals in mind.
1 Answer 1
Sorry I'm so late to this!
- Is
do
notation a good choice given my goals?
Sure. Anyone who knows monads will understand do
. On the other hand, don't assume everyone knows monadic parsers. A simple link as a comment can go a long way. (Or you could actually comment your code, but that's more work!)
- Was it a good choice to use a single pass method? Might it be more presentable to parse into a tree and then verify certain properties of the tree?
It's ok. Building an explicit tree would be more normal, because it's prerequisite to doing anything else besides just checking validity.
- Are there any subtle Haskell-isms that I have that could easily be removed to make the more approachable to a general audience?
That's hard to say; I'm probably too familiar with Haskell to count as a general audience.
- Add a few comments.
- Maybe get rid of
instance Alternative Parser
. (Not that it isn't pulling its weight, it's just a thing you could maybe do without?) - Certainly figure out some other way of writing
[ (b, a) | predicate a ]
. Probably anyone can guess what it does, but a couple minutes of googling didn't find anything saying it's valid syntax; I had to try it in GHCI to make sure.
Other stuff:
- Are
rem
andres
supposed to be "remainder" and "resolution"? Longer variable names are sometimes easier for new readers. - On the flip side, in the
>>=
definition (and elsewhere),string
could probably just bes
; the advantage being that it doesn't share a name with the functionstring
in the same module. - Your definition of
fmap
is really hard to read. Usingmap
to specify which level applies to the[]
is ok; why not use.
for the function layer? The functor instance of(,)
never feels great. If you want to keep it concise, consider importing Bifunctor so you can write
fmap f (Parser p) = Parser $ (map (second f)) . p
.
If you want it to be easy for a stranger to read it'll probably need to be much more verbose. - The use of
do
inliftA2
is in the[]
monad; note that with a comment. empty = Parser (const [])
- Don't name your parser
lambda
.
Explore related questions
See similar questions with these tags.