I am writing a very simple Prolog intepreter in Haskell. It's a class assignment and I really want to do it right.
I was able to (quite quickly) write a parser for the language. Today I borrowed a copy of Programming in Haskell by Graham Hutton and I decided I would rewrite my code to a more function-oriented style, using fancy Haskell features.
I spent my whole day reading about Monads and how they can simplify the code. Then I tried coding the book's examples, but they seem a bit outdated and incomplete.
Now, I have a working Prolog parser I am not happy with (because I am sure I can do shorter and terser).
I would like if anyone could review my code and give advice as how to move towards a more "Haskellish" approach. I don't want to use Parsec or any other ready-made parser in my code. I am looking for a minimalist working solution.
import Data.Char
data Category = Atom | Variable | Number | Operator | Complex deriving (Show, Eq)
data Token = Token {
category :: Category,
token :: String
}
instance Show Token where
show (Token {token = token}) = show token
data Term = Term {
tokenType :: Category,
name :: String,
args :: [Term]
}
instance Show Term where
show (Term {tokenType = tokenType, name = name, args = args}) =
name ++ if tokenType == Complex then
"(" ++ showTermList args ++ ")"
else []
showTermList [] = []
showTermList (t : []) = show t
showTermList (t : ts) = show t ++ ", " ++ showTermList ts
data Rule = Rule {
lhs :: Maybe Term,
rhs :: [Term]
}
instance Show Rule where
show (Rule { lhs = lhs, rhs = rhs }) =
show lhs ++ " :- " ++ showTermList rhs
operators = "()[];"
smileyOperator = ":-"
openParen = "("
closeParen = ")"
comma = ","
dot = "."
parse :: [Token] -> [Rule]
parse [] = []
parse ts =
let (rule, ts') = parseRule ts
in (rule : parse ts')
parseRule :: [Token] -> (Rule, [Token])
parseRule [] = error "No tokens to parse"
parseRule ts =
if null ts' || token (head ts') /= dot then
(rule, ts')
else
(rule, tail ts')
where (rule, ts') = parseRule' ts
parseRule' :: [Token] -> (Rule, [Token])
parseRule' (t : ts)
| token t == smileyOperator =
let (terms, ts') = parseTermList ts
in (Rule {lhs = Nothing, rhs = terms}, ts')
| otherwise =
let (term, ts') = parseTerm (t : ts)
(terms, ts'') =
if null ts' then
error "Syntax error: missing dot (.)"
else
if token (head ts') == smileyOperator then
parseTermList $ tail ts'
else
([], ts')
in (Rule {lhs = Just term, rhs = terms}, ts'')
parseTerm :: [Token] -> (Term, [Token])
parseTerm [] = error "No tokens to parse"
parseTerm (t : ts) =
case category t of
Variable -> (Term {tokenType = Variable, name = token t, args = []}, ts)
Number -> (Term{tokenType = Number, name = token t, args = []}, ts)
Atom -> parseAtom (t : ts)
Operator -> error "Operator was unexpected here"
where
parseAtom (t : ts)
| null ts || token (head ts) /= openParen =
(Term {tokenType = Atom, name = token t, args = []}, ts)
| otherwise =
let (args, ts') = parseTermList (ts)
in (Term {tokenType = Complex, name = token t, args = args}, ts')
parseTermList :: [Token] -> ([Term], [Token])
parseTermList [] = ([], [])
parseTermList (t : ts)
| token t == openParen = parseTermList ts
| token t == comma = parseTermList ts
| token t `elem` [dot, closeParen] = ([], ts) -- TODO this allows . instead ), fix it
| otherwise =
let (term, ts') = parseTerm (t : ts)
(terms, ts'') = parseTermList ts'
in (term : terms, ts'')
tokens :: String -> [Token]
tokens [] = []
tokens (c : cs)
| isSpace c = tokens cs -- eat all whitespace
| otherwise =
let (token, cs') = nextToken (c : cs)
in token : tokens cs'
nextToken :: String -> (Token, String)
nextToken [] = error "There are no tokens in an empty string"
nextToken (c : cs)
| c `elem` operators = (Token {category = Atom, token = [c]}, cs)
| isUpper c || c == '_' = wrap Variable isValidIdentifierChar
| isLower c = wrap Atom isValidIdentifierChar
| isNumber c = wrap Number isNumber
| otherwise = wrap Operator (\c -> not (isLetter c || isSpace c))
where
wrap category charFilter =
let (acc, cs') = accumulate (c : cs) charFilter
in (Token {category = category, token = acc}, cs')
isValidIdentifierChar c = isAlphaNum c || c == '_'
accumulate :: String -> (Char -> Bool) -> (String, String)
accumulate [] _ = ([], [])
accumulate (c : cs) charFilter
| charFilter c =
let (acc, cs') = accumulate cs charFilter
in (c : acc, cs')
| otherwise = ([], c : cs)
PS: Why is a Monad called Monad?
Edit: My question doesn't fit this forum. See Question Author's Answer for a precise answer if you are interested in the use of Monad
for recursive parsing.
3 Answers 3
Built-ins
The function showTermList
applies show
to each item in the list (map show
) and puts ,
[COMMA] between any two items of the list (intercalate ","
). You can just use this predefined functions to write:
showTermList = intercalate "," . map show
Also in tokens
:
| isSpace c = tokens cs -- eat all whitespace
Eating all white-space is better expressed as filter (not . isSpace)
.
tokens :: String -> [Token]
tokens = tokens' . filter (not . isSpace)
where
tokens' [] = []
tokens' (c:cs) = let (token, cs') = nextToken (c : cs)
in token : tokens cs'
Repetition
You have the repetition of (rule,
in your function:
parseRule :: [Token] -> (Rule, [Token])
parseRule [] = error "No tokens to parse"
parseRule ts =
if null ts' || token (head ts') /= dot then
(rule, ts')
else
(rule, tail ts')
where (rule, ts') = parseRule' ts
You could avoid it by assigning a tail
variable:
parseRule ts = (rule, tail)
where
(rule, ts') = parseRule' ts
tail = if null ts' || token (head ts') /= dot then ts' else tail ts'
Guards are even more visually immediate then if
else
:
parseRule ts = (rule, decideTail ts')
where
(rule, ts') = parseRule' ts
decideTail ts'
| null ts' || token (head ts') /= dot = ts'
| otherwise = tail ts'
Now we note that:
(rule, decideTail ts')
where
(rule, ts') = parseRule' ts
Is built-in under the name of mapSnd
parseRule ts = mapSnd decideTail $ parseRule' ts
where
decideTail ts'
| null ts' || token (head ts') /= dot = ts'
| otherwise = tail ts'
Or even pointfree:
parseRule = mapSnd decideTail . parseRule'
where
decideTail ts'
| null ts' || token (head ts') /= dot = ts'
| otherwise = tail ts'
accumulate
accumulate
is a particularly fortunate case because it can be written as flip span
using Haskell built-ins (just search String -> (Char -> Bool) -> (String, String)
in Hoogle and you will find it (span
is more general hence the a
in the signature, in your case a = Char
))
Question Author's answer
I accepted Caridorc's answer as it shows the canonical CR answer style and my original question doesn't seem to fit this forum very much. His/her suggestions are spot-on when it comes to making the original code better.
Nonetheless, I figured it out and understood Monad and related concepts. I think it's really a clever idea to use them as a conceptual basis for Parser
.
The new code:
import Data.Char
import Control.Monad
import Control.Applicative
{- data structures -}
data Term = Atom String
| Number Int
| Variable String
| Complex String [Term]
deriving Show
data Rule = Rule (Maybe Term) [Term]
deriving Show
{- Parser algebra -}
data Parser a = Parser (String -> [(a, String)])
instance Functor Parser where
fmap = liftM
instance Applicative Parser where
pure v = Parser (\input -> [(v, input)])
(<*>) = ap
instance Monad Parser where
p >>= f = Parser (
\cs -> concat [parse (f val) cs' | (val, cs') <- parse p cs])
return = pure
instance MonadPlus Parser where
mzero = Parser (\input -> [])
mplus p q = Parser (\input -> parse p input ++ parse q input)
instance Alternative Parser where
empty = mzero
(<|>) = mplus
many p = some p `mplus` return []
some p = do
a <- p
as <- many p
return (a:as)
parse :: Parser a -> String -> [(a, String)]
parse (Parser p) input = p input
{- simple parsers: building blocks for other parsers -}
char :: Char -> Parser Char
char c = sat (c ==)
string :: String -> Parser String
string "" = return ""
string (c:cs) = do
char c
string cs
return (c:cs)
item :: Parser Char
item = Parser (\input -> case input of
[] -> []
(c:cs) -> [(c, cs)])
sat :: (Char -> Bool) -> Parser Char
sat pred = do
c <- item
if pred c then return c else mzero
sat2 :: (Char -> Bool) -> (Char -> Bool) -> Parser String
sat2 initChar insideChar = do
c <- sat initChar
cs <- many (sat insideChar)
return (c:cs)
sepBy :: Parser a -> Parser b -> Parser [a]
sepBy p sep = do
a <- p
as <- many (do {sep; p})
return (a:as)
list1 :: Parser a -> Parser [a]
list1 p = p `sepBy` comma
list :: Parser a -> Parser [a]
list p = list1 p <|> return []
{- Prolog character classes and operator parsers -}
dot = char '.'
comma = char ','
isIdentChar :: Char -> Bool
isIdentChar c = isAlphaNum c || isSymbol c
smiley = string ":-"
{- Prolog language construct parsers -}
variable :: Parser Term
variable = fmap Variable $ sat2 isUpper isIdentChar
atom :: Parser Term
atom = fmap Atom $ sat2 (\c -> isLower c || c == '_') isIdentChar
number :: Parser Term
number = fmap (Number . read) (some (sat isNumber))
complex :: Parser Term
complex = fmap (uncurry Complex) $ do
Atom f <- atom
char '('
args <- list1 term
char ')'
return (f, args)
term :: Parser Term
term = complex <|> atom <|> variable <|> number
rule :: Parser Rule
rule = fmap (uncurry Rule) $ do
ruleHead <- (fmap Just term <|> return Nothing)
body <- (do {smiley; list term} <|> return [])
dot
return (ruleHead, body) -- TODO `:-.` ~~~> Rule Nothing [], do I mind?
The line term = complex <|> atom <|> variable <|> number
is where it really shines (cf. definition of term
in the beginning).
The new code is much more sophisticated and easily extensible, but is not very straightforward, at least not in the part when it comes to Monadic behavior. Quite a lot of concepts and ideas are burried in the code. To write it and understand it completely, I needed to read:
- The Learn You a Haskell For Great Good Tutorial: Higher Order Functions, Functors, Applicative Functors and Monoids and A Fistful of Monads
- Graham Hutton's book Programming in Haskell, chapter 8
- Monadic parser in Haskell
The last resource is really good and much of the code is based on it. But it's slightly outdated (contains references to MonadZero
which does not seem to exist and does not account for the fact that Alternative
is superclass of MonadPlus
for some time already).
The project is hosted at GitHub in dcepelik/prolog. If you're interested, check it out in the future for updates (parsing the rest of Prolog syntactic structures, whitespace processing etc.).
Of course, I welcome any suggestions regarding the new code, too.
-
\$\begingroup\$ I suggest asking this version as a new question to get new suggestions. \$\endgroup\$Caridorc– Caridorc2016年03月26日 15:00:14 +00:00Commented Mar 26, 2016 at 15:00
-
2\$\begingroup\$ Maybe I could, but right now I am quite confident with the code the way it is, so I don't want to bother anyone without a point. The phrase is there just in case someone explicitly wanted to comment on some obvious deficiency I haven't noticed. \$\endgroup\$David– David2016年03月26日 15:03:30 +00:00Commented Mar 26, 2016 at 15:03
-
2\$\begingroup\$ You can almost always improve something but (not being a Haskell programmer, so caveats apply) I think this new version is indeed quite good, well done! Also, you noted that you "think it's really a clever idea to use [monads] as a conceptual basis for
Parser
." — Yes. In fact, it’s devastatingly clever. If you haven’t already, check out the concept of parser combinators, and the library that implements them in Haskell, Parsec. \$\endgroup\$Konrad Rudolph– Konrad Rudolph2016年03月26日 17:28:51 +00:00Commented Mar 26, 2016 at 17:28 -
\$\begingroup\$ @KonradRudolph Thanks for the link, it will certainly be useful as I proceed. And thanks for the praise. I was trying to avoid using Parsec as I wanted to see how the stuff really works, but you're right that after this it may be valuable source (literally). I was kind of surprised how nicely it all works in practice. What really got me was the notion that lists represent results of non-deterministic operations and that the parsers actually operate on these. I guess one can write much faster code, but hardly as elegant. It's basically a description of grammar, nothing else. \$\endgroup\$David– David2016年03月26日 20:03:36 +00:00Commented Mar 26, 2016 at 20:03
Have some refactoring of your latest version, mostly to move the code closer to my tastes, YMMV. Take what you will.
import Data.Char
import Control.Monad
import Control.Applicative
import Control.Monad.Trans.State
{- data structures -}
data Term = Atom String
| Number Int
| Variable String
| Complex String [Term]
deriving Show
data Rule = Rule (Maybe Term) [Term]
deriving Show
{- Parser algebra -}
type Parser = StateT String []
parse :: Parser a -> String -> [(a, String)]
parse = runStateT
{- simple parsers: building blocks for other parsers -}
char :: Char -> Parser Char
char = sat . (==)
string :: String -> Parser String
string = traverse char
item :: Parser Char
item = mapStateT maybeToList $ StateT uncons
sat :: (Char -> Bool) -> Parser Char
sat pred = mfilter pred item
sat2 :: (Char -> Bool) -> (Char -> Bool) -> Parser String
sat2 initChar insideChar = (:) <$> sat initChar <*> many (sat insideChar)
sepBy :: Parser a -> Parser b -> Parser [a]
sepBy p sep = (:) <$> p <*> many (sep *> p)
list1 :: Parser a -> Parser [a]
list1 p = p `sepBy` comma
list :: Parser a -> Parser [a]
list p = list1 p <|> return []
{- Prolog character classes and operator parsers -}
dot = char '.'
comma = char ','
isIdentChar :: Char -> Bool
isIdentChar = liftA2 (||) isAlphaNum isSymbol
smiley = string ":-"
{- Prolog language construct parsers -}
variable :: Parser Term
variable = Variable <$> sat2 isUpper isIdentChar
atom :: Parser Term
atom = Atom <$> sat2 (liftA2 (||) isLower (== '_')) isIdentChar
number :: Parser Term
number = Number . read <$> some (sat isNumber)
complex :: Parser Term
complex = do
Atom f <- atom
char '('
args <- list1 term
char ')'
return $ Complex f args
term :: Parser Term
term = complex <|> atom <|> variable <|> number
rule :: Parser Rule
rule = do
ruleHead <- fmap Just term <|> return Nothing
body <- (smiley *> list term) <|> return []
dot
return $ Rule ruleHead body -- TODO `:-.` ~~~> Rule Nothing [], do I mind?
-
\$\begingroup\$ Well, I'll really have to take a closer look at this :). \$\endgroup\$David– David2016年03月28日 11:34:54 +00:00Commented Mar 28, 2016 at 11:34
-
\$\begingroup\$ Gurkenglas, would you mind elaborating on how did you manage to get rid of basically all the typeclasses I use, replacing them with StateT? \$\endgroup\$David– David2016年03月29日 15:31:23 +00:00Commented Mar 29, 2016 at 15:31
-
1\$\begingroup\$ Well, you were doing precisely what StateT is doing, so I got rid of basically all the typeclasses you used, replacing them with StateT, because they already implemented it and with more features to boot. How? I "saw a pattern" and "applied a cached thought" etc. I don't know what you're asking me to explain here :( By the way, you might want to take a look at hackage.haskell.org/package/base-4.8.2.0/docs/… , it's in base and looks close to what you're doing. \$\endgroup\$Gurkenglas– Gurkenglas2016年03月30日 20:22:18 +00:00Commented Mar 30, 2016 at 20:22
mgu ::
? C&P error? \$\endgroup\$