Prolog parser written in Haskell

Question 1

I am writing a very simple Prolog intepreter in Haskell. It's a class assignment and I really want to do it right.

I was able to (quite quickly) write a parser for the language. Today I borrowed a copy of Programming in Haskell by Graham Hutton and I decided I would rewrite my code to a more function-oriented style, using fancy Haskell features.

I spent my whole day reading about Monads and how they can simplify the code. Then I tried coding the book's examples, but they seem a bit outdated and incomplete.

Now, I have a working Prolog parser I am not happy with (because I am sure I can do shorter and terser).

I would like if anyone could review my code and give advice as how to move towards a more "Haskellish" approach. I don't want to use Parsec or any other ready-made parser in my code. I am looking for a minimalist working solution.

import Data.Char
data Category = Atom | Variable | Number | Operator | Complex deriving (Show, Eq)
data Token = Token {
 category :: Category,
 token :: String
}
instance Show Token where
 show (Token {token = token}) = show token
data Term = Term {
 tokenType :: Category,
 name :: String,
 args :: [Term]
}
instance Show Term where
 show (Term {tokenType = tokenType, name = name, args = args}) =
 name ++ if tokenType == Complex then
 "(" ++ showTermList args ++ ")"
 else []
showTermList [] = []
showTermList (t : []) = show t
showTermList (t : ts) = show t ++ ", " ++ showTermList ts
data Rule = Rule {
 lhs :: Maybe Term,
 rhs :: [Term]
}
instance Show Rule where
 show (Rule { lhs = lhs, rhs = rhs }) =
 show lhs ++ " :- " ++ showTermList rhs
operators = "()[];"
smileyOperator = ":-"
openParen = "("
closeParen = ")"
comma = ","
dot = "."
parse :: [Token] -> [Rule]
parse [] = []
parse ts =
 let (rule, ts') = parseRule ts
 in (rule : parse ts')
parseRule :: [Token] -> (Rule, [Token])
parseRule [] = error "No tokens to parse"
parseRule ts =
 if null ts' || token (head ts') /= dot then
 (rule, ts')
 else
 (rule, tail ts')
 where (rule, ts') = parseRule' ts
parseRule' :: [Token] -> (Rule, [Token])
parseRule' (t : ts)
 | token t == smileyOperator =
 let (terms, ts') = parseTermList ts
 in (Rule {lhs = Nothing, rhs = terms}, ts')
 | otherwise =
 let (term, ts') = parseTerm (t : ts)
 (terms, ts'') =
 if null ts' then
 error "Syntax error: missing dot (.)"
 else
 if token (head ts') == smileyOperator then
 parseTermList $ tail ts'
 else
 ([], ts')
 in (Rule {lhs = Just term, rhs = terms}, ts'')
parseTerm :: [Token] -> (Term, [Token])
parseTerm [] = error "No tokens to parse"
parseTerm (t : ts) =
 case category t of
 Variable -> (Term {tokenType = Variable, name = token t, args = []}, ts)
 Number -> (Term{tokenType = Number, name = token t, args = []}, ts)
 Atom -> parseAtom (t : ts)
 Operator -> error "Operator was unexpected here"
 where
 parseAtom (t : ts)
 | null ts || token (head ts) /= openParen =
 (Term {tokenType = Atom, name = token t, args = []}, ts)
 | otherwise = 
 let (args, ts') = parseTermList (ts)
 in (Term {tokenType = Complex, name = token t, args = args}, ts')
parseTermList :: [Token] -> ([Term], [Token])
parseTermList [] = ([], [])
parseTermList (t : ts)
 | token t == openParen = parseTermList ts
 | token t == comma = parseTermList ts
 | token t `elem` [dot, closeParen] = ([], ts) -- TODO this allows . instead ), fix it
 | otherwise =
 let (term, ts') = parseTerm (t : ts)
 (terms, ts'') = parseTermList ts'
 in (term : terms, ts'')
tokens :: String -> [Token]
tokens [] = []
tokens (c : cs)
 | isSpace c = tokens cs -- eat all whitespace
 | otherwise =
 let (token, cs') = nextToken (c : cs)
 in token : tokens cs'
nextToken :: String -> (Token, String)
nextToken [] = error "There are no tokens in an empty string"
nextToken (c : cs)
 | c `elem` operators = (Token {category = Atom, token = [c]}, cs)
 | isUpper c || c == '_' = wrap Variable isValidIdentifierChar
 | isLower c = wrap Atom isValidIdentifierChar
 | isNumber c = wrap Number isNumber
 | otherwise = wrap Operator (\c -> not (isLetter c || isSpace c))
 where
 wrap category charFilter =
 let (acc, cs') = accumulate (c : cs) charFilter
 in (Token {category = category, token = acc}, cs')
 isValidIdentifierChar c = isAlphaNum c || c == '_'
accumulate :: String -> (Char -> Bool) -> (String, String)
accumulate [] _ = ([], [])
accumulate (c : cs) charFilter
 | charFilter c =
 let (acc, cs') = accumulate cs charFilter
 in (c : acc, cs')
 | otherwise = ([], c : cs)

PS: Why is a Monad called Monad?

Edit: My question doesn't fit this forum. See Question Author's Answer for a precise answer if you are interested in the use of Monad for recursive parsing.

Question 2

The etymology of Monad is covered in a fantastic answer here: english.stackexchange.com/questions/30654/…

Question 3

Thanks for the editor's attention. Just out of curiosity, what was wrong about my edit?

Question 4

It's better to modify the question text than to separate out your requests. Note that your edit was just moved, not removed. It was the chatty "thanks" text that was removed. Also the extraneous text right before the code block.

Question 5

What's up with the stray mgu :: ? C&P error?

Question 6

@Zeta Whoops, sorry about that :). That's just an orphan and shouldn't really be there. And it's not a C&P error, it's what happens when you start typing a new func and give up in the middle :).

Question 7

Built-ins

The function showTermList applies show to each item in the list (map show) and puts , [COMMA] between any two items of the list (intercalate ","). You can just use this predefined functions to write:

showTermList = intercalate "," . map show

Also in tokens:

| isSpace c = tokens cs -- eat all whitespace

Eating all white-space is better expressed as filter (not . isSpace).

tokens :: String -> [Token]
tokens = tokens' . filter (not . isSpace)
 where
 tokens' [] = []
 tokens' (c:cs) = let (token, cs') = nextToken (c : cs)
 in token : tokens cs'

Repetition

You have the repetition of (rule, in your function:

parseRule :: [Token] -> (Rule, [Token])
parseRule [] = error "No tokens to parse"
parseRule ts =
 if null ts' || token (head ts') /= dot then
 (rule, ts')
 else
 (rule, tail ts')
 where (rule, ts') = parseRule' ts

You could avoid it by assigning a tail variable:

parseRule ts = (rule, tail)
 where
 (rule, ts') = parseRule' ts
 tail = if null ts' || token (head ts') /= dot then ts' else tail ts'

Guards are even more visually immediate then if else:

parseRule ts = (rule, decideTail ts')
 where
 (rule, ts') = parseRule' ts
 decideTail ts'
 | null ts' || token (head ts') /= dot = ts'
 | otherwise = tail ts'

Now we note that:

(rule, decideTail ts')
 where
 (rule, ts') = parseRule' ts

Is built-in under the name of mapSnd

parseRule ts = mapSnd decideTail $ parseRule' ts
 where
 decideTail ts'
 | null ts' || token (head ts') /= dot = ts'
 | otherwise = tail ts'

Or even pointfree:

parseRule = mapSnd decideTail . parseRule'
 where
 decideTail ts'
 | null ts' || token (head ts') /= dot = ts'
 | otherwise = tail ts'

`accumulate`

accumulate is a particularly fortunate case because it can be written as flip span using Haskell built-ins (just search String -> (Char -> Bool) -> (String, String) in Hoogle and you will find it (span is more general hence the a in the signature, in your case a = Char))

Question 8

Question Author's answer

I accepted Caridorc's answer as it shows the canonical CR answer style and my original question doesn't seem to fit this forum very much. His/her suggestions are spot-on when it comes to making the original code better.

Nonetheless, I figured it out and understood Monad and related concepts. I think it's really a clever idea to use them as a conceptual basis for Parser.

The new code:

import Data.Char
import Control.Monad
import Control.Applicative
{- data structures -}
data Term = Atom String
 | Number Int
 | Variable String
 | Complex String [Term]
 deriving Show
data Rule = Rule (Maybe Term) [Term]
 deriving Show
{- Parser algebra -}
data Parser a = Parser (String -> [(a, String)])
instance Functor Parser where
 fmap = liftM
instance Applicative Parser where
 pure v = Parser (\input -> [(v, input)])
 (<*>) = ap
instance Monad Parser where
 p >>= f = Parser (
 \cs -> concat [parse (f val) cs' | (val, cs') <- parse p cs])
 return = pure
instance MonadPlus Parser where
 mzero = Parser (\input -> [])
 mplus p q = Parser (\input -> parse p input ++ parse q input) 
instance Alternative Parser where
 empty = mzero
 (<|>) = mplus
 many p = some p `mplus` return []
 some p = do
 a <- p
 as <- many p
 return (a:as)
parse :: Parser a -> String -> [(a, String)]
parse (Parser p) input = p input
{- simple parsers: building blocks for other parsers -}
char :: Char -> Parser Char
char c = sat (c ==)
string :: String -> Parser String
string "" = return ""
string (c:cs) = do
 char c
 string cs
 return (c:cs)
item :: Parser Char
item = Parser (\input -> case input of
 [] -> []
 (c:cs) -> [(c, cs)])
sat :: (Char -> Bool) -> Parser Char
sat pred = do
 c <- item
 if pred c then return c else mzero
sat2 :: (Char -> Bool) -> (Char -> Bool) -> Parser String
sat2 initChar insideChar = do
 c <- sat initChar
 cs <- many (sat insideChar)
 return (c:cs)
sepBy :: Parser a -> Parser b -> Parser [a]
sepBy p sep = do
 a <- p
 as <- many (do {sep; p})
 return (a:as)
list1 :: Parser a -> Parser [a]
list1 p = p `sepBy` comma
list :: Parser a -> Parser [a]
list p = list1 p <|> return []
{- Prolog character classes and operator parsers -}
dot = char '.'
comma = char ','
isIdentChar :: Char -> Bool
isIdentChar c = isAlphaNum c || isSymbol c
smiley = string ":-"
{- Prolog language construct parsers -}
variable :: Parser Term
variable = fmap Variable $ sat2 isUpper isIdentChar
atom :: Parser Term
atom = fmap Atom $ sat2 (\c -> isLower c || c == '_') isIdentChar
number :: Parser Term
number = fmap (Number . read) (some (sat isNumber))
complex :: Parser Term
complex = fmap (uncurry Complex) $ do
 Atom f <- atom
 char '('
 args <- list1 term
 char ')'
 return (f, args)
term :: Parser Term
term = complex <|> atom <|> variable <|> number
rule :: Parser Rule
rule = fmap (uncurry Rule) $ do
 ruleHead <- (fmap Just term <|> return Nothing)
 body <- (do {smiley; list term} <|> return [])
 dot
 return (ruleHead, body) -- TODO `:-.` ~~~> Rule Nothing [], do I mind?

The line term = complex <|> atom <|> variable <|> number is where it really shines (cf. definition of term in the beginning).

The new code is much more sophisticated and easily extensible, but is not very straightforward, at least not in the part when it comes to Monadic behavior. Quite a lot of concepts and ideas are burried in the code. To write it and understand it completely, I needed to read:

The Learn You a Haskell For Great Good Tutorial: Higher Order Functions, Functors, Applicative Functors and Monoids and A Fistful of Monads
Graham Hutton's book Programming in Haskell, chapter 8
Monadic parser in Haskell

The last resource is really good and much of the code is based on it. But it's slightly outdated (contains references to MonadZero which does not seem to exist and does not account for the fact that Alternative is superclass of MonadPlus for some time already).

The project is hosted at GitHub in dcepelik/prolog. If you're interested, check it out in the future for updates (parsing the rest of Prolog syntactic structures, whitespace processing etc.).

Of course, I welcome any suggestions regarding the new code, too.

Question 9

I suggest asking this version as a new question to get new suggestions.

Question 10

Maybe I could, but right now I am quite confident with the code the way it is, so I don't want to bother anyone without a point. The phrase is there just in case someone explicitly wanted to comment on some obvious deficiency I haven't noticed.

Question 11

You can almost always improve something but (not being a Haskell programmer, so caveats apply) I think this new version is indeed quite good, well done! Also, you noted that you "think it's really a clever idea to use [monads] as a conceptual basis for Parser." — Yes. In fact, it’s devastatingly clever. If you haven’t already, check out the concept of parser combinators, and the library that implements them in Haskell, Parsec.

Question 12

@KonradRudolph Thanks for the link, it will certainly be useful as I proceed. And thanks for the praise. I was trying to avoid using Parsec as I wanted to see how the stuff really works, but you're right that after this it may be valuable source (literally). I was kind of surprised how nicely it all works in practice. What really got me was the notion that lists represent results of non-deterministic operations and that the parsers actually operate on these. I guess one can write much faster code, but hardly as elegant. It's basically a description of grammar, nothing else.

Question 13

Have some refactoring of your latest version, mostly to move the code closer to my tastes, YMMV. Take what you will.

import Data.Char
import Control.Monad
import Control.Applicative
import Control.Monad.Trans.State
{- data structures -}
data Term = Atom String
 | Number Int
 | Variable String
 | Complex String [Term]
 deriving Show
data Rule = Rule (Maybe Term) [Term]
 deriving Show
{- Parser algebra -}
type Parser = StateT String []
parse :: Parser a -> String -> [(a, String)]
parse = runStateT
{- simple parsers: building blocks for other parsers -}
char :: Char -> Parser Char
char = sat . (==)
string :: String -> Parser String
string = traverse char
item :: Parser Char
item = mapStateT maybeToList $ StateT uncons
sat :: (Char -> Bool) -> Parser Char
sat pred = mfilter pred item
sat2 :: (Char -> Bool) -> (Char -> Bool) -> Parser String
sat2 initChar insideChar = (:) <$> sat initChar <*> many (sat insideChar)
sepBy :: Parser a -> Parser b -> Parser [a]
sepBy p sep = (:) <$> p <*> many (sep *> p)
list1 :: Parser a -> Parser [a]
list1 p = p `sepBy` comma
list :: Parser a -> Parser [a]
list p = list1 p <|> return []
{- Prolog character classes and operator parsers -}
dot = char '.'
comma = char ','
isIdentChar :: Char -> Bool
isIdentChar = liftA2 (||) isAlphaNum isSymbol
smiley = string ":-"
{- Prolog language construct parsers -}
variable :: Parser Term
variable = Variable <$> sat2 isUpper isIdentChar
atom :: Parser Term
atom = Atom <$> sat2 (liftA2 (||) isLower (== '_')) isIdentChar
number :: Parser Term
number = Number . read <$> some (sat isNumber)
complex :: Parser Term
complex = do
 Atom f <- atom
 char '('
 args <- list1 term
 char ')'
 return $ Complex f args
term :: Parser Term
term = complex <|> atom <|> variable <|> number
rule :: Parser Rule
rule = do
 ruleHead <- fmap Just term <|> return Nothing
 body <- (smiley *> list term) <|> return []
 dot
 return $ Rule ruleHead body -- TODO `:-.` ~~~> Rule Nothing [], do I mind?

Question 14

Well, I'll really have to take a closer look at this :).

Question 15

Gurkenglas, would you mind elaborating on how did you manage to get rid of basically all the typeclasses I use, replacing them with StateT?

Question 16

Well, you were doing precisely what StateT is doing, so I got rid of basically all the typeclasses you used, replacing them with StateT, because they already implemented it and with more features to boot. How? I "saw a pattern" and "applied a cached thought" etc. I don't know what you're asking me to explain here :( By the way, you might want to take a look at hackage.haskell.org/package/base-4.8.2.0/docs/… , it's in base and looks close to what you're doing.

Caridorc CaridorcCaridorc 28k7 gold badges54 silver badges137 bronze badges · Accepted Answer · 2016-03-24 12:55:52Z

Built-ins

The function showTermList applies show to each item in the list (map show) and puts , [COMMA] between any two items of the list (intercalate ","). You can just use this predefined functions to write:

showTermList = intercalate "," . map show

Also in tokens:

| isSpace c = tokens cs -- eat all whitespace

Eating all white-space is better expressed as filter (not . isSpace).

tokens :: String -> [Token]
tokens = tokens' . filter (not . isSpace)
 where
 tokens' [] = []
 tokens' (c:cs) = let (token, cs') = nextToken (c : cs)
 in token : tokens cs'

Repetition

You have the repetition of (rule, in your function:

parseRule :: [Token] -> (Rule, [Token])
parseRule [] = error "No tokens to parse"
parseRule ts =
 if null ts' || token (head ts') /= dot then
 (rule, ts')
 else
 (rule, tail ts')
 where (rule, ts') = parseRule' ts

You could avoid it by assigning a tail variable:

parseRule ts = (rule, tail)
 where
 (rule, ts') = parseRule' ts
 tail = if null ts' || token (head ts') /= dot then ts' else tail ts'

Guards are even more visually immediate then if else:

parseRule ts = (rule, decideTail ts')
 where
 (rule, ts') = parseRule' ts
 decideTail ts'
 | null ts' || token (head ts') /= dot = ts'
 | otherwise = tail ts'

Now we note that:

(rule, decideTail ts')
 where
 (rule, ts') = parseRule' ts

Is built-in under the name of mapSnd

parseRule ts = mapSnd decideTail $ parseRule' ts
 where
 decideTail ts'
 | null ts' || token (head ts') /= dot = ts'
 | otherwise = tail ts'

Or even pointfree:

parseRule = mapSnd decideTail . parseRule'
 where
 decideTail ts'
 | null ts' || token (head ts') /= dot = ts'
 | otherwise = tail ts'

`accumulate`

accumulate is a particularly fortunate case because it can be written as flip span using Haskell built-ins (just search String -> (Char -> Bool) -> (String, String) in Hoogle and you will find it (span is more general hence the a in the signature, in your case a = Char))

Stack Exchange Network

Prolog parser written in Haskell

3 Answers 3

Built-ins

Repetition

`accumulate`

Question Author's answer

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Prolog parser written in Haskell

3 Answers 3

Built-ins

Repetition

accumulate

Question Author's answer

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

`accumulate`