2
\$\begingroup\$

I found myself wanting to brush up on some notions about parsers and grammars and, at the same time, to exercise my Haskell - I am a Haskell newbie; moreover, I haven't touched the language at all in a couple months.

You can find the complete result here, and, separated by hrs and without the module boilerplate, pasted below.

I'm not exactly happy with it.

It seems somewhat dirty and convoluted for something that is stated so simply in a few lines of English in the reference textbook - (meaning, the (Purple) Dragon Book).

I am also noticing a complete lack of typically functional constructs - function composition, curried functions, etc - which might be a code smell.

On the other hand, after spending a couple nights on it, I can't seem to find any more opportunity for improvement: I'm stuck.

Any suggestions both in the small and in the large scale are appreciated.

---------------------------------
-- Datatypes and utilities
---------------------------------
data Terminal = T String | Dollar | Epsilon deriving (Show, Eq, Ord)
data Nonterminal = NT String | Start String deriving (Show, Eq, Ord)
type Symbol = (Either Nonterminal Terminal)
data Production = Prod {left_hand :: Nonterminal, right_hand :: [Symbol]} deriving (Show, Eq)
data Grammar = Grammar [Production] deriving (Show)
allTerminalsInGrammar :: Grammar -> Set Terminal
allTerminalsInGrammar (Grammar []) = Data.Set.empty
allTerminalsInGrammar (Grammar (prod:prods)) =
 (
 Data.Set.union
 (allTerminalsInGrammar (Grammar prods))
 (Data.Set.fromList $ onlyTerminals (right_hand prod))
 )
 where onlyTerminals a = rights a
allNonterminalsInGrammar :: Grammar -> Set Nonterminal
allNonterminalsInGrammar (Grammar []) = Data.Set.empty
allNonterminalsInGrammar (Grammar (prod:prods)) =
 (
 Data.Set.union
 (allNonterminalsInGrammar (Grammar prods))
 (Data.Set.singleton (left_hand prod))
 )
type FirstSet = (Set Terminal)
type FollowSet = (Set Terminal)
data FirstMap = FirstMap (Data.Map.Map Nonterminal (Set Terminal)) deriving (Show, Eq)
-- A FirstMap maps a symbol to its FirstSet
data FollowMap = FollowMap (Data.Map.Map Nonterminal (Set Terminal)) deriving (Show, Eq)
-- A FollowMap maps a symbol to its FollowSet
getFirstSetFor :: Nonterminal -> FirstMap -> FirstSet
getFirstSetFor nonterminal (FirstMap firstmap) =
 f $ Data.Map.lookup nonterminal firstmap
 where
 f Nothing = Data.Set.empty
 f (Just set) = set
getFollowSetFor :: Nonterminal -> FollowMap -> FollowSet
getFollowSetFor nonterminal (FollowMap firstmap) =
 f $ Data.Map.lookup nonterminal firstmap
 where
 f Nothing = Data.Set.empty
 f (Just set) = set
mergeSetIntoFirstMap :: FirstMap -> Nonterminal -> Set Terminal -> FirstMap
mergeSetIntoFirstMap (FirstMap map) nt t = FirstMap (insertWith (Data.Set.union) nt t map)
mergeSetIntoFollowMap :: FollowMap -> Nonterminal -> Set Terminal -> FollowMap
mergeSetIntoFollowMap (FollowMap map) nt t = FollowMap (insertWith (Data.Set.union) nt t map)
---------------------------------------------------------

fixpoint :: Eq a => (a -> a) -> a -> a
fixpoint f x = let x' = f x in if x == x' then x else fixpoint f x'

----------------------------------------------------
-- FIRST0 gives ∅ to each nonterminal;
--
-- Then repeat:
--
-- If X is a terminal then FIRST(X) = {X}
-- If X is a nonterminal
-- If X -> ε is a production then add ε to FIRST(X)
-- If X->Y1 Y2 ... Yk for some k >= 1
-- If ε is in FIRST−1(Yj) for ALL j= 1...k add ε
-- If ε is in FIRST−1(Yi) for ALL i= 1...l < k
-- then add FIRST(Yl+1)
--
-- Stop when nothing can be added (i.e.: find a fixpoint)
first :: Grammar -> FirstMap
first grammar = (fixpoint (firsti grammar) (first0 grammar))
 where
 --------------------------------------------------------
 first0 :: Grammar -> FirstMap
 first0 grammar = FirstMap $
 (Data.Map.fromList $ (fmap defaults) $ Data.Set.toList (allNonterminalsInGrammar grammar))
 where
 defaults nt = (nt, (Data.Set.empty))
 --------------------------------------------------------
 firsti :: Grammar -> FirstMap -> FirstMap
 firsti (Grammar []) fMinus1 = fMinus1
 firsti (Grammar (prod:prods)) fMinus1 =
 (
 mergeSetIntoFirstMap -- Add to the FIRST_(i-1) set,
 (firsti (Grammar prods) fMinus1) -- (computed from the remaining productions)
 (left_hand prod) -- for the symbol on the LHS of the current production prod,
 (terminalsForRHS prod) -- the terminals obtained by applying the rules on the current production prod
 )
 where 
 --
 terminalsForRHS :: Production -> Set Terminal
 terminalsForRHS (Prod _ []) = (Data.Set.singleton Epsilon)
 -- If X -> ε is a production then add ε to FIRST(X) 
 terminalsForRHS (Prod _ [(Right Epsilon)]) = (Data.Set.singleton Epsilon)
 terminalsForRHS (Prod x (yj:ys)) =
 case yj of
 -- If X is a terminal then FIRST(X) = {X} 
 Right terminal -> Data.Set.singleton terminal
 -- Otherwise...
 Left nonterminal -> if (Data.Set.member Epsilon first_iminus1)
 then (Data.Set.union
 (terminalsForYj)
 (terminalsForRHS (Prod x ys)))
 else (terminalsForYj) 
 where
 first_iminus1 = getFirstSetFor nonterminal fMinus1
 terminalsForYj = (Data.Set.delete Epsilon first_iminus1)
 -- If ε is in FIRST−1(Yi) for ALL i= 1...l < k
 -- then add FIRST(Yl+1)
 -- --> guaranteed because we add FIRST(Yl+1) and we
 -- continue adding terminalsRHS for the
 -- remainder of the production only if ε is in FIRST−1
 -- If ε is in FIRST−1(Yj) for ALL j= 1...k add ε
 -- --> guaranteed because we continue adding terminalsRHS for the
 -- remainder of the production only if ε is in FIRST−1
 -- eventually adding terminalsForRHS (Prod _ []) = (Data.Set.singleton Epsilon) 

-------------------------------------------------------------------------
-- We can compute FIRST for any string X1X2...Xn as follows.
-- Add to FIRST(X1X2... Xn) all non-ε symbols of FIRST(X1).
-- Also add the non-ε symbols of FIRST(X2), if ε is in FIRST(X1);
-- and so on.
-- Finally, add ε to FIRST(X1X2 . . Xn) if, for all i, ε is in FIRST(Xi)
-------------------------------------------------------------------------
firstForWord :: [Symbol] -> FirstMap -> FirstSet
firstForWord [(Left nt)] first = getFirstSetFor nt first -- If Epsilon in first don't remove
firstForWord [(Right t)] first = (Data.Set.singleton t)
firstForWord ((Left nt):ss) first =
 if (Data.Set.member
 Epsilon
 (getFirstSetFor nt first)
 )
 then (Data.Set.union
 firstMinusEpsilon
 (firstForWord ss first)
 )
 else firstMinusEpsilon
 where firstMinusEpsilon = (Data.Set.delete
 Epsilon
 (getFirstSetFor nt first)
 )

----------------------------------------------------
-- FOLLOW0 gives $ to FOLLOW(S) and {} to everything else;
--
-- Then repeat:
--
-- If there is a production A -> w1Bw2 ,
-- then everything in FIRST(w2) except ε is in FOLLOW(B)
-- If where FIRST(w2) contains Epsilon,
-- then everything in FOLLOW (A) is also in FOLLOW (B).
-- If there is a production A -> w1B,
-- then everything in FOLLOW (A) is in FOLLOW (B).
--
-- Stop when nothing can be added (i.e. find a fixpoint)
-----------------------------------------------------
follow :: Grammar -> FollowMap
follow grammar = (fixpoint (followi grammar (first grammar)) (follow0 grammar))
 where
 --------------------------------------------------------
 follow0 :: Grammar -> FollowMap
 follow0 grammar = FollowMap $
 (Data.Map.fromList $ (fmap defaults) $ Data.Set.toList (allNonterminalsInGrammar grammar))
 where
 defaults nt@(NT _) = (nt, (Data.Set.empty))
 defaults nt@(Start _) = (nt, (Data.Set.singleton Dollar))
 --------------------------------------------------------
 followi :: Grammar -> FirstMap -> FollowMap -> FollowMap
 followi (Grammar []) _ fMinus1 = fMinus1
 followi (Grammar (prod:prods)) firstMapForG fMinus1 =
 (mergeTerminalsFromProd -- Add the terminals obtained by applying the rules on the current production
 prod
 (followi -- 
 (Grammar prods) --
 firstMapForG -- to the FOLLOW sets obtained from the remaining productions
 fMinus1 --
 )
 )
 where
 mergeTerminalsFromProd :: Production -> FollowMap -> FollowMap
 mergeTerminalsFromProd (Prod l (s:ss)) = (mergeTerminalsFromProd' l [] s ss)
 mergeTerminalsFromProd' :: Nonterminal -> [Symbol] -> Symbol -> [Symbol] -> FollowMap -> FollowMap
 -- If there is a production A -> w1Bw2 ,
 mergeTerminalsFromProd' a w1 nt@(Left b) w2@(w21:w2s) fMinus1 = (mergeSetIntoFollowMap
 (mergeTerminalsFromProd' a (w1 ++ [nt]) w21 w2s fMinus1)
 b
 newTerminals
 )
 where
 -- everything in FIRST(w2) except ε is in FOLLOW(B)
 firstW2MinusEpsilon = (Data.Set.delete Epsilon (firstForWord w2 firstMapForG))
 newTerminals = if (Data.Set.member Epsilon (firstForWord w2 firstMapForG))
 -- If where FIRST(w2) contains Epsilon,
 -- then everything in FOLLOW (A) is also in FOLLOW (B).
 then (Data.Set.union (getFollowSetFor a fMinus1) firstW2MinusEpsilon)
 else (firstForWord w2 firstMapForG)
 -- If there is a production A -> w1B, then everything in FOLLOW (A) is in FOLLOW (B).
 mergeTerminalsFromProd' a w1 (Left b) [] fMinus1 = (mergeSetIntoFollowMap
 fMinus1
 b
 (getFollowSetFor a fMinus1)
 )
 -- We don't have to insert anything for terminals; i.e. there are no rules
 -- for A -> w1Tw2 where T is terminal; if we are at the end of the production we're done
 mergeTerminalsFromProd' a _ (Right _) [] fMinus1 = fMinus1
 -- Otherwise just recurse
 mergeTerminalsFromProd' a w1 (Right t) (w2:w2s) fMinus1 = (mergeTerminalsFromProd' a (w1 ++ [(Right t)]) w2 w2s fMinus1) 

main = let
 -- Example from dragon book
 dragon_gram = Grammar [p1, p2, p3, p4, p5, p6, p7, p8] where
 p1 = Prod {left_hand = (Start "E"), right_hand = [Left (NT "T"), Left (NT "E'")]}
 p2 = Prod {left_hand = (NT "E'"), right_hand = [Right (T "+"), Left (NT "T"), Left (NT "E'")]}
 p3 = Prod {left_hand = (NT "E'"), right_hand = [Right Epsilon]}
 p4 = Prod {left_hand = (NT "T"), right_hand = [Left (NT "F"), Left (NT "T'")]}
 p5 = Prod {left_hand = (NT "T'"), right_hand = [Right (T "*"), Left (NT "F"), Left (NT "T'")]}
 p6 = Prod {left_hand = (NT "T'"), right_hand = [Right Epsilon]}
 p7 = Prod {left_hand = (NT "F"), right_hand = [Right (T "("), Left (Start "E"), Right (T ")")]}
 p8 = Prod {left_hand = (NT "F"), right_hand = [Right (T "id")]}
 in do
 putStrLn "EXAMPLE GRAMMAR: "
 print (dragon_gram)
 putStrLn "FIRST(G) = "
 print (first dragon_gram)
 putStrLn "FOLLOW(G) = "
 print (follow dragon_gram)
asked May 1, 2016 at 12:35
\$\endgroup\$

1 Answer 1

2
\$\begingroup\$

I refactored for a while, largely replacing recursion with combinators. I think all these type aliases are unnecessary, I probably would have removed them if I had finished this. You didnt actually use the LHS in terminalsForRHS, except to pass it into recursive calls, so I took it out.

module First where
import Data.Map
import Data.Maybe
import Data.Either
import Data.Set
import Control.Arrow
---------------------------------
-- Datatypes and utilities
---------------------------------
data Terminal = T String | Dollar | Epsilon deriving (Show, Eq, Ord)
data Nonterminal = NT String | Start String deriving (Show, Eq, Ord)
type Symbol = Either Nonterminal Terminal
type Production = (Nonterminal, Symbol)
type Grammar = [Production]
allTerminalsInGrammar :: Grammar -> Set Terminal
allTerminalsInGrammar = foldMap $ Data.Set.fromList . rights . snd
allNonterminalsInGrammar :: Grammar -> Set Nonterminal
allNonterminalsInGrammar = Data.Set.fromList . map fst
type FFSet = Set Terminal
type FFMap = Data.Map.Map Nonterminal FFSet
-- A First/FollowMap maps a symbol to its First/FollowSet
getSetFor :: Nonterminal -> FFMap -> FFSet
getSetFor nt = fold . Data.Map.lookup nt
mergeSetIntoFFMap :: Nonterminal -> Set Terminal -> FFMap -> FFMap
mergeSetIntoFFMap = insertWith Data.Set.union
---------------------------------------------------------
-----------------------
-- Main program
-----------------------
fixpoint :: Eq a => (a -> a) -> a -> a
fixpoint f x = let x' = f x in if x == x' then x else fixpoint f x'
----------------------------------------------------
-- FIRST0 gives ∅ to each nonterminal;
--
-- Then repeat:
--
-- If X is a terminal then FIRST(X) = {X}
-- If X is a nonterminal
-- If X -> ε is a production then add ε to FIRST(X)
-- If X->Y1 Y2 ... Yk for some k >= 1
-- If ε is in FIRST−1(Yj) for ALL j= 1...k add ε
-- If ε is in FIRST−1(Yi) for ALL i= 1...l < k
-- then add FIRST(Yl+1)
--
-- Stop when nothing can be added (i.e.: find a fixpoint)
---------------------------------------------------- 
first :: Grammar -> FirstMap
first grammar = fixpoint (`firsti` grammar) Data.Map.empty where
 firsti :: FirstMap -> Grammar -> FirstMap
 firsti fMinus1 = fromListWith Data.Set.union . (map . right)
 -- The terminals obtained by applying the rules on the current production prod
 (foldr terminalsForRHS $ Data.Set.singleton Epsilon) where
 terminalsForRHS :: Symbol -> Set Terminal -> Set Terminal
 -- If X is a terminal then FIRST(X) = {X}
 terminalsForRHS (Right terminal) _ = Data.Set.singleton terminal
 terminalsForRHS (Left nonterminal) foo = Data.Set.union
 ( Data.Set.delete Epsilon first_iminus1)
 if Data.Set.member Epsilon first_iminus1
 then foo
 else Data.Set.empty 
 where
 first_iminus1 = getSetFor nonterminal fMinus1
 -- If ε is in FIRST−1(Yi) for ALL i= 1...l < k
 -- then add FIRST(Yl+1)
 -- --> guaranteed because we add FIRST(Yl+1) and we
 -- continue adding terminalsRHS for the
 -- remainder of the production only if ε is in FIRST−1
 -- If ε is in FIRST−1(Yj) for ALL j= 1...k add ε
 -- --> guaranteed because we continue adding terminalsRHS for the
 -- remainder of the production only if ε is in FIRST−1
 -- eventually adding terminalsForRHS (Prod _ []) = Data.Set.singleton Epsilon) 
-------------------------------------------------------------------------
-- We can compute FIRST for any string X1X2...Xn as follows.
-- Add to FIRST(X1X2... Xn) all non-ε symbols of FIRST(X1).
-- Also add the non-ε symbols of FIRST(X2), if ε is in FIRST(X1);
-- and so on.
-- Finally, add ε to FIRST(X1X2 . . Xn) if, for all i, ε is in FIRST(Xi)
-------------------------------------------------------------------------
firstForWord :: [Symbol] -> FFMap -> FFSet
firstForWord [Left nt] first = getSetFor nt first -- If Epsilon in first don't remove
firstForWord [Right t] first = Data.Set.singleton t
firstForWord (Left nt:ss) first = Data.Set.union 
 ( Data.Set.delete Epsilon $ getSetFor nt first)
 if Data.Set.member Epsilon $ getSetFor nt first
 then firstForWord ss first
 else Data.Set.empty

Edit:

firstForWord :: FFMap -> [Symbol] -> FFSet
firstForWord first = unsnoc >>> \(Just (lefts, last)) ->
 foldr foo (either (`getSetFor` first) Data.Set.singleton) lefts where
 foo (Left nt) bar = Data.Set.union 
 ( Data.Set.delete Epsilon $ getSetFor nt first)
 if Data.Set.member Epsilon $ getSetFor nt first
 then bar
 else Data.Set.empty

And then, if you don't mind making it more defined, make this and the above similar snippet use the same first argument to foldr.

answered May 2, 2016 at 22:31
\$\endgroup\$
2
  • \$\begingroup\$ Thank you a lot, this is really a great help as it is. I had a few "d'oh" moments while reading it, I expect I'll have more when altering my code to reflect yours. Just one thing, though: why did you merge FirstSet and FollowSet into FFSet? I originally kept them separate on purpose, to make sure not to use a FirstSet as a FollowSet or viceversa inside some nested call. Is that a bad Java-ism? Does Haskell provide a better way to avoid that? Thank you. \$\endgroup\$ Commented May 3, 2016 at 17:02
  • \$\begingroup\$ I find that these wrappers, that everyone invents anew, make it harder to see patterns that can be replaced by a combinator from some library that already did the heavy lifting. Mixing up which goes where shouldn't be much of a problem once the code has been reduced sufficiently. Repost if you find many reductions, I might see more on another pass. \$\endgroup\$ Commented May 3, 2016 at 18:09

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.