CodeWars - Three-pass compiler for arithmetic expressions

Question 1

I'm looking for feedback on my solution to the Tiny Three-Pass Compiler kata on CodeWars. The kata is to implement a compiler for an arithmetic language in three passes - parsing to an AST, constant folding, then generating code in a small assembly language.

I'm looking for general stylistic review, as well as reducing some of the duplication in the pass2 and pass3 functions; I have a feeling that the Add/Sub/Mul/Div cases could probably be condensed somehow.

Without further ado, the code:

module TinyThreePassCompiler where
import Data.Map.Strict (Map)
import qualified Data.Map.Strict as Map
import Data.Maybe (fromJust)
import Text.Parsec
import qualified Text.Parsec.Token as Tok
-- count number of arguments declared, how identifiers map to argument numbers
type CompilerState = (Int, Map String Int) 
type Parser a = Parsec String CompilerState a
langDef :: Tok.LanguageDef CompilerState
langDef = Tok.LanguageDef
 { Tok.commentStart = ""
 , Tok.commentEnd = ""
 , Tok.commentLine = ""
 , Tok.nestedComments = False
 , Tok.identStart = letter
 , Tok.identLetter = letter
 , Tok.opStart = oneOf "+-*/"
 , Tok.opLetter = oneOf "+-*/"
 , Tok.reservedNames = []
 , Tok.reservedOpNames = []
 , Tok.caseSensitive = True
 }
lexer :: Tok.TokenParser CompilerState
lexer = Tok.makeTokenParser langDef
parens :: Parser a -> Parser a
parens = Tok.parens lexer
brackets :: Parser a -> Parser a
brackets = Tok.brackets lexer
identifier :: Parser String
identifier = Tok.identifier lexer
reservedOp :: String -> Parser ()
reservedOp = Tok.reservedOp lexer
integer :: Parser Integer
integer = Tok.integer lexer
data AST = Imm Int
 | Arg Int
 | Add AST AST
 | Sub AST AST
 | Mul AST AST
 | Div AST AST
 deriving (Eq, Show)
function :: Parser AST
function = do
 brackets argList
 expression
argList :: Parser ()
argList = do
 many variableDec
 return ()
-- for each variable declaration, we add a mapping from the argument name to the argument number
-- as well as incrementing the Int in CompilerState, which tracks what argument number we're on
variableDec :: Parser ()
variableDec = do
 varName <- identifier
 (varNum, _) <- getState
 modifyState (\(num, args) -> (num + 1, Map.insert varName varNum args))
 return ()
expression :: Parser AST
expression = term `chainl1` addSubOp
addSubOp :: Parser (AST -> AST -> AST)
addSubOp = (reservedOp "+" >> return Add)
 <|> (reservedOp "-" >> return Sub)
term :: Parser AST
term = factor `chainl1` multDivOp
multDivOp :: Parser (AST -> AST -> AST)
multDivOp = (reservedOp "*" >> return Mul)
 <|> (reservedOp "/" >> return Div)
factor :: Parser AST
factor = number
 <|> variableUse
 <|> parens expression
number :: Parser AST
number = do
 num <- integer
 return $ Imm $ fromIntegral num
-- using fromJust because we don't care about error handling
-- per problem, all programs will be valid
variableUse :: Parser AST
variableUse = do
 varName <- identifier
 (_, varMap) <- getState
 return $ Arg $ fromJust $ Map.lookup varName varMap
compile :: String -> [String]
compile = pass3 . pass2 . pass1
-- parsing pass
pass1 :: String -> AST
pass1 str = case (runParser function (0, Map.empty) "" str) of
 (Left _) -> error "Parse error"
 (Right ast) -> ast
-- constant folding pass
-- do a postorder traversal of the AST, checking for optimization opportunities at each node
pass2 :: AST -> AST
pass2 ast = case ast of
 Add left right -> case (pass2 left, pass2 right) of
 ((Imm m), (Imm n)) -> Imm (m + n)
 (l, r) -> Add l r
 Sub left right -> case (pass2 left, pass2 right) of
 ((Imm m), (Imm n)) -> Imm (m - n)
 (l, r) -> Sub l r
 Mul left right -> case (pass2 left, pass2 right) of
 ((Imm m), (Imm n)) -> Imm (m * n)
 (l, r) -> Mul l r
 Div left right -> case (pass2 left, pass2 right) of
 ((Imm m), (Imm n)) -> Imm (m `div` n)
 (l, r) -> Div l r
 _ -> ast
-- code generation pass
-- postorder traversal of the AST, generating code at each node
pass3 :: AST -> [String]
pass3 ast = case ast of
 Imm n -> ["IM " ++ show n, "PU"]
 Arg a -> ["AR " ++ show a, "PU"]
 Add l r -> pass3 l ++ pass3 r ++ ["PO", "SW", "PO", "AD", "PU"]
 Sub l r -> pass3 l ++ pass3 r ++ ["PO", "SW", "PO", "SU", "PU"]
 Mul l r -> pass3 l ++ pass3 r ++ ["PO", "SW", "PO", "MU", "PU"]
 Div l r -> pass3 l ++ pass3 r ++ ["PO", "SW", "PO", "DI", "PU"]

A few notes:

the AST data type and types for compile, pass1, pass2, and pass3 are set by the kata. I'm stuck with using Int as the underlying type for the Imm and Arg constructors.
Everything is one module for running on CodeWars. I'd probably split this into multiple modules if it weren't for that restriction.
Parsec is available on CodeWars; Megaparsec isn't, else I'd consider using that.
Per the kata's description, all test programs will be valid; thus, I don't put any effort into error handling, and I'm ok with variableUse throwing an error if an undeclared argument is used.

Question 2

Huh. I never got to solve that kata, so congratulations first.

Working with a given AST

As you said yourself, the AST type isn't optimal. We could use

data AST = Imm Int
 | Arg Int
 | BinOp BinOp AST AST
 deriving (Eq, Show)
data BinOp = Add | Mul | Sub | Div deriving (Eq, Show)

instead and simplify both pass2 and pass3 a lot. The operator parsers would return a BinOp instead of an AST -> AST -> AST. But we're stuck with the current AST.

Reducing code duplication (DRY)

If we have a look at pass2 we see that all cases are very similar. They all follow the

 Constructor left right -> case (pass2 left, pass2 right) of
 (Imm m, Imm n) -> Imm (m <op> n)

pattern. We can just use that one, too:

pass2 :: AST -> AST
pass2 ast = case ast of
 Add left right -> simplify Add (+) left right
 Sub left right -> simplify Sub (-) left right
 Mul left right -> simplify Mul (*) left right
 Div left right -> simplify Div div left right
 _ -> ast
 where
 simplify c op l r = case (pass2 l, pass2 r) of 
 (Imm m, Imm n) -> Imm (m `op` n)
 _ -> c l r

This variant is easier to maintain, as we only have to change simplify or add the new constructor to the outer case, e.g.

pass2 :: AST -> AST
pass2 ast = case ast of
 Add left right -> simplify Add (+) left right
 Sub left right -> simplify Sub (-) left right
 Mul left right -> simplify Mul (*) left right
 Div left right -> simplify Div div left right
 Pow left right -> simplify Pow (^) left right
 _ -> ast
 where
 simplify c op l r = case (pass2 l, pass2 r) of 
 (Imm m, Imm n) -> Imm (m `op` n)
 _ -> c l r

It's similar for pass3:

pass3 :: AST -> [String]
pass3 ast = case ast of
 Imm n -> ["IM " ++ show n, "PU"]
 Arg a -> ["AR " ++ show a, "PU"]
 Add l r -> generate "AD" l r
 Sub l r -> generate "SU" l r
 Mul l r -> generate "MU" l r
 Div l r -> generate "DI" l r
 where
 generate oc l r = pass3 l ++ pass3 r ++ ["PO", "SW", "PO", oc, "PU"]

This keeps the repetition in our code a little bit down.

An intermediate representation

But we can do better if we introduce additional functions and types:

data BinOp = BAdd | BSub | BMul | BDiv deriving (Eq, Show)
data BinOpExpr = BinOpExpr BinOp AST AST
toBinOpExpr :: AST -> Maybe BinOpExpr
toBinOpExpr ast = case ast of
 Add l r -> gen BAdd l r
 Sub l r -> gen BSub l r
 Mul l r -> gen BMul l r
 Div l r -> gen BDiv l r
 _ -> Nothing
 where
 gen c l r = Just $ BinOpExpr c l r

If we now have toOpCode :: BinOp -> String, it's really easy to write pass3:

pass3 :: AST -> [String]
pass3 ast = case ast of
 Imm n -> ["IM " ++ show n, "PU"]
 Arg a -> ["AR " ++ show a, "PU"]
 o -> case toBinOpExpr o of 
 Just (BinOpExpr op l r) -> pass3 l ++ pass3 r ++ ["PO", "SW", "PO", toOpCode op, "PU"]
 Nothing -> error "Unreachable"

Similar for pass2 if we have toFunc :: BinOp -> (Int -> Int -> Int):

pass2 :: AST -> AST
pass2 ast = case toBinOpExpr ast of
 Just (BinOpExpr op l r) -> simplify op l r ast
 _ -> ast 
 where
 simplify op l r ast = case (pass2 l, pass2 r) of 
 (Imm m, Imm n) -> Imm (toFunc op m n)
 _ -> ast

Other remarks

Your Parser are well done, but we can get rid of some superfluous code. For example

argList :: Parser ()
argList = do
 many variableDec
 return ()

can be written as

argList = skipMany variableDec

skipMany still applies the parser (and therefore changes the state), whereas many applies the parser and collects a list of [()].

Similarly, we can get rid of return () in

variableDec :: Parser ()
variableDec = do
 varName <- identifier
 (varNum, _) <- getState
 modifyState (\(num, args) -> (num + 1, Map.insert varName varNum args))
 return ()

as modifyState f >> return () is the same as modifyState f due to the monad laws:

For number, we can use Parser's Functor instance to make it shorter:

number :: Parser AST
number = Imm . fromIntegral <$> integer

The other functions could get written in a point-free style, but that would be harder to read in my opinion.

Question 3

Thanks! These are some interesting suggestions; I'll definitely keep them in mind for future work in this vein.

Question 4

@DylanSp Finally got to comment the Parser section. Have fun on CW!

Question 5

Control.Lens.Plated specializes in passes over recursive data structures.

pass2 = rewriteOf uniplate $ fmap Imm . \case
 Add (Imm left) (Imm right) -> Just $ left + right
 Sub (Imm left) (Imm right) -> Just $ left - right
 Mul (Imm left) (Imm right) -> Just $ left * right
 Div (Imm left) (Imm right) -> Just $ left / right
 _ -> Nothing

Control.Lens.Prism specializes in abstracting over constructors.

pass2 = rewriteOf uniplate $ \ast -> listToMaybe
 [ Imm $ f left right
 | (p, f) <- [(_Add, (+)), (_Sub, (-)), (_Mul, (*)), (_Div, (/))]
 , Just (Imm left, Imm right) <- pure $ preview p ast
 ]

Zeta Zeta 19.6k2 gold badges57 silver badges90 bronze badges · Accepted Answer · 2018-02-28 17:05:47Z

Huh. I never got to solve that kata, so congratulations first.

Working with a given AST

As you said yourself, the AST type isn't optimal. We could use

data AST = Imm Int
 | Arg Int
 | BinOp BinOp AST AST
 deriving (Eq, Show)
data BinOp = Add | Mul | Sub | Div deriving (Eq, Show)

instead and simplify both pass2 and pass3 a lot. The operator parsers would return a BinOp instead of an AST -> AST -> AST. But we're stuck with the current AST.

Reducing code duplication (DRY)

If we have a look at pass2 we see that all cases are very similar. They all follow the

 Constructor left right -> case (pass2 left, pass2 right) of
 (Imm m, Imm n) -> Imm (m <op> n)

pattern. We can just use that one, too:

pass2 :: AST -> AST
pass2 ast = case ast of
 Add left right -> simplify Add (+) left right
 Sub left right -> simplify Sub (-) left right
 Mul left right -> simplify Mul (*) left right
 Div left right -> simplify Div div left right
 _ -> ast
 where
 simplify c op l r = case (pass2 l, pass2 r) of 
 (Imm m, Imm n) -> Imm (m `op` n)
 _ -> c l r

This variant is easier to maintain, as we only have to change simplify or add the new constructor to the outer case, e.g.

pass2 :: AST -> AST
pass2 ast = case ast of
 Add left right -> simplify Add (+) left right
 Sub left right -> simplify Sub (-) left right
 Mul left right -> simplify Mul (*) left right
 Div left right -> simplify Div div left right
 Pow left right -> simplify Pow (^) left right
 _ -> ast
 where
 simplify c op l r = case (pass2 l, pass2 r) of 
 (Imm m, Imm n) -> Imm (m `op` n)
 _ -> c l r

It's similar for pass3:

pass3 :: AST -> [String]
pass3 ast = case ast of
 Imm n -> ["IM " ++ show n, "PU"]
 Arg a -> ["AR " ++ show a, "PU"]
 Add l r -> generate "AD" l r
 Sub l r -> generate "SU" l r
 Mul l r -> generate "MU" l r
 Div l r -> generate "DI" l r
 where
 generate oc l r = pass3 l ++ pass3 r ++ ["PO", "SW", "PO", oc, "PU"]

This keeps the repetition in our code a little bit down.

An intermediate representation

But we can do better if we introduce additional functions and types:

data BinOp = BAdd | BSub | BMul | BDiv deriving (Eq, Show)
data BinOpExpr = BinOpExpr BinOp AST AST
toBinOpExpr :: AST -> Maybe BinOpExpr
toBinOpExpr ast = case ast of
 Add l r -> gen BAdd l r
 Sub l r -> gen BSub l r
 Mul l r -> gen BMul l r
 Div l r -> gen BDiv l r
 _ -> Nothing
 where
 gen c l r = Just $ BinOpExpr c l r

If we now have toOpCode :: BinOp -> String, it's really easy to write pass3:

pass3 :: AST -> [String]
pass3 ast = case ast of
 Imm n -> ["IM " ++ show n, "PU"]
 Arg a -> ["AR " ++ show a, "PU"]
 o -> case toBinOpExpr o of 
 Just (BinOpExpr op l r) -> pass3 l ++ pass3 r ++ ["PO", "SW", "PO", toOpCode op, "PU"]
 Nothing -> error "Unreachable"

Similar for pass2 if we have toFunc :: BinOp -> (Int -> Int -> Int):

pass2 :: AST -> AST
pass2 ast = case toBinOpExpr ast of
 Just (BinOpExpr op l r) -> simplify op l r ast
 _ -> ast 
 where
 simplify op l r ast = case (pass2 l, pass2 r) of 
 (Imm m, Imm n) -> Imm (toFunc op m n)
 _ -> ast

Other remarks

Your Parser are well done, but we can get rid of some superfluous code. For example

argList :: Parser ()
argList = do
 many variableDec
 return ()

can be written as

argList = skipMany variableDec

skipMany still applies the parser (and therefore changes the state), whereas many applies the parser and collects a list of [()].

Similarly, we can get rid of return () in

variableDec :: Parser ()
variableDec = do
 varName <- identifier
 (varNum, _) <- getState
 modifyState (\(num, args) -> (num + 1, Map.insert varName varNum args))
 return ()

as modifyState f >> return () is the same as modifyState f due to the monad laws:

For number, we can use Parser's Functor instance to make it shorter:

number :: Parser AST
number = Imm . fromIntegral <$> integer

The other functions could get written in a point-free style, but that would be harder to read in my opinion.

Thanks! These are some interesting suggestions; I'll definitely keep them in mind for future work in this vein.
@DylanSp Finally got to comment the Parser section. Have fun on CW!

Stack Exchange Network

CodeWars - Three-pass compiler for arithmetic expressions

2 Answers 2

Working with a given AST

Reducing code duplication (DRY)

An intermediate representation

Other remarks

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CodeWars - Three-pass compiler for arithmetic expressions

2 Answers 2

Working with a given AST

Reducing code duplication (DRY)

An intermediate representation

Other remarks

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions