I recently finished reading Learn you a Haskell for great good!. Even though some topics are a bit over my head (Monads
anyone?), I wanted to try my hand at a fairly difficult problem. I chose to write a eval function that correctly handles parenthesis and the order of operations. I think this is pretty good, but I can see some things that need improvement.
module Eval where
data Token = Times | Div | Add | Sub | LPren | RPren | Number Float
deriving (Show, Eq)
tokenise :: String -> Token
tokenise n = case n of
"*" -> Times
"/" -> Div
"+" -> Add
"-" -> Sub
"(" -> LPren
")" -> RPren
_ -> Number $ read n
parse :: String -> [Token]
parse s = map tokenise $ tokens s
reduce :: [Token] -> Float
reduce tokens = let groups = prenGroups tokens
results = map reduce groups
leftover = substitute tokens results
orderOfOps = [Times, Div, Add, Sub]
in unpackNum $ head $ foldl (flip reduceAllOfOp) leftover orderOfOps
-- The expression passed to this function must be ONLY numbers an ops in proper form,
-- num, (op, num)*
reduceAllOfOp :: Token -> [Token] -> [Token]
reduceAllOfOp tok (x:op:y:rest) = if op == tok
then let xn = unpackNum x
yn = unpackNum y
oper = getOp op
in reduceAllOfOp tok ((Number $ oper xn yn):rest)
else x:op:(reduceAllOfOp tok (y:rest))
reduceAllOfOp _ toks = toks
-- substitute numbers in for prenthisis groups, pram 1 is the problem, pram 2 is a list of numbers
substitute :: [Token] -> [Float] -> [Token]
substitute ts [] = ts
substitute ts ns = let head' = takeWhile (/=LPren) ts
tail' = tail $ dropWhile (/=LPren) ts
rprenIdx = findMatchingRPren tail'
(_, tail'') = splitAt rprenIdx tail'
in head' ++ [Number $ head ns] ++ substitute tail'' (tail ns)
unpackNum :: Token -> Float
unpackNum (Number num) = num
unpackNum _ = error "Not a number!!"
isNum :: Token -> Bool
isNum (Number _) = True
isNum _ = False
getOp :: Token -> Float -> Float -> Float
getOp tok = case tok of
Times -> (*)
Div -> (/)
Add -> (+)
Sub -> (-)
_ -> error "Not an op!!"
-- this is also a very big mess, but not quite as bad as `findMatchingRPren`
prenGroups :: [Token] -> [[Token]]
prenGroups [] = []
prenGroups tokens = if all (`elem` tokens) [LPren, RPren]
-- remove the head of the tokens, returns ... (`...) ...`
then let tail' = tail $ dropWhile (/=LPren) tokens
prenLoc = findMatchingRPren tail'
-- returns ... (`...``) ...`
(group, rest) = splitAt (prenLoc-1) tail'
-- the `tail` call is to remove the pren from the list
in group:prenGroups (tail rest)
else []
-- THIS IS A HORRIBLE FLAMING PIECE OF UGLY JUNK THAT SHOULD BE DESTROYED ON SIGHT!
-- unfortunatly it is the only thing that I can think of that works...
type Counter = (Int, Int, Bool)
findMatchingRPren :: [Token] -> Int
findMatchingRPren t =
let fun = (\(runningCount, splitIdx, done) tok ->
let count = runningCount + case tok of
RPren -> -1
LPren -> 1
_ -> 0
in if done
then (runningCount, splitIdx, done)
else (if count == 0
then (count, splitIdx+1, True)
else (count, splitIdx+1, False))) :: Counter -> Token -> Counter
res = foldl fun ((1, 0, False) :: Counter) t
in case res of
(_, idx, _) -> idx
-- ================ end of interesting code ===================== --
tokens :: String -> [String]
tokens [] = []
tokens s = let s' = trim s
isOp = (`elem` "+-*/()")
(head', tail') = if isOp $ head s'
then ([head s'], tail s')
else break isOp s'
in trim head':tokens tail'
-- remove leading and trailing space
trim :: String -> String
trim (' ':ta) = trim ta
trim s = if last s == ' '
then trim $ init s
else s
eval :: String -> Maybe Float
eval e = Just $ reduce $ parse e
show' :: [Token] -> String
show' = concatMap (\x -> case x of
Times -> "*"
Div -> "/"
Add -> "+"
Sub -> "-"
LPren -> "("
RPren -> ")"
Number x -> show x)
So my questions are:
- How can I write the
findMatchingRPren
function more elegantly? - How does my naming look (e.g. the use of
head'
andtail'
)? - What would be the best way to extract the shared code from
substitute
andprenGroups
? - What general improvements can be made to my algorithm?
- Just anything else you can see that could be improved...
3 Answers 3
If you aren't going to use actual compiler libraries—which would be the right move in production code, but maybe less interesting for learning—then you need an intermediate step between tokenizing and evaluation, which is to build tree structure representing your expression.
Whenever you have potentially nested structured in your data, such as parens around sub-expressions, you need to look to stacks and trees to manage them cleanly.
-
\$\begingroup\$ Yeah, this is mostly so I can try out my Haskell skills. Would you mind adding some explanation on how you would make the tree structure? \$\endgroup\$J Atkin– J Atkin2016年03月06日 14:34:19 +00:00Commented Mar 6, 2016 at 14:34
-
\$\begingroup\$ Turns out that comment boxes don't support full Markdown so this is harder than it should be. \$\endgroup\$keithb_b– keithb_b2016年03月12日 17:42:32 +00:00Commented Mar 12, 2016 at 17:42
-
\$\begingroup\$ For example, take the expression
(1+2)*(5-3)
. We can parse that out into a tree like this: the root,e
is a multiplication operation with a left and a right operand, call themer
andel
. The left operandel
is an expression with an addition operation and a left operand and a right operand, call themell
andelr
. The left operandell
is an expression with a literal value, 1. The right operandelr
is an expression with a literal value, 2. And similarly for `er. What would a Haskell value that represented such a tree of expressions look like? What shape should its type be? \$\endgroup\$keithb_b– keithb_b2016年03月12日 17:48:00 +00:00Commented Mar 12, 2016 at 17:48 -
\$\begingroup\$ Hello, I just found your answer again. After re-reading it, I finally realized what you meant. When I read this the first time, I was relatively new to coding (and I was 15). This actually makes a lot of sense and you deserve the accepted answer. \$\endgroup\$J Atkin– J Atkin2017年11月28日 03:04:44 +00:00Commented Nov 28, 2017 at 3:04
Input / Output repetition
tokenise :: String -> Token
tokenise n = case n of
"*" -> Times
"/" -> Div
"+" -> Add
"-" -> Sub
"(" -> LPren
")" -> RPren
_ -> Number $ read n
show' = concatMap (\x -> case x of
Times -> "*"
Div -> "/"
Add -> "+"
Sub -> "-"
LPren -> "("
RPren -> ")"
Number x -> show x)
tokenise
and show
contain exactly the same information, I suggest using a look-up table to avoid repetition:
tokenToSymbol = [ ("*", Times),
("/", Div),
... ]
symbolToToken = map swap tokenToSymbol
tokenise n = case lookup n tokenToSymbol of
Just x -> x
Nothing -> Number $ read n
show' = concatMap (\x -> case lookup x symbolTotoken of
Just x -> x
Nothing -> show x)
Now the mapping is defined only once, this simplifies future changes.
Using the State
monad twice and an external looping library, findMatchingRPren
can be somewhat tamed. The logic behind it is of course still the same, except that instead of calculating an Int
, we immediately split the string, which is what you were going to do anyway. This follows a principle of avoiding "Int blindness", compare to boolean blindness. (Also this aborts when we're done.)
import Control.Monad.Loops
splitAtMatchingRPren :: [Token] -> ([Token], [Token])
splitAtMatchingRPren = runState $ (`evalStateT` 1) $ whileM (gets (/=0)) $ do
tok <- lift $ state $ fromJustNote "out of tokens" . uncons
modify $ (+) case tok of
RPren -> -1
LPren -> 1
_ -> 0
return tok
Note that your solution just puts splits to the right of the whole string, without complaint.