Here is another reference solution, that I discovered after writing my own:
https://insight.io/github.com/beloglazov/haskell-course/blob/HEAD/hw2.hs
I am following this lecture course as a complete beginner to Haskell. As there is no one to check my solutions to the homework assignments, I am posting here.
Until now, the lecture notes include: boolean logic, functions, tuples, lists, algebraic types, pattern matching, Prelude, case expressions, recursive data types.
The task is to parse this (and similar) error log files:
I 6 Completed armadillo processing
I 1 Nothing to report
I 4 Everything normal
I 11 Initiating self-destruct sequence
E 70 3 Way too many pickles
E 65 8 Bad pickle-flange interaction detected
W 5 Flange is due for a check-up
I 7 Out for lunch, back in two time steps
E 20 2 Too many pickles
I 9 Back from lunch
E 99 10 Flange failed!
The meaning of those lines can be glimpsed in the provided together with the homework file Log.hs:
module Log where
import Control.Applicative
data MessageType = Info
| Warning
| Error Int
deriving (Show, Eq)
type TimeStamp = Int
data LogMessage = LogMessage MessageType TimeStamp String
| Unknown String
deriving (Show, Eq)
data MessageTree = Leaf
| Node MessageTree LogMessage MessageTree
deriving (Show, Eq)
-- | @testParse p n f@ tests the log file parser @p@ by running it
-- on the first @n@ lines of file @f@.
testParse :: (String -> [LogMessage])
-> Int
-> FilePath
-> IO [LogMessage]
testParse parse n file = take n . parse <$> readFile file
-- | @testWhatWentWrong p w f@ tests the log file parser @p@ and
-- warning message extractor @w@ by running them on the log file
-- @f@.
testWhatWentWrong :: (String -> [LogMessage])
-> ([LogMessage] -> [String])
-> FilePath
-> IO [String]
testWhatWentWrong parse whatWentWrong file
= whatWentWrong . parse <$> readFile file
My solution:
{-# OPTIONS_GHC -Wall #-}
module Main where
import Log as L
-- Skip 'n' words and interpret the next one as a number.
wordsToNum :: String -> Int -> Int
wordsToNum s n = read ((words s) !! n)
-- Return the tail of a string.
dropWords :: String -> Int -> String
dropWords s n = unwords ( drop n (words s))
-- Convert a line from the sample log into an object of type L.LogMessage.
parseMessage :: String -> L.LogMessage
parseMessage [] = L.Unknown ""
parseMessage p@(x:_) = case x of
'E' -> L.LogMessage (L.Error (wordsToNum p 1)) (wordsToNum p 2) (dropWords p 3)
'W' -> L.LogMessage L.Warning (wordsToNum p 1) (dropWords p 2)
'I' -> L.LogMessage L.Info (wordsToNum p 1) (dropWords p 2)
_ -> L.Unknown p
-- Parse a whole logfile.
parse :: String -> [LogMessage]
parse s = map parseMessage (lines s)
-- Insert a L.LogMessage into a sorted L.MessageTree.
insert :: L.LogMessage -> L.MessageTree -> L.MessageTree
insert (L.Unknown _) t = t
insert m (L.Leaf) = L.Node Leaf m Leaf
insert m@(L.LogMessage _ ts1 _)
(L.Node lhs (L.LogMessage _ ts2 _) rhs)
= case ts1 > ts2 of
True -> insert m rhs
_ -> insert m lhs
insert _ _ = L.Leaf -- error, what do we do now?
-- Construct tree sorted by timestamp.
build :: [L.LogMessage] -> L.MessageTree
build [] = L.Leaf
build (m:l) = insert m (build l)
-- Flatten the tree into a sorted list.
inOrder :: L.MessageTree -> [L.LogMessage]
inOrder (L.Leaf) = []
inOrder (L.Node lhs m rhs) = inOrder lhs ++ [m] ++ inOrder rhs
-- Sort by increasing timestamp.
sort :: [L.LogMessage] -> [L.LogMessage]
sort l = inOrder (build l)
-- Remove massages, that are not error with severity >= 50.
filterList :: [L.LogMessage] -> [L.LogMessage]
filterList [] = []
filterList (m@(L.LogMessage (Error e) _ _):l) = case e >= 50 of
True -> [m] ++ filterList l
_ -> filterList l
filterList (_:l) = filterList l
toString :: [L.LogMessage] -> [String]
toString [] = []
toString (m:l) = [show m] ++ toString l
-- Extracts Error messages with severity >= 50 from an _unsorted_ list.
whatWentWrong :: [L.LogMessage] -> [String]
whatWentWrong [] = []
whatWentWrong l = toString (filterList (sort l))
main :: IO()
main = do
print (parseMessage "E 2 562 help help")
print (parseMessage "I 29 la la la")
print (parseMessage "This is not in the right format")
_ <- L.testParse parse 10 "./sample.log"
a <- L.testWhatWentWrong parse whatWentWrong "./sample.log"
print (unlines a)
-
\$\begingroup\$ related question (codereview.stackexchange.com/q/117351/37660) \$\endgroup\$Vogel612– Vogel6122017年04月18日 08:51:43 +00:00Commented Apr 18, 2017 at 8:51
-
\$\begingroup\$ It's been more than a year. Do you want to review your own code? \$\endgroup\$Zeta– Zeta2018年10月27日 08:00:19 +00:00Commented Oct 27, 2018 at 8:00
1 Answer 1
Simplifications for the pattern usage you currently have:
The pattern in parseMessage
can be extended so you don't need to explicitly use dropWords
. That only works if you change the signature to something based on words instead of the whole line, though.
That is the first thing you may want to adjust. As it now stands you're doing a lot of work with "splitting and reassembling" strings.
In the spirit of avoiding unnecessary work, you should only do that once.
Now assuming your parseMessage
were to work off a "splitted" string, you could instead have:
parseMessage :: [String] -> L.LogMessage
parseMessage "E":e:t:m = L.LogMessage (L.Error (read e)) (read t) (unwords m)
parseMessage "W":t:m = L.LogMessage L.Warning (read t) (unwords m)
parseMessage "I":t:m = L.LogMessage L.Info (read t) (unwords m)
parseMessage p = L.Unknown (unwords p)
Note that unwords
already correctly handles the case of p = []
This allows you to completely drop the helpers dropWords
and wordsToNum
.
Additional minor simplifications, spoilers hide the full solution ;)
parse
could be written as a function composition to avoid specifying the parameters
.
parse = (map parseMessage) . lines
build
is commonly known in functional programming as afold
. It collapses a list into a single element.
In this case, the fold is "right-associative", so the whole thing can be reformulated withfoldr
:
build = foldr insert Leaf
Here's a bit of an explanation why this works:
Let's write this a bit more ... verbosely:
build ms = foldr insert Leaf ms
. What basically happens here is that you give a neutral or starter element (Leaf
) andinsert
is repeatedly called in the following way:
(m1 `insert` (m2 `insert` (... (mn `insert` Leaf))))
Note that there is a huge bug in how insert
works. As it stands, insert
will only ever return a tree with a single Node
in it. To understand why you need to reevaluate what insert
does and how mutability works in haskell (hint: it doesn't).
Sidenote here: you can simplify insert
a bit with an if .. then .. else ..
. You should call insert a few times and see what happens with the LogMessage that has the timestamp t2
.
If you weren't probably forced to use a tree for sorting sort
would be much easier to write in terms of Data.List.sortBy
.
Similarly, filterList
also already exists in the form of filter
:
filterList = filter (\(L.LogMessage (Error e) _ _) -> e >= 50)
Note that this particular piece is untested
An additional simplification is noticing that toString
is just map show