Here's my first significant Haskell program - it presents a count of each word seen on stdin. Any comments are welcome.
import Data.Map
emptyMap = empty :: Map String Integer
countWord m s = alter (\x -> case x of
Nothing -> Just 1
Just n -> Just (n+1)) s m
mapPairs m = concat [ k ++ ": " ++ (show v) ++ "\n" | (k,v) <- toList m ]
mapTotal = sum . elems
mapOutput m = (mapPairs m) ++ "Total words: " ++ (show (mapTotal m)) ++ "\n"
wc = interact $ mapOutput . (foldl countWord emptyMap) . words
Some of the questions I have are:
Am I using Map efficiently here? (I realize there are other solutions which do not use a Map - the main point of this exercise was to learn how to use Data.Map.)
How can I make sure that emptyMap, countWord and mapTotal all use the same integral type (
Int
vs.Integer
)? For instance, the type of mapTotal isMap a Integer -> Integer
, and so I'll get a type error if I change the definition of emptyMap toempty :: Map String Int
. Ideally I'd like to make the choice betweenInt
andInteger
in one place for all three definitions.
2 Answers 2
Your usage of Data.Map
is sound. I see nothing that could be improved as long as you stay with Data.Map
.
If you give all of the functions explicit type declarations, so that the Monomorphism Restriction doesn't apply, you can use generic integral types:
mapTotal :: Integral i => Map a i -> i
Some quick style-related things you could change:
First:
(\x -> case x of
Nothing -> Just 1
Just n -> Just (n+1))
... is the same as
Just . maybe 1 (+1)
Second:
You don't need parens around many of your function calls; if you have something like a +.- (foo bar baz) ¤:* b
, it means the same thing as a +.- foo bar baz ¤:* b
because every operator has lower precedence than function application.
Third:
You use very long indentations. A tip is to insert newlines after =
and before control structures like case
to make the code more compact. I find this style to be quite good.
-
\$\begingroup\$ I should probably have waited until this question was migrated to CodeReview; thought it already had been... \$\endgroup\$dflemstr– dflemstr2012年02月12日 04:05:34 +00:00Commented Feb 12, 2012 at 4:05
-
\$\begingroup\$ The first snippet is actually
maybe (Just 1) (Just . (+1))
, notmaybe 1 (+1)
. \$\endgroup\$ehird– ehird2012年02月12日 04:41:13 +00:00Commented Feb 12, 2012 at 4:41 -
1\$\begingroup\$
Data.Map.Strict
may be more efficient since he is only counting. \$\endgroup\$recursion.ninja– recursion.ninja2014年11月20日 03:11:53 +00:00Commented Nov 20, 2014 at 3:11
It's good code, except for the strictness problem : you're accumulating big thunks of ((1+1)+1)+1.. in your Map. If you're handling long texts with repeated words, you could even end up having a stack overflow when you're finally computing them (in your "mapPair").
In your solution, you'll have to do :
Just n -> Just $! (n+1)
so that the result is evaluated before Just is applied.
Other solutions :
countWord w m = alter ((Just $!) . maybe 1 (+1)) w m
or
fromListWith' :: (v->v->v) -> [(k,v)] -> Map k v
fromListWith' f = foldl' ins empty
where ins m (k,v) = insertWith' f k v m
countWords :: String -> Map String Integer
countWords = fromListWith' (+) . flip zip (repeat 1) . words
Any of these solutions will work like your original one but should be a bit faster and less memory hungry, though it is possible that GHC was already optimising that (if you were using -O or -O2).
fromListWith
; you can replacefoldl countWord emptyMap
with something likefromListWith (+) . flip zip (repeat 1)
and skip definingcountWord
at all. \$\endgroup\$