In order to practice some Haskell, I tried this kata, where the task is to add some numeric inputs that are delimited by commas or newlines.
I have the below solution in Haskell which passes my tests, but could anyone offer any improvements? It feels a bit messy to me so feedback on how to make it more functional (my experience is pretty much all OO until now) would be great!
Known failings:
This does not try to be smart about invalid syntax so the pattern matching is not exhaustive - would the best solution here be to use a try/either type? This seemed like it would add quite a bit of noise at this stage so for now I have left this. It also does not fail on negative numbers for the same reason.
Code:
module Calculator where
import Data.List.Split
defaultDelimiters = [",", "\n"]
readInt :: String -> Int
readInt = read
splitAtDelims :: ([String], String) -> [String]
splitAtDelims (delims, body) = foldr splitEachOnSubStr [body] delims
where
splitOnSubStr = split . dropBlanks . dropDelims . onSublist
splitEachOnSubStr = concatMap . splitOnSubStr
parseDelim :: String -> String -> (String, String)
parseDelim (']':xs) delim = (delim, xs)
parseDelim (x:xs) delim = parseDelim xs (delim++[x])
parseCustomDelimiters :: [String] -> String -> ([String], String)
parseCustomDelimiters [] (delim:'\n':body) = ([[delim]], body)
parseCustomDelimiters delims ('\n':rest) = (delims, rest)
parseCustomDelimiters delims ('[':rest) = parseCustomDelimiters newDelims remainingText
where
parsed = parseDelim rest ""
newDelims = delims ++ [fst parsed]
remainingText = snd parsed
parse :: String -> ([String], String)
parse ('/':'/':text) = parseCustomDelimiters [] text
parse body = (defaultDelimiters, body)
add :: String -> Int
add = sum . filter (< 1000) . map readInt . splitAtDelims . parse
-
3\$\begingroup\$ You seem to be ignoring the TDD aspect of the kata. \$\endgroup\$200_success– 200_success2018年08月29日 14:34:03 +00:00Commented Aug 29, 2018 at 14:34
-
\$\begingroup\$ I do have tests as well so know that it basically works (besides the above caveats), but in terms of code.. I'm wondering whether I have arrived at a reasonable solution - passing the tests is one thing, being readable and idiomatic is another.. \$\endgroup\$Mark– Mark2018年08月29日 20:19:34 +00:00Commented Aug 29, 2018 at 20:19
-
\$\begingroup\$ Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers . \$\endgroup\$Simon Forsberg– Simon Forsberg2018年09月03日 21:54:24 +00:00Commented Sep 3, 2018 at 21:54
-
\$\begingroup\$ Apologies, since you see extra detail added to SO questions all the time, I assumed that was fine so long as the original question was left in tact. Thanks for the link to clarify this. \$\endgroup\$Mark– Mark2018年09月04日 05:54:56 +00:00Commented Sep 4, 2018 at 5:54
1 Answer 1
I'd suggest to accompany the code with comments. Not just for the purpose of review, but also for your own. After a few weeks it's hard to remember all the details, and what seems to be clear now will be hard to read later. Especially in this case when all arguments are of String
or [String]
, it'd be very helpful to document them (see Function arguments).
Some more thoughts: The code isn't that easy to read. One reason for that I see is that your functions often entangle together several goals. Extracting common functionality into smaller helper functions would help.
Specific comments:
parse :: String -> ([String], String)
parse ('/':'/':text) = parseCustomDelimiters [] text
parse body = (defaultDelimiters, body)
The name is basically meaningless - what does it parse? Why do you call the
string text
in one place and body
in another?
parseDelim :: String -> String -> (String, String)
parseDelim (']':xs) delim = (delim, xs)
parseDelim (x:xs) delim = parseDelim xs (delim++[x])
This basically is "read until a specific character and return both parts". And
the second argument is redundant, it's only an implementation detail, and
obscures how the function should be used. I'd create a helper function like
readUntil :: Char -> String -> (String, String)
, and implement it using
break
.
And then just parseDelim = readUntil ']'
.
splitAtDelims :: ([String], String) -> [String]
splitAtDelims (delims, body) = foldr splitEachOnSubStr [body] delims
where
splitOnSubStr = split . dropBlanks . dropDelims . onSublist
splitEachOnSubStr = concatMap . splitOnSubStr
The type is quite uncommon. I'd rather use
splitAtDelims :: [String] -> String -> [String]
and then use uncurry splitAtDelims
when needed.
parseCustomDelimiters :: [String] -> String -> ([String], String)
parseCustomDelimiters [] (delim:'\n':body) = ([[delim]], body)
parseCustomDelimiters delims ('\n':rest) = (delims, rest)
parseCustomDelimiters delims ('[':rest) = parseCustomDelimiters newDelims remainingText
where
parsed = parseDelim rest ""
newDelims = delims ++ [fst parsed]
remainingText = snd parsed
Again, the first argument is just an implementation detail and makes the function both harder to read and harder to use. Rather you should define it for example as:
parseCustomDelimiters :: String -> ([String], String)
parseCustomDelimiters = loop []
where
loop = -- your original definition
Also you use both names body
and rest
for the same purpose, which is very
confusing. You could also use readUntil
from before to simplify this, as
apparently new-line terminates processing the input. So you could just split the
body on '\n'
and parse the delimiters from the first part.
Notice that when parsing delimiters, you match on the starting marker "//"
in
a different function than the ending marker "\n"
. It's much easier to read
when such related parts are together, in one function. So you could write
another helper function, like
readBetween :: String -> String -> (String, String)
where the first and the second argument are the starting and ending makers. Then
you could write readBetween "//" "\n" text
and parse the returned content.
I'd strongly encourage you to explore ReadP
, which is available in base, or some other parser (like Parsec AttoParsec). The code will be then much more readable!
You can also go without such a full-featured parser, and instead implement your own, tiny one. Already you can see in your types that the core structure of a parsing function (that doesn't handle errors or multiple possibilities) is
String -> (a, String)
That is, we parse some expected value of type a
and return the rest of the string. A parser is then just an encapsulation of such an idea, like newtype Parser a = Parser (String -> (a, String))
and common operations on them.
As mentioned in the comments, you didn't specify any tests, even though this was a TDD exercise. Including them would be definitely helpful. And a great way to learn about property-based testing! Testing is also extremely valuable for refactorings, to make sure you don't introduce a regression.
I hope this helps!
-
\$\begingroup\$ Thanks, this is great feedback - I will take another look and come back with some follow up questions :) \$\endgroup\$Mark– Mark2018年09月02日 08:18:19 +00:00Commented Sep 2, 2018 at 8:18
-
\$\begingroup\$ Yes, using break is much nicer, as is introducing an intermediate function for readUntil. (edit - I actually ended up inlining this as it brought the knowledge of ']' closer to the knowledge of '[') I hadn't seen uncurry - this seems very useful, thanks! Again, these tuples were something that felt wrong but I wasn't sure how to get around it. This feels a lot nicer but I wonder if I'm overusing it a bit now... :) Hiding the implementation detail of building a list of delimiters in parseCustomDelimiters is much nicer! \$\endgroup\$Mark– Mark2018年09月02日 18:35:30 +00:00Commented Sep 2, 2018 at 18:35
-
\$\begingroup\$ I had actually intended the distinction between "body" and "rest" or "text" to be a way to tell that we had the body as opposed to some text which would require further splitting to get to the body (+ delims).. but I can see this doesn't give the clarity I had hoped for Unfortunately, we can't just split the original text on '\n', since "//[\n]\n1\n2" could be valid input - so some more complex regex would be required and that sounds less transparent than code (to me..). For the same reason, readBetween is not viable. \$\endgroup\$Mark– Mark2018年09月02日 18:36:02 +00:00Commented Sep 2, 2018 at 18:36
-
\$\begingroup\$ I have updated the question with latest code and tests - I have not yet tried to use property based testing, but will give it a go when I get a chance. I will take a look at readp/parsers too, though may hold off on posting that for now. \$\endgroup\$Mark– Mark2018年09月02日 18:36:19 +00:00Commented Sep 2, 2018 at 18:36
-
\$\begingroup\$ ReadP is very cool, thanks for pointing it out! (I've also added a version with that just for completeness) \$\endgroup\$Mark– Mark2018年09月03日 20:54:44 +00:00Commented Sep 3, 2018 at 20:54