I'm an experienced programmer that also has a little experience with functional programming, although mostly theoretical (as in hobby-level reading and very minor projects).
I recently decided to work through the new book "Beginning Haskell" to get up to speed with Haskell and try to crank out something worthy of a github repo. Early in the book you are asked to implement a Client
ADT including some supportive data types for Person and Gender. I did it like this and I'm not liking it at all:
data Gender = Male | Female | Unknown | None
deriving (Show, Eq)
data Person = Person { fName :: String, lName :: String, gender :: Gender }
deriving (Show, Eq)
data Client = GovOrg { code :: String, name :: String }
| Individual { code :: String, person :: Person }
| Company { code :: String, name :: String, contact :: Person }
deriving (Show, Eq)
After this you are asked to write a function that takes [Client]
and returns the count for the different genders, e.g. "out of the 20 clients listed, we have 5 male, 5 female, 10 unknown". Let's call it countGenders
. Below is the code I've ended up with, and I feel that it's a complete mess and lacks any feeling of clarity or elegance. I think my OOP-brain is forcing me into bad designs. I'll walk you through the lines below. Once again I am not at all happy with this.
First we have a result data type. Could be just an (Int, Int, Int)
, but I feel like the location in the tuple isnt a natural fit with what the data should represent (as opposed to Point Int Int
):
data GenderCount = GenderCount { male :: Int, female :: Int, unknown :: Int }
Then we have the top level function that is supposed to be invoked. Maps genderCount
across a [Client]
, which results in [(Int, Int, Int)]
, then unzips those and sums each list.
countGenders :: [Client] -> GenderCount
countGenders cs = GenderCount (sum m) (sum f) (sum u)
where (m, f, u) = unzip3 $ map genderCount cs
Then we have genderCount, which maps a Client to a result tuple (Male, Female, Unknown):
genderCount :: Client -> (Int, Int, Int)
genderCount c = case cGender c of
Male -> (1, 0, 0)
Female -> (0, 1, 0)
Unknown -> (0, 0, 1)
None -> (0, 0, 0)
Last we have cGender, which gets the gender from a client (using RecordWildCards
extension):
cGender :: Client -> Gender
cGender GovOrg {} = None
cGender Individual { .. } = gender person
cGender Company { .. } = gender contact
As you can see this is all a mess. But I'm having trouble thinking of how to refine it. I feel like my data model is bloated, which bleeds over into too many functions to do something that should be really simple. I would appreciate any feedback on the data model, the functions or anything else.
Here is the entire code if you want to look it over in full context: http://pastebin.com/Vymcd6Bj
1 Answer 1
This isn't that much of a mess at all. Let's clean it up, then get a little fancy.
Right off the bat it seems a little funky to have a None
Gender
, as it appears you're not actually trying to account for agender individuals but instead providing for the failure to produce a Gender
for a particular Client
. Operations that might fail in Haskell usually signal so by returning a Maybe
value, so let's drop None
from the definition and see what needs changing.
data Gender = Male | Female | Unknown
deriving (Show, Eq)
genderCount
and cGender
depend on None
, let's start with the latter. c
doesn't mean much as a prefix to me, so I'm going to call this function clientGender
, but that's a stylistic preference. Changing this function to use Maybe
is straightforward.
clientGender :: Client -> Maybe Gender
clientGender GovOrg {} = Nothing
clientGender Individual {..} = Just (gender person)
clientGender Company {..} = Just (gender contact)
Now let's turn to genderCount
. The first thing to notice is that the function is a tiny wrapped up case
statement. This is usually a code smell to me that hints at a missed opportunity to break functions into smaller, more independent pieces. In this case genderCount
has nothing to do with Client
s, so let's rip that part out!
genderCount :: Gender -> (Int, Int, Int) -- Counting Genders, not Clients!
genderCount Male = (1, 0, 0)
genderCount Female = (0, 1, 0)
genderCount Unknown = (0, 0, 1)
This is still unsatisfactory, right? There's that convention-typed tuple, and we've got a perfectly good GenderCount
datatype lying around, so let's use it.
genderCount :: Gender -> GenderCount
genderCount Male = GenderCount 1 0 0
genderCount Female = GenderCount 0 1 0
genderCount Unknown = GenderCount 0 0 1
It's a small change, but users of GenderCount
can rely on the field names rather than a tuple ordering.
And now countGenders
, the piece that glues it all together. Our type is still correct, which is awesome! No changes there. The implementation though is doing a few different things we'll need to adjust. In an informational sense it's determining the genders of all of the clients, then accumulating a count of each Gender
. What it looks like though is some weird tuple math to produce a GenderCount
value from nowhere! We can rewrite it given our new implementations to be a little prettier, but first we're going to need a way to add two GenderCount
s together.
addGenderCounts :: GenderCount -> GenderCount -> GenderCount
addGenderCounts (GenderCount m f u) (GenderCount n g v) = GenderCount (m + n) (f + g) (u + v)
A lot of repeated instances of GenderCount
in there, but that's not so bad given we can use this as a combinator. Now we can put our countGenders
function together.
countGenders :: [Client] -> GenderCount
countGenders = foldr (addGenderCounts . genderCount) (GenderCount 0 0 0) . mapMaybe clientGender
This works! I've imported mapMaybe
from Data.Maybe
here to account for our clientGender
function sometimes returning Nothing
(mapMaybe
drops all the Nothing
s and returns a list of the Just
values). We use a right fold to accumulate our GenderCount
values, and a starting value of GenderCount 0 0 0
for our accumulator.
There are a few ways to go from here to clean things up further. You could get rid of the GenderCount 0 0 0
value by using foldr1
, at the cost of adding another composition with map
into the mix. If you have sharp eyes and a working knowledge of the Typeclassopedia you'll note a striking similarity between the way we use GenderCount
with a right fold, and a Monoid
. If you don't have a working knowledge of the Typeclassopedia, our motivation is that Monoid
s allow us to specify an identity element and an associative reduction operation, revealing some higher level abstractions (and functions) we can use to wire our code together. Let's make a Monoid
.
instance Monoid GenderCount where
mempty = GenderCount 0 0 0
mappend = addGenderCounts
-- Laws:
-- mempty <> x = x
-- x <> mempty = x
-- x <> (y <> z) = (x <> y) <> z
I won't prove the Monoid
laws, but you should be able to see that they are trivial given the properties of addition and our definition of GenderCount
.
Let's try one more pass at countGenders
now.
countGenders :: [Client] -> GenderCount
countGenders = foldMap genderCount . mapMaybe clientGender
Nice!
-
\$\begingroup\$ Thanks a lot! You really got what I felt was at the heart of the issue for me - a misused data model. I had an inkling that
Maybe
would be better than a None gender, but it just bloated my code more. ThemapMaybe
function helps a bit with that and makes it a lot cleaner though. I'm aware of the Monoid type/laws and could see its applicability to the problem but wasn't sure how to proceed. \$\endgroup\$Anders Arpi– Anders Arpi2014年05月05日 14:52:13 +00:00Commented May 5, 2014 at 14:52 -
\$\begingroup\$ Glad to help! I find the best way to learn about nifty little functions like
mapMaybe
is to read through the docs forbase
. Anywhere in theData
orControl
namespaces is probably a good place to start. \$\endgroup\$bisserlis– bisserlis2014年05月06日 00:43:37 +00:00Commented May 6, 2014 at 0:43
Explore related questions
See similar questions with these tags.