Counting items in categories

Question 1

I'm an experienced programmer that also has a little experience with functional programming, although mostly theoretical (as in hobby-level reading and very minor projects).

I recently decided to work through the new book "Beginning Haskell" to get up to speed with Haskell and try to crank out something worthy of a github repo. Early in the book you are asked to implement a Client ADT including some supportive data types for Person and Gender. I did it like this and I'm not liking it at all:

data Gender = Male | Female | Unknown | None
 deriving (Show, Eq)
data Person = Person { fName :: String, lName :: String, gender :: Gender }
 deriving (Show, Eq) 
data Client = GovOrg { code :: String, name :: String }
 | Individual { code :: String, person :: Person }
 | Company { code :: String, name :: String, contact :: Person }
 deriving (Show, Eq)

After this you are asked to write a function that takes [Client] and returns the count for the different genders, e.g. "out of the 20 clients listed, we have 5 male, 5 female, 10 unknown". Let's call it countGenders. Below is the code I've ended up with, and I feel that it's a complete mess and lacks any feeling of clarity or elegance. I think my OOP-brain is forcing me into bad designs. I'll walk you through the lines below. Once again I am not at all happy with this.

First we have a result data type. Could be just an (Int, Int, Int), but I feel like the location in the tuple isnt a natural fit with what the data should represent (as opposed to Point Int Int):

data GenderCount = GenderCount { male :: Int, female :: Int, unknown :: Int }

Then we have the top level function that is supposed to be invoked. Maps genderCount across a [Client], which results in [(Int, Int, Int)], then unzips those and sums each list.

countGenders :: [Client] -> GenderCount
countGenders cs = GenderCount (sum m) (sum f) (sum u)
 where (m, f, u) = unzip3 $ map genderCount cs

Then we have genderCount, which maps a Client to a result tuple (Male, Female, Unknown):

genderCount :: Client -> (Int, Int, Int)
genderCount c = case cGender c of
 Male -> (1, 0, 0)
 Female -> (0, 1, 0)
 Unknown -> (0, 0, 1)
 None -> (0, 0, 0)

Last we have cGender, which gets the gender from a client (using RecordWildCards extension):

cGender :: Client -> Gender
cGender GovOrg {} = None 
cGender Individual { .. } = gender person
cGender Company { .. } = gender contact

As you can see this is all a mess. But I'm having trouble thinking of how to refine it. I feel like my data model is bloated, which bleeds over into too many functions to do something that should be really simple. I would appreciate any feedback on the data model, the functions or anything else.

Here is the entire code if you want to look it over in full context: http://pastebin.com/Vymcd6Bj

Question 2

This isn't that much of a mess at all. Let's clean it up, then get a little fancy.

Right off the bat it seems a little funky to have a None Gender, as it appears you're not actually trying to account for agender individuals but instead providing for the failure to produce a Gender for a particular Client. Operations that might fail in Haskell usually signal so by returning a Maybe value, so let's drop None from the definition and see what needs changing.

data Gender = Male | Female | Unknown
 deriving (Show, Eq)

genderCount and cGender depend on None, let's start with the latter. c doesn't mean much as a prefix to me, so I'm going to call this function clientGender, but that's a stylistic preference. Changing this function to use Maybe is straightforward.

clientGender :: Client -> Maybe Gender
clientGender GovOrg {} = Nothing
clientGender Individual {..} = Just (gender person)
clientGender Company {..} = Just (gender contact)

Now let's turn to genderCount. The first thing to notice is that the function is a tiny wrapped up case statement. This is usually a code smell to me that hints at a missed opportunity to break functions into smaller, more independent pieces. In this case genderCount has nothing to do with Clients, so let's rip that part out!

genderCount :: Gender -> (Int, Int, Int) -- Counting Genders, not Clients!
genderCount Male = (1, 0, 0)
genderCount Female = (0, 1, 0)
genderCount Unknown = (0, 0, 1)

This is still unsatisfactory, right? There's that convention-typed tuple, and we've got a perfectly good GenderCount datatype lying around, so let's use it.

genderCount :: Gender -> GenderCount
genderCount Male = GenderCount 1 0 0
genderCount Female = GenderCount 0 1 0
genderCount Unknown = GenderCount 0 0 1

It's a small change, but users of GenderCount can rely on the field names rather than a tuple ordering.

And now countGenders, the piece that glues it all together. Our type is still correct, which is awesome! No changes there. The implementation though is doing a few different things we'll need to adjust. In an informational sense it's determining the genders of all of the clients, then accumulating a count of each Gender. What it looks like though is some weird tuple math to produce a GenderCount value from nowhere! We can rewrite it given our new implementations to be a little prettier, but first we're going to need a way to add two GenderCounts together.

addGenderCounts :: GenderCount -> GenderCount -> GenderCount
addGenderCounts (GenderCount m f u) (GenderCount n g v) = GenderCount (m + n) (f + g) (u + v)

A lot of repeated instances of GenderCount in there, but that's not so bad given we can use this as a combinator. Now we can put our countGenders function together.

countGenders :: [Client] -> GenderCount
countGenders = foldr (addGenderCounts . genderCount) (GenderCount 0 0 0) . mapMaybe clientGender

This works! I've imported mapMaybe from Data.Maybe here to account for our clientGender function sometimes returning Nothing (mapMaybe drops all the Nothings and returns a list of the Just values). We use a right fold to accumulate our GenderCount values, and a starting value of GenderCount 0 0 0 for our accumulator.

There are a few ways to go from here to clean things up further. You could get rid of the GenderCount 0 0 0 value by using foldr1, at the cost of adding another composition with map into the mix. If you have sharp eyes and a working knowledge of the Typeclassopedia you'll note a striking similarity between the way we use GenderCount with a right fold, and a Monoid. If you don't have a working knowledge of the Typeclassopedia, our motivation is that Monoids allow us to specify an identity element and an associative reduction operation, revealing some higher level abstractions (and functions) we can use to wire our code together. Let's make a Monoid.

instance Monoid GenderCount where
 mempty = GenderCount 0 0 0
 mappend = addGenderCounts
-- Laws:
-- mempty <> x = x
-- x <> mempty = x
-- x <> (y <> z) = (x <> y) <> z

I won't prove the Monoid laws, but you should be able to see that they are trivial given the properties of addition and our definition of GenderCount.

Let's try one more pass at countGenders now.

countGenders :: [Client] -> GenderCount
countGenders = foldMap genderCount . mapMaybe clientGender

Nice!

Question 3

Thanks a lot! You really got what I felt was at the heart of the issue for me - a misused data model. I had an inkling that Maybe would be better than a None gender, but it just bloated my code more. The mapMaybe function helps a bit with that and makes it a lot cleaner though. I'm aware of the Monoid type/laws and could see its applicability to the problem but wasn't sure how to proceed.

Question 4

Glad to help! I find the best way to learn about nifty little functions like mapMaybe is to read through the docs for base. Anywhere in the Data or Control namespaces is probably a good place to start.

bisserlis bisserlis 3,3311 gold badge13 silver badges17 bronze badges · Accepted Answer · 2014-05-05 06:27:17Z

This isn't that much of a mess at all. Let's clean it up, then get a little fancy.

Right off the bat it seems a little funky to have a None Gender, as it appears you're not actually trying to account for agender individuals but instead providing for the failure to produce a Gender for a particular Client. Operations that might fail in Haskell usually signal so by returning a Maybe value, so let's drop None from the definition and see what needs changing.

data Gender = Male | Female | Unknown
 deriving (Show, Eq)

genderCount and cGender depend on None, let's start with the latter. c doesn't mean much as a prefix to me, so I'm going to call this function clientGender, but that's a stylistic preference. Changing this function to use Maybe is straightforward.

clientGender :: Client -> Maybe Gender
clientGender GovOrg {} = Nothing
clientGender Individual {..} = Just (gender person)
clientGender Company {..} = Just (gender contact)

Now let's turn to genderCount. The first thing to notice is that the function is a tiny wrapped up case statement. This is usually a code smell to me that hints at a missed opportunity to break functions into smaller, more independent pieces. In this case genderCount has nothing to do with Clients, so let's rip that part out!

genderCount :: Gender -> (Int, Int, Int) -- Counting Genders, not Clients!
genderCount Male = (1, 0, 0)
genderCount Female = (0, 1, 0)
genderCount Unknown = (0, 0, 1)

This is still unsatisfactory, right? There's that convention-typed tuple, and we've got a perfectly good GenderCount datatype lying around, so let's use it.

genderCount :: Gender -> GenderCount
genderCount Male = GenderCount 1 0 0
genderCount Female = GenderCount 0 1 0
genderCount Unknown = GenderCount 0 0 1

It's a small change, but users of GenderCount can rely on the field names rather than a tuple ordering.

And now countGenders, the piece that glues it all together. Our type is still correct, which is awesome! No changes there. The implementation though is doing a few different things we'll need to adjust. In an informational sense it's determining the genders of all of the clients, then accumulating a count of each Gender. What it looks like though is some weird tuple math to produce a GenderCount value from nowhere! We can rewrite it given our new implementations to be a little prettier, but first we're going to need a way to add two GenderCounts together.

addGenderCounts :: GenderCount -> GenderCount -> GenderCount
addGenderCounts (GenderCount m f u) (GenderCount n g v) = GenderCount (m + n) (f + g) (u + v)

A lot of repeated instances of GenderCount in there, but that's not so bad given we can use this as a combinator. Now we can put our countGenders function together.

countGenders :: [Client] -> GenderCount
countGenders = foldr (addGenderCounts . genderCount) (GenderCount 0 0 0) . mapMaybe clientGender

This works! I've imported mapMaybe from Data.Maybe here to account for our clientGender function sometimes returning Nothing (mapMaybe drops all the Nothings and returns a list of the Just values). We use a right fold to accumulate our GenderCount values, and a starting value of GenderCount 0 0 0 for our accumulator.

There are a few ways to go from here to clean things up further. You could get rid of the GenderCount 0 0 0 value by using foldr1, at the cost of adding another composition with map into the mix. If you have sharp eyes and a working knowledge of the Typeclassopedia you'll note a striking similarity between the way we use GenderCount with a right fold, and a Monoid. If you don't have a working knowledge of the Typeclassopedia, our motivation is that Monoids allow us to specify an identity element and an associative reduction operation, revealing some higher level abstractions (and functions) we can use to wire our code together. Let's make a Monoid.

instance Monoid GenderCount where
 mempty = GenderCount 0 0 0
 mappend = addGenderCounts
-- Laws:
-- mempty <> x = x
-- x <> mempty = x
-- x <> (y <> z) = (x <> y) <> z

I won't prove the Monoid laws, but you should be able to see that they are trivial given the properties of addition and our definition of GenderCount.

Let's try one more pass at countGenders now.

countGenders :: [Client] -> GenderCount
countGenders = foldMap genderCount . mapMaybe clientGender

Nice!

Thanks a lot! You really got what I felt was at the heart of the issue for me - a misused data model. I had an inkling that Maybe would be better than a None gender, but it just bloated my code more. The mapMaybe function helps a bit with that and makes it a lot cleaner though. I'm aware of the Monoid type/laws and could see its applicability to the problem but wasn't sure how to proceed.
Glad to help! I find the best way to learn about nifty little functions like mapMaybe is to read through the docs for base. Anywhere in the Data or Control namespaces is probably a good place to start.

Stack Exchange Network

Counting items in categories

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Counting items in categories

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions