I decided to make a solution in Haskell to a problem that I found on another post here at CodeReview (link: The Beauty and the Strings) I would like suggestions on how to improve letterFreqs. I have tried searching for mutable arrays that could help with performance. But is it worth it? The array that is modified often is only 26 elements long. Haskell is new to me, so I find syntax of libraries such as Data.Vector, Data.Array.MArray, and Data.IO.Array difficult to grasp easily.
{-# OPTIONS_GHC -F -pgmF htfpp #-}
module Main where
import Test.Framework
import Data.Char (isLower, isUpper, ord)
import Data.List (zipWith, sort)
letterFreqs :: String -> [Int]
letterFreqs str =
freqs' str (replicate 26 0)
where freqs' [] fs = fs
freqs' (ch:tail) fs
| isLower ch = let index = (ord ch - ord 'a')
(front, back) = splitAt index fs
newval = 1 + fs !! index
in freqs' tail (front ++ newval : drop 1 back)
| isUpper ch = let index = (ord ch - ord 'A')
(front, back) = splitAt index fs
newval = 1 + fs !! index
in freqs' tail (front ++ newval : drop 1 back)
| otherwise = freqs' tail fs
computeMaxBeauty :: String -> Int
computeMaxBeauty = sum . zipWith (*) [1..26] . sort . letterFreqs
test_ABbCcc = assertEqual 152 $ computeMaxBeauty "ABbCcc"
test_This_is_from_Facebook_Hacker_Cup_2013_ =
assertEqual 551 $ computeMaxBeauty "This is from Facebook Hacker Cup 2013."
test_Ignore_punctuation__please___ =
assertEqual 491 $ computeMaxBeauty "Ignore punctuation, please :)"
test_Sometimes_test_cases_are_hard_to_make_up_ =
assertEqual 729 $ computeMaxBeauty "Sometimes, test cases are hard to make up."
test_CodeReview_is_love__CodeReview_is_life_ =
assertEqual 724 $ computeMaxBeauty "CodeReview is love. CodeReview is life."
main :: IO ()
main = htfMain htf_thisModulesTests
-
1\$\begingroup\$ There's a really handy function for this. The example is even calculating frequencies. \$\endgroup\$Carl– Carl2015年02月22日 21:59:56 +00:00Commented Feb 22, 2015 at 21:59
1 Answer 1
If you're going to use a list in Haskell, the last thing you want to do is continually modify random elements within it. Luckily, there's a better way.
Start by listing out all of the different things your function is doing.
- Drop characters that are not
'a'..'z'
or'A'..'Z'
. - Regardless of case, determine a letter's position in the alphabet.
- Tally a value that represents a specific letter in the alphabet.
- Return a list of alphabet letter frequencies.
By identifying the smallest bits of functionality that still make sense to talk about, we can write or reuse functions that do only one thing, and then compose them using higher order functions into something that does everything we need in one go.
Since this problem is dependent on characters and lists, your first stop as a new Haskell user should be to the Haddock documentation for Data.Char
and Data.List
to learn about what pre-existing functions you might be able to use for your solutions.
Tackling the first bullet point, dropping some elements of a list conditionally on their values should jump out at you as a filter
operation. If we look through the documentation for Data.Char
you may realize that Char
represents all Unicode characters—not just ASCII values—but we're only interested in the ASCII letters.
Knowing that Char
is an instance of Enum
(from the Data.Char
documentation), we can use list range notation to get just the Char
values we care about in a list. I.e., alphabet = ['a'..'z']
. Instead of expanding our range to include the uppercase versions of letters, we'll next write a function to lowercase our string using toLower :: Char -> Char
from Data.Char
.
lower :: String -> String
lower = map toLower
And assuming we have a lowercase string, it's easy to filter to just the characters we're interested in.
alphas :: String -> Char
alphas = filter (`elem` alphabet)
Notice that we're building functions that can be easily composed to produce the functionality our final program requires. Going from an arbitrary string to just one containing valid letters all in lower case is easy, alphas . lower
.
Going by the original statement of functionality, we next need to determine a letter's position in the alphabet so that we can then keep a tally of letter frequencies. In writing a functional version though we can be smarter. Instead of picking a letter and then finding the correct tally to increase, we rearrange the order of operations. First we group letters and then we count the number of elements in each group to produce a tally.
Perusing the documentation for Data.List
we'll pull out sort
and group
. sort
will rearrange our list of Char
, then group
will organize them into lists by equality. The last step is to run length encode the groups to produce a tally for each letter.
tally :: (Ord a) => [a] -> [(a, Int)]
tally = map runLength . group . sort
where
runLength :: [a] -> (a, Int)
runLength [] = error "runLength: Can't run length encode an empty list"
runLength xs@(x:_) = (x, length xs)
That error case should be unnecessary, group
shouldn't return any empty sub-lists, but we include it so that if our understanding is incorrect or future changes to the function ever break things we get a better error than a pattern-match failure (and the compiler stops complaining if we compile with -Wall
).
Now one thing to note is that if a letter isn't present at all in a given string, tally
won't return a pair for it. In your original computeMaxBeauty
that's a problem since it relies on a list of length 26 being returned. Since we have alphabet
which encodes letter value by position though we can write another version that is more flexible.
beauty :: (Char, Int) -> Int
beauty (a, n) = n * (fromMaybe 0 (elemIndex a alphabet) + 1)
Here we look up the index of a Char
in alphabet
, then use a utility function from Data.Maybe
in the event that Char
wasn't present. Because lists are 0-indexed we have to add 1.
And tying all of our simple single-purpose functions together,
computeMaxBeauty :: String -> Int
computeMaxBeauty = sum . map beauty . tally . alphas . lower
My working version, should be runnable.
import Data.Char (toLower)
import Data.List (group, sort)
import Data.Maybe (fromMaybe)
alphabet :: [Char]
alphabet = ['a'..'z']
lower :: String -> String
lower = map toLower
alphas :: String -> [Char]
alphas = filter (`elem` alphabet)
tally :: [Char] -> [(Char, Int)]
tally = map runLength . group . sort
where
runLength :: [a] -> (a, Int)
runLength [] = error "runLength: Can't run length encode an empty list"
runLength xs@(x:_) = (x, length xs)
-- Matches original functionality, unnecessary with changed computeMaxBeauty as below
letterFreqs :: String -> [Int]
letterFreqs = map ((subtract 1) . snd) . tally . (alphabet ++) . alphas . lower
computeMaxBeauty :: String -> Int
computeMaxBeauty = sum . map beauty . tally . alphas . lower
where
beauty :: (Char, Int) -> Int
beauty (a, n) = n * (fromMaybe 0 (elemIndex a alphabet) + 1)
-
\$\begingroup\$ Do the composed functions run efficiently? What I want to know is do the composed functions run through the list once or more than once? \$\endgroup\$Kevin Tindall– Kevin Tindall2015年02月23日 03:28:16 +00:00Commented Feb 23, 2015 at 3:28