Beauty and the Strings in Haskell

Question 1

I decided to make a solution in Haskell to a problem that I found on another post here at CodeReview (link: The Beauty and the Strings) I would like suggestions on how to improve letterFreqs. I have tried searching for mutable arrays that could help with performance. But is it worth it? The array that is modified often is only 26 elements long. Haskell is new to me, so I find syntax of libraries such as Data.Vector, Data.Array.MArray, and Data.IO.Array difficult to grasp easily.

{-# OPTIONS_GHC -F -pgmF htfpp #-}
module Main where
import Test.Framework
import Data.Char (isLower, isUpper, ord)
import Data.List (zipWith, sort)
letterFreqs :: String -> [Int]
letterFreqs str =
 freqs' str (replicate 26 0)
 where freqs' [] fs = fs
 freqs' (ch:tail) fs
 | isLower ch = let index = (ord ch - ord 'a')
 (front, back) = splitAt index fs
 newval = 1 + fs !! index
 in freqs' tail (front ++ newval : drop 1 back)
 | isUpper ch = let index = (ord ch - ord 'A')
 (front, back) = splitAt index fs
 newval = 1 + fs !! index
 in freqs' tail (front ++ newval : drop 1 back)
 | otherwise = freqs' tail fs
computeMaxBeauty :: String -> Int
computeMaxBeauty = sum . zipWith (*) [1..26] . sort . letterFreqs
test_ABbCcc = assertEqual 152 $ computeMaxBeauty "ABbCcc"
test_This_is_from_Facebook_Hacker_Cup_2013_ =
 assertEqual 551 $ computeMaxBeauty "This is from Facebook Hacker Cup 2013."
test_Ignore_punctuation__please___ =
 assertEqual 491 $ computeMaxBeauty "Ignore punctuation, please :)"
test_Sometimes_test_cases_are_hard_to_make_up_ =
 assertEqual 729 $ computeMaxBeauty "Sometimes, test cases are hard to make up."
test_CodeReview_is_love__CodeReview_is_life_ =
 assertEqual 724 $ computeMaxBeauty "CodeReview is love. CodeReview is life."
main :: IO ()
main = htfMain htf_thisModulesTests

Question 2

There's a really handy function for this. The example is even calculating frequencies.

Question 3

If you're going to use a list in Haskell, the last thing you want to do is continually modify random elements within it. Luckily, there's a better way.

Start by listing out all of the different things your function is doing.

Drop characters that are not 'a'..'z' or 'A'..'Z'.
Regardless of case, determine a letter's position in the alphabet.
Tally a value that represents a specific letter in the alphabet.
Return a list of alphabet letter frequencies.

By identifying the smallest bits of functionality that still make sense to talk about, we can write or reuse functions that do only one thing, and then compose them using higher order functions into something that does everything we need in one go.

Since this problem is dependent on characters and lists, your first stop as a new Haskell user should be to the Haddock documentation for Data.Char and Data.List to learn about what pre-existing functions you might be able to use for your solutions.

Tackling the first bullet point, dropping some elements of a list conditionally on their values should jump out at you as a filter operation. If we look through the documentation for Data.Char you may realize that Char represents all Unicode characters—not just ASCII values—but we're only interested in the ASCII letters.

Knowing that Char is an instance of Enum (from the Data.Char documentation), we can use list range notation to get just the Char values we care about in a list. I.e., alphabet = ['a'..'z']. Instead of expanding our range to include the uppercase versions of letters, we'll next write a function to lowercase our string using toLower :: Char -> Char from Data.Char.

lower :: String -> String
lower = map toLower

And assuming we have a lowercase string, it's easy to filter to just the characters we're interested in.

alphas :: String -> Char
alphas = filter (`elem` alphabet)

Notice that we're building functions that can be easily composed to produce the functionality our final program requires. Going from an arbitrary string to just one containing valid letters all in lower case is easy, alphas . lower.

Going by the original statement of functionality, we next need to determine a letter's position in the alphabet so that we can then keep a tally of letter frequencies. In writing a functional version though we can be smarter. Instead of picking a letter and then finding the correct tally to increase, we rearrange the order of operations. First we group letters and then we count the number of elements in each group to produce a tally.

Perusing the documentation for Data.List we'll pull out sort and group. sort will rearrange our list of Char, then group will organize them into lists by equality. The last step is to run length encode the groups to produce a tally for each letter.

tally :: (Ord a) => [a] -> [(a, Int)]
tally = map runLength . group . sort
 where
 runLength :: [a] -> (a, Int)
 runLength [] = error "runLength: Can't run length encode an empty list"
 runLength xs@(x:_) = (x, length xs)

That error case should be unnecessary, group shouldn't return any empty sub-lists, but we include it so that if our understanding is incorrect or future changes to the function ever break things we get a better error than a pattern-match failure (and the compiler stops complaining if we compile with -Wall).

Now one thing to note is that if a letter isn't present at all in a given string, tally won't return a pair for it. In your original computeMaxBeauty that's a problem since it relies on a list of length 26 being returned. Since we have alphabet which encodes letter value by position though we can write another version that is more flexible.

beauty :: (Char, Int) -> Int
beauty (a, n) = n * (fromMaybe 0 (elemIndex a alphabet) + 1)

Here we look up the index of a Char in alphabet, then use a utility function from Data.Maybe in the event that Char wasn't present. Because lists are 0-indexed we have to add 1.

And tying all of our simple single-purpose functions together,

computeMaxBeauty :: String -> Int
computeMaxBeauty = sum . map beauty . tally . alphas . lower

My working version, should be runnable.

import Data.Char (toLower)
import Data.List (group, sort)
import Data.Maybe (fromMaybe)
alphabet :: [Char]
alphabet = ['a'..'z']
lower :: String -> String
lower = map toLower
alphas :: String -> [Char]
alphas = filter (`elem` alphabet)
tally :: [Char] -> [(Char, Int)]
tally = map runLength . group . sort
 where
 runLength :: [a] -> (a, Int)
 runLength [] = error "runLength: Can't run length encode an empty list"
 runLength xs@(x:_) = (x, length xs)
-- Matches original functionality, unnecessary with changed computeMaxBeauty as below
letterFreqs :: String -> [Int]
letterFreqs = map ((subtract 1) . snd) . tally . (alphabet ++) . alphas . lower
computeMaxBeauty :: String -> Int
computeMaxBeauty = sum . map beauty . tally . alphas . lower
 where
 beauty :: (Char, Int) -> Int
 beauty (a, n) = n * (fromMaybe 0 (elemIndex a alphabet) + 1)

Question 4

Do the composed functions run efficiently? What I want to know is do the composed functions run through the list once or more than once?

bisserlis bisserlis 3,3311 gold badge13 silver badges17 bronze badges · Accepted Answer · 2015-02-22 22:46:59Z

If you're going to use a list in Haskell, the last thing you want to do is continually modify random elements within it. Luckily, there's a better way.

Start by listing out all of the different things your function is doing.

Drop characters that are not 'a'..'z' or 'A'..'Z'.
Regardless of case, determine a letter's position in the alphabet.
Tally a value that represents a specific letter in the alphabet.
Return a list of alphabet letter frequencies.

By identifying the smallest bits of functionality that still make sense to talk about, we can write or reuse functions that do only one thing, and then compose them using higher order functions into something that does everything we need in one go.

Since this problem is dependent on characters and lists, your first stop as a new Haskell user should be to the Haddock documentation for Data.Char and Data.List to learn about what pre-existing functions you might be able to use for your solutions.

Tackling the first bullet point, dropping some elements of a list conditionally on their values should jump out at you as a filter operation. If we look through the documentation for Data.Char you may realize that Char represents all Unicode characters—not just ASCII values—but we're only interested in the ASCII letters.

Knowing that Char is an instance of Enum (from the Data.Char documentation), we can use list range notation to get just the Char values we care about in a list. I.e., alphabet = ['a'..'z']. Instead of expanding our range to include the uppercase versions of letters, we'll next write a function to lowercase our string using toLower :: Char -> Char from Data.Char.

lower :: String -> String
lower = map toLower

And assuming we have a lowercase string, it's easy to filter to just the characters we're interested in.

alphas :: String -> Char
alphas = filter (`elem` alphabet)

Notice that we're building functions that can be easily composed to produce the functionality our final program requires. Going from an arbitrary string to just one containing valid letters all in lower case is easy, alphas . lower.

Going by the original statement of functionality, we next need to determine a letter's position in the alphabet so that we can then keep a tally of letter frequencies. In writing a functional version though we can be smarter. Instead of picking a letter and then finding the correct tally to increase, we rearrange the order of operations. First we group letters and then we count the number of elements in each group to produce a tally.

Perusing the documentation for Data.List we'll pull out sort and group. sort will rearrange our list of Char, then group will organize them into lists by equality. The last step is to run length encode the groups to produce a tally for each letter.

tally :: (Ord a) => [a] -> [(a, Int)]
tally = map runLength . group . sort
 where
 runLength :: [a] -> (a, Int)
 runLength [] = error "runLength: Can't run length encode an empty list"
 runLength xs@(x:_) = (x, length xs)

That error case should be unnecessary, group shouldn't return any empty sub-lists, but we include it so that if our understanding is incorrect or future changes to the function ever break things we get a better error than a pattern-match failure (and the compiler stops complaining if we compile with -Wall).

Now one thing to note is that if a letter isn't present at all in a given string, tally won't return a pair for it. In your original computeMaxBeauty that's a problem since it relies on a list of length 26 being returned. Since we have alphabet which encodes letter value by position though we can write another version that is more flexible.

beauty :: (Char, Int) -> Int
beauty (a, n) = n * (fromMaybe 0 (elemIndex a alphabet) + 1)

Here we look up the index of a Char in alphabet, then use a utility function from Data.Maybe in the event that Char wasn't present. Because lists are 0-indexed we have to add 1.

And tying all of our simple single-purpose functions together,

computeMaxBeauty :: String -> Int
computeMaxBeauty = sum . map beauty . tally . alphas . lower

My working version, should be runnable.

import Data.Char (toLower)
import Data.List (group, sort)
import Data.Maybe (fromMaybe)
alphabet :: [Char]
alphabet = ['a'..'z']
lower :: String -> String
lower = map toLower
alphas :: String -> [Char]
alphas = filter (`elem` alphabet)
tally :: [Char] -> [(Char, Int)]
tally = map runLength . group . sort
 where
 runLength :: [a] -> (a, Int)
 runLength [] = error "runLength: Can't run length encode an empty list"
 runLength xs@(x:_) = (x, length xs)
-- Matches original functionality, unnecessary with changed computeMaxBeauty as below
letterFreqs :: String -> [Int]
letterFreqs = map ((subtract 1) . snd) . tally . (alphabet ++) . alphas . lower
computeMaxBeauty :: String -> Int
computeMaxBeauty = sum . map beauty . tally . alphas . lower
 where
 beauty :: (Char, Int) -> Int
 beauty (a, n) = n * (fromMaybe 0 (elemIndex a alphabet) + 1)

Do the composed functions run efficiently? What I want to know is do the composed functions run through the list once or more than once?

Stack Exchange Network

Beauty and the Strings in Haskell

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Beauty and the Strings in Haskell

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions