Today I started to tackle the 99 Haskell problems (http://www.haskell.org/haskellwiki/99_questions). I had to struggle with problem 9 which requires to Pack consecutive duplicates of list elements into sublists. If a list contains repeated elements they should be placed in separate sublists..
For example the input:
pack ['a', 'a', 'a', 'a', 'b', 'c', 'c', 'a', 'a', 'd', 'e', 'e', 'e', 'e']
["aaaa","b","cc","aa","d","eeee"]
I managed to create a pack
function. However, my solution feels kind of
overcomplicated:
pack :: [[Char]] -> [[Char]]
pack xs = foldl (\acc x -> if not(null acc) && x `Data.List.isPrefixOf` (last acc)
then (init acc) ++ [(last acc) ++ x]
else acc ++ [x]) [] xs
A test of my pack
function:
pack ["a", "a", "b", "b", "b", "c", "a", "a", "c"]
["aa","bbb","c","aa","c"]
- Can you give me a code review?
- How can I simplify it? What could I improve to make this function more idiomatic?
1 Answer 1
Let's improve the function step by step, focusin on readability as well as on the algorithm itself.
I believe the type of the function is not what it should be. We get a list of characters, and we should produce a list of lists:
pack1 :: [Char] -> [[Char]] pack1 xs = foldl (\acc x -> if not(null acc) && [x] `isPrefixOf` (last acc) then (init acc) ++ [(last acc) ++ [x]] else acc ++ [[x]]) [] xs
The function doesn't require that we compare
Char
s. In such a case, it's better to make its type as generic as possible. Not only it makes it usage more generic, it also helps us ensure that we restrict to the specific problem of packing, not introducing anything particular aboutChar
s. So the generalized type would be(Eq a) => [a] -> [[a]]
.The major problem is that the function has O(n^2) time complexity. This is because calling
last
on a list, or appending someting to its end using++
is O(n), and doing it n-times gets us to O(n^2). This is definitely something we'd like to avoid. The solution is easy: Instead of appending to the list, let's just prepend, and reverse it at the end. Sincereverse
is O(n), the result will be O(n) as well. Let's apply this idea to both the "big" list as well as its sublists:pack2 :: (Eq a) => [a] -> [[a]] pack2 xs = reverse . map reverse $ foldl (\acc x -> if not (null acc) && [x] `isPrefixOf` (head acc) then (x : (head acc)) : (tail acc) else [x] : acc) [] xs
Most likely we could also omit
map reverse
, as the sublists always contain equal elements. This assumes that if two elements are equal, they're indistinguishable, which is usually a reasonable assumption. But I kept it there for completeness, and it doesn't affect the asymptotic complexity.In Haskell it's customary to avoid
if/then/else
in favor of pattern matching. Not only it's often shorter, but it provides stronger guarantees and/or less required reasoning about the code. In theif
condition, the reader needs to infer that usinglast
orhead
onacc
is allowed because we checked it's non-empty. If we instead factor out the inner function and use patternmatching,pack3 :: (Eq a) => [a] -> [[a]] pack3 = reverse . map reverse . foldl f [] where f acc@(as@(a:_):ass) x | a == x = (x : as) : ass f acc x = [x] : acc
we don't need to use any potentially dangerous functions such as
head/tail
and do any checking.Here we also introduced another small improvement called η-conversion (eta). The idea is that
\x -> g x
is equivalent tog
.We'd like to avoid calling
reverse
.The problem is that we're using
foldl
to produce the final list. By its nature, if we usefoldl
for converting lists, it reverses the order of the elements (or causes O(n^2) time complexity). Although we want to consume the list from left to right, it's more appropriate to usefoldr
here. (I'd say that the only reason to usefoldl
is if the result needs the whole list - for example when summing a list.)Let's start with implementing
map
usingfoldr
. (I'll hide the example, if you want to try it yourself. If you don't, just hover the mouse over it.)map' f = foldr (\x r -> f x : r) []
The idea is to express how to process a single element, knowing how the rest of the list is processed.Let's apply the same principle to
pack
:pack4 :: (Eq a) => [a] -> [[a]] pack4 = foldr f [] where f x acc@(as@(a:_):ass) | a == x = (x : as) : ass f x acc = [x] : acc
This isn't very different from the previous example. The main difference is that we (seemingly!) consume elements from the right, so we have no problem with reversing the order.
And still ... we can ask for more. The problem is that the function isn't lazy enough (even though we're using
foldr
). We'd like for example thathead $ pack4 ('a' : 'c' : repeat 'b')
produced "a", instead of looping forever - the required information is available. The reason for this is that we inspect the accumulator (by matching it with the pattern
acc@(as@(a:_):ass)
which is not irrefutable) at each iteration, which means that in order to produce a single element we first force the processing of the rest of the list.Instead, we can create the result structure first to hold the current element without forcing the results of processing the rest of input list, so that this structure will be filled with more values only when/if it is so demanded:
pack5 :: (Eq a) => [a] -> [[a]] pack5 = foldr f [] where f x r = (x:u):w where (u,w) = case r of [] -> ([],[]) (g:gs) | head g==x -> (g ,gs) gs -> ([],gs)
See also the source code for group, which is also implemented with the lazy behavior.
-
1\$\begingroup\$ lazy code:
... where { f x r = (x:u):w where (u,w) = case r of { [] -> ([],[]) ; (g:gs) | head g==x -> (g,gs) ; gs -> ([],gs) }}
. Separation of structure creation and filling. Saw it first in some answer by Daniel Fischer, but it's very similar to what is normally done, in Prolog. \$\endgroup\$Will Ness– Will Ness2014年01月30日 16:40:48 +00:00Commented Jan 30, 2014 at 16:40 -
\$\begingroup\$ @WillNess I don't think it'd be fair to add it to my answer, as it was your idea. Better add it as your own answer. (Or feel free to edit my answer yourself.) \$\endgroup\$Petr– Petr2014年02月01日 15:05:34 +00:00Commented Feb 1, 2014 at 15:05
-
\$\begingroup\$ Done editing. :) \$\endgroup\$Will Ness– Will Ness2014年02月01日 15:35:55 +00:00Commented Feb 1, 2014 at 15:35
group
function ofData.List
. You could also inspect its source code. \$\endgroup\$