99 Haskell problems, problem 9: Pack function

Question 1

Today I started to tackle the 99 Haskell problems (http://www.haskell.org/haskellwiki/99_questions). I had to struggle with problem 9 which requires to Pack consecutive duplicates of list elements into sublists. If a list contains repeated elements they should be placed in separate sublists..

For example the input:

pack ['a', 'a', 'a', 'a', 'b', 'c', 'c', 'a', 'a', 'd', 'e', 'e', 'e', 'e']
["aaaa","b","cc","aa","d","eeee"]

I managed to create a pack function. However, my solution feels kind of overcomplicated:

pack :: [[Char]] -> [[Char]]
pack xs = foldl (\acc x -> if not(null acc) && x `Data.List.isPrefixOf` (last acc) 
 then (init acc) ++ [(last acc) ++ x] 
 else acc ++ [x]) [] xs

A test of my pack function:

pack ["a", "a", "b", "b", "b", "c", "a", "a", "c"]
["aa","bbb","c","aa","c"]

Can you give me a code review?
How can I simplify it? What could I improve to make this function more idiomatic?

Question 2

For the record, that is the same as the group function of Data.List. You could also inspect its source code.

Question 3

Let's improve the function step by step, focusin on readability as well as on the algorithm itself.

I believe the type of the function is not what it should be. We get a list of characters, and we should produce a list of lists:

pack1 :: [Char] -> [[Char]]
pack1 xs = foldl (\acc x -> if not(null acc) && [x] `isPrefixOf` (last acc) 
 then (init acc) ++ [(last acc) ++ [x]] 
 else acc ++ [[x]]) [] xs

The function doesn't require that we compare Chars. In such a case, it's better to make its type as generic as possible. Not only it makes it usage more generic, it also helps us ensure that we restrict to the specific problem of packing, not introducing anything particular about Chars. So the generalized type would be (Eq a) => [a] -> [[a]].
The major problem is that the function has O(n^2) time complexity. This is because calling last on a list, or appending someting to its end using ++ is O(n), and doing it n-times gets us to O(n^2). This is definitely something we'd like to avoid. The solution is easy: Instead of appending to the list, let's just prepend, and reverse it at the end. Since reverse is O(n), the result will be O(n) as well. Let's apply this idea to both the "big" list as well as its sublists:
```
pack2 :: (Eq a) => [a] -> [[a]]
pack2 xs = reverse . map reverse $
 foldl (\acc x -> if not (null acc) && [x] `isPrefixOf` (head acc)
 then (x : (head acc)) : (tail acc)
 else [x] : acc) [] xs
```
Most likely we could also omit map reverse, as the sublists always contain equal elements. This assumes that if two elements are equal, they're indistinguishable, which is usually a reasonable assumption. But I kept it there for completeness, and it doesn't affect the asymptotic complexity.
In Haskell it's customary to avoid if/then/else in favor of pattern matching. Not only it's often shorter, but it provides stronger guarantees and/or less required reasoning about the code. In the if condition, the reader needs to infer that using last or head on acc is allowed because we checked it's non-empty. If we instead factor out the inner function and use patternmatching,
```
pack3 :: (Eq a) => [a] -> [[a]]
pack3 = reverse . map reverse . foldl f []
 where f acc@(as@(a:_):ass) x | a == x = (x : as) : ass
 f acc x = [x] : acc
```
we don't need to use any potentially dangerous functions such as head/tail and do any checking.

Here we also introduced another small improvement called η-conversion (eta). The idea is that \x -> g x is equivalent to g.
We'd like to avoid calling reverse.

The problem is that we're using foldl to produce the final list. By its nature, if we use foldl for converting lists, it reverses the order of the elements (or causes O(n^2) time complexity). Although we want to consume the list from left to right, it's more appropriate to use foldr here. (I'd say that the only reason to use foldl is if the result needs the whole list - for example when summing a list.)

Let's start with implementing map using foldr. (I'll hide the example, if you want to try it yourself. If you don't, just hover the mouse over it.)

map' f = foldr (\x r -> f x : r) [] The idea is to express how to process a single element, knowing how the rest of the list is processed.

Let's apply the same principle to pack:
```
pack4 :: (Eq a) => [a] -> [[a]]
pack4 = foldr f []
 where f x acc@(as@(a:_):ass) | a == x = (x : as) : ass
 f x acc = [x] : acc
```
This isn't very different from the previous example. The main difference is that we (seemingly!) consume elements from the right, so we have no problem with reversing the order.
And still ... we can ask for more. The problem is that the function isn't lazy enough (even though we're using foldr). We'd like for example that
```
head $ pack4 ('a' : 'c' : repeat 'b')
```
produced "a", instead of looping forever - the required information is available. The reason for this is that we inspect the accumulator (by matching it with the pattern acc@(as@(a:_):ass) which is not irrefutable) at each iteration, which means that in order to produce a single element we first force the processing of the rest of the list.

Instead, we can create the result structure first to hold the current element without forcing the results of processing the rest of input list, so that this structure will be filled with more values only when/if it is so demanded:
```
pack5 :: (Eq a) => [a] -> [[a]]
pack5 = foldr f []
 where 
 f x r = (x:u):w 
 where 
 (u,w) = case r of 
 [] -> ([],[]) 
 (g:gs) | head g==x -> (g ,gs) 
 gs -> ([],gs)
```

See also the source code for group, which is also implemented with the lazy behavior.

Question 4

lazy code: ... where { f x r = (x:u):w where (u,w) = case r of { [] -> ([],[]) ; (g:gs) | head g==x -> (g,gs) ; gs -> ([],gs) }}. Separation of structure creation and filling. Saw it first in some answer by Daniel Fischer, but it's very similar to what is normally done, in Prolog.

Question 5

@WillNess I don't think it'd be fair to add it to my answer, as it was your idea. Better add it as your own answer. (Or feel free to edit my answer yourself.)

Question 6

Done editing. :)

Petr Petr 3,06018 silver badges33 bronze badges · Accepted Answer · 2014-01-27 22:19:01Z

Let's improve the function step by step, focusin on readability as well as on the algorithm itself.

I believe the type of the function is not what it should be. We get a list of characters, and we should produce a list of lists:

pack1 :: [Char] -> [[Char]]
pack1 xs = foldl (\acc x -> if not(null acc) && [x] `isPrefixOf` (last acc) 
 then (init acc) ++ [(last acc) ++ [x]] 
 else acc ++ [[x]]) [] xs

The function doesn't require that we compare Chars. In such a case, it's better to make its type as generic as possible. Not only it makes it usage more generic, it also helps us ensure that we restrict to the specific problem of packing, not introducing anything particular about Chars. So the generalized type would be (Eq a) => [a] -> [[a]].
The major problem is that the function has O(n^2) time complexity. This is because calling last on a list, or appending someting to its end using ++ is O(n), and doing it n-times gets us to O(n^2). This is definitely something we'd like to avoid. The solution is easy: Instead of appending to the list, let's just prepend, and reverse it at the end. Since reverse is O(n), the result will be O(n) as well. Let's apply this idea to both the "big" list as well as its sublists:
```
pack2 :: (Eq a) => [a] -> [[a]]
pack2 xs = reverse . map reverse $
 foldl (\acc x -> if not (null acc) && [x] `isPrefixOf` (head acc)
 then (x : (head acc)) : (tail acc)
 else [x] : acc) [] xs
```
Most likely we could also omit map reverse, as the sublists always contain equal elements. This assumes that if two elements are equal, they're indistinguishable, which is usually a reasonable assumption. But I kept it there for completeness, and it doesn't affect the asymptotic complexity.
In Haskell it's customary to avoid if/then/else in favor of pattern matching. Not only it's often shorter, but it provides stronger guarantees and/or less required reasoning about the code. In the if condition, the reader needs to infer that using last or head on acc is allowed because we checked it's non-empty. If we instead factor out the inner function and use patternmatching,
```
pack3 :: (Eq a) => [a] -> [[a]]
pack3 = reverse . map reverse . foldl f []
 where f acc@(as@(a:_):ass) x | a == x = (x : as) : ass
 f acc x = [x] : acc
```
we don't need to use any potentially dangerous functions such as head/tail and do any checking.

Here we also introduced another small improvement called η-conversion (eta). The idea is that \x -> g x is equivalent to g.
We'd like to avoid calling reverse.

The problem is that we're using foldl to produce the final list. By its nature, if we use foldl for converting lists, it reverses the order of the elements (or causes O(n^2) time complexity). Although we want to consume the list from left to right, it's more appropriate to use foldr here. (I'd say that the only reason to use foldl is if the result needs the whole list - for example when summing a list.)

Let's start with implementing map using foldr. (I'll hide the example, if you want to try it yourself. If you don't, just hover the mouse over it.)

map' f = foldr (\x r -> f x : r) [] The idea is to express how to process a single element, knowing how the rest of the list is processed.

Let's apply the same principle to pack:
```
pack4 :: (Eq a) => [a] -> [[a]]
pack4 = foldr f []
 where f x acc@(as@(a:_):ass) | a == x = (x : as) : ass
 f x acc = [x] : acc
```
This isn't very different from the previous example. The main difference is that we (seemingly!) consume elements from the right, so we have no problem with reversing the order.
And still ... we can ask for more. The problem is that the function isn't lazy enough (even though we're using foldr). We'd like for example that
```
head $ pack4 ('a' : 'c' : repeat 'b')
```
produced "a", instead of looping forever - the required information is available. The reason for this is that we inspect the accumulator (by matching it with the pattern acc@(as@(a:_):ass) which is not irrefutable) at each iteration, which means that in order to produce a single element we first force the processing of the rest of the list.

Instead, we can create the result structure first to hold the current element without forcing the results of processing the rest of input list, so that this structure will be filled with more values only when/if it is so demanded:
```
pack5 :: (Eq a) => [a] -> [[a]]
pack5 = foldr f []
 where 
 f x r = (x:u):w 
 where 
 (u,w) = case r of 
 [] -> ([],[]) 
 (g:gs) | head g==x -> (g ,gs) 
 gs -> ([],gs)
```

See also the source code for group, which is also implemented with the lazy behavior.

lazy code: ... where { f x r = (x:u):w where (u,w) = case r of { [] -> ([],[]) ; (g:gs) | head g==x -> (g,gs) ; gs -> ([],gs) }}. Separation of structure creation and filling. Saw it first in some answer by Daniel Fischer, but it's very similar to what is normally done, in Prolog.
@WillNess I don't think it'd be fair to add it to my answer, as it was your idea. Better add it as your own answer. (Or feel free to edit my answer yourself.)

Stack Exchange Network

99 Haskell problems, problem 9: Pack function

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

99 Haskell problems, problem 9: Pack function

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions