Permutations of a list in Haskell

Question 1

I have this piece of code which given a list returns another list which combines every element with every other element (without repetitions) using a given function as a combiner. I came up with three different version (2 of them listed below, fastest to slowest), and now I'm asking for help to optimize it/them even further. Currently, my perms3 does not utilize multithreading at all, and since I'm quite new to Haskell I can't seem to figure out why, but I'm thinking that in its current state my algorithm isn't parallelizable, even though it can be.

I've also concluded that the length of the output list will always be \$ \binom{x}{2} + x = x(x-1)/2 + x\$ where \$x\$ is the length of the input list, but I dont know how to use that in my algorithm either to minimize the allocations.

The reason the why I'm asking is because I want to know more about the inner workings of Haskell, how lazy-evaluation affects performance and what is considered good/bad Haskell code.

My two permutation functions:

perms3 c l = perms3' c l l
perms3' _ [] _ = []
perms3' c (a:b) [x] = c a x:perms3' c b b -- the outer loop
perms3' c l@(a:_) (h:t) = c a h:perms3' c l t -- the inner loop
perms2 _ [] = [] --this can probably be 2 folds, the inner fold might be parallelizable?
perms2 c l@(h:t)= foldr (\x acc -> c h x:acc) [] l ++ perms2 c t

Input: (,) ['a'..'c']
Output: [('a','a'),('a','b'),('a','c'),('b','b'),('b','c'),('c','c')]
 
Input (+) [1..5]
Output: [2,3,4,5,6,4,5,6,7,6,7,8,8,9,10]

Unsurprisingly, perms3 is a fair bit faster than perms2.

Testing perms3:

import Control.Monad
import System.Environment
main :: IO ()
main = liftM (length . perms3 (,) . enumFromTo 1 . read . head) getArgs >>= print
perms3 :: (a -> a -> t) -> [a] -> [t] 
perms3 c l = perms3' c l l
perms3' _ [] _ = []
perms3' c (a:b) [x] = c a x:perms3' c b b
perms3' c l@(a:_) (h:t) = c a h:perms3' c l t

Running it:

$ ghc -O3 main && time ./main 10000
50005000
real 0m0.615s
$ ghc -O3 main && time ./main 30000
450015000
real 0m5.347s

Question(s):

Is this a good way to write Haskell code?
Can I squeeze any more performance out of my function?
Should I prefer the imperative do-notation instead of arrows in my main function?

Question 2

3. Monadic notation

Let's start with your last question, as it has a clichéd answer: Do whatever you prefer. I often find the bind operator >>= nicer to read, but switch to do-notation as soon as the code gets complicated. Another rule of thumb is to use do-notation if the code is easier to express imperatively. Both notations are equivalent, there is no performance impact.

1. Idiomatic Haskell

Haskell has a few conventions about variable naming. Lists are commonly referred to as xs or ys (think plural of x), while function parameters are named f or g. Type parameters are almost always taken from the first letters of the alphabet.

perms3 :: (a -> a -> b) -> [a] -> [b] 
perms3 f xs = perms3' f xs xs
perms3' _ [] _ = []
perms3' c (x:xs) [y] = f y z : perms3' f xs xs
perms3' c xs'@(x:_) (y:ys) = f x y : perms3' f xs' ys

Since perms3' is only used within perms3, it would make sense to define it as a local function. This has the additional benefit that we can drop f from its parameter list, since it's already in scope and won't change.

perms3 :: (a -> a -> b) -> [a] -> [b] 
perms3 f xs = perms3' xs xs
 where
 perms3' [] _ = []
 perms3' (x:xs) [y] = f y z:perms3' f xs xs
 perms3' xs'@(x:_) (y:ys) = f x y:perms3' f xs' ys

2. Performance improvement

Pattern matching on lists usually takes two cases: empty and non-empty lists. This can reduce code (DRY etc) and actually improve performance, as tests for empty lists are faster than tests for singleton lists.

perms4 :: (a -> a -> b) -> [a] -> [b] 
perms4 f xs = perms4' xs xs
 where
 perms4' [] _ = []
 perms4' (_:ys) [] = perms4' ys ys
 perms4' ys'@(y:_) (z:zs) = f y z : perms4' ys' zs

A quick test gave me a 20% improvement compared to perms3.

Notes on `perms2`

The fold in perms2 is just a map. Using partial application, it could be written as

perms5 _ [] = []
perms5 f xs'@(x:xs) = map (f x) xs' ++ perms5 f xs
-- faster:
-- perms5 f xs'@(x:xs) = foldr ((:) . f x) [] xs' ++ perms5 f xs

However, this is (somewhat surprisingly) slower than the fold. AFAICT, this is due to the fact that ghc won't inline map, but will do so with foldr.

Finally, here is how I (naïvely) would have written the code. Looks nice, performs terribly.

import Data.List (concatMap, tails)
perms6 f = concatMap pairs . tails
 where
 pairs [] = []
 pairs (ys'@(y:_)) = map (f y) ys'

or

perms7 f xs = concat $ zipWith (map . f) xs (tails xs)

Question 3

Thank you for your thorough answer, I especially like that you included another ways of writing it. Your version(s) clearly outdo mine in readability. I've yet to benchmark and compare them, but it is more about a good style of code than microoptimizations.

tarleb tarlebtarleb 3083 silver badges9 bronze badges · Accepted Answer · 2015-07-24 22:16:26Z

3. Monadic notation

Let's start with your last question, as it has a clichéd answer: Do whatever you prefer. I often find the bind operator >>= nicer to read, but switch to do-notation as soon as the code gets complicated. Another rule of thumb is to use do-notation if the code is easier to express imperatively. Both notations are equivalent, there is no performance impact.

1. Idiomatic Haskell

Haskell has a few conventions about variable naming. Lists are commonly referred to as xs or ys (think plural of x), while function parameters are named f or g. Type parameters are almost always taken from the first letters of the alphabet.

perms3 :: (a -> a -> b) -> [a] -> [b] 
perms3 f xs = perms3' f xs xs
perms3' _ [] _ = []
perms3' c (x:xs) [y] = f y z : perms3' f xs xs
perms3' c xs'@(x:_) (y:ys) = f x y : perms3' f xs' ys

Since perms3' is only used within perms3, it would make sense to define it as a local function. This has the additional benefit that we can drop f from its parameter list, since it's already in scope and won't change.

perms3 :: (a -> a -> b) -> [a] -> [b] 
perms3 f xs = perms3' xs xs
 where
 perms3' [] _ = []
 perms3' (x:xs) [y] = f y z:perms3' f xs xs
 perms3' xs'@(x:_) (y:ys) = f x y:perms3' f xs' ys

2. Performance improvement

Pattern matching on lists usually takes two cases: empty and non-empty lists. This can reduce code (DRY etc) and actually improve performance, as tests for empty lists are faster than tests for singleton lists.

perms4 :: (a -> a -> b) -> [a] -> [b] 
perms4 f xs = perms4' xs xs
 where
 perms4' [] _ = []
 perms4' (_:ys) [] = perms4' ys ys
 perms4' ys'@(y:_) (z:zs) = f y z : perms4' ys' zs

A quick test gave me a 20% improvement compared to perms3.

Notes on `perms2`

The fold in perms2 is just a map. Using partial application, it could be written as

perms5 _ [] = []
perms5 f xs'@(x:xs) = map (f x) xs' ++ perms5 f xs
-- faster:
-- perms5 f xs'@(x:xs) = foldr ((:) . f x) [] xs' ++ perms5 f xs

However, this is (somewhat surprisingly) slower than the fold. AFAICT, this is due to the fact that ghc won't inline map, but will do so with foldr.

Finally, here is how I (naïvely) would have written the code. Looks nice, performs terribly.

import Data.List (concatMap, tails)
perms6 f = concatMap pairs . tails
 where
 pairs [] = []
 pairs (ys'@(y:_)) = map (f y) ys'

or

perms7 f xs = concat $ zipWith (map . f) xs (tails xs)

Thank you for your thorough answer, I especially like that you included another ways of writing it. Your version(s) clearly outdo mine in readability. I've yet to benchmark and compare them, but it is more about a good style of code than microoptimizations.

Stack Exchange Network

Permutations of a list in Haskell

1 Answer 1

3. Monadic notation

1. Idiomatic Haskell

2. Performance improvement

Notes on `perms2`

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Permutations of a list in Haskell

1 Answer 1

3. Monadic notation

1. Idiomatic Haskell

2. Performance improvement

Notes on perms2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

Notes on `perms2`