Trying to solve this combinatorics problem, I wrote some functions to count the number of unordered distinct factorizations of an integer n
into k
distinct parts.
The following function builds up a list of unordered distinct factorizations of n
with largest part at most m
.
duf' :: Integer -> Integer -> [[Integer]]
duf' m 1 = [[]]
duf' 1 n = []
duf' 0 n = []
duf' m n | isPrime n = if m < n then [] else [[n]]
| otherwise = concatMap (\x -> map (\y -> x : y) $
duf' (x-1) (n `div` x) ) $ tail $
takeWhile (<= m) $ toList $ divisors n
It is then used below to generate the list of all unordered distinct factorizations of n
.
duf :: Integer -> [[Integer]]
duf n = duf' n n
To count the number of factorizations of k
parts only I use a simple (and inefficient ?) filter.
pd n k = length . filter (\x -> k == length x) $ duf n
As a beginner, I have the feeling this is not really Haskell-ish. Any help to improve the syntax is welcome !
My concern is that I would like to avoid filtering and (if possible) to count without generating an enormous list. Is there a way to "merge" pd
and duf'
? Or at least optimizing the thing ?
I use this two external functions : Math.NumberTheory.Primes.Factorisation.divisors
and Data.Numbers.Primes.isPrime
. Are there better choices ?
1 Answer 1
I took a couple passes (gist) at this and managed to improve the style a bit, but I couldn't move the needle on performance at all.
Style
My thoughts and justifications on important changes are written as in-line comments below.
module Main where
import qualified Math.NumberTheory.Primes.Factorisation as Factorisation
import Data.Numbers.Primes (isPrime)
import Data.Set (toAscList)
duf' :: Integer -> Integer -> [[Integer]]
duf' _ 1 = [[]] -- Underscores in place of unused variable names
duf' 1 _ = [ ]
duf' 0 _ = [ ]
duf' m n | isPrime n = if m < n then [] else [[n]]
| otherwise = concatMap (\x -> map (x:) $ duf' (x-1) (n `div` x)) -- Sectioning cons operator
(tail . takeWhile (<= m) $ divisors n) -- Reflowed for legibility
where
divisors = toAscList . Factorisation.divisors -- Locally bound to decrease datatype mismatch line noise
duf :: Integer -> [[Integer]]
duf n = duf' n n
pd :: Integer -> Int -> Int
pd n k = length . filter ((== k) . length) $ duf n -- Function composition for ordering
Some reorganizing of layout in duf'
made it a lot more comprehensible.
Performance
After rewriting duf'
, I noticed that the inner lists it returns are only ever really used for their length, so I came up with this version that tosses out the actual elements in favor of keeping a tally whenever a new element would be added instead.
duf' :: Integer -> Integer -> [Int]
duf' _ 1 = [0]
duf' 1 _ = [ ]
duf' 0 _ = [ ]
duf' m n | isPrime n = if m < n then [] else [1]
| otherwise = concatMap (\x -> map (+1) $ duf' (x-1) (n `div` x))
(tail . takeWhile (<= m) $ divisors n)
where
divisors = toAscList . Factorisation.divisors
duf :: Integer -> [Int]
duf n = duf' n n
pd :: Integer -> Int -> Int
pd n k = length . filter (== k) $ duf n
Unfortunately, this doesn't net us any wins in run time as I only really smeared out the inner call to length
from filter
in pd
across duf'
(this is less idiomatic, in my opinion). One potential positive is that this version may be more space efficient due to not keeping lists of elements around when all we really care about is the list spine anyway. I wasn't testing for that though, so I don't have any cold hard facts at hand and I suspect you'd have to use an argument strict version of (+1)
to prevent building up chains of thunks before realizing any real decrease in memory.
-
\$\begingroup\$ Could you elaborate a bit about using "an argument strict version of
(+1)
" ? \$\endgroup\$lbeziaud– lbeziaud2015年07月12日 14:45:07 +00:00Commented Jul 12, 2015 at 14:45 -
1\$\begingroup\$ Sure, you'd turn on
{-# LANGUAGE BangPatterns #-}
(put this compiler directive at the top of your file) then write a functionplus !m !n = m + n
. The exclamation points are strictness annotations enabled by theBangPatterns
extension, when Haskell evaluatesplus
it will evaluate both arguments to WHNF (Weak Head Normal Form), which forInt
s means they will be fully evaluated. If you have(1 + 2) + 3
GHC will normally defer evaluating(1 + 2)
until you need the value of the whole expression, causing a space leak. \$\endgroup\$R B– R B2015年07月12日 18:32:54 +00:00Commented Jul 12, 2015 at 18:32 -
\$\begingroup\$
plus (plus 1 2) 3
will be more eager, the outerplus
will evaluate its arguments and so the only thunk hanging around will be forplus 3 3
. \$\endgroup\$R B– R B2015年07月12日 18:33:37 +00:00Commented Jul 12, 2015 at 18:33