I am still pretty new to Haskell and am working on a graph generator for undirected unlabeled graphs containing loops and I have a bottleneck in the following functions. So far, I have not given any thought to performance and just looked for correctness.
I know this kind of question is not really popular, but I'd be interested in general guidelines for improving performance given some naive but correct implementation like the following.
For instance:
- Where to enforce strictness?
- Should I use vector/array instead of lists? When, where and why?
- Should I improve single functions by replacing recursion using folds/maps/etc. or similar?
Additional info:
- I'm using -O2 with
ghc
- profiling shows (when normalizing runtime of
connections
to 100%)arqSeq
40% (with almost all time spend inboundsequences
)connectionCombinations
60% with a quarter of the time spend inoccurences
I don't think you need to understand all the details. I'm looking more for micro-improvements (the following determines arcs from a given node to equivalence classes of vertices grouped by their degree).
import Data.List
import Control.Monad
type Arcs = Int
type Count = Int
type Vertex = Int
type MSequence = [Int]
data EquivClass = EC { order :: Int, verts :: [Vertex] } deriving (Eq, Show)
type ECPartition = [EquivClass]
type NodeSelection = [(Arcs,Count)]
type ECNodeSelection = (EquivClass, NodeSelection)
-- number of occurences of unique elements in a list
occurences :: (Ord a) => [a] -> [(a, Int)]
occurences = map (\xs@(x:_) -> (x, length xs)) . group . reverse . sort
-- number of vertices in an equivalence class
ecLength :: EquivClass -> Int
ecLength = length . verts
-- for a given y = (y_1,...,y_n) and a bound m, find all vectors
-- x = (x_1,...,x_n) such that |x| = m and x_i <= y_i
boundSequences :: (Num a, Ord a, Enum a) => a -> [a] -> [[a]]
boundSequences m x | m <= sum x = (fByM . sequence . ranges) x
| otherwise = [[]]
where fByM = filter (\x -> sum x == m)
ranges = map (\x -> [0..x])
-- return m-sequences (combinations of number of arcs) for the given
-- order to an ECPartition
arcSeq :: Int -> ECPartition -> [MSequence]
arcSeq m x = boundSequences m (map ecProd x)
where ecProd e = ecLength e * order e
-- return all the possible arc combinations to an equivalence
-- class for a given number of arcs
connectionCombinations :: Int -> EquivClass -> [NodeSelection]
connectionCombinations arcs = map groupOcc . prune arcs . sequence . orderRep
where orderRep (EC o v) = replicate (length v) [0..o]
prune a = nub . map (reverse . sort) . filter ((== a) . sum)
groupOcc = filter ((/= 0) . fst) . occurences
-- return all the possible lists of equivalence class node selections
-- from a Partition and a givne number of arcs
connections :: ECPartition -> Int -> [[ECNodeSelection]]
connections p order = concatMap (con p) $ arcSeq order p
where con p arcs = map (zip p) $ zipWithM connectionCombinations arcs p
main = do
-- small example. In the complete app this is called 10,000's of
-- times in case of high degrees or vertex counts
let cons = connections [EC 4 [1], EC 1 [2,3,4]] 5
mapM_ (print) cons
-
2\$\begingroup\$ Possibly the most important thing is to figure out how to effectively measure performance, and determine based on that data where the bottlenecks are. Haskell's performance can be very hard to accurately predict without quantitative measurements. \$\endgroup\$Matt Fenwick– Matt Fenwick2011年12月01日 18:03:30 +00:00Commented Dec 1, 2011 at 18:03
-
3\$\begingroup\$ Performance can be very hard to predict accurately without measurements in any language. Laziness just makes it harder to analyze those results (also in any language, but pervasive in Haskell). \$\endgroup\$C. A. McCann– C. A. McCann2011年12月01日 18:09:56 +00:00Commented Dec 1, 2011 at 18:09
-
\$\begingroup\$ @MattFenwick Please see my update, I have chosen this bunch of functions after profiling. Sorry I forgot to mention this. \$\endgroup\$bbtrb– bbtrb2011年12月01日 18:14:55 +00:00Commented Dec 1, 2011 at 18:14
-
\$\begingroup\$ Could you put up an example that compiles? \$\endgroup\$Nathan Howell– Nathan Howell2011年12月01日 18:20:39 +00:00Commented Dec 1, 2011 at 18:20
-
\$\begingroup\$ @NathanHowell: Code updated, should compile and run \$\endgroup\$bbtrb– bbtrb2011年12月01日 18:26:16 +00:00Commented Dec 1, 2011 at 18:26
1 Answer 1
-- for a given y = (y_1,...,y_n) and a bound m, find all vectors
-- x = (x_1,...,x_n) such that |x| = m and x_i <= y_i
boundSequences :: (Num a, Ord a, Enum a) => a -> [a] -> [[a]]
boundSequences m x | m <= sum x = (fByM . sequence . ranges) x
| otherwise = [[]]
where fByM = filter (\x -> sum x == m)
ranges = map (\x -> [0..x])
Excepting some fortunate cases, you are wasting a lot of work here. sequence . ranges
produces (y_1+1) * ... * (y_n+1)
lists you have to check. If you write a function to produce only the successful lists, you can gain a lot of performance for larger input (it won't do too much for short lists with small elements, but it will help for those too).
Shouldn't the case sum x < m
return []
rather than [[]]
?
-- return all the possible arc combinations to an equivalence
-- class for a given number of arcs
connectionCombinations :: Int -> EquivClass -> [NodeSelection]
connectionCombinations arcs = map groupOcc . prune arcs . sequence . orderRep
where orderRep (EC o v) = replicate (length v) [0..o]
prune a = nub . map (reverse . sort) . filter ((== a) . sum)
groupOcc = filter ((/= 0) . fst) . occurences
Instead of reverse . sort
, use sortBy (flip compare)
. That won't make much difference for short lists, but it's cleaner, IMO. prune arcs . sequence
contains another occurrence of the boundSequences
problem, creating a lot of lists and filtering out most of them immediately. nub
is not good, it is quadratic (worst case, O(total*distinct) in general). The things you're nubbing have an Ord
instance, if the order doesn't matter, map head . group . sort
is a much faster nub than nub
(still better Data.Set.toList . Data.Set.fromList
), if order matters, keep a Set
of seen elements and for each new, output that and add it to the set of seen.
I've not yet profiled, but it may be that the profile is misleading because it attributes costs to the functions where results are forced rather than where they are calculated, that needs some investigation, check back later.
Okay, the given data doesn't run long enough to produce a meaningful profile, so I changed it to
let cons = connections [EC 4 [1 .. 6], EC 6 [2 .. 9]] 5
That runs long enough to collect a handful of samples. Unfortunately, the profile wasn't very informative:
total time = 7.90 secs (395 ticks @ 20 ms)
total alloc = 8,098,386,128 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
connectionCombinations Main 100.0 100.0
So I inserted a couple of {-# SCC #-}
pragmas. The biggo in that programme is the sequence . orderRep
, unsurprisingly. A rewrite along the lines mentioned above,
boundSequences :: (Num a, Ord a, Enum a) => a -> [a] -> [[a]]
boundSequences m xs
| sm < m = []
| sm == m = [xs]
| otherwise = go sm m xs
where
sm = sum xs
go _ r []
| r == 0 = [[]]
| otherwise = []
go _ r [y]
| y < r = []
| otherwise = [[r]]
go s r (y:ys) = do
let mny | s < r+y = r+y-s
| otherwise = 0
mxy = min y r
c <- [mny .. mxy]
map (c:) (go (s-y) (r-c) ys)
and using that in connectionCombinations
instead of prune arcs . sequence
(requires a small change in orderRep
),
connectionCombinations :: Int -> EquivClass -> [NodeSelection]
connectionCombinations arcs = map groupOcc . nub . map (sortBy (flip compare)) .
boundSequences arcs . orderRep
where orderRep (EC o v) = replicate (length v) o
groupOcc = filter ((/= 0) . fst) . occurences
brought the running time (ghc -O2, no profiling) down significantly:
dafis@schwartz:~/Cairo> time ./orArcs > /dev/null
real 0m5.836s
user 0m5.593s
sys 0m0.231s
dafis@schwartz:~/Cairo> time ./arcs > /dev/null
real 0m0.008s
user 0m0.005s
sys 0m0.003s
Whether any further optimisations (e.g. the abovementioned nub
) are necessary, and where most of the time in the actual programme is spent, so that one could identify the most important points, I cannot tell without more realistic data.
-
\$\begingroup\$ Wow, thank you for the elaborate answer. It wasn't obvious to me that the algorithm (using sequence) is so flawed, I've tried the most obvious approach first and hoped for haskell to figure the rest out for me. \$\endgroup\$bbtrb– bbtrb2011年12月01日 22:35:07 +00:00Commented Dec 1, 2011 at 22:35