Here is one of my programs that utilized memoization and array to improve performance and memory usage. The performance seems satisfactory but the memory usage is ridiculous and I can't figure out what's wrong:
{-# LANGUAGE BangPatterns #-}
import Data.Functor
import Data.Array (Array)
import qualified Data.Array as Arr
import Control.DeepSeq
genColtzArr n = collatzArr
where collatzArr = Arr.array (1, n) $ take n $ map (\v -> (v, collatz v 0)) [1..]
collatz 1 !acc = 1 + acc
collatz !m !acc
| even m = go (m `div` 2) acc
| otherwise = go (3 * m + 1) acc
where go !l !acc
| l <= n = let !v = collatzArr Arr.! l in 1 + acc + v
| otherwise = collatz l $ 1 + acc
collatz
here means this guy. This function is supposed to receive a number n
, and then return an array indexing from 1 to n
, and in which each cell contains the length of the link from the index to 1 by applying Collatz formula.
But the memory usage of this method is so high. Here is the profiler result (ghc option -prof -fprof-auto -rtsopts
, run time option +RTS -p
, n == 500000
):
total alloc = 730,636,136 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
genColtzArr.collatz Main 40.4 34.7
genColtzArr.collatz.go Main 25.5 14.4
COST CENTRE MODULE no. entries %time %alloc %time %alloc
genColtzArr Main 105 1 0.0 0.0 74.7 72.1
genColtzArr.collatzArr Main 106 1 8.0 20.8 74.7 72.1
genColtzArr.collatzArr.\ Main 107 500000 0.9 2.2 66.8 51.3
genColtzArr.collatz Main 109 1182582 40.4 34.7 65.9 49.1
genColtzArr.collatz.go Main 110 1182581 25.5 14.4 25.5 14.4
Please note that -O2
is not a desired answer. I want to figure out what's the problem in this program and in general, how should I spot time and memory inefficiencies in Haskell code. Specifically, I have no idea why this code, with tail recursion and bang pattern, can consume so much memory.
the same code with -s
produces this:
1,347,869,264 bytes allocated in the heap
595,901,528 bytes copied during GC
172,105,056 bytes maximum residency (7 sample(s))
897,704 bytes maximum slop
315 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 2408 colls, 0 par 0.412s 0.427s 0.0002s 0.0075s
Gen 1 7 colls, 0 par 0.440s 0.531s 0.0759s 0.1835s
INIT time 0.000s ( 0.000s elapsed)
MUT time 0.828s ( 0.816s elapsed)
GC time 0.852s ( 0.958s elapsed)
RP time 0.000s ( 0.000s elapsed)
PROF time 0.000s ( 0.000s elapsed)
EXIT time 0.004s ( 0.017s elapsed)
Total time 1.684s ( 1.791s elapsed)
%GC time 50.6% (53.5% elapsed)
Alloc rate 1,627,861,429 bytes per MUT second
Productivity 49.4% of total user, 46.4% of total elapsed
so it takes 300 meg. that is still too large.
2 Answers 2
It doesn't use 300 megabytes of heap, it peaks at a little over 20 megabytes. Total allocation is not peak allocation and Haskell has cheap short-lived allocations so total alloc isn't always a good heuristic for GC time or steady-state heap usage. The heap profiling stuff is giving data designed for tuning code rather than for analytics and total alloc is often more helpful when comparing code for overall memory usage.
Here's a screenshot from a memory profile:
I don't think it necessarily bit you here, but in future, add explicit types to functions you are benchmarking. See here for an example of why you want to do that.
Here's the command I wrote to profile your code:
$ rm -f collatz
$ stack ghc -- -prof -fprof-auto -rtsopts -O2 collatz.hs -o collatz
$ ./collatz +RTS -hc -p
$ hp2ps collatz.hp
$ evince collatz.ps
Last bits are just converting the hp
heap profiling data to a postscript file and then I'm opening it in my PDF reader.
-
\$\begingroup\$ so what you are suggesting the program is well optimized and no memory problem? em... but i see different numbers in top and ps \$\endgroup\$Jason Hu– Jason Hu2016年03月10日 13:17:39 +00:00Commented Mar 10, 2016 at 13:17
-
\$\begingroup\$ No, I'm saying you should understand what the data tells you before you act on the data. \$\endgroup\$bitemyapp– bitemyapp2016年03月10日 18:12:26 +00:00Commented Mar 10, 2016 at 18:12
-
\$\begingroup\$ Could you make profiling command use multiple lines for easier reading? \$\endgroup\$oherrala– oherrala2016年03月11日 12:26:07 +00:00Commented Mar 11, 2016 at 12:26
Here is one of my programs that utilized memoization and array to improve performance and memory usage. The performance seems satisfactory but the memory usage is ridiculous and I can't figure out what's wrong:
I don't actually know the answer, but I'd guess that you used memoization improperly. GHC already evaluates a lambda expression at most once per top-level function.
The bang patterns there are almost certainly the reason that GHC is taking up too much memory. Haskell is a lazy language; take advantage of this. Indeed, using strictness in combination with immutability (and yes, arrays are unfortunately immutable here) is probably just not a good idea.
Here is an (admittedly advanced) example using recursion-schemes:
import Data.Functor.Foldable
coalgebra :: Int -> Either [Int] (ListF Int Int)
coalgebra 1 = Left [1]
coalgebra n
| n `mod` 2 == 0 = Right $ Cons n (div n 2)
| otherwise = Right $ Cons n (3 * n + 1)
collatz :: Int -> [Int]
collatz = elgot embed coalgebra
In general, while such an aggressively functional style will not always be the fastest, it will provide the best control flow and run 1-2x slower than a Rust implementation (for small enough inputs).
Indeed, the more involed
{-# LANGUAGE TemplateHaskell #-}
import Data.Foldable.Functor
import Language.Haskell.TH.Syntax
import CollatzTH
coalgebra :: Int -> Either [Int] (ListF Int Int)
coalgebra 1 = Left [1]
coalgebra 2 = Left $( lift (collatzTH 2) )
coalgebra 3 = Left $( lift (collatzTH 3) )
coalgebra 6 = Left $( lift (collatzTH 6) )
coalgebra 7 = Left $( lift (collatzTH 7) )
coalgebra 9 = Left $( lift (collatzTH 9) )
coalgebra 18 = Left $( lift (collatzTH 18) )
coalgebra n
| n `mod` 2 == 0 = Right $ Cons n (div n 2)
| otherwise = Right $ Cons n (3 * n + 1)
collatz :: Int -> [Int]
collatz = elgot embed elgotCoalgebra
We have to put collatzTH in another module, because TemplateHaskell is a pain in the butt.
collatzTH :: Int -> [Int]
collatzTH 1 = [1]
collatzTH n
| n `mod` 2 == 0 = n:(collatzPatternTH (div n 2))
This is not memoized, nor is it exactly a supercompiler, but it is in fact faster than the naive Rust implementation for values I tested < 2223.
-
\$\begingroup\$ i was doing 500000. collatz is an explosive function, the performance in 2k is not really representative enough. maybe you can run in your machine and try out which one is faster? i would love to see the profilingresult. \$\endgroup\$Jason Hu– Jason Hu2017年08月23日 03:59:52 +00:00Commented Aug 23, 2017 at 3:59
Explore related questions
See similar questions with these tags.
total alloc
figure is the number of bytes allocated during the lifetime of the program. Most of it quickly gets GC'ed. Running with-s
will tell you how much it really took, and you can evaluate the minimum memory required by using the-M30M
command and reducing from here. \$\endgroup\$vector
roughly divides the memory usage by 2, giving me a 52MB total memory use. It can be coerced down to 34MB with-MxM
. \$\endgroup\$