4
\$\begingroup\$

Here is one of my programs that utilized memoization and array to improve performance and memory usage. The performance seems satisfactory but the memory usage is ridiculous and I can't figure out what's wrong:

{-# LANGUAGE BangPatterns #-}
import Data.Functor
import Data.Array (Array)
import qualified Data.Array as Arr
import Control.DeepSeq
genColtzArr n = collatzArr
 where collatzArr = Arr.array (1, n) $ take n $ map (\v -> (v, collatz v 0)) [1..] 
 collatz 1 !acc = 1 + acc
 collatz !m !acc
 | even m = go (m `div` 2) acc
 | otherwise = go (3 * m + 1) acc
 where go !l !acc
 | l <= n = let !v = collatzArr Arr.! l in 1 + acc + v
 | otherwise = collatz l $ 1 + acc

collatz here means this guy. This function is supposed to receive a number n, and then return an array indexing from 1 to n, and in which each cell contains the length of the link from the index to 1 by applying Collatz formula.

But the memory usage of this method is so high. Here is the profiler result (ghc option -prof -fprof-auto -rtsopts, run time option +RTS -p, n == 500000):

total alloc = 730,636,136 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
genColtzArr.collatz Main 40.4 34.7
genColtzArr.collatz.go Main 25.5 14.4
COST CENTRE MODULE no. entries %time %alloc %time %alloc 
 genColtzArr Main 105 1 0.0 0.0 74.7 72.1
 genColtzArr.collatzArr Main 106 1 8.0 20.8 74.7 72.1
 genColtzArr.collatzArr.\ Main 107 500000 0.9 2.2 66.8 51.3
 genColtzArr.collatz Main 109 1182582 40.4 34.7 65.9 49.1
 genColtzArr.collatz.go Main 110 1182581 25.5 14.4 25.5 14.4

Please note that -O2 is not a desired answer. I want to figure out what's the problem in this program and in general, how should I spot time and memory inefficiencies in Haskell code. Specifically, I have no idea why this code, with tail recursion and bang pattern, can consume so much memory.

UPDATE1:

the same code with -s produces this:

 1,347,869,264 bytes allocated in the heap
 595,901,528 bytes copied during GC
 172,105,056 bytes maximum residency (7 sample(s))
 897,704 bytes maximum slop
 315 MB total memory in use (0 MB lost due to fragmentation)
 Tot time (elapsed) Avg pause Max pause
 Gen 0 2408 colls, 0 par 0.412s 0.427s 0.0002s 0.0075s
 Gen 1 7 colls, 0 par 0.440s 0.531s 0.0759s 0.1835s
 INIT time 0.000s ( 0.000s elapsed)
 MUT time 0.828s ( 0.816s elapsed)
 GC time 0.852s ( 0.958s elapsed)
 RP time 0.000s ( 0.000s elapsed)
 PROF time 0.000s ( 0.000s elapsed)
 EXIT time 0.004s ( 0.017s elapsed)
 Total time 1.684s ( 1.791s elapsed)
 %GC time 50.6% (53.5% elapsed)
 Alloc rate 1,627,861,429 bytes per MUT second
 Productivity 49.4% of total user, 46.4% of total elapsed

so it takes 300 meg. that is still too large.

asked Mar 8, 2016 at 13:28
\$\endgroup\$
11
  • \$\begingroup\$ Did you add type signatures in your program? Or did you omit them? \$\endgroup\$ Commented Mar 8, 2016 at 14:04
  • \$\begingroup\$ You didn't mention how much memory this program was using. The total alloc figure is the number of bytes allocated during the lifetime of the program. Most of it quickly gets GC'ed. Running with -s will tell you how much it really took, and you can evaluate the minimum memory required by using the -M30M command and reducing from here. \$\endgroup\$ Commented Mar 8, 2016 at 16:23
  • \$\begingroup\$ Using vector roughly divides the memory usage by 2, giving me a 52MB total memory use. It can be coerced down to 34MB with -MxM. \$\endgroup\$ Commented Mar 8, 2016 at 16:42
  • \$\begingroup\$ @Zeta I just have one function here. How the signature matters? \$\endgroup\$ Commented Mar 8, 2016 at 18:25
  • 1
    \$\begingroup\$ @HuStmpHrrr ideally it should take way less than 10MB! However there is a trick here, and I think it would prevent the use of an unboxed data structure. It relies on the "cells" in your array being lazy, and the collatz function not looping. That means that you actually store all the thunks in the structure at first, not the final int values ... \$\endgroup\$ Commented Mar 9, 2016 at 16:55

2 Answers 2

6
\$\begingroup\$

It doesn't use 300 megabytes of heap, it peaks at a little over 20 megabytes. Total allocation is not peak allocation and Haskell has cheap short-lived allocations so total alloc isn't always a good heuristic for GC time or steady-state heap usage. The heap profiling stuff is giving data designed for tuning code rather than for analytics and total alloc is often more helpful when comparing code for overall memory usage.

Here's a screenshot from a memory profile:

collatz memory profile

I don't think it necessarily bit you here, but in future, add explicit types to functions you are benchmarking. See here for an example of why you want to do that.

Here's the command I wrote to profile your code:

$ rm -f collatz
$ stack ghc -- -prof -fprof-auto -rtsopts -O2 collatz.hs -o collatz 
$ ./collatz +RTS -hc -p
$ hp2ps collatz.hp
$ evince collatz.ps

Last bits are just converting the hp heap profiling data to a postscript file and then I'm opening it in my PDF reader.

answered Mar 9, 2016 at 23:33
\$\endgroup\$
3
  • \$\begingroup\$ so what you are suggesting the program is well optimized and no memory problem? em... but i see different numbers in top and ps \$\endgroup\$ Commented Mar 10, 2016 at 13:17
  • \$\begingroup\$ No, I'm saying you should understand what the data tells you before you act on the data. \$\endgroup\$ Commented Mar 10, 2016 at 18:12
  • \$\begingroup\$ Could you make profiling command use multiple lines for easier reading? \$\endgroup\$ Commented Mar 11, 2016 at 12:26
1
\$\begingroup\$

Here is one of my programs that utilized memoization and array to improve performance and memory usage. The performance seems satisfactory but the memory usage is ridiculous and I can't figure out what's wrong:

I don't actually know the answer, but I'd guess that you used memoization improperly. GHC already evaluates a lambda expression at most once per top-level function.

The bang patterns there are almost certainly the reason that GHC is taking up too much memory. Haskell is a lazy language; take advantage of this. Indeed, using strictness in combination with immutability (and yes, arrays are unfortunately immutable here) is probably just not a good idea.

Here is an (admittedly advanced) example using recursion-schemes:

import Data.Functor.Foldable
coalgebra :: Int -> Either [Int] (ListF Int Int)
coalgebra 1 = Left [1]
coalgebra n
 | n `mod` 2 == 0 = Right $ Cons n (div n 2)
 | otherwise = Right $ Cons n (3 * n + 1)
collatz :: Int -> [Int]
collatz = elgot embed coalgebra

In general, while such an aggressively functional style will not always be the fastest, it will provide the best control flow and run 1-2x slower than a Rust implementation (for small enough inputs).

Indeed, the more involed

{-# LANGUAGE TemplateHaskell #-}
import Data.Foldable.Functor
import Language.Haskell.TH.Syntax
import CollatzTH
coalgebra :: Int -> Either [Int] (ListF Int Int)
coalgebra 1 = Left [1]
coalgebra 2 = Left $( lift (collatzTH 2) )
coalgebra 3 = Left $( lift (collatzTH 3) )
coalgebra 6 = Left $( lift (collatzTH 6) )
coalgebra 7 = Left $( lift (collatzTH 7) )
coalgebra 9 = Left $( lift (collatzTH 9) )
coalgebra 18 = Left $( lift (collatzTH 18) )
coalgebra n
 | n `mod` 2 == 0 = Right $ Cons n (div n 2)
 | otherwise = Right $ Cons n (3 * n + 1)
collatz :: Int -> [Int]
collatz = elgot embed elgotCoalgebra

We have to put collatzTH in another module, because TemplateHaskell is a pain in the butt.

collatzTH :: Int -> [Int]
collatzTH 1 = [1]
collatzTH n
 | n `mod` 2 == 0 = n:(collatzPatternTH (div n 2))

This is not memoized, nor is it exactly a supercompiler, but it is in fact faster than the naive Rust implementation for values I tested < 2223.

answered Aug 23, 2017 at 3:44
\$\endgroup\$
1
  • \$\begingroup\$ i was doing 500000. collatz is an explosive function, the performance in 2k is not really representative enough. maybe you can run in your machine and try out which one is faster? i would love to see the profilingresult. \$\endgroup\$ Commented Aug 23, 2017 at 3:59

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.