Memoized Collatz sequence

Question 1

Here is one of my programs that utilized memoization and array to improve performance and memory usage. The performance seems satisfactory but the memory usage is ridiculous and I can't figure out what's wrong:

{-# LANGUAGE BangPatterns #-}
import Data.Functor
import Data.Array (Array)
import qualified Data.Array as Arr
import Control.DeepSeq
genColtzArr n = collatzArr
 where collatzArr = Arr.array (1, n) $ take n $ map (\v -> (v, collatz v 0)) [1..] 
 collatz 1 !acc = 1 + acc
 collatz !m !acc
 | even m = go (m `div` 2) acc
 | otherwise = go (3 * m + 1) acc
 where go !l !acc
 | l <= n = let !v = collatzArr Arr.! l in 1 + acc + v
 | otherwise = collatz l $ 1 + acc

collatz here means this guy. This function is supposed to receive a number n, and then return an array indexing from 1 to n, and in which each cell contains the length of the link from the index to 1 by applying Collatz formula.

But the memory usage of this method is so high. Here is the profiler result (ghc option -prof -fprof-auto -rtsopts, run time option +RTS -p, n == 500000):

total alloc = 730,636,136 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
genColtzArr.collatz Main 40.4 34.7
genColtzArr.collatz.go Main 25.5 14.4
COST CENTRE MODULE no. entries %time %alloc %time %alloc 
 genColtzArr Main 105 1 0.0 0.0 74.7 72.1
 genColtzArr.collatzArr Main 106 1 8.0 20.8 74.7 72.1
 genColtzArr.collatzArr.\ Main 107 500000 0.9 2.2 66.8 51.3
 genColtzArr.collatz Main 109 1182582 40.4 34.7 65.9 49.1
 genColtzArr.collatz.go Main 110 1182581 25.5 14.4 25.5 14.4

Please note that -O2 is not a desired answer. I want to figure out what's the problem in this program and in general, how should I spot time and memory inefficiencies in Haskell code. Specifically, I have no idea why this code, with tail recursion and bang pattern, can consume so much memory.

UPDATE1:

the same code with -s produces this:

 1,347,869,264 bytes allocated in the heap
 595,901,528 bytes copied during GC
 172,105,056 bytes maximum residency (7 sample(s))
 897,704 bytes maximum slop
 315 MB total memory in use (0 MB lost due to fragmentation)
 Tot time (elapsed) Avg pause Max pause
 Gen 0 2408 colls, 0 par 0.412s 0.427s 0.0002s 0.0075s
 Gen 1 7 colls, 0 par 0.440s 0.531s 0.0759s 0.1835s
 INIT time 0.000s ( 0.000s elapsed)
 MUT time 0.828s ( 0.816s elapsed)
 GC time 0.852s ( 0.958s elapsed)
 RP time 0.000s ( 0.000s elapsed)
 PROF time 0.000s ( 0.000s elapsed)
 EXIT time 0.004s ( 0.017s elapsed)
 Total time 1.684s ( 1.791s elapsed)
 %GC time 50.6% (53.5% elapsed)
 Alloc rate 1,627,861,429 bytes per MUT second
 Productivity 49.4% of total user, 46.4% of total elapsed

so it takes 300 meg. that is still too large.

Question 2

Did you add type signatures in your program? Or did you omit them?

Question 3

You didn't mention how much memory this program was using. The total alloc figure is the number of bytes allocated during the lifetime of the program. Most of it quickly gets GC'ed. Running with -s will tell you how much it really took, and you can evaluate the minimum memory required by using the -M30M command and reducing from here.

Question 4

Using vector roughly divides the memory usage by 2, giving me a 52MB total memory use. It can be coerced down to 34MB with -MxM.

Question 5

@Zeta I just have one function here. How the signature matters?

Question 6

@HuStmpHrrr ideally it should take way less than 10MB! However there is a trick here, and I think it would prevent the use of an unboxed data structure. It relies on the "cells" in your array being lazy, and the collatz function not looping. That means that you actually store all the thunks in the structure at first, not the final int values ...

Question 7

It doesn't use 300 megabytes of heap, it peaks at a little over 20 megabytes. Total allocation is not peak allocation and Haskell has cheap short-lived allocations so total alloc isn't always a good heuristic for GC time or steady-state heap usage. The heap profiling stuff is giving data designed for tuning code rather than for analytics and total alloc is often more helpful when comparing code for overall memory usage.

Here's a screenshot from a memory profile:

collatz memory profile

I don't think it necessarily bit you here, but in future, add explicit types to functions you are benchmarking. See here for an example of why you want to do that.

Here's the command I wrote to profile your code:

$ rm -f collatz
$ stack ghc -- -prof -fprof-auto -rtsopts -O2 collatz.hs -o collatz 
$ ./collatz +RTS -hc -p
$ hp2ps collatz.hp
$ evince collatz.ps

Last bits are just converting the hp heap profiling data to a postscript file and then I'm opening it in my PDF reader.

Question 8

so what you are suggesting the program is well optimized and no memory problem? em... but i see different numbers in top and ps

Question 9

No, I'm saying you should understand what the data tells you before you act on the data.

Question 10

Could you make profiling command use multiple lines for easier reading?

Question 11

Here is one of my programs that utilized memoization and array to improve performance and memory usage. The performance seems satisfactory but the memory usage is ridiculous and I can't figure out what's wrong:

I don't actually know the answer, but I'd guess that you used memoization improperly. GHC already evaluates a lambda expression at most once per top-level function.

The bang patterns there are almost certainly the reason that GHC is taking up too much memory. Haskell is a lazy language; take advantage of this. Indeed, using strictness in combination with immutability (and yes, arrays are unfortunately immutable here) is probably just not a good idea.

Here is an (admittedly advanced) example using recursion-schemes:

import Data.Functor.Foldable
coalgebra :: Int -> Either [Int] (ListF Int Int)
coalgebra 1 = Left [1]
coalgebra n
 | n `mod` 2 == 0 = Right $ Cons n (div n 2)
 | otherwise = Right $ Cons n (3 * n + 1)
collatz :: Int -> [Int]
collatz = elgot embed coalgebra

In general, while such an aggressively functional style will not always be the fastest, it will provide the best control flow and run 1-2x slower than a Rust implementation (for small enough inputs).

Indeed, the more involed

{-# LANGUAGE TemplateHaskell #-}
import Data.Foldable.Functor
import Language.Haskell.TH.Syntax
import CollatzTH
coalgebra :: Int -> Either [Int] (ListF Int Int)
coalgebra 1 = Left [1]
coalgebra 2 = Left $( lift (collatzTH 2) )
coalgebra 3 = Left $( lift (collatzTH 3) )
coalgebra 6 = Left $( lift (collatzTH 6) )
coalgebra 7 = Left $( lift (collatzTH 7) )
coalgebra 9 = Left $( lift (collatzTH 9) )
coalgebra 18 = Left $( lift (collatzTH 18) )
coalgebra n
 | n `mod` 2 == 0 = Right $ Cons n (div n 2)
 | otherwise = Right $ Cons n (3 * n + 1)
collatz :: Int -> [Int]
collatz = elgot embed elgotCoalgebra

We have to put collatzTH in another module, because TemplateHaskell is a pain in the butt.

collatzTH :: Int -> [Int]
collatzTH 1 = [1]
collatzTH n
 | n `mod` 2 == 0 = n:(collatzPatternTH (div n 2))

This is not memoized, nor is it exactly a supercompiler, but it is in fact faster than the naive Rust implementation for values I tested < 2223.

Question 12

i was doing 500000. collatz is an explosive function, the performance in 2k is not really representative enough. maybe you can run in your machine and try out which one is faster? i would love to see the profilingresult.

bitemyapp bitemyapp 3292 silver badges5 bronze badges · Accepted Answer · 2016-03-09 23:33:50Z

It doesn't use 300 megabytes of heap, it peaks at a little over 20 megabytes. Total allocation is not peak allocation and Haskell has cheap short-lived allocations so total alloc isn't always a good heuristic for GC time or steady-state heap usage. The heap profiling stuff is giving data designed for tuning code rather than for analytics and total alloc is often more helpful when comparing code for overall memory usage.

Here's a screenshot from a memory profile:

collatz memory profile

I don't think it necessarily bit you here, but in future, add explicit types to functions you are benchmarking. See here for an example of why you want to do that.

Here's the command I wrote to profile your code:

$ rm -f collatz
$ stack ghc -- -prof -fprof-auto -rtsopts -O2 collatz.hs -o collatz 
$ ./collatz +RTS -hc -p
$ hp2ps collatz.hp
$ evince collatz.ps

Last bits are just converting the hp heap profiling data to a postscript file and then I'm opening it in my PDF reader.

so what you are suggesting the program is well optimized and no memory problem? em... but i see different numbers in top and ps
No, I'm saying you should understand what the data tells you before you act on the data.
Could you make profiling command use multiple lines for easier reading?

Stack Exchange Network

Memoized Collatz sequence

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Memoized Collatz sequence

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions