Can someone explain the concept behind Haskell's memoization?

Question 1

(note I'm putting the question here because it's about the conceptual mechanics of it, rather than a coding problem)

I was working on a small program, that was using a sequence of fibonacci numbers in its equasion, but I noticed that if I got over a certain number it got painfully slow, googling around a bit I stumbled upon a technique in Haskell known as Memoization, they showed code working like this:

-- Traditional implementation of fibonacci, hangs after about 30
slow_fib :: Int -> Integer
slow_fib 0 = 0
slow_fib 1 = 1
slow_fib n = slow_fib (n-2) + slow_fib (n-1)
-- Memorized variant is near instant even after 10000
memoized_fib :: Int -> Integer
memoized_fib = (map fib [0 ..] !!)
 where fib 0 = 0
 fib 1 = 1
 fib n = memoized_fib (n-2) + memoized_fib (n-1)

So my question to you guys is, how or rather why does this work?

Is it because it somehow manages to run through most of the list before the calculation catches up? But if haskell is lazy, there isn't really any calculation that needs to catch up... So how does it work?

Question 2

could you clarify what do you mean by the calculation catches up?. BTW, memoization is not specific to haskell: en.wikipedia.org/wiki/Memoization

Question 3

see my explanation under killan's answer

Question 4

Love your question; just a quick note: The technique is called memoization, not memorization.

Question 5

Just to explain the mechanics behind the actual memoization,

memo_fib = (map fib [1..] !!)

produces a list of "thunks", unevaluated computations. Think of these like unopened presents, as long as we don't touch them, they won't run.

Now once we evaluate a thunk, we never evaluate it again. This is actually the only form of mutation in "normal" haskell, thunks mutate once evaluated to become concrete values.

So back to your code, you've got a list of thunks, and you still do this tree recursion, but you recurse using the list, and once an element in the list is evaluated, it never gets computed again. Thus, we avoid the tree recursion in the naive fib function.

As an tangentially interesting note, this is especially fast over a series of fibonnaci numbers are computed since that list is only evaluated once, meaning that if you calculate memo_fib 10000 twice, the second time should be instantaneous. This is because Haskell only evaluated arguments to functions once and you're using partial application instead of a lambda.

TLDR: By storing calculations in a list, each element of the list is evaluated once, therefore, each fibonnacci number is calculated exactly once throughout the entire program.

Visualization:

 [THUNK_1, THUNK_2, THUNK_3, THUNK_4, THUNK_5]
 -- Evaluating THUNK_5
 [THUNK_1, THUNK_2, THUNK_3, THUNK_4, THUNK_3 + THUNK_4]
 [THUNK_1, THUNK_2, THUNK_1 + THUNK_2, THUNK_4, THUNK_3 + THUNK_4]
 [1, 1, 1 + 1, THUNK_4, THUNK_3 + THUNK_4]
 [1, 1, 2, THUNK_4, 2 + THUNK4]
 [1, 1, 2, 1 + 2, 2 + THUNK_4]
 [1, 1, 2, 3, 2 + 3]
 [1, 1, 2, 3, 5]

So you can see how for evaluating THUNK_4 is much faster since it's subexpressions are already evaluated.

Question 6

could you provide an example of how the values in the list behave for a short sequence? I think it may add to the visualization of how it's supposed to work... And while it's true that if I call memo_fib with the same value twice, the second time will be instant, but if I call it with a value 1 higher, it still takes forever to evaluate (like say going from 30 to 31)

Question 7

@ElectricCoffee Added

Question 8

@ElectricCoffee No it won't since memo_fib 29 and memo_fib 30 are already evaluated, it will take exactly as long as it takes to add those two numbers :) Once something is eval-ed, it stays evaled.

Question 9

@ElectricCoffee Your recursion has to go through the list, otherwise you don't gain any performance

Question 10

@ElectricCoffee Yes. but the 31st element of the list isn't using past computations, you're memoizing yes, but in a pretty useless way.. Computations that are repeated aren't computed twice, but you still have tree recursion for each new value which is very, very slow

Question 11

The point of memoization is never to compute the same function twice - this is extremely useful to speed up computations that are purely functional, i.e. without side effects, because for those the process can be entirely automated without affecting correctness. This is particularly necessary for functions like fibo, which lead to tree recursion, i.e. exponential effort, when implemented naively. (This is one reason why the Fibonacci numbers are actually a very bad example for teaching recursion - nearly all demo implementations you find in tutorials or books are unusable for large input values.)

If you trace the flow of execution, you will see that in the second case, the value for fib x will always be available when fib x+1 is executed, and the runtime system will be able to simply read it from memory rather than via another recursive call, while the first solution tries to compute the larger solution before the results for smaller values are available. This is ultimately because the iterator [0..n] is evaluated from left to right and will therefore start with 0, while the recursion in the first example starts with n and only then asks about n-1. This is what leads to the many, many unnecessary duplicate function calls.

Question 12

oh I understand the point of it, I just didn't understand how it works, like from what I can see in the code, is that when you write memorized_fib 20 for example, you're actually just writing map fib [0..] !! 20, it would still need to calculate the entire range of numbers up to 20, or am I missing something here?

Question 13

Yes, but only once for each number. The naive implementation calculates fib 2 so often it will make your head spin - go ahead, write down the call tree fur just a small value like n==5. You will never forget memoization again once you've seen what it saves you.

Question 14

@ElectricCoffee: Yes, it will calculate fib of 1 to 20. You don't gain anything from that call. Now try calculating fib 21, and you'll see that instead of calculating 1-21, you can just calculate 21 because you already have 1-20 calculated and don't need to do it again.

Question 15

I'm trying to write down the call tree for n = 5, and I've currently reached the point where n == 3, so far so good, but maybe it's just my imperative mind thinking this, but doesn't that just mean that for n == 3, you just get map fib [0..]!!3? which then goes into the fib n branch of the program... where exactly do I get the benefit of pre-computed data?

Question 16

No, memoized_fib is fine. It's slow_fib that will make you weep if you trace it.

score 12 · Accepted Answer · 2013-12-09 15:26:26Z

Just to explain the mechanics behind the actual memoization,

memo_fib = (map fib [1..] !!)

produces a list of "thunks", unevaluated computations. Think of these like unopened presents, as long as we don't touch them, they won't run.

Now once we evaluate a thunk, we never evaluate it again. This is actually the only form of mutation in "normal" haskell, thunks mutate once evaluated to become concrete values.

So back to your code, you've got a list of thunks, and you still do this tree recursion, but you recurse using the list, and once an element in the list is evaluated, it never gets computed again. Thus, we avoid the tree recursion in the naive fib function.

As an tangentially interesting note, this is especially fast over a series of fibonnaci numbers are computed since that list is only evaluated once, meaning that if you calculate memo_fib 10000 twice, the second time should be instantaneous. This is because Haskell only evaluated arguments to functions once and you're using partial application instead of a lambda.

TLDR: By storing calculations in a list, each element of the list is evaluated once, therefore, each fibonnacci number is calculated exactly once throughout the entire program.

Visualization:

 [THUNK_1, THUNK_2, THUNK_3, THUNK_4, THUNK_5]
 -- Evaluating THUNK_5
 [THUNK_1, THUNK_2, THUNK_3, THUNK_4, THUNK_3 + THUNK_4]
 [THUNK_1, THUNK_2, THUNK_1 + THUNK_2, THUNK_4, THUNK_3 + THUNK_4]
 [1, 1, 1 + 1, THUNK_4, THUNK_3 + THUNK_4]
 [1, 1, 2, THUNK_4, 2 + THUNK4]
 [1, 1, 2, 1 + 2, 2 + THUNK_4]
 [1, 1, 2, 3, 2 + 3]
 [1, 1, 2, 3, 5]

So you can see how for evaluating THUNK_4 is much faster since it's subexpressions are already evaluated.

could you provide an example of how the values in the list behave for a short sequence? I think it may add to the visualization of how it's supposed to work... And while it's true that if I call memo_fib with the same value twice, the second time will be instant, but if I call it with a value 1 higher, it still takes forever to evaluate (like say going from 30 to 31)
@ElectricCoffee No it won't since memo_fib 29 and memo_fib 30 are already evaluated, it will take exactly as long as it takes to add those two numbers :) Once something is eval-ed, it stays evaled.
@ElectricCoffee Your recursion has to go through the list, otherwise you don't gain any performance
@ElectricCoffee Yes. but the 31st element of the list isn't using past computations, you're memoizing yes, but in a pretty useless way.. Computations that are repeated aren't computed twice, but you still have tree recursion for each new value which is very, very slow

Stack Exchange Network

Can someone explain the concept behind Haskell's memoization?

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Can someone explain the concept behind Haskell's memoization?

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions