Is a memoized pure function itself considered pure?

Question 1

Let’s say fn(x) is a pure function that does something expensive, like returning a list of the prime factors of x.

And let’s say we make a memoized version of the same function called memoizedFn(x). It always returns the same result for a given input, but it maintains a private cache of previous results to improve performance.

Formally speaking, is memoizedFn(x) considered pure?

Or is there some other name or qualifying term used to refer to such a function in FP discussions? (i.e. a function with side effects that may affect the computational complexity of subsequent calls but that may not affect the return values.)

Question 2

Maybe it is not pure for purists, but "pure enough" for pragmatic people ;-)

Question 3

@DocBrown I agree, just wondering if there’s a more formal term for ‘pure enough’

Question 4

Executing a pure function will most likely modify the processor's instruction cache, branch predictors etc. But that's probably "pure enough" for purists as well - or you can forget about pure functions altogether.

Question 5

@callum No, there is no formal definition of "pure enough". When arguing about purity and the semantic equivalence of two "referentially transparent" calls, you always have to state exactly which semantics you are going to apply. At some low level of implementation detail it will always break down and have different memory effects or timings. That's why you have to be pragmatic: what level of detail is useful for reasoning about your code?

Question 6

Then for the sake of pragmatism, I'd say that the purity depends on whether or not you consider computation time to be part of the output. funcx(){sleep(cached_time--); return 0;} returns the same val every time, but will perform differently

Question 7

Yes. The memoized version of a pure function is also a pure function.

All that function purity cares about is the effect that input parameters on the return value of the function (passing the same input should always produce the same output) and any observable side effects (e.g. text to the terminal or UI or network). Computation time and extra memory usages is irrelevant to function purity.

Caches of a pure function is pretty much invisible to the program; a functional programming language is allowed to automatically optimise a pure function to a memoized version of the function if it can determine that it'll be beneficial to do so. In practice, automatically determining when memoization is beneficial is actually quite a difficult problem, but such optimisation would be valid.

Question 8

Caches are never invisible to a program, that's very naive. You can run out of memory because of it, depending on how badly it is done. From the purity point of view this can mean an exception for example. From practical point of view its simply a disaster. It's a hidden side effect, hard to debug and trace, and very real.

Question 9

In the theoretical world of function purity, it's common to assume you have infinite memory and time. Caches can have preallocated amount of memory, so you never get an out of memory situation during operation. The vast majority of programs aren't critical enough that it needs to continue working under an OOM; most applications are fine to just terminate, which also rollsback any ongoing transactions, and restart the application. In situations where you do need to work with such critical application, the API need to expose the possibility of OOM and other such situations.

Question 10

Wikipedia defines a "Pure Function" as a function that has the following properties:

Its return value is the same for the same arguments (no variation with local static variables, non-local variables, mutable reference arguments or input streams from I/O devices).
Its evaluation has no side effects (no mutation of local static variables, non-local variables, mutable reference arguments or I/O streams).

In effect, a pure function returns the same output given the same input, and doesn't affect anything else outside of the function. For purposes of purity, it doesn't matter how the function computes its return value, so long as it returns the same output given the same input.

Functionally pure languages like Haskell routinely use memoization to speed up a function by caching its previously computed results.

Question 11

I might miss something, but how are you going to keep cache without side effects?

Question 12

By keeping it inside the function.

Question 13

"no mutation of local static variable" seems to exclude local variables persistent between calls as well.

Question 14

This doesn't actually answer the question, even if you seem to imply that yes, it is pure.

Question 15

@val You’re correct: this condition needs to be relaxed a bit. The purely-functional memoization he refers to has no visible mutation of any static data. What happens is that the result is then computed and memoized the first time the function is called, and returns the same value whenever it’s called. Many languages have an idiom for that: a static const local variable in C++ (but not C), or a lazily-evaluated data structure in Haskell. There is one more condition you need: the initialization must be thread-safe.

Question 16

Yes, memoized pure functions are commonly referred to as pure. This is especially common in languages like Haskell, in which memoized, lazily-evaluated, immutable results are a built-in feature.

There is one important caveat: the memoizing function must be thread-safe, or else you might get a race condition when two threads both try to call it.

One example of a computer scientist using the term "purely functional" this way is this blog post by Conal Elliott about automatic memoization:

Perhaps surprisingly, memoization can be implemented simply and purely functionally in a lazy functional language.

There are many examples in the peer-reviewed literature, and have been for decades. For example, this paper from 1995, "Using Automatic Memoization as a Software Engineering Tool in Real-World AI Systems," uses very similar language in section 5.2 to describe what we would today call a pure function:

Memoization only works for true functions, not procedures. That is, if a function’s result is not completely and deterministically specified by its input parameters, using memoization will give incorrect results. The number of functions that can be memoized successfully will be increased by encouraging the use of a functional programming style throughout the system.

Some imperative languages have a similar idiom. For example, a static const variable in C++ is initialized only once, before its value is used, and never mutates.

Question 17

It depends on how you do it.

Usually people want to memoize by mutating some sort of cache dictionary. This has all the problems associated with impure mutation, such as having to worry about concurrency, worrying about the cache growing too large, etc.

However, you can memoize without impure memory mutation. One example is in this answer, where I track the memoized values externally by means of a lengths argument.

In the link Robert Harvey provided, lazy evaluation is used to avoid side effects.

Another technique sometimes seen is to explicitly mark the memoization as an impure side effect in the context of an IO type, such as with cats-effect's memoize function.

This last one brings up the point that sometimes the goal is just encapsulating mutation rather than eliminating it. Most functional programmers consider it "pure enough" to make impurity explicit and encapsulated.

If you want a term to differentiate it from a truly pure function, I think it's sufficient to just say "memoized with a mutable dictionary." That lets people know how to use it safely.

Question 18

I don't think any of the purer solution solves the above problems: While you lose any concurrency worries, you also lose any chance for two concurrently started calls like collatz(100) and collatz(200) to cooperate. And IIUIC, the problem with cache growing too large remains (though Haskell may have some nice tricks for this?).

Question 19

Note: IO is pure. All impure methods on IO and Cats are named unsafe. Async.memoize is also pure, so we don't have to settle for "pure enough" :)

Question 20

Usually, a function that returns a list is not pure at all because it requires allocation of storage and can thereby fail (e.g. by throwing an exception, which is non-pure). A language that has value types and can represent a list as a bounded-size value type may not have this issue. For this reason, your example is probably not pure.

In general, if the memoization can be done in a manner that's failure-case free (e.g. by having statically allocated storage for memoized results, and internal synchronization to control access to them if the language admits threads), it's reasonable to consider such a function pure.

Question 21

You can implement memoization without side-effects using the state monad.

[State monad] is basically a function S => (S, A), where S is the type that represents your state and A is the result the function produces - Cats State.

In your case the state would be the memoized value or nothing (i.e. Haskell Maybe or Scala Option[A]). If the memoized value is present it is returned as A, otherwise A is calculated and returned as both the transitioned state and the result.

Lie Ryan Lie Ryan 12.5k2 gold badges32 silver badges43 bronze badges · Answer 1 · 2019-08-26 15:08:11Z

Yes. The memoized version of a pure function is also a pure function.

All that function purity cares about is the effect that input parameters on the return value of the function (passing the same input should always produce the same output) and any observable side effects (e.g. text to the terminal or UI or network). Computation time and extra memory usages is irrelevant to function purity.

Caches of a pure function is pretty much invisible to the program; a functional programming language is allowed to automatically optimise a pure function to a memoized version of the function if it can determine that it'll be beneficial to do so. In practice, automatically determining when memoization is beneficial is actually quite a difficult problem, but such optimisation would be valid.

Caches are never invisible to a program, that's very naive. You can run out of memory because of it, depending on how badly it is done. From the purity point of view this can mean an exception for example. From practical point of view its simply a disaster. It's a hidden side effect, hard to debug and trace, and very real.
In the theoretical world of function purity, it's common to assume you have infinite memory and time. Caches can have preallocated amount of memory, so you never get an out of memory situation during operation. The vast majority of programs aren't critical enough that it needs to continue working under an OOM; most applications are fine to just terminate, which also rollsback any ongoing transactions, and restart the application. In situations where you do need to work with such critical application, the API need to expose the possibility of OOM and other such situations.

score 19 · Answer 2 · 2019-08-26 14:56:48Z

Wikipedia defines a "Pure Function" as a function that has the following properties:

Its return value is the same for the same arguments (no variation with local static variables, non-local variables, mutable reference arguments or input streams from I/O devices).
Its evaluation has no side effects (no mutation of local static variables, non-local variables, mutable reference arguments or I/O streams).

In effect, a pure function returns the same output given the same input, and doesn't affect anything else outside of the function. For purposes of purity, it doesn't matter how the function computes its return value, so long as it returns the same output given the same input.

Functionally pure languages like Haskell routinely use memoization to speed up a function by caching its previously computed results.

I might miss something, but how are you going to keep cache without side effects?
"no mutation of local static variable" seems to exclude local variables persistent between calls as well.
This doesn't actually answer the question, even if you seem to imply that yes, it is pure.
@val You’re correct: this condition needs to be relaxed a bit. The purely-functional memoization he refers to has no visible mutation of any static data. What happens is that the result is then computed and memoized the first time the function is called, and returns the same value whenever it’s called. Many languages have an idiom for that: a static const local variable in C++ (but not C), or a lazily-evaluated data structure in Haskell. There is one more condition you need: the initialization must be thread-safe.

Davislor Davislor 1,56311 silver badges14 bronze badges · Answer 3 · 2019-08-27 01:21:32Z

Yes, memoized pure functions are commonly referred to as pure. This is especially common in languages like Haskell, in which memoized, lazily-evaluated, immutable results are a built-in feature.

There is one important caveat: the memoizing function must be thread-safe, or else you might get a race condition when two threads both try to call it.

One example of a computer scientist using the term "purely functional" this way is this blog post by Conal Elliott about automatic memoization:

Perhaps surprisingly, memoization can be implemented simply and purely functionally in a lazy functional language.

There are many examples in the peer-reviewed literature, and have been for decades. For example, this paper from 1995, "Using Automatic Memoization as a Software Engineering Tool in Real-World AI Systems," uses very similar language in section 5.2 to describe what we would today call a pure function:

Memoization only works for true functions, not procedures. That is, if a function’s result is not completely and deterministically specified by its input parameters, using memoization will give incorrect results. The number of functions that can be memoized successfully will be increased by encouraging the use of a functional programming style throughout the system.

Some imperative languages have a similar idiom. For example, a static const variable in C++ is initialized only once, before its value is used, and never mutates.

score 3 · Answer 4 · 2019-08-26 17:37:29Z

It depends on how you do it.

Usually people want to memoize by mutating some sort of cache dictionary. This has all the problems associated with impure mutation, such as having to worry about concurrency, worrying about the cache growing too large, etc.

However, you can memoize without impure memory mutation. One example is in this answer, where I track the memoized values externally by means of a lengths argument.

In the link Robert Harvey provided, lazy evaluation is used to avoid side effects.

Another technique sometimes seen is to explicitly mark the memoization as an impure side effect in the context of an IO type, such as with cats-effect's memoize function.

This last one brings up the point that sometimes the goal is just encapsulating mutation rather than eliminating it. Most functional programmers consider it "pure enough" to make impurity explicit and encapsulated.

If you want a term to differentiate it from a truly pure function, I think it's sufficient to just say "memoized with a mutable dictionary." That lets people know how to use it safely.

I don't think any of the purer solution solves the above problems: While you lose any concurrency worries, you also lose any chance for two concurrently started calls like collatz(100) and collatz(200) to cooperate. And IIUIC, the problem with cache growing too large remains (though Haskell may have some nice tricks for this?).
Note: IO is pure. All impure methods on IO and Cats are named unsafe. Async.memoize is also pure, so we don't have to settle for "pure enough" :)

score 3 · Answer 5 · 2019-08-27 02:18:11Z

Usually, a function that returns a list is not pure at all because it requires allocation of storage and can thereby fail (e.g. by throwing an exception, which is non-pure). A language that has value types and can represent a list as a bounded-size value type may not have this issue. For this reason, your example is probably not pure.

In general, if the memoization can be done in a manner that's failure-case free (e.g. by having statically allocated storage for memoized results, and internal synchronization to control access to them if the language admits threads), it's reasonable to consider such a function pure.

Samuel Samuel 9,2471 gold badge27 silver badges43 bronze badges · Answer 6 · 2019-08-27 04:23:49Z

You can implement memoization without side-effects using the state monad.

[State monad] is basically a function S => (S, A), where S is the type that represents your state and A is the result the function produces - Cats State.

In your case the state would be the memoized value or nothing (i.e. Haskell Maybe or Scala Option[A]). If the memoized value is present it is returned as A, otherwise A is calculated and returned as both the transitioned state and the result.

Stack Exchange Network

Is a memoized pure function itself considered pure?

6 Answers 6

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Is a memoized pure function itself considered pure?

6 Answers 6

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions