Let’s say fn(x)
is a pure function that does something expensive, like returning a list of the prime factors of x
.
And let’s say we make a memoized version of the same function called memoizedFn(x)
. It always returns the same result for a given input, but it maintains a private cache of previous results to improve performance.
Formally speaking, is memoizedFn(x)
considered pure?
Or is there some other name or qualifying term used to refer to such a function in FP discussions? (i.e. a function with side effects that may affect the computational complexity of subsequent calls but that may not affect the return values.)
6 Answers 6
Yes. The memoized version of a pure function is also a pure function.
All that function purity cares about is the effect that input parameters on the return value of the function (passing the same input should always produce the same output) and any observable side effects (e.g. text to the terminal or UI or network). Computation time and extra memory usages is irrelevant to function purity.
Caches of a pure function is pretty much invisible to the program; a functional programming language is allowed to automatically optimise a pure function to a memoized version of the function if it can determine that it'll be beneficial to do so. In practice, automatically determining when memoization is beneficial is actually quite a difficult problem, but such optimisation would be valid.
-
Caches are never invisible to a program, that's very naive. You can run out of memory because of it, depending on how badly it is done. From the purity point of view this can mean an exception for example. From practical point of view its simply a disaster. It's a hidden side effect, hard to debug and trace, and very real.freakish– freakish11/06/2024 20:29:02Commented Nov 6, 2024 at 20:29
-
In the theoretical world of function purity, it's common to assume you have infinite memory and time. Caches can have preallocated amount of memory, so you never get an out of memory situation during operation. The vast majority of programs aren't critical enough that it needs to continue working under an OOM; most applications are fine to just terminate, which also rollsback any ongoing transactions, and restart the application. In situations where you do need to work with such critical application, the API need to expose the possibility of OOM and other such situations.Lie Ryan– Lie Ryan12/03/2024 04:53:51Commented Dec 3, 2024 at 4:53
Wikipedia defines a "Pure Function" as a function that has the following properties:
Its return value is the same for the same arguments (no variation with local static variables, non-local variables, mutable reference arguments or input streams from I/O devices).
Its evaluation has no side effects (no mutation of local static variables, non-local variables, mutable reference arguments or I/O streams).
In effect, a pure function returns the same output given the same input, and doesn't affect anything else outside of the function. For purposes of purity, it doesn't matter how the function computes its return value, so long as it returns the same output given the same input.
Functionally pure languages like Haskell routinely use memoization to speed up a function by caching its previously computed results.
-
17I might miss something, but how are you going to keep cache without side effects?val - disappointed in SE– val - disappointed in SE08/27/2019 00:01:28Commented Aug 27, 2019 at 0:01
-
1By keeping it inside the function.Robert Harvey– Robert Harvey08/27/2019 00:15:12Commented Aug 27, 2019 at 0:15
-
6"no mutation of local static variable" seems to exclude local variables persistent between calls as well.val - disappointed in SE– val - disappointed in SE08/27/2019 00:29:27Commented Aug 27, 2019 at 0:29
-
3This doesn't actually answer the question, even if you seem to imply that yes, it is pure.Mars– Mars08/27/2019 00:34:40Commented Aug 27, 2019 at 0:34
-
6@val You’re correct: this condition needs to be relaxed a bit. The purely-functional memoization he refers to has no visible mutation of any static data. What happens is that the result is then computed and memoized the first time the function is called, and returns the same value whenever it’s called. Many languages have an idiom for that: a
static const
local variable in C++ (but not C), or a lazily-evaluated data structure in Haskell. There is one more condition you need: the initialization must be thread-safe.Davislor– Davislor08/27/2019 14:24:42Commented Aug 27, 2019 at 14:24
Yes, memoized pure functions are commonly referred to as pure. This is especially common in languages like Haskell, in which memoized, lazily-evaluated, immutable results are a built-in feature.
There is one important caveat: the memoizing function must be thread-safe, or else you might get a race condition when two threads both try to call it.
One example of a computer scientist using the term "purely functional" this way is this blog post by Conal Elliott about automatic memoization:
Perhaps surprisingly, memoization can be implemented simply and purely functionally in a lazy functional language.
There are many examples in the peer-reviewed literature, and have been for decades. For example, this paper from 1995, "Using Automatic Memoization as a Software Engineering Tool in Real-World AI Systems," uses very similar language in section 5.2 to describe what we would today call a pure function:
Memoization only works for true functions, not procedures. That is, if a function’s result is not completely and deterministically specified by its input parameters, using memoization will give incorrect results. The number of functions that can be memoized successfully will be increased by encouraging the use of a functional programming style throughout the system.
Some imperative languages have a similar idiom. For example, a static const
variable in C++ is initialized only once, before its value is used, and never mutates.
It depends on how you do it.
Usually people want to memoize by mutating some sort of cache dictionary. This has all the problems associated with impure mutation, such as having to worry about concurrency, worrying about the cache growing too large, etc.
However, you can memoize without impure memory mutation. One example is in this answer, where I track the memoized values externally by means of a lengths
argument.
In the link Robert Harvey provided, lazy evaluation is used to avoid side effects.
Another technique sometimes seen is to explicitly mark the memoization as an impure side effect in the context of an IO
type, such as with cats-effect's memoize function.
This last one brings up the point that sometimes the goal is just encapsulating mutation rather than eliminating it. Most functional programmers consider it "pure enough" to make impurity explicit and encapsulated.
If you want a term to differentiate it from a truly pure function, I think it's sufficient to just say "memoized with a mutable dictionary." That lets people know how to use it safely.
-
I don't think any of the purer solution solves the above problems: While you lose any concurrency worries, you also lose any chance for two concurrently started calls like
collatz(100)
andcollatz(200)
to cooperate. And IIUIC, the problem with cache growing too large remains (though Haskell may have some nice tricks for this?).maaartinus– maaartinus08/26/2019 23:16:08Commented Aug 26, 2019 at 23:16 -
Note:
IO
is pure. All impure methods onIO
and Cats are namedunsafe
.Async.memoize
is also pure, so we don't have to settle for "pure enough" :)Samuel– Samuel08/27/2019 04:33:50Commented Aug 27, 2019 at 4:33
Usually, a function that returns a list is not pure at all because it requires allocation of storage and can thereby fail (e.g. by throwing an exception, which is non-pure). A language that has value types and can represent a list as a bounded-size value type may not have this issue. For this reason, your example is probably not pure.
In general, if the memoization can be done in a manner that's failure-case free (e.g. by having statically allocated storage for memoized results, and internal synchronization to control access to them if the language admits threads), it's reasonable to consider such a function pure.
You can implement memoization without side-effects using the state monad.
[State monad] is basically a function S => (S, A), where S is the type that represents your state and A is the result the function produces - Cats State.
In your case the state would be the memoized value or nothing (i.e. Haskell Maybe
or Scala Option[A]
). If the memoized value is present it is returned as A
, otherwise A
is calculated and returned as both the transitioned state and the result.
funcx(){sleep(cached_time--); return 0;}
returns the same val every time, but will perform differently