Compiler is g++ 4.2.
Yikes. That compiler is ancient. I’ve met competitive programmers younger than that compiler. If you’re planning to do any serious C++ programming, you really should look into upgrading your toolchain, if possible.
#define BIG unsigned long long int
This is absolutely the wrong way to create a new type.
First, there is a trend in C++ to avoid the preprocessor completely. So, no #define
s, no #if
s, and so on. It is ALMOST completely unnecessary in the most recent version (C++20), but even in C++11 it is almost completely avoidable.
Rather than a #define
, you should use a type alias:
using BIG = unsigned long long int;
But there’s one more problem. By convention, ALL_CAPS
variable names are reserved for preprocessor macros... which basically means you should never use them. That means you should do:
using big = unsigned long long int;
Or, even better, choose a more descriptive name, like:
using big_uint = unsigned long long int;
But even better still would be to create your own namespace, and put everything—this type alias and all the Fibonacci functions—in there:
namespace your_ns {
using big_t = unsigned long long int;
int fib(int n) {
// ... etc. ...
} // your_ns
Now I know your focus is on the memoized version of the Fibonacci algorithm, but the non-memoized one is really problematic:
int fib(int n) {
if (n < 2) {
return n;
}
return fib(n-1) + fib(n-2);
}
Imagine what happens if you call fib(5)
. Internally, that results in two calls to fib(4)
and fib(3)
:
fib(5)
↓
fib(4) + fib(3)
Except... fib(4)
also degrades to fib(3)
and fib(2)
, while fib(3)
degrades to fib(2)
and fib(1)
:
fib(5)
↓
fib(4) + fib(3)*
↓
fib(3)* + fib(2) + fib(2) + fib(1)
As you can see, fib(3)
is called twice. And it gets even worse, because fib(3)
degrades to fib(2)
and fib(1)
:
fib(5)
↓
fib(4) + fib(3)
↓
fib(3) + fib(2)* + fib(2)* + fib(1)
↓
fib(2)* + fib(1) + fib(1) + fib(0) + fib(1) + fib(0) + 1
Now you can see fib(2)
is called three times. Each one of those times becomes fib(1)
and fib(0)
.
fib(5)
↓
fib(4) + fib(3)
↓
fib(3) + fib(2) + fib(2) + fib(1)
↓
fib(2) + fib(1) + fib(1) + fib(0) + fib(1) + fib(0) + 1
↓
fib(1) + fib(0) + 1 + 1 + 1 + 1 + 1 + 1
So the net result of a single call to fib(5)
is:
- 1 call to
fib(4)
- 2 calls to
fib(3)
- 3 calls to
fib(2)
- 5 calls to
fib(1)
- 3 calls to
fib(0)
And that’s just fib(5)
. Imagine how many repeated calls you’ll get with fib(50)
.
(This is obviously not such a big problem with the memoized version... but it’s still a problem!)
What you really want to look into is the tail-recursive Fibonacci algorithm. I’m not going to work it out for you, but I’ll give you the interface:
auto fib_impl(int n, big_t a, big_t b)
{
// ...
}
auto fib(int n)
{
// are you sure you don't want to check for negative numbers?
if (n < 2)
return big_t(n); // note: this line won't compile if you used BIG; this is why the preprocessor sucks
return fib_impl(n, 0, 1);
}
Now you don’t say what version of C++ you’re working with, but if it’s one of the more recent versions, then you can make all of this constexpr
. That means that, when possible, the function may be computed at compile time, meaning it will have (effectively) zero run-time cost.
BIG fib_with_memo(int n) {
if (umap.find(n) == umap.end()) { // if F_n not in cache, calculate, cache, and return it
BIG val = fib_with_memo(n-1) + fib_with_memo(n-2);
umap[n] = val;
return val;
}
return umap[n]; // otherwise return the cached value
}
Now, since umap
has no purpose outside of this function, it makes more sense to hide it within the function as a static variable. If you don’t know what function static variables are, they’re basically global variables, but only visible within the function.
While you’re at it, you might as well replace:
umap[0] = 0;
umap[1] = 1;
with:
umap = std::unordered_map<int, big_t>{{0, 0}, {1, 1}};
Why? Because this will be MUCH more efficient. The first way means that each line first searches the map for the key, then creates the new pair. The second way doesn’t bother with the search, but rather just creates the two pairs directly.
So now your fib_with_memo()
could look like this:
auto fib_with_memo(int n) {
// umap is constructed only once, the first time the function is called,
// and initialized with {0, 0} and {1, 1}.
static auto umap = std::unordered_map<int, big_t>{{0, 0}, {1, 1}};
// note everything else is exactly the same (except I replaced the BIG
// with auto)
if (umap.find(n) == umap.end()) { // if F_n not in cache, calculate, cache, and return it
auto val = fib_with_memo(n-1) + fib_with_memo(n-2);
umap[n] = val;
return val;
}
return umap[n]; // otherwise return the cached value
}
Another small fix is that once you’re in that if
block, you know for a fact that n
isn’t in the map. So rather than doing umap[n]
, you can emplace the new pair directly:
if (umap.find(n) == umap.end()) {
auto val = fib_with_memo(n-1) + fib_with_memo(n-2);
umap.emplace(n, val);
return val;
}
Depending on your implementation, that might be more efficient.
Another possibly major optimization comes from the fact that you use find()
... and then just discard the result. Why not use it?
if (auto p = umap.find(n); p == umap.end())
{
// we didn't find n
auto val = fib_with_memo(n-1) + fib_with_memo(n-2);
umap.emplace(n, val);
return val;
}
else
{
// we found n
return p->second;
}
So this is about where we’re at right now:
auto fib_with_memo(int n) {
static auto umap = std::unordered_map<int, big_t>{{0, 0}, {1, 1}};
if (auto p = umap.find(n); p == umap.end())
auto val = fib_with_memo(n-1) + fib_with_memo(n-2);
umap.emplace(n, val);
return val;
}
else
{
// we found n
return p->second;
}
}
Now we get to the real problem: the same issue as with the non-memoized version, where you’re calling the same function over and over and over. The fix is the same as with the non-memoized version: refactor:
auto fib_with_memo_impl(int n, big_t a, big_t b) {
static auto umap = std::unordered_map<int, big_t>{{0, 0}, {1, 1}};
if (auto p = umap.find(n); p == umap.end())
auto val = // actual algorithm to calculate fib here
umap.emplace(n, val);
return val;
}
else
{
// we found n
return p->second;
}
}
auto fib_with_memo(int n) {
// are you sure you don't want to check for n < 0?
// trivial solutions, don't even bother diving into the calculation
if (n < 2)
return big_t(n);
return fib_with_memo_impl(n, 0, 1);
}
It’s up to you to implement the actual Fibonacci calculation there (you have a
and b
to work with). But otherwise, that should be a drop-in replacement for your current design, except orders of magnitude faster.
An alternative design
As you said:
I used a
std::unordered_map
(hash table) because its lookup/insertion times are \$O(1)\$ (compared to thestd::map
, which has \$O(log(n))\$ for both.)
Unfortunately, this thinking is misguided.
It turns out that due to the way modern processors work—particularly with caching—that big-O notation can be WILDLY misleading. I recall that the inventor of C++ himself was floored when someone showed him that a std::list
, which has \$O(1)\$ insertion in the middle, is slower than a std::vector
, which has \$O(n)\$ insertion in the middle, even for huge numbers of elements.
Forget all that theory crap, and instead take a step back and consider your actual problem. Imagine that you’ve called fib_with_memo(10)
. Okay, now what would you expect to see in the memo-map?
It would be this, right?:
0 → 0
1 → 1
2 → 1
3 → 2
4 → 3
5 → 5
6 → 8
7 → 13
8 → 21
9 → 34
10 → 55
Now look at the keys. It’s just 0 to 10, right? In order, with no holes. Just like an index.
So... why do you need a map? You’re just "mapping" an index to a value. That’s just what a basic array, or a vector, does.
Let’s imagine that we’re using a vector for the memoizing, with the "key" just being the index. That means after memoizing fib_with_memo(10)
, the vector would contain:
[index 0 ] → 0
[index 1 ] → 1
[index 2 ] → 1
[index 3 ] → 2
[index 4 ] → 3
[index 5 ] → 5
[index 6 ] → 8
[index 7 ] → 13
[index 8 ] → 21
[index 9 ] → 34
[index 10] → 55
So, after fib_with_memo(10)
, the vector contains 11 elements, and the element at index n
—or, vec[n]
—is the nth Fibonacci number.
Simple, right?
So how do you tell whether a number is already memoized? Again, simple, just check the vector’s size. If we call fib_with_memo(13)
, the vector must have at least 14 elements in it for vec[13]
to be valid. So:
if (vec.size() > n)
return vec[n];
else
// calculate the nth fibonacci number
In fact, the implementation becomes trivial, too. That’s because if the function is called with an unmemoized n
, all you need to do is resize the vector to n + 1
, and fill in all the numbers between the old size and the new size. Something like:
auto fib_with_memo(int n) {
// are you sure you don't want to check for n < 0?
// trivial solutions, don't even bother diving into the calculation
if (n < 2)
return big_t(n);
static auto vec = std::vector<big_t>{0, 1}; // start with fib(0) and fib(1)
// note that if you really want, you could start with more than just
// 0 and 1 - after testing your real-world use case, you could determine
// that you will need, say, up to fib(8), so you could precompute that
// much and put it in the vector initializer list above
//
// if you make the non-memoized fib function constexpr, then you could
// even be *really* clever, and do something like:
// static auto vec = std::vector{fib(0), fib(1), fib(2), fib(3), ...
// for as many precomputed values as you like.
//
// and then you could be *REALLY* clever (if you’re using a more modern
// version of c++), and do something like:
// static auto vec = [](auto initial_size) {
// auto v = std::vector<big_t>(std::min(2, initial_size + 1));
// v[0] = 0;
// v[1] = 1;
//
// for (auto p = std::next(v.begin(), 2); p != v.end(); ++p)
// *p = *(p - 1) + *(p - 2);
//
// return v;
// }(4); // 4 to calculate fib(0) though fib(4), but you can use any number
// but that's getting *really* fancy.
// memoized, so use cached value
if (vec.size() > n)
return vec[n];
// not memoized, so calculate all the values between what's been cached,
// and what's desired
auto old_size = vec.size();
vec.resize(n + 1);
// now fill in everything from vec[old_size] to vec[n] inclusive,
// where vec[i] = vec[i - 1] + vec[i - 2]
return vec[n];
}
Obviously I haven’t profiled this design, but I’d bet it is hundreds, if not thousands of times faster than a similar design using a hash map.
- 16.5k
- 2
- 25
- 65