Implementation of LRU cache

Question 1

What do you think of this implementation of an LRU cache? I'm trying to migrate to modern C++, so it would be fantastic if you could give me any tips here. I've heard it's not desired to use raw pointers, right? What about only internally, as in the example?

Also, since I can't return a nullptr value in the get() method, what is the best alternative for this? Default value as in the example?

#ifndef __LRU_HPP__
#define __LRU_HPP__
#include <map>
#include<iostream>
#include<assert.h>
template<class T>
struct Default {
 T value;
 Default():value(){}
};
template<class K, class V>
struct node {
 K key;
 V value;
 node<K,V>* next;
 node<K,V>* prev;
 explicit node(K key, V value);
};
template<class K, class V>
class lru {
private:
 node<K,V>* head;
 node<K,V>* tail;
 std::map<K,node<K,V>*> map;
 std::size_t capacity;
 void replace(node<K,V>*);
public:
 explicit lru(std::size_t);
 void put(K key, V value);
 V get(K key);
};
template<class K, class V>
inline node<K,V>::node(K key, V value) {
 this->key = key;
 this->value = value;
}
template<class K, class V>
inline lru<K,V>::lru(std::size_t capacity) {
 assert(capacity > 1);
 this->capacity = capacity;
 this->head = tail = nullptr;
}
template<class K, class V>
void lru<K,V>::put(K key, V value) {
 auto it = map.find(key);
 if (it != map.end()) {
 node<K,V>* temp = it->second;
 temp->value = value;
 replace(temp);
 return;
 }
 if(capacity == map.size()) {
 tail->prev->next = tail->next;
 node<K,V>* target = tail;
 map.erase(target->key);
 tail = tail->prev;
 delete tail;
 }
 node<K,V>* temp = new node<K,V>(key, value);
 temp->prev = nullptr;
 temp->next = head;
 if(head != nullptr)
 head->prev = temp;
 else
 tail = temp;
 head = temp;
 map.insert(std::pair<K,node<K,V>*> {key, temp});
}
template<class K, class V>
V lru<K,V>::get(K key) {
 auto it = map.find(key);
 if(it != map.end()) {
 replace(it->second);
 return it->second->value;
 }
 return Default<V>().value;
}
template<class K, class V>
void lru<K,V>::replace(node<K,V>* temp) {
 if (temp->prev != nullptr)
 temp->prev->next = temp->next;
 if (temp->next != nullptr)
 temp->next->prev = temp->prev;
 if (tail == temp)
 tail = temp->prev != nullptr ? temp->prev : temp;
 if (head != temp) {
 head->prev = temp;
 temp->next = head;
 head = temp;
 }
}
#endif

Question 2

Two underscores are reserved anywhere for the implementation. Your program technically has undefined behavior because of your include guard.

Question 3

I have rolled back the last edit. Please see What to do when someone answers .

Question 4

Namespace Usage

I'd put all of this into some namespace, so it won't accidentally collide with other usage of the names it defines.

I'd probably also work even a little harder to do something to hide your node and Default templates. For example, node doesn't need to be visible to the outside world, so it would probably be better off defined inside of lru.

Prefer Initialization to Assignment

A constructor like this:

template<class K, class V>
inline node<K, V>::node(K key, V value) {
 this->key = key;
 this->value = value;
}

...is typically better written something like this instead:

template <class K, class V>
inline node<K, V>::node(K key, V value)
 : key(key)
 , value(value)
{}

In some cases (e.g., initializing a const member), you must use initialization (like this) instead of assignment. In this case, it's not mandatory, but I still consider it preferable.

Consider passing by reference

Passing your keys and values by value generally requires copying. You might want to consider whether it's worth passing by reference instead. This generally accomplishes little (or nothing) if the type being passed is "small", but can improve speed substantially if the type is "large". In this case, "small" generally means something that will fit in a single machine register, so "large" is anything that won't. E.g., a char, short, or int would usually be considered "small", but a std::string would usually be thought of as "large".

Encapsulate knowledge of a type's internals in that type

Right now, your lru class "knows" (depends upon) the internals of the node class. I'd generally prefer if only node knew about its internals. For one possibility, you might pass a next and prev to the ctor, and let it put those values into the node itself:

template <class K, class V>
inline node<K, V>::node(K key, V value, node *prev = nullptr, node *next = nullptr)
 : key(key)
 , value(value)
 , prev(prev)
 , next(next)
{}

In this case, your code like this:

node<K, V>* temp = new node<K, V>(key, value);
temp->prev = nullptr;
temp->next = head;

...would become something like:

node<K, V> *temp = new node<K, V>(key, value, nullptr, head);

Consider other Data Structures

Although a doubly-linked list does work for the task at hand, it's often fairly wasteful, using a couple of pointers per node in addition to the data you actually care about storing.

I'd consider using a singly linked list instead. Although it may not immediately be obviously how you can do this, I'm reasonably certain you can.

Consider pre-allocating your storage

Given that the total maximum number of objects is fixed when you create the cache, I'd also consider pre-allocating storage for both the objects and the linked-list nodes when you create the cache. An allocator for a fixed-size set of fixed-size blocks can be really trivial, which can improve speed considerably over using a couple of allocations on the free store for every item you insert in the cache.

Question 5

I am not sure I understand how a circular buffer may be of help here. An entire point is to not change the locations of the values, that is keep them consistent with the pointers in the map, while changing their order.

Question 6

For the node constructor, I recommend passing by value, as we store the arguments: node(K key, V value) : key{std::move(key)}, value{std::move(value)} {}. That works well with glvalue and rvalue arguments quite happily, and favours moving over copying.

Question 7

@vnp: Hmmm...I'm not sure what I was thinking, but I think you're right.

Question 8

Awesome review! Though I'm not sure how I can implement this with a singly linked list. The problem here is that I need to navigate back when I remove the last element. In my case, I can just use tail = tail->prev. Also, the idea of pre-allocating my storage sounds good, I should definitely add it. +1

Question 9

Header guards

Your code contains undefined behaviour since the underscores are reserved for the implementation:

17.6.4.3 Reserved names [reserved.names]

The C++ standard library reserves the following kinds of names:

macros

global names

names with external linkage

If a program declares or defines a name in a context where it is reserved, other than as explicitly allowed by this Clause, its behavior is undefined.

[...]

17.6.4.3.2 Global names [global.names]

Certain sets of names and function signatures are always reserved to the implementation:

Each name that contains a double underscore __ or begins with an underscore followed by an uppercase letter (2.12) is reserved to the implementation for any use.

Each name that begins with an underscore is reserved to the implementation for use as a name in the global namespace.

So rename __LRU_HPP__ to LEAST_RECENTLY_USED_HPP or similar.

`Default`

Instead of

return Default<V>().value;

use

return V{};

Unless you want to use Default<V> to provide specializations.

Possible bug

I'm rather sure you meant delete target, not delete tail.

 if(capacity == map.size()) {
 tail->prev->next = tail->next;
 node<K,V>* target = tail;
 map.erase(target->key);
 tail = tail->prev;
 delete tail; // << ?
 }

Otherwise you end up with a deleted, but not nulled tail.

Call-by-reference

Use const & for arguments if you're not going to change them afterwards.

Question 10

There is no reason for node to know the key.
The node constructor shall make a completely created node, that is prev and next fields should be initialized as well (to nullptr).

Member initializer lists are preferred to constructor bodies. For example,

 lru<K,V>::lru(std::size_t capacity)
 : capacity(capacity)
 , prev(nullptr)
 , next(nullptr) {
 assert(capacity > 1);
 }

Returning a Default<V>.value() does not fit the purpose. At least it means that the client cannot put such a value. An idiomatic (i.e. STL) way is to make get to return an iterator, with failure being signaled by returning end iterator.

The sequence in put:

tail->prev->next = tail->next;
....
tail = tail->prev;
delete tail;

is a definite bug, and has many grave consequences.

Consider a rewrite:

....
tail = tail->prev;
delete tail->next;
tail->next = nullptr;

The reviewer immediately notices another "bug" in put - namely that the head is not updated - and it takes time and mental concentration to recall the capacity > 1 assertion, which makes the bug immaterial. I strongly recommend to factor the tail removal into a function with the name which reflects this fact. I suggest pop_back_unguarded or something along those lines.
I don't think replace reflects what the function actually does. bump_up perhaps?

Question 11

I use node->key (for the last element in the list) to remove it from the map object. I've decided to return V {} as the default value. I think iterator should be used for sequences only, right? Though I'm not sure how it's done in C++. Nice review +1, thanks

Question 12

Expectations

When reviewing data structure objects, it's useful if the unit tests are presented alongside the code. At the very least, a simple test program such as this can help clarify expectations:

#include <string>
int main()
{
 auto cache = lru<int,std::string>{2};
 cache.put(1, "one");
 assert(cache.get(1) == "one");
 cache.put(2, "two");
 assert(cache.get(1) == "one");
 assert(cache.get(2) == "two");
 cache.put(3, "three"); // should replace 1
 assert(cache.get(1) == "");
 assert(cache.get(2) == "two"); // FAIL!?
 assert(cache.get(3) == "three");
}

Unfortunately, even this simple test fails at the penultimate assertion, so clearly I've misunderstood the intent.

If we comment out that assertion, then the program completes, but does expose leaked memory when run under Valgrind. So there's work required there to match allocations and deletions.

Detailed review

Don't use reserved names. __LRU_HPP__ is not available for user code as it contains two consecutive underscores.

Prefer C++ header files - specifically, <cassert> rather than <assert.h>.

We use std::pair but fail to include <utility>. Whilst many implementations bring this in transitively due to <map>, that's not required, so we should include it.

The Default class doesn't seem to add value. Looking ahead to the single place we use it, the only thing we do is immediately extract its value member. I'm not sure what it's doing here, really.

Since node is used only by the private interface of lru, it doesn't need to be exposed to the public.

It seems to be used only as an implementation of a doubly-linked list - and we have std::list for that, so perhaps it would be better to use one of those rather than implementing our own.

It's good to see correct use of explicit for the constructor, so that we don't accidentally create a conversion for values.

put() and get() both take their arguments by value, which likely results in unnecessary copying. I see no reason not to accept const references instead.

get() returns a V, which is problematic on two fronts. Firstly, it limits V to default-constructible type. Secondly, it means there's no way to distinguish an absent value from one that happens to equal V{}. Consider returning a pointer or a std::optional (according to the lifetime guarantees your users expect).

I don't see the part in get() where we update the key's most-recently-used status. Without that, the eviction policy is not LRU, but more like least-recently-modified. Actually, it's worse - the first if statement in put means that it's least-recently-created.

Improved code

See what happens if we use std::list instead of implementing our own linked-list logic:

#ifndef LRU_HPP_
#define LRU_HPP_
#include <map>
#include <list>
#include <optional>
#include <utility>
template<class K, class V>
class lru
{
 // The map points to list iterators
 // List is stored most-recent first
 using List = std::list<std::pair<K,V>>;
 using Map = std::map<K,typename List::iterator>;
 std::size_t capacity;
 Map map = {};
 mutable List list = {};
public:
 explicit lru(std::size_t capacity)
 : capacity{capacity}
 {}
 void put(K const& key, V const& value);
 std::optional<V> get(K const& key) const;
};
template<class K, class V>
void lru<K,V>::put(K const& key, V const& value)
{
 if (auto map_it = map.find(key); map_it != map.end()) {
 auto list_it = map_it->second;
 list_it->second = value;
 // move it to the front
 list.splice(list.begin(), list, list_it);
 return;
 }
 if (map.size() == capacity) {
 map.erase(list.back().first);
 list.pop_back();
 }
 list.emplace_front(key, value);
 map.emplace(key, list.begin());
}
template<class K, class V>
std::optional<V> lru<K,V>::get(const K& key) const
{
 if (auto map_it = map.find(key); map_it != map.end()) {
 auto list_it = map_it->second;
 // move it to the front
 list.splice(list.begin(), list, list_it);
 return list_it->second;
 }
 return std::nullopt;
}
#endif

No raw new/delete, no tricky pointer manipulations, and much shorter and simpler.

Jerry Coffin Jerry Coffin 34.1k4 gold badges77 silver badges145 bronze badges · Accepted Answer · 2018-03-22 05:58:31Z

Namespace Usage

I'd put all of this into some namespace, so it won't accidentally collide with other usage of the names it defines.

I'd probably also work even a little harder to do something to hide your node and Default templates. For example, node doesn't need to be visible to the outside world, so it would probably be better off defined inside of lru.

Prefer Initialization to Assignment

A constructor like this:

template<class K, class V>
inline node<K, V>::node(K key, V value) {
 this->key = key;
 this->value = value;
}

...is typically better written something like this instead:

template <class K, class V>
inline node<K, V>::node(K key, V value)
 : key(key)
 , value(value)
{}

In some cases (e.g., initializing a const member), you must use initialization (like this) instead of assignment. In this case, it's not mandatory, but I still consider it preferable.

Consider passing by reference

Passing your keys and values by value generally requires copying. You might want to consider whether it's worth passing by reference instead. This generally accomplishes little (or nothing) if the type being passed is "small", but can improve speed substantially if the type is "large". In this case, "small" generally means something that will fit in a single machine register, so "large" is anything that won't. E.g., a char, short, or int would usually be considered "small", but a std::string would usually be thought of as "large".

Encapsulate knowledge of a type's internals in that type

Right now, your lru class "knows" (depends upon) the internals of the node class. I'd generally prefer if only node knew about its internals. For one possibility, you might pass a next and prev to the ctor, and let it put those values into the node itself:

template <class K, class V>
inline node<K, V>::node(K key, V value, node *prev = nullptr, node *next = nullptr)
 : key(key)
 , value(value)
 , prev(prev)
 , next(next)
{}

In this case, your code like this:

node<K, V>* temp = new node<K, V>(key, value);
temp->prev = nullptr;
temp->next = head;

...would become something like:

node<K, V> *temp = new node<K, V>(key, value, nullptr, head);

Consider other Data Structures

Although a doubly-linked list does work for the task at hand, it's often fairly wasteful, using a couple of pointers per node in addition to the data you actually care about storing.

I'd consider using a singly linked list instead. Although it may not immediately be obviously how you can do this, I'm reasonably certain you can.

Consider pre-allocating your storage

Given that the total maximum number of objects is fixed when you create the cache, I'd also consider pre-allocating storage for both the objects and the linked-list nodes when you create the cache. An allocator for a fixed-size set of fixed-size blocks can be really trivial, which can improve speed considerably over using a couple of allocations on the free store for every item you insert in the cache.

I am not sure I understand how a circular buffer may be of help here. An entire point is to not change the locations of the values, that is keep them consistent with the pointers in the map, while changing their order.
For the node constructor, I recommend passing by value, as we store the arguments: node(K key, V value) : key{std::move(key)}, value{std::move(value)} {}. That works well with glvalue and rvalue arguments quite happily, and favours moving over copying.
@vnp: Hmmm...I'm not sure what I was thinking, but I think you're right.
Awesome review! Though I'm not sure how I can implement this with a singly linked list. The problem here is that I need to navigate back when I remove the last element. In my case, I can just use tail = tail->prev. Also, the idea of pre-allocating my storage sounds good, I should definitely add it. +1

Stack Exchange Network

Implementation of LRU cache

4 Answers 4

Namespace Usage

Prefer Initialization to Assignment

Consider passing by reference

Encapsulate knowledge of a type's internals in that type

Consider other Data Structures

Consider pre-allocating your storage

Header guards

17.6.4.3 Reserved names [reserved.names]

17.6.4.3.2 Global names [global.names]

`Default`

Possible bug

Call-by-reference

Expectations

Detailed review

Improved code

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Implementation of LRU cache

4 Answers 4

Namespace Usage

Prefer Initialization to Assignment

Consider passing by reference

Encapsulate knowledge of a type's internals in that type

Consider other Data Structures

Consider pre-allocating your storage

Header guards

17.6.4.3 Reserved names [reserved.names]

17.6.4.3.2 Global names [global.names]

Default

Possible bug

Call-by-reference

Expectations

Detailed review

Improved code

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

`Default`