What do you think of this implementation of an LRU cache? I'm trying to migrate to modern C++, so it would be fantastic if you could give me any tips here. I've heard it's not desired to use raw pointers, right? What about only internally, as in the example?
Also, since I can't return a nullptr
value in the get()
method, what is the best alternative for this? Default value as in the example?
#ifndef __LRU_HPP__
#define __LRU_HPP__
#include <map>
#include<iostream>
#include<assert.h>
template<class T>
struct Default {
T value;
Default():value(){}
};
template<class K, class V>
struct node {
K key;
V value;
node<K,V>* next;
node<K,V>* prev;
explicit node(K key, V value);
};
template<class K, class V>
class lru {
private:
node<K,V>* head;
node<K,V>* tail;
std::map<K,node<K,V>*> map;
std::size_t capacity;
void replace(node<K,V>*);
public:
explicit lru(std::size_t);
void put(K key, V value);
V get(K key);
};
template<class K, class V>
inline node<K,V>::node(K key, V value) {
this->key = key;
this->value = value;
}
template<class K, class V>
inline lru<K,V>::lru(std::size_t capacity) {
assert(capacity > 1);
this->capacity = capacity;
this->head = tail = nullptr;
}
template<class K, class V>
void lru<K,V>::put(K key, V value) {
auto it = map.find(key);
if (it != map.end()) {
node<K,V>* temp = it->second;
temp->value = value;
replace(temp);
return;
}
if(capacity == map.size()) {
tail->prev->next = tail->next;
node<K,V>* target = tail;
map.erase(target->key);
tail = tail->prev;
delete tail;
}
node<K,V>* temp = new node<K,V>(key, value);
temp->prev = nullptr;
temp->next = head;
if(head != nullptr)
head->prev = temp;
else
tail = temp;
head = temp;
map.insert(std::pair<K,node<K,V>*> {key, temp});
}
template<class K, class V>
V lru<K,V>::get(K key) {
auto it = map.find(key);
if(it != map.end()) {
replace(it->second);
return it->second->value;
}
return Default<V>().value;
}
template<class K, class V>
void lru<K,V>::replace(node<K,V>* temp) {
if (temp->prev != nullptr)
temp->prev->next = temp->next;
if (temp->next != nullptr)
temp->next->prev = temp->prev;
if (tail == temp)
tail = temp->prev != nullptr ? temp->prev : temp;
if (head != temp) {
head->prev = temp;
temp->next = head;
head = temp;
}
}
#endif
-
7\$\begingroup\$ Two underscores are reserved anywhere for the implementation. Your program technically has undefined behavior because of your include guard. \$\endgroup\$Ben Steffan– Ben Steffan2018年03月22日 06:05:30 +00:00Commented Mar 22, 2018 at 6:05
-
\$\begingroup\$ I have rolled back the last edit. Please see What to do when someone answers . \$\endgroup\$Zeta– Zeta2018年03月23日 05:43:06 +00:00Commented Mar 23, 2018 at 5:43
4 Answers 4
Namespace Usage
I'd put all of this into some namespace, so it won't accidentally collide with other usage of the names it defines.
I'd probably also work even a little harder to do something to hide your node
and Default
templates. For example, node
doesn't need to be visible to the outside world, so it would probably be better off defined inside of lru
.
Prefer Initialization to Assignment
A constructor like this:
template<class K, class V>
inline node<K, V>::node(K key, V value) {
this->key = key;
this->value = value;
}
...is typically better written something like this instead:
template <class K, class V>
inline node<K, V>::node(K key, V value)
: key(key)
, value(value)
{}
In some cases (e.g., initializing a const member), you must use initialization (like this) instead of assignment. In this case, it's not mandatory, but I still consider it preferable.
Consider passing by reference
Passing your key
s and value
s by value generally requires copying. You might want to consider whether it's worth passing by reference instead. This generally accomplishes little (or nothing) if the type being passed is "small", but can improve speed substantially if the type is "large". In this case, "small" generally means something that will fit in a single machine register, so "large" is anything that won't. E.g., a char
, short
, or int
would usually be considered "small", but a std::string
would usually be thought of as "large".
Encapsulate knowledge of a type's internals in that type
Right now, your lru
class "knows" (depends upon) the internals of the node
class. I'd generally prefer if only node
knew about its internals. For one possibility, you might pass a next
and prev
to the ctor, and let it put those values into the node
itself:
template <class K, class V>
inline node<K, V>::node(K key, V value, node *prev = nullptr, node *next = nullptr)
: key(key)
, value(value)
, prev(prev)
, next(next)
{}
In this case, your code like this:
node<K, V>* temp = new node<K, V>(key, value);
temp->prev = nullptr;
temp->next = head;
...would become something like:
node<K, V> *temp = new node<K, V>(key, value, nullptr, head);
Consider other Data Structures
Although a doubly-linked list does work for the task at hand, it's often fairly wasteful, using a couple of pointers per node in addition to the data you actually care about storing.
I'd consider using a singly linked list instead. Although it may not immediately be obviously how you can do this, I'm reasonably certain you can.
Consider pre-allocating your storage
Given that the total maximum number of objects is fixed when you create the cache, I'd also consider pre-allocating storage for both the objects and the linked-list nodes when you create the cache. An allocator for a fixed-size set of fixed-size blocks can be really trivial, which can improve speed considerably over using a couple of allocations on the free store for every item you insert in the cache.
-
\$\begingroup\$ I am not sure I understand how a circular buffer may be of help here. An entire point is to not change the locations of the values, that is keep them consistent with the pointers in the map, while changing their order. \$\endgroup\$vnp– vnp2018年03月22日 06:19:53 +00:00Commented Mar 22, 2018 at 6:19
-
2\$\begingroup\$ For the
node
constructor, I recommend passing by value, as we store the arguments:node(K key, V value) : key{std::move(key)}, value{std::move(value)} {}
. That works well with glvalue and rvalue arguments quite happily, and favours moving over copying. \$\endgroup\$Toby Speight– Toby Speight2018年03月22日 11:16:48 +00:00Commented Mar 22, 2018 at 11:16 -
\$\begingroup\$ @vnp: Hmmm...I'm not sure what I was thinking, but I think you're right. \$\endgroup\$Jerry Coffin– Jerry Coffin2018年03月22日 13:17:59 +00:00Commented Mar 22, 2018 at 13:17
-
\$\begingroup\$ Awesome review! Though I'm not sure how I can implement this with a singly linked list. The problem here is that I need to navigate back when I remove the last element. In my case, I can just use
tail = tail->prev
. Also, the idea of pre-allocating my storage sounds good, I should definitely add it. +1 \$\endgroup\$nullbyte– nullbyte2018年03月23日 04:07:27 +00:00Commented Mar 23, 2018 at 4:07
Header guards
Your code contains undefined behaviour since the underscores are reserved for the implementation:
17.6.4.3 Reserved names [reserved.names]
The C++ standard library reserves the following kinds of names:
- macros
- global names
- names with external linkage
If a program declares or defines a name in a context where it is reserved, other than as explicitly allowed by this Clause, its behavior is undefined.
[...]
17.6.4.3.2 Global names [global.names]
Certain sets of names and function signatures are always reserved to the implementation:
Each name that contains a double underscore
__
or begins with an underscore followed by an uppercase letter (2.12) is reserved to the implementation for any use.Each name that begins with an underscore is reserved to the implementation for use as a name in the global namespace.
So rename __LRU_HPP__
to LEAST_RECENTLY_USED_HPP
or similar.
Default
Instead of
return Default<V>().value;
use
return V{};
Unless you want to use Default<V>
to provide specializations.
Possible bug
I'm rather sure you meant delete target
, not delete tail
.
if(capacity == map.size()) {
tail->prev->next = tail->next;
node<K,V>* target = tail;
map.erase(target->key);
tail = tail->prev;
delete tail; // << ?
}
Otherwise you end up with a deleted, but not nulled tail
.
Call-by-reference
Use const &
for arguments if you're not going to change them afterwards.
There is no reason for
node
to know the key.The node constructor shall make a completely created node, that is
prev
andnext
fields should be initialized as well (tonullptr
).Member initializer lists are preferred to constructor bodies. For example,
lru<K,V>::lru(std::size_t capacity) : capacity(capacity) , prev(nullptr) , next(nullptr) { assert(capacity > 1); }
Returning a
Default<V>.value()
does not fit the purpose. At least it means that the client cannotput
such a value. An idiomatic (i.e. STL) way is to makeget
to return an iterator, with failure being signaled by returningend
iterator.The sequence in
put
:tail->prev->next = tail->next; .... tail = tail->prev; delete tail;
is a definite bug, and has many grave consequences.
Consider a rewrite:
.... tail = tail->prev; delete tail->next; tail->next = nullptr;
The reviewer immediately notices another "bug" in
put
- namely that the head is not updated - and it takes time and mental concentration to recall thecapacity > 1
assertion, which makes the bug immaterial. I strongly recommend to factor the tail removal into a function with the name which reflects this fact. I suggestpop_back_unguarded
or something along those lines.I don't think
replace
reflects what the function actually does.bump_up
perhaps?
-
\$\begingroup\$ I use
node->key
(for the last element in the list) to remove it from themap
object. I've decided to returnV {}
as the default value. I thinkiterator
should be used for sequences only, right? Though I'm not sure how it's done in C++. Nice review +1, thanks \$\endgroup\$nullbyte– nullbyte2018年03月23日 03:59:55 +00:00Commented Mar 23, 2018 at 3:59
Expectations
When reviewing data structure objects, it's useful if the unit tests are presented alongside the code. At the very least, a simple test program such as this can help clarify expectations:
#include <string>
int main()
{
auto cache = lru<int,std::string>{2};
cache.put(1, "one");
assert(cache.get(1) == "one");
cache.put(2, "two");
assert(cache.get(1) == "one");
assert(cache.get(2) == "two");
cache.put(3, "three"); // should replace 1
assert(cache.get(1) == "");
assert(cache.get(2) == "two"); // FAIL!?
assert(cache.get(3) == "three");
}
Unfortunately, even this simple test fails at the penultimate assertion, so clearly I've misunderstood the intent.
If we comment out that assertion, then the program completes, but does expose leaked memory when run under Valgrind. So there's work required there to match allocations and deletions.
Detailed review
Don't use reserved names. __LRU_HPP__
is not available for user code as it contains two consecutive underscores.
Prefer C++ header files - specifically, <cassert>
rather than <assert.h>
.
We use std::pair
but fail to include <utility>
. Whilst many implementations bring this in transitively due to <map>
, that's not required, so we should include it.
The Default
class doesn't seem to add value. Looking ahead to the single place we use it, the only thing we do is immediately extract its value
member. I'm not sure what it's doing here, really.
Since node
is used only by the private interface of lru
, it doesn't need to be exposed to the public.
It seems to be used only as an implementation of a doubly-linked list - and we have std::list
for that, so perhaps it would be better to use one of those rather than implementing our own.
It's good to see correct use of explicit
for the constructor, so that we don't accidentally create a conversion for values.
put()
and get()
both take their arguments by value, which likely results in unnecessary copying. I see no reason not to accept const references instead.
get()
returns a V
, which is problematic on two fronts. Firstly, it limits V
to default-constructible type. Secondly, it means there's no way to distinguish an absent value from one that happens to equal V{}
. Consider returning a pointer or a std::optional
(according to the lifetime guarantees your users expect).
I don't see the part in get()
where we update the key's most-recently-used status. Without that, the eviction policy is not LRU, but more like least-recently-modified. Actually, it's worse - the first if
statement in put
means that it's least-recently-created.
Improved code
See what happens if we use std::list
instead of implementing our own linked-list logic:
#ifndef LRU_HPP_
#define LRU_HPP_
#include <map>
#include <list>
#include <optional>
#include <utility>
template<class K, class V>
class lru
{
// The map points to list iterators
// List is stored most-recent first
using List = std::list<std::pair<K,V>>;
using Map = std::map<K,typename List::iterator>;
std::size_t capacity;
Map map = {};
mutable List list = {};
public:
explicit lru(std::size_t capacity)
: capacity{capacity}
{}
void put(K const& key, V const& value);
std::optional<V> get(K const& key) const;
};
template<class K, class V>
void lru<K,V>::put(K const& key, V const& value)
{
if (auto map_it = map.find(key); map_it != map.end()) {
auto list_it = map_it->second;
list_it->second = value;
// move it to the front
list.splice(list.begin(), list, list_it);
return;
}
if (map.size() == capacity) {
map.erase(list.back().first);
list.pop_back();
}
list.emplace_front(key, value);
map.emplace(key, list.begin());
}
template<class K, class V>
std::optional<V> lru<K,V>::get(const K& key) const
{
if (auto map_it = map.find(key); map_it != map.end()) {
auto list_it = map_it->second;
// move it to the front
list.splice(list.begin(), list, list_it);
return list_it->second;
}
return std::nullopt;
}
#endif
No raw new
/delete
, no tricky pointer manipulations, and much shorter and simpler.