I want to implement a multithreading LRU cache in C++. And here is the code. Can you tell me what should I do to make it more multithreading?
#include <iostream>
#include <unordered_map>
#include <thread>
#include <chrono>
#include <mutex>
using namespace std;
class LRUCache {
public:
class Node {
public:
int key,val;
Node* left,*right;
Node(int _key,int _val):key(_key),val(_val),left(NULL),right(NULL) {}
}*L,*R;
unordered_map<int,Node*> m;
int n;
mutex mtx;
condition_variable cv;
void insert(Node* node)
{
Node* next=L->right;
L->right=node;
node->right=next;
node->left=L;
next->left=node;
}
void erase(Node* node) {
node->left->right=node->right;
node->right->left=node->left;
}
LRUCache(int capacity) {
n=capacity;
L=new Node(-1,-1),R=new Node(-1,-1);
L->right=R;
R->left=L;
}
int get(int key) {
mtx.lock();
if(m.count(key)==0) {mtx.unlock();return -1;}
Node* node=m[key];
erase(node);
insert(node);
cout<<"get:"<<node->val<<endl;
mtx.unlock();
return node->val;
}
void put(int key, int value) {
mtx.lock();
if(m.count(key)) {
m[key]->val=value;
erase(m[key]);
insert(m[key]);
} else {
Node* node=new Node(key,value);
if(m.size()==n) {
Node* tmp=R->left;
erase(tmp);
m.erase(tmp->key);
}
insert(node);
m[key]=node;
}
cout<<"put:"<<value<<endl;
mtx.unlock();
}
void putcache() {
for(int i=1;i<=10;i++) {
put(i,i+10);
this_thread::sleep_for(std::chrono::seconds(1));
}
}
void getcache() {
for(int i=1;i<=10;i++) {
get(i);
this_thread::sleep_for(std::chrono::seconds(1));
}
}
};
int main() {
LRUCache c1(100);
std::thread t1(&LRUCache::putcache,&c1);
std::thread t2(&LRUCache::getcache,&c1);
t1.join();
t2.join();
}
2 Answers 2
Don't using namespace std;
, especially in headers - that brings every identifier from std
into the global namespace, from every header that's included (and that set of identifiers is subject to change with future standards). We have namespaces for a reason - let's use them!
This program doesn't compile, because condition_variable
is not defined. If that's intended to be std::condition_variable
, then we need to include <condition_variable>
before using it.
It doesn't make sense to have all the members declared public
. That means we have no control over the class's invariants, and any of its state can be modified unsafely.
Why are we using std::cout
in member functions get()
and put()
? I wouldn't want my program's standard output stream corrupted like that (to be honest, I wouldn't want such tracing even on std::clog
unless specifically requested). And don't flush the stream (with std::endl
) when that's not necessary - just use a plain newline.
There's a risk of memory leak here:
L=new Node(-1,-1),R=new Node(-1,-1);
If the creation of the second Node
fails, then we lose the one and only reference to the first (i.e. L
), and are unable to delete
it.
We have an even bigger memory leak, since no Node
ever gets deleted at all.
Why do we have arbitrary sleeps in the putcache()
and getcache()
member functions? I don't see any benefit to delaying these operations.
Instead of manually locking and unlocking the mutex mtx
, prefer std::lock_guard
, which will unlock the mutex in all return paths, including when an exception is thrown.
Node
requires (possibly deleted) copy and assignment operators, as copies shouldn't be sharing the pointers left
and right
. Similarly, LRUCache
needs those members, due to its pointers L
and R
. Initialise its members, rather than letting them default-initialise and then overwriting in the constructor.
Make more use of the standard library
You are leveraging std::unordered_map
to provide \$O(1)\$ lookups, but you have manually implemented a linked list to keep track of which nodes were most recently used. You can use std::list
to do this for you. In fact, the list will store both the keys and values, while the std::unordered_map
will map keys to entries in the list. You can make use of the guarantee that list iterators will not be invalidated, even if elements in a list are added or (re)moved. The splice()
function can be used to move an element to the front.
class LRUCache {
std::list<std::pair<int, int>> nodes;
using node_iterator = decltype(nodes)::iterator;
std::unordered_map<int, node_iterator> index;
std::mutex mtx;
std::size_t n;
void update(node_iterator node_it) {
nodes.splice(nodes.begin(), nodes, node_it);
}
public:
LRUCache(std::size_t n): n(n) {}
void put(int key, int value) {
std::lock_guard lock(mtx);
if (auto it = index.find(key); it != index.end()) {
auto node_it = it->second;
node_it->second = value;
update(node_it);
} else {
nodes.emplace_front(key, value);
index[key] = nodes.front();
if (nodes.size() > n) {
index.erase(nodes.back().first);
nodes.pop_back();
}
}
}
int get(int key) {
std::lock_guard lock(mtx);
if (auto it = index.find(key); it != index.end()) {
update(it->second);
return it->second->second;
} else {
return -1;
}
}
};
There are other variations possible, see for example this StackOverflow question.
Ambiguous return value
There is a problem with get()
: if it returns -1, it is not possible to tell whether this was because the key was not found, or if the key was found but the value was set to -1. Consider adding another way to report whether or not the key was found. A good way in C++17 or later is to use std::optional
for the return value.
Make it a template
Your cache is hardcoded to use int
for both keys and values. What if you want to have std::string
keys and float
values? You can easily make your cache work with any type of key and value by making it a template:
template<typename Key, typename Value>
class LRUCache {
std::list<std::pair<Key, Value>> nodes;
...
public:
void put(const Key& key, const Value& value) {
...
}
std::optional<Value> get(const Key& key) {
...
}
};