Fast pool allocator for games in C++

Question 1

I have implemented the following pool allocator in C++:

template <typename T>
struct pool {
private:
 struct node {
 node* next;
 T element;
 };
private:
 std::vector<node*> m_Chunks;
 node* m_Head = nullptr;
 uint64 m_MaxElements = 0;
 bool m_Resizable;
public:
 pool(pool const&) = delete;
 pool& operator=(pool const&) = delete;
 pool(uint64 nElems, bool resiz = false)
 : m_Resizable{ resiz } {
 m_Head = alloc_chunk(nElems);
 }
 pool(pool&& o) 
 : m_Chunks{ std::move(o.m_Chunks) }, m_Head{ o.m_Head },
 m_MaxElements{ o.m_MaxElements }, m_Resizable{ o.m_Resizable } {
 }
 pool& operator=(pool&& o) {
 for (auto n : m_Chunks) {
 std::free(n);
 }
 m_Chunks = std::move(o.m_Chunks);
 m_Head = o.m_Head;
 m_MaxElements = o.m_MaxElements;
 m_Resizable = o.m_Resizable;
 return *this;
 }
 ~pool() {
 for (auto n : m_Chunks) {
 std::free(n);
 }
 }
 operator bool() const {
 return m_Chunks.size();
 }
 T* alloc() {
 if (!m_Head) {
 if (m_Resizable) {
 m_Head = alloc_chunk(m_MaxElements);
 if (!m_Head) {
 return nullptr;
 }
 }
 else {
 return nullptr;
 }
 }
 auto h = m_Head;
 m_Head = m_Head->next;
 return &h->element;
 }
 void free(T* ptr) {
 if (!ptr) {
 return;
 }
 uint8* mem_raw = reinterpret_cast<uint8*>(ptr);
 mem_raw -= offsetof(node, element);
 node* mem_head = reinterpret_cast<node*>(mem_raw);
 mem_head->next = m_Head;
 m_Head = mem_head;
 }
private:
 node* alloc_chunk(uint64 num) {
 uint64 alloc_sz = sizeof(node) * num;
 node* mem = reinterpret_cast<node*>(std::malloc(alloc_sz));
 if (!mem) {
 return nullptr;
 }
 m_Chunks.push_back(mem);
 node* it = mem;
 for (uint64 i = 1; i < num; ++i, ++it) {
 it->next = it + 1;
 }
 it->next = nullptr;
 m_MaxElements += num;
 return mem;
 }
};

Is the implementation good/correct? Can I make the code higher quality somehow? I have written a small test that stress-tests it and it seems OK, performance is about 5 to 10 times better than default operator new.

Are there modern C++ elements I could use? I've learnt C++11 and 14 this year in college and I've tried to use my knowledge here. I know this should mean that I should make it exception-safe, but games usually not use exceptions (that's why I've included operator bool for minimal error checking) so I've decided not to.

Edit: The test code I used

template <typename FN>
void measure_exec(const char* name, FN f) {
 auto start = std::chrono::steady_clock::now();
 f();
 auto t = std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::steady_clock::now() - start);
 std::cout << name << " took " << t.count() << "ms." << std::endl;
}
struct message {
 int id;
 double timestamp;
};
int main(void) {
 std::srand(std::time(nullptr));
 std::vector<message*> control;
 std::vector<message*> test;
 measure_exec("Pool", [&]{
 mem::pool<message> pool{ 32, true };
 for (uint64 i = 0; i < 200000; ++i) {
 if (i % 15) {
 // Allocate
 int r_id = std::rand();
 double r_time = double(std::rand()) / std::rand();
 auto t = pool.alloc();
 t->id = r_id;
 t->timestamp = r_time;
 test.push_back(t);
 }
 else if (control.size()) {
 // Delete
 uint64 idx = std::rand() % control.size();
 test.erase(test.begin() + idx);
 }
 }
 });
 measure_exec("New", [&]{
 for (uint64 i = 0; i < 200000; ++i) {
 if (i % 15) {
 // Allocate
 int r_id = std::rand();
 double r_time = double(std::rand()) / std::rand();
 control.push_back(new message{ r_id, r_time });
 }
 else if (control.size()) {
 // Delete
 uint64 idx = std::rand() % control.size();
 control.erase(control.begin() + idx);
 }
 }
 });
 std::cin.get();
 return 0;
}

Note that I haven't really tested anything like this before, this testing method was a complete shot in the dark. I know because of randomness it can be unfair but repeating the test yielded similar results. Yes I know I don't free memory but I don't think it matters that much.

Question 2

I am surprised you can get that kind of speed increase. The C++ memory pool is relatively well optimized. Now you can get that kind of speed increase if you basically do very little in terms of management and throw away the results at the end. Do you want to publish your tests so we can also give those a once over.

Question 3

@MartinYork Sure, I'll update my question with the test code.

Question 4

WIthout writing a full answer, you should have a look at std::aligned_storage. You are still hitting the memory system a lot more than you could/should.

Question 5

@Frank Do you mean that I should replace the node or the node element with aligned storage? Shouldn't the compiler provide a good alignment?

Question 6

@PeterLenkefi Ideally, you would batch allocate an array of aligned_storage and in-place new objects on top of that storage on a per-demand basis. This way, objects only exist when allocated, instead of being batch constructed.

Question 7

Comment on testing

If you are not going to bother with delete then you may as well write a much simpler version.

template <typename T>
struct pool
{
 private:
 node* m_head = nullptr;
 node* m_tail = nullptr;
 std::size_t m_max = 0;
 bool m_resize= false;
 
 public:
 pool(pool const&) = delete;
 pool& operator=(pool const&) = delete;
 pool(uint64 nElems, bool resiz = false)
 : m_max(nElems)
 , m_resize(resiz) {
 allocBlock();
 }
 pool(pool&& o) nothrow {
 swap(o);
 }
 pool& operator=(pool&& o) nothrow {
 swap(o);
 return *this;
 }
 void swap(pool& other) nothrow {
 using std::swap;
 swap(m_head, other.m_head);
 swap(m_tail, other.m_tail);
 swap(m_max, other.m_max);
 swap(m_resize, other.m_resize);
 }
 operator bool() const {
 return m_head != m_tail;
 }
 T* alloc() {
 if ((m_head == m_tail) && m_resizable) {
 allocBlock();
 }
 return m_head == m_tail ? nullptr : m_head++;
 }
 
 void free(T* ptr) {}
 
 private:
 void allocBlock() {
 uint64 alloc_sz = sizeof(T) * m_max;
 m_head = m_tail = reinterpret_cast<T*>(std::malloc(alloc_sz));
 if (m_head == nullptr) {
 throw std::bad_alloc;
 }
 m_tail += m_max;
 }
};

Code Review

You should mark your move constructor as noexcept.

 pool(pool&& o) noexcept;

They should be noexcept anyway; good to let the language police that for you (it terminates if it actually throws). Also it can help if you were to use your pool with standard libraries as it enables some optimizations.

Sure your object is invalid after a move. And will not cause any problems if used correctly.

 pool(pool&& o) 
 : m_Chunks{ std::move(o.m_Chunks) }, m_Head{ o.m_Head },
 m_MaxElements{ o.m_MaxElements }, m_Resizable{ o.m_Resizable } {
 }

But I think it will cause problems if used incorrectly. The m_Head pointer still points at the chain of elements. So you can still accidentally use this and cause real problems for the other allocator that just took ownership of the memory.

The standard way to implement this is using swap(). See my implementation above. This handles situations like this in a safe way.

Same comments about the move assignment operator.

But this looks weird.

 pool& operator=(pool&& o) {
 for (auto n : m_Chunks) {
 std::free(n);
 }

You free all the memory. BUT your head still points into this chain and can be-reused. I think that is definitely a bug. A move should not be freeing the memory.

This is strange.

 operator bool() const {
 return m_Chunks.size();
 }

I would have made that return true if there is available memory. Not if we ever successfully allocated memory. But I have not read the standard allocator documentation. So that may be what you want (though I doubt it).

Why are you using uint8?

 uint8* mem_raw = reinterpret_cast<uint8*>(ptr);

Should this not be char*? The operator offsetof() returns the offset in bytes. The char by definition fits into 1 byte, thus a char* is a 1 byte addressable range. Also the standard has special properties for char* that other integer types don't have.

Also the uint8 type is not standard, though std::uint8_t is standard. But it is not required to be defined. If it is defined then it has exactly 8 bits but if that is not available the type is not available.

Comments on testing code

Don't allow the vector to expand.

std::vector<message*> control;
std::vector<message*> test;

Otherwise you are testing the memory manager code. Which is very complex and will give you different results (because memory has already been messed with in a previous test).

So pre-allocate the size of the vectors (so they don't reallocate during the test).

control.reserve(2000000);
test.reserve(2000000);

That is way too small a test.

 for (uint64 i = 0; i < 200000; ++i) {

Both of these allocators will finish in the blink of an eye. You want a test that lasts a couple of seconds to get real information.

You are not running the constructor.

 // Allocation but no construction
 auto t = pool.alloc();
 // Allocation and construction.
 new message{ r_id, r_time };

You could use placement-new to make them more similar. But personally I would override the new and delete operators so that they use your pool.

The test time will be dominated by this call.

 uint64 idx = std::rand() % control.size();
 test.erase(test.begin() + idx);

You are forcing a move of all the iterators in the vector. Rather than erasing it, just free the memory and set the pointer to nullptr. That would be a much better test.

You should use exactly the same code for both tests. The only difference should be the allocator used. You should have one test that uses your custom allocator. While the second test uses an allocator that simply calls new/delete underneath the hood.

template<typename T>
class normal
{
 public:
 T* alloc() {return reinterpret_cast<T*>(new char[sizeof(T)]);}
 void free(T* ptr) {delete [] reinterpret_cast<char*>(ptr);}
};
int main(void)
{
 std::size_t testSize = 200'000'000;
 std::vector<message*> control;
 std::vector<message*> test;
 control.reserve(testSize);
 test.reserve(testSize);
 measure_exec("Pool", [&]{mem::pool<message> pool{ 32, true }; runtTest(testSize, test, pool);});
 measure_exec("New", [&]{mem::normal<message> pool;runtTest(testSize, control, pool);});
 std::cin.get();
}

Question 8

"They should be no-throw anyway good to let the language put that extra testing in for you." 1. I assume you mean noexcept, since nothrow is a facility used to request different behavior from operator new and doesn't make sense in the context of a move constructor. 2. Even if you add noexcept, this will not do any extra checking from the language side for you. The only thing that will happen is that exceptions get converted into calls to std::terminate (unless that is what you mean by testing).

Question 9

@BenSteffan: Yes that's what I was trying to say. Updated answer.

Question 10

Thank you! Actually I forgot to test freeing, that was really stupid of me. Why shouldn't the move assignment free? I don't get that. I set the head into the other pool which should mean I don't point into the old memory.

Question 11

Don't abbreviate unnecessarily

Most of the names you've chosen for variables and functions look fine to me, but there are some weird, unnecessary abbreviations:

resiz -> just write resize
alloc_sz -> I would write alloc_size or size

Some abbreviations are quite commonly used so I'd keep those (like Elems and alloc instead of Elements and allocation). However, be consistent: why is it nElems but m_MaxElements?

Create a `class chunk`

Your pool consists of multiple chunks, and each chunk is an array of nodes. It makes sense to make a proper type for chunks as well. While you are at it, avoid doing manual memory allocations, and let std::vector do it for you. The alloc_chunk() function can become the constructor of chunk.

struct chunk {
 std::vector<node> nodes;
 chunk(size_t nElems): nodes(nElems) {
 for (auto &node: nodes)
 node.next = &node + 1;
 nodes.back().next = nullptr;
 }
};

But this also leads me to:

Consider using a `std::deque` to store `node`s

Your allocator requires each node, once allocated, to be at the same place in memory for the rest of its life. If you resize a std::vector or use realloc(), the storage might be moved. This is why you are using chunks. However, there is a STL container that gives you stable pointers and the ability to resize: std::deque. This allows you to get rid of chunks.

template <typename T>
class pool {
 struct node {/* same as before */};
 std::deque<node> m_Nodes;
 node *m_Head{};
public:
 pool(pool const&) = delete;
 pool& operator=(pool const&) = delete;
 pool() = default;
 pool(pool&& o) = default;
 ...
 T *alloc() {
 if (!m_Head) {
 m_Nodes.emplace_back({});
 m_Head = &m_Nodes.back();
 }
 ...
 }
 ...
};

Performance

Your allocator is very good when storing large objects and when the game doesn't need to allocate large batches of objects at a time. However, it has the overhead of one pointer per element, so if you are storing small objects, you might be wasting a lot of memory. Furthermore, there is no better way to allocate a batch of objects than to allocate them one by one.

If you want to store small objects (for example, struct message is not that big), then instead of having a linked list of free nodes, it might be better to use (a tree of) bitmaps to store which elements of the chunk are used or not. Finding a region of consecutive free elements might also be much easier this way.

Use a testing framework

Consider using an existing (unit) testing framework, like CppUnit or Google Test, and a benchmarking framework like Google Benchmark. You still need to write the actual tests yourself, but they take away some of the burden and impose some order to what you are doing.

score 8 · Answer 1 · 2018-02-06 23:22:33Z

Comment on testing

If you are not going to bother with delete then you may as well write a much simpler version.

template <typename T>
struct pool
{
 private:
 node* m_head = nullptr;
 node* m_tail = nullptr;
 std::size_t m_max = 0;
 bool m_resize= false;
 
 public:
 pool(pool const&) = delete;
 pool& operator=(pool const&) = delete;
 pool(uint64 nElems, bool resiz = false)
 : m_max(nElems)
 , m_resize(resiz) {
 allocBlock();
 }
 pool(pool&& o) nothrow {
 swap(o);
 }
 pool& operator=(pool&& o) nothrow {
 swap(o);
 return *this;
 }
 void swap(pool& other) nothrow {
 using std::swap;
 swap(m_head, other.m_head);
 swap(m_tail, other.m_tail);
 swap(m_max, other.m_max);
 swap(m_resize, other.m_resize);
 }
 operator bool() const {
 return m_head != m_tail;
 }
 T* alloc() {
 if ((m_head == m_tail) && m_resizable) {
 allocBlock();
 }
 return m_head == m_tail ? nullptr : m_head++;
 }
 
 void free(T* ptr) {}
 
 private:
 void allocBlock() {
 uint64 alloc_sz = sizeof(T) * m_max;
 m_head = m_tail = reinterpret_cast<T*>(std::malloc(alloc_sz));
 if (m_head == nullptr) {
 throw std::bad_alloc;
 }
 m_tail += m_max;
 }
};

Code Review

You should mark your move constructor as noexcept.

 pool(pool&& o) noexcept;

They should be noexcept anyway; good to let the language police that for you (it terminates if it actually throws). Also it can help if you were to use your pool with standard libraries as it enables some optimizations.

Sure your object is invalid after a move. And will not cause any problems if used correctly.

 pool(pool&& o) 
 : m_Chunks{ std::move(o.m_Chunks) }, m_Head{ o.m_Head },
 m_MaxElements{ o.m_MaxElements }, m_Resizable{ o.m_Resizable } {
 }

But I think it will cause problems if used incorrectly. The m_Head pointer still points at the chain of elements. So you can still accidentally use this and cause real problems for the other allocator that just took ownership of the memory.

The standard way to implement this is using swap(). See my implementation above. This handles situations like this in a safe way.

Same comments about the move assignment operator.

But this looks weird.

 pool& operator=(pool&& o) {
 for (auto n : m_Chunks) {
 std::free(n);
 }

You free all the memory. BUT your head still points into this chain and can be-reused. I think that is definitely a bug. A move should not be freeing the memory.

This is strange.

 operator bool() const {
 return m_Chunks.size();
 }

I would have made that return true if there is available memory. Not if we ever successfully allocated memory. But I have not read the standard allocator documentation. So that may be what you want (though I doubt it).

Why are you using uint8?

 uint8* mem_raw = reinterpret_cast<uint8*>(ptr);

Should this not be char*? The operator offsetof() returns the offset in bytes. The char by definition fits into 1 byte, thus a char* is a 1 byte addressable range. Also the standard has special properties for char* that other integer types don't have.

Also the uint8 type is not standard, though std::uint8_t is standard. But it is not required to be defined. If it is defined then it has exactly 8 bits but if that is not available the type is not available.

Comments on testing code

Don't allow the vector to expand.

std::vector<message*> control;
std::vector<message*> test;

Otherwise you are testing the memory manager code. Which is very complex and will give you different results (because memory has already been messed with in a previous test).

So pre-allocate the size of the vectors (so they don't reallocate during the test).

control.reserve(2000000);
test.reserve(2000000);

That is way too small a test.

 for (uint64 i = 0; i < 200000; ++i) {

Both of these allocators will finish in the blink of an eye. You want a test that lasts a couple of seconds to get real information.

You are not running the constructor.

 // Allocation but no construction
 auto t = pool.alloc();
 // Allocation and construction.
 new message{ r_id, r_time };

You could use placement-new to make them more similar. But personally I would override the new and delete operators so that they use your pool.

The test time will be dominated by this call.

 uint64 idx = std::rand() % control.size();
 test.erase(test.begin() + idx);

You are forcing a move of all the iterators in the vector. Rather than erasing it, just free the memory and set the pointer to nullptr. That would be a much better test.

You should use exactly the same code for both tests. The only difference should be the allocator used. You should have one test that uses your custom allocator. While the second test uses an allocator that simply calls new/delete underneath the hood.

template<typename T>
class normal
{
 public:
 T* alloc() {return reinterpret_cast<T*>(new char[sizeof(T)]);}
 void free(T* ptr) {delete [] reinterpret_cast<char*>(ptr);}
};
int main(void)
{
 std::size_t testSize = 200'000'000;
 std::vector<message*> control;
 std::vector<message*> test;
 control.reserve(testSize);
 test.reserve(testSize);
 measure_exec("Pool", [&]{mem::pool<message> pool{ 32, true }; runtTest(testSize, test, pool);});
 measure_exec("New", [&]{mem::normal<message> pool;runtTest(testSize, control, pool);});
 std::cin.get();
}

"They should be no-throw anyway good to let the language put that extra testing in for you." 1. I assume you mean noexcept, since nothrow is a facility used to request different behavior from operator new and doesn't make sense in the context of a move constructor. 2. Even if you add noexcept, this will not do any extra checking from the language side for you. The only thing that will happen is that exceptions get converted into calls to std::terminate (unless that is what you mean by testing).
@BenSteffan: Yes that's what I was trying to say. Updated answer.
Thank you! Actually I forgot to test freeing, that was really stupid of me. Why shouldn't the move assignment free? I don't get that. I set the head into the other pool which should mean I don't point into the old memory.

score 2 · Answer 2 · 2021-08-08 10:28:05Z

Don't abbreviate unnecessarily

Most of the names you've chosen for variables and functions look fine to me, but there are some weird, unnecessary abbreviations:

resiz -> just write resize
alloc_sz -> I would write alloc_size or size

Some abbreviations are quite commonly used so I'd keep those (like Elems and alloc instead of Elements and allocation). However, be consistent: why is it nElems but m_MaxElements?

Create a `class chunk`

Your pool consists of multiple chunks, and each chunk is an array of nodes. It makes sense to make a proper type for chunks as well. While you are at it, avoid doing manual memory allocations, and let std::vector do it for you. The alloc_chunk() function can become the constructor of chunk.

struct chunk {
 std::vector<node> nodes;
 chunk(size_t nElems): nodes(nElems) {
 for (auto &node: nodes)
 node.next = &node + 1;
 nodes.back().next = nullptr;
 }
};

But this also leads me to:

Consider using a `std::deque` to store `node`s

Your allocator requires each node, once allocated, to be at the same place in memory for the rest of its life. If you resize a std::vector or use realloc(), the storage might be moved. This is why you are using chunks. However, there is a STL container that gives you stable pointers and the ability to resize: std::deque. This allows you to get rid of chunks.

template <typename T>
class pool {
 struct node {/* same as before */};
 std::deque<node> m_Nodes;
 node *m_Head{};
public:
 pool(pool const&) = delete;
 pool& operator=(pool const&) = delete;
 pool() = default;
 pool(pool&& o) = default;
 ...
 T *alloc() {
 if (!m_Head) {
 m_Nodes.emplace_back({});
 m_Head = &m_Nodes.back();
 }
 ...
 }
 ...
};

Performance

Your allocator is very good when storing large objects and when the game doesn't need to allocate large batches of objects at a time. However, it has the overhead of one pointer per element, so if you are storing small objects, you might be wasting a lot of memory. Furthermore, there is no better way to allocate a batch of objects than to allocate them one by one.

If you want to store small objects (for example, struct message is not that big), then instead of having a linked list of free nodes, it might be better to use (a tree of) bitmaps to store which elements of the chunk are used or not. Finding a region of consecutive free elements might also be much easier this way.

Use a testing framework

Consider using an existing (unit) testing framework, like CppUnit or Google Test, and a benchmarking framework like Google Benchmark. You still need to write the actual tests yourself, but they take away some of the burden and impose some order to what you are doing.

Stack Exchange Network

Fast pool allocator for games in C++

2 Answers 2

Comment on testing

Code Review

Comments on testing code

Don't abbreviate unnecessarily

Create a `class chunk`

Consider using a `std::deque` to store `node`s

Performance

Use a testing framework

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Fast pool allocator for games in C++

2 Answers 2

Comment on testing

Code Review

Comments on testing code

Don't abbreviate unnecessarily

Create a class chunk

Consider using a std::deque to store nodes

Performance

Use a testing framework

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

Create a `class chunk`

Consider using a `std::deque` to store `node`s