Today I tried to learn how to make a simple garbage collector for a future project, it didn't need to be fast nor complex nor optimal, it just needed to work. After lots of searching and reading, I was able to make something work.
So, how can I improve it from here? At some point I guess I'll have to ditch the std::array
(the small size is just for testing purposes). Do I actually need to use new
and delete
, or can I just keep using STL containers?
For what I've read, I have a couple options: mark and compact, copying garbage collection, generational garbage collector. Which of them would be the more natural evolution from what I've done?
#include <array>
#include <iostream>
#include <string>
using std::cout;
using std::endl;
template <typename T>
struct Handle {
bool marked {};
T value {};
Handle * next {};
};
template <typename T>
class Heap {
std::array<Handle<T>, 10> heap;
Handle<T> * root {};
Handle<T> * free_list {};
void push(Handle<T> * handle) {
handle->next = free_list;
free_list = handle;
}
void reset() {
for (size_t i = 0; i < heap.size(); ++i) {
heap[i].marked = false;
}
}
void mark(Handle<T> * handle) {
if (handle && !handle->marked) {
handle->marked = true;
mark(handle->next);
}
}
void sweep() {
free_list = nullptr;
for (size_t i = 0; i < heap.size(); ++i) {
if (!heap[i].marked) {
heap[i].value = {};
push(&heap[i]);
}
}
}
public:
Heap() {
for (size_t i = 0; i < heap.size(); ++i) {
push(&heap[i]);
}
}
template <typename U>
Handle<T> * allocate(U && u) {
if (!free_list) {
reset();
mark(root);
sweep();
if (!free_list) {
throw std::bad_alloc();
}
}
Handle<T> * handle = free_list;
free_list = free_list->next;
handle->value = std::forward<T>(u);
handle->next = nullptr;
return handle;
}
Handle<T> * keep_alive(Handle<T> * handle) {
handle->next = root;
root = handle;
return handle;
}
void signal() {
root = nullptr;
}
};
int main() {
Heap<std::string> heap;
auto a = heap.allocate("A");
auto b = heap.keep_alive(a);
a = heap.allocate("B");
auto c = heap.keep_alive(a);
a = heap.allocate("C");
a = heap.allocate("D");
a = heap.allocate("E");
cout << a->value << endl;
cout << b->value << endl;
cout << c->value << endl;
return 0;
}
2 Answers 2
From a design and interface point of view, I find the use of:
auto a = heap.allocate("A");
cout << a->value << endl;
unnatural. I would expect a
to act like a pointer to std::string
since it's the value returned from a function named allocate
using an object of type Heap<std::string>
.
It would be more natural if:
*a
evaluated tostd::string&
(std::string const&
ifa
wereconst
)a->
evaluated tostd::string*
(std::string const*
ifa
wereconst
)
Then, you could replace a->value
by *a
.
auto a = heap.allocate("A");
cout << *a << endl;
P.S. I haven't spent enough time delving into the implementation of Heap
to suggest how it can be done.
-
\$\begingroup\$ I believe that implementing the pointer-like semantics for the allocated objects can be done by implementing the
Handle::operator*()
andHandle::operator->()
operators. \$\endgroup\$Maarten Bamelis– Maarten Bamelis2017年02月17日 10:51:45 +00:00Commented Feb 17, 2017 at 10:51 -
\$\begingroup\$ @MaartenBamelis, that was my first thought. However, the return type of
Heap::allocate
isHandle<T>*
, notHandle<T>
. The internals ofHeap
need to be changed significantly to allow for such a change. \$\endgroup\$R Sahu– R Sahu2017年02月17日 16:11:21 +00:00Commented Feb 17, 2017 at 16:11 -
\$\begingroup\$ an important detail that I missed! In my opinion, the added value of the pointer-like semantics does warrant rethinking the
Heap
class. As you recognized before, it feels more natural and intuitive that way. \$\endgroup\$Maarten Bamelis– Maarten Bamelis2017年02月17日 17:21:46 +00:00Commented Feb 17, 2017 at 17:21
I'm not sure there's such a thing as a simple mark-sweep garbage-collector. Many start off that way, but they all seem to grow more complex, particularly as they get tuned for performance.
I'm not an expert in the domain, but I can review some of the C++ style issues.
Firstly, thanks for providing a well-written and complete program with test cases. That always helps the reviewer! I compiled using GCC with my usual warnings enabled, and the only criticism it made was an Effective C++ recommendation: class `Heap’ has pointer data members but does not override its copy constructor or assignment operator. I don't think you need them, so:
Heap(const Heap&) = delete;
Heap operator=(const Heap&) = delete;
There's a numeric literal 10
which seems to be arbitrary - it seems to be maximum number of roots. It's probably better as a named constant, to make this more obvious. You might even want to make it a template parameter of your class.
Loops such as this one:
void reset() {
for (size_t i = 0; i < heap.size(); ++i) {
heap[i].marked = false;
}
}
are (IMO) more readable if written with range-based for
:
void reset() {
for (auto& handle: heap)
handle.marked = false;
}
You use std::forward()
but don't include <utility>
; similarly for std::bad_alloc
and <new>
. Don't rely on your implementation's transitive includes, but always include the documented headers for what you use. I'd also move the <iostream>
stuff down to live with your main()
, as that's really only part of your test suite.
My edited version:
#include <array>
#include <new>
#include <utility>
template <typename T>
struct Handle {
bool marked {};
T value {};
Handle * next {};
};
static const size_t heap_capacity = 10;
template <typename T, size_t N = heap_capacity>
class Heap {
Heap(const Heap&) = delete;
Heap operator=(const Heap&) = delete;
std::array<Handle<T>, N> heap{};
Handle<T> * root {};
Handle<T> * free_list {};
void push(Handle<T> * handle) {
handle->next = free_list;
free_list = handle;
}
void reset() {
for (auto& handle: heap)
handle.marked = false;
}
void mark(Handle<T> * handle) {
if (handle && !handle->marked) {
handle->marked = true;
mark(handle->next);
}
}
void sweep() {
free_list = nullptr;
for (auto& handle: heap) {
if (!handle.marked) {
handle.value = {};
push(&handle);
}
}
}
public:
Heap() {
for (auto& handle: heap) {
push(&handle);
}
}
template <typename U>
Handle<T> * allocate(U && u) {
if (!free_list) {
reset();
mark(root);
sweep();
if (!free_list) {
throw std::bad_alloc();
}
}
Handle<T> * handle = free_list;
free_list = free_list->next;
handle->value = std::forward<T>(u);
handle->next = nullptr;
return handle;
}
Handle<T> * keep_alive(Handle<T> * handle) {
handle->next = root;
root = handle;
return handle;
}
void signal() {
root = nullptr;
}
};
#include <string>
#include <iostream>
int main() {
using std::cout;
using std::endl;
Heap<std::string> heap;
auto a = heap.allocate("A");
auto b = heap.keep_alive(a);
a = heap.allocate("B");
auto c = heap.keep_alive(a);
a = heap.allocate("C");
a = heap.allocate("D");
a = heap.allocate("E");
cout << a->value << endl;
cout << b->value << endl;
cout << c->value << endl;
return 0;
}
std::shared_ptr
s won't work if there's cyclic references. \$\endgroup\$std::weak_ptrs
where you can introduce a cycle. That is what they are there for. \$\endgroup\$