Are the final templated functions allocate
and deallocate
thread safe? The Cache
object is a thread_local
and that also makes FreeList
a thread_local
, the only parts that I'm still in doubt are the calls to std::malloc
and std::free
. Are they always thread safe or that depends on the implementation? Is it better to put a lock on them?
I think that the cached memory doesn't need locks since all the objects here are thread_local
and each thread has a local copy of it. The only problem really are std::malloc
and std::free
, are they already synchronized?
Here's a self contained example of all of my code:
#include <iostream>
class FreeList {
struct Node {
Node * next;
};
Node * this_head;
public:
static constexpr auto const NODE_SIZE = sizeof(Node);
FreeList() noexcept
: this_head() {
}
FreeList(FreeList &&) = delete;
FreeList(FreeList const &) = delete;
FreeList & operator =(FreeList &&) = delete;
FreeList & operator =(FreeList const &) = delete;
void push(void * const address) noexcept {
auto const head = static_cast<Node *>(address);
head->next = this_head;
this_head = head;
}
auto pop() noexcept {
void * const head = this_head;
if (head) {
this_head = this_head->next;
}
return head;
}
};
template <uint64_t const SIZE>
class Cache {
static_assert(SIZE >= FreeList::NODE_SIZE);
FreeList this_free_list;
public:
Cache() noexcept
: this_free_list() {
}
Cache(Cache &&) = delete;
Cache(Cache const &) = delete;
~Cache() noexcept {
while (auto const address = this_free_list.pop()) {
// Do I need a lock here?
std::free(address);
}
}
Cache & operator =(Cache &&) = delete;
Cache & operator =(Cache const &) = delete;
auto allocate() {
if (auto const address = this_free_list.pop()) {
return address;
}
// Do I need a lock here?
if (auto const address = std::malloc(SIZE)) {
return address;
}
throw std::bad_alloc();
}
void deallocate(void * const address) noexcept {
if (address) {
this_free_list.push(address);
}
}
};
template <typename TYPE>
auto & get() noexcept {
thread_local TYPE local;
return local;
}
template <typename TYPE>
auto & get_cache() noexcept {
return get<Cache<sizeof(TYPE)>>();
}
// Are these thread safe?
template <typename TYPE>
auto allocate() {
auto const address = get_cache<TYPE>().allocate();
return static_cast<TYPE *>(address);
}
template <typename TYPE>
void deallocate(void * const address) noexcept {
get_cache<TYPE>().deallocate(address);
}
int main() {
auto x = allocate<int64_t>();
*x = 5;
std::cout << *x << '\n';
deallocate<int64_t>(x);
}
-
\$\begingroup\$ Which C++ version is this targeting? C++11? C++14? C++17? \$\endgroup\$hoffmale– hoffmale2018年07月23日 23:44:36 +00:00Commented Jul 23, 2018 at 23:44
-
\$\begingroup\$ I'm targeting mainly C++17, but the more versions, the better. \$\endgroup\$João Pires– João Pires2018年07月23日 23:49:36 +00:00Commented Jul 23, 2018 at 23:49
2 Answers 2
Are the final templated functions
allocate
anddeallocate
thread safe?
As every other state used is thread-local, the answer hinges on this:
Is the malloc()
/free()
memory-management-system thread-safe?
The C++ Standard just defers to the C standard:
Effects: These functions have the semantics specified in the C standard library.
And the C Standard says something like (from C11 final draft n1570):
- For purposes of determining the existence of a data race, memory allocation functions behave as though they accessed only memory locations accessible through their arguments and not other static duration storage. These functions may, however, visibly modify the storage that they allocate or deallocate. A call to
free
orrealloc
that deallocates a region p of memory synchronizes with any allocation call that allocates all or part of the region p. This synchronization occurs after any access of p by the deallocating function, and before any such access by the allocating function.
So, in the end, the answer is:
Yes, it's thread-safe.
Now, let's critic your code:
You are only using
FreeList
forCache
, and don't call any of its methods more than once.
Abstraction and encapsulation are tools for managing complexity, but all that useless boilerplate increases it instead.There is no valid reason
Cache
andFreeList
aren't literal types, which don't depend on dynamic initialization.
Useconstexpr
, or better yet move to an in-class-initializer and=default
the default-ctor.To make a type non-copy- and non-move- constructible and assignable, it suffices to declare an explicitly deleted move-ctor or move-assignment-operator.
While it's a good thing
Cache``static_assert
s the block-size is big enough, makeget_cache()
fix the request if needed. The caller should not have to care.Consider writing an Allocator using your caching-system. That way, it can be used by standard containers, instead of just manually. Though admittedly, as the block-size is a compile-time-constant, it can only be used in containers allocating single nodes instead of whole arrays.
-
1\$\begingroup\$ Yes, it's thread-safe. Whether it's a good idea depends on your use-profile. And changing to use the interface an Allocator should provide is also recommended, so you can use it for standard containers. \$\endgroup\$Deduplicator– Deduplicator2018年07月24日 00:14:12 +00:00Commented Jul 24, 2018 at 0:14
-
1\$\begingroup\$ @hoffmale: If you look at the code,
std::malloc()
andstd::free()
are the only places where shared data-structures are used. Everything else isthread_local
(whether directly or indirectly), and thus already thread-safe. \$\endgroup\$Deduplicator– Deduplicator2018年07月24日 00:16:25 +00:00Commented Jul 24, 2018 at 0:16 -
1\$\begingroup\$ @JoãoPires: The problem is that the "unsafe middle" is
public
ly accessible. If I want a non-staticCache
, I can simply create one - and that one isn't thread-safe (other than the constructor and destructor). \$\endgroup\$hoffmale– hoffmale2018年07月24日 00:24:14 +00:00Commented Jul 24, 2018 at 0:24 -
1\$\begingroup\$ @hoffmale Sorry, but if you follow that logic, just about nothing is thread-safe, because it's built out of non-thread-safe building-blocks, most of which are publicly available. \$\endgroup\$Deduplicator– Deduplicator2018年07月24日 00:26:51 +00:00Commented Jul 24, 2018 at 0:26
-
1\$\begingroup\$ @JoãoPires: That's some context not obvious from the question. // @Deduplicator: My problem is that a blanket statement "this is thread-safe" doesn't hold if there are non-thread-safe access paths. This doesn't mean building blocks all have to be thread-safe, just that the non-thread-safe parts must not be
public
ly accessible and have to be properly regulated. (Like having the safest door doesn't make a house burglar-proof if the ground-level windows are wide open.) \$\endgroup\$hoffmale– hoffmale2018年07月24日 01:08:32 +00:00Commented Jul 24, 2018 at 1:08
Thread safety
std::malloc
and std::free
are thread-safe by themselves... But Freelist
and Cache
aren't, unless exclusively accessed through allocate
or deallocate
.
In all other cases some synchronization is needed in FreeList::push
and FreeList::pop
(or alternatively, Cache::allocate
and Cache::deallocate
).
There are some options to make those two classes thread-safe for all access paths:
Move
get
,get_cache
,Cache
andFreeList
into aclass
asprivate
nested classes / member functions (so they aren't publicly accessible anymore) and makeallocate
anddeallocate
afriend
of that class. (private/anonymousnamespace
won't work since that one has to ve just as accessible asallocate
/deallocate
because it has to reside in a header.)Why move
get
andget_cache
? Because references and pointers tothread_local
objects can be shared with other threads - so a threads might get access to another threadsthread_local Cache
insideget
.
thread_local
objects are only thread-safe if no reference or pointer to them gets shared with other threads (which might happen accidentally, e.g. by a mismanaged lambda capture).This means
allocate
anddeallocate
are not thread-safe as long as references to the underlyingCache
(viaget
orget_cache
) could be shared by someone else. And sinceallocate
anddeallocate
have exactly the same level of access asget
orget_cache
the posted implementation is not thread-safe in the general case.Add some locking inside
FreeList
, or makeFreeList
lock-free by makingthis_head
astd::atomic<Node*>
.This would make the critical section inside
Cache::allocate
andCache::deallocate
thread-safe as well.
Memory management
Cache::allocate
won't provide correctly aligned memory for alignments greater thanstd::max_align_t
(std::malloc
is only specified to support alignments to this value). This might cause problems if higher alignments are needed, e.g. for SSE or AVX instructions.std::aligned_alloc
could be used as a replacement. (Beware:std::aligned_alloc
isn't available for MSVC yet.)It isn't possible to create objects whose size is less than
sizeof(void*)
- so exchangingint64_t
forint32_t
insidemain
will fail to compile on 64-bit systems. This could be fixed by always allocating chunks of memory that are at leastsizeof(void*)
(orFreeList::NODE_SIZE
, if you prefer).
Naming
Cache
doesn't actually cache objects - only chunks of memory.FreeListAllocator
might be a better name.get
is a very generic name. (get_
)thread_local_instance
orsingleton
might be more descriptive.Similar,
get_cache
might better beget_allocator
orget_thread_local_allocator
.
General stuff
With some slight modifications
Cache
could be changed into a standard library compatiblestd::pmr::memory_resource
- which would allow it to be used with standard containers.If I had to design this, I'd likely make
FreeList
an adaptor over another allocator (taken as template parameter) who would be asked for memory if the list was empty. This design would allow easy composition of different allocation strategies. There's an excellent talk by Andrei Alexandrescu about this topic.
-
\$\begingroup\$ Adding synchronisation to code which is never used with data shared across threads is quite wasteful. \$\endgroup\$Deduplicator– Deduplicator2018年07月24日 16:17:04 +00:00Commented Jul 24, 2018 at 16:17
-
\$\begingroup\$ @Deduplicator: "never used with data shared across threads" is not a quality that OP's implementation provides.
get
returns a reference to athread_local
object - but that reference can be passed to other threads, which then would still reference the original object. As long asget
andget_cache
are as accessible asallocate
anddeallocate
none of those functions are thread-safe. So to make it thread-safe either the access viaget
/get_cache
needs to be madeprivate
or the critical operations (at leastFreeList::push
/FreeList::pop
) need to be synchronized. \$\endgroup\$hoffmale– hoffmale2018年07月24日 16:22:45 +00:00Commented Jul 24, 2018 at 16:22 -
\$\begingroup\$ Just using
thread_local
doesn't mean "inaccessible from other threads". Yes, using the corresponding name refers to a different instance on each thread, but pointers or references to those instances can still be shared. \$\endgroup\$hoffmale– hoffmale2018年07月24日 16:23:22 +00:00Commented Jul 24, 2018 at 16:23 -
\$\begingroup\$ It's about as good as it reasonably gets (meaning without encumbering usability, maintainability, or efficiency unreasonably), but C++ isn't really designed to enforce such things. If the user wants, he can circumvent any obstacle, reasonable or not. \$\endgroup\$Deduplicator– Deduplicator2018年07月24日 16:43:21 +00:00Commented Jul 24, 2018 at 16:43
-
\$\begingroup\$ @Deduplicator: I don't think it's an obstacle if the "circumvention" is as simple as passing the result from
get_cache
to a function or lambda which in turn then gets called by a thread pool. The caller gets a reference, passes a reference, and suddenly he has data races - by only using an API that your answer advertised as thread-safe. \$\endgroup\$hoffmale– hoffmale2018年07月24日 16:50:38 +00:00Commented Jul 24, 2018 at 16:50
Explore related questions
See similar questions with these tags.