C++ shared_ptr implemented as a coding practice and learning purposes. It uses std::shared_ptr interface. Basic tests are included (using single header Catch 2) Some methods are omitted to keep the code shorter and readable. For simplicity it expects C++ 20. See comments for more details.
Is this implementation correct?
Does it fulfill multithreaded guarantees?
Does it fulfill exception guarantees?
Can this code be simplified? I mean made more idiomatic and readable?
Can it be speed up? (In parallel/concurrent use)
What unit test to add?
Thank you in advance for your comments.
Find the code bellow and on: https://github.com/tintera/SharedPtr (Repo contains cmake, tests, project, sln)
#pragma once
#include <atomic>
/// Lock free smart ptr similar to shared ptr.
/// - Destructor of pointed object must not throw. Or operators =, == have undefined behavior.
/// - Published as single header file for readability.
///
/// Known limits:
/// - Some race condition exist. Best to fix them and keep implementation lock free. And keep default constructor noexcept (as in std::)
/// - Owned object is not part of control block.
/// - No custom deleter or allocator.
/// - No separate template type for constructors. (std::shared_ptr constructor has another template type Y)
/// - No make_shared function.
/// - No std::hash<std::shared_ptr>
/// - No std::atomic<std::shared_ptr>
///
/// Omitted (not much to learn in implementing them IMHO)
/// - reset
/// - swap
/// - operator[] managing arrays not implemented at all.
/// - unique as it's removed in C++ 20
/// - owner_before
/// - operator<<(std::shared_ptr)
///
namespace smart_ptr
{
template<typename T>
class weak_ptr;
template<typename T>
class shared_ptr
{
friend class weak_ptr<T>;
struct control_block
{
explicit control_block(T* payload)
: payload_(payload)
{
}
std::atomic<int> usages_{1};
std::atomic<int> weak_usages_{0};
T* payload_{nullptr};
};
control_block* control_{nullptr};
void finish_one_instance_()
{
if (!control_)
{
return;
}
--control_->usages_;
if (control_->usages_ == 0)
{
// Last owner. We are in a single threaded scenario now.
delete control_->payload_;
if (control_->weak_usages_ == 0)
{
delete control_;
}
}
}
public:
constexpr shared_ptr() noexcept = default;
constexpr explicit shared_ptr(nullptr_t) noexcept{}
explicit shared_ptr(T* ptr)
try
: control_(new control_block(ptr))
{
}
catch(...)
{
delete ptr;
throw;
}
explicit shared_ptr(std::unique_ptr<T>&& ptr)
: control_(new control_block(ptr.get()))
{
}
~shared_ptr() noexcept
{
finish_one_instance_();
}
shared_ptr(const shared_ptr& other) noexcept
: control_{other.control_}
{
if(control_)
{
// TODO: Race condition, control can be deleted before usages_ is incremented.
++control_->usages_;
}
}
shared_ptr(shared_ptr&& other) noexcept
{
std::swap(control_, other.control_);
}
template< class Y >
explicit shared_ptr( const weak_ptr<Y>& r )
{
if (r.expired())
{
throw std::bad_weak_ptr{};
}
control_ = r.control_;
++control_->usages_;
}
shared_ptr& operator=(const shared_ptr& other) noexcept
{
if(this == &other) {
return *this;
}
finish_one_instance_();
control_ = other.control_;
++other.use_count();
return *this;
}
shared_ptr& operator=(shared_ptr&& other) noexcept
{
std::swap(this, shared_ptr{other});
return *this;
}
[[nodiscard]] explicit operator bool() const noexcept
{
return static_cast<bool>(control_);
}
void reset() noexcept
{
finish_one_instance_();
control_->payload_ = nullptr;
}
[[nodiscard]] T* get() const noexcept
{
// No race condition. The control exists while it's owned by this shared_ptr.
return control_ ? control_->payload_ : nullptr;
}
[[nodiscard]] T& operator*() const noexcept
{
return *get();
}
[[nodiscard]] T* operator->() const noexcept
{
return get();
}
[[nodiscard]] long use_count() const noexcept
{
// No race condition. The control_ exists while it's owned by this shared_ptr
return control_ ? control_->usages_.load() : 0;
}
};
template< class T, class U >
std::strong_ordering operator<=>( const shared_ptr<T>& lhs, const shared_ptr<U>& rhs ) noexcept
{
return lhs.control_ <=> rhs.control_;
};
template<typename T>
class weak_ptr
{
friend class shared_ptr<T>;
typename shared_ptr<T>::control_block* control_{nullptr};
public:
constexpr weak_ptr() noexcept = default;
~weak_ptr()
{
if (control_)
{
--control_->weak_usages_;
if (control_->usages_ == 0 && control_->weak_usages_ == 0)
{
delete control_;
}
}
}
explicit weak_ptr( const shared_ptr<T>& r ) noexcept
: control_(r.control_)
{
if (control_)
{
// TODO: Race condition, control can be deleted before weak_usages_ is incremented.
++control_->weak_usages_;
}
}
// TODO: explicit dod not compile
weak_ptr( const weak_ptr& r ) noexcept
{
control_ = r.control_;
if (control_)
{
// TODO: Race condition, control can be deleted before weak_usages_ is incremented.
++r.control_->weak_usages_;
}
}
explicit weak_ptr(weak_ptr&& r) noexcept
{
std::swap(*this, r);
}
weak_ptr& operator=(const weak_ptr& other) noexcept
{
if(this == &other)
{
return *this;
}
std::swap(*this, weak_ptr{other});
return *this;
}
weak_ptr& operator=(weak_ptr&& r) noexcept
{
std::swap(*this, r);
return *this;
}
[[nodiscard]] bool expired() const noexcept
{
return (!control_) || (control_->usages_ == 0);
}
shared_ptr<T> lock()
{
return expired() ? shared_ptr<T>() : shared_ptr<T>(*this);
}
};
}
4 Answers 4
Critical Bugs
Bug 1
explicit shared_ptr(std::unique_ptr<T>&& ptr)
: control_(new control_block(ptr.get()))
{
}
This is broken. You pass by R-Value reference so the object is moved into ptr
. Which means this parameter is now the owning copy. You get()
the pointer and store it in the control block.
BUT ptr
now goes out of scope and calls delete on the object. So you are now storing a pointer to a released object in your shared_ptr
!
Fix: Call release()
rather than get()
.
Bug 2
The increment here is on a temporary:
shared_ptr& operator=(const shared_ptr& other) noexcept
{
if(this == &other) {
return *this;
}
finish_one_instance_();
control_ = other.control_;
// This line does not increment the count.
// This returns a value (not a reference).
// So here you are incrementing a local temporary.
++other.use_count();
// This should probably be (something like this).
if (countrol_) {
++control->usages_.load();
}
return *this;
}
Code Review Opinion:
If you don't support custom deleter's then you should not support taking ownership of the generic unique_ptr
.
explicit shared_ptr(std::unique_ptr<T>&& ptr)
: control_(new control_block(ptr.get()))
{
}
Please specialize so you can only take a standard unique_ptr that does not have a specialized destructor (or support custom deleters). What I would not like to see is somebody accidentally converting the unique pointer into your shared ptr and then it not being correctly deleted.
Note: if ptr
has the value nullptr
. Then you are still creating a control block. Are you sure you want to do this? Nearly every function you have check for the existence of the control block (and when you explicitly pass a nullptr
value you don't create a control_block, so I would expect the behavior to be the same.
explicit shared_ptr(T* ptr)
try
: control_(new control_block(ptr))
/// I think I would change the above line to:
: control_(ptr ? new control_block(ptr) : nullptr)
{
}
catch(...)
{
delete ptr;
throw;
}
I will admit this is how I used to do it as well.
shared_ptr(shared_ptr&& other) noexcept
{
std::swap(control_, other.control_);
}
But I think a better solution is:
shared_ptr(shared_ptr&& other) noexcept
: control_(std::exchange(other.control_, nullptr))
{}
The reason for this being a better solution (in my opinion) is that the consensus is that you should (when reasonable) use the initializer list to make sure that members are set correctly. Otherwise, you will end up initializing members twice (once in the implicit initializer list and then once in the body). For simple POD types this may not be an issue, but anything with a constructor or assignment operator this can become expensive. So having the habit of always using the initializer list puts you into a good habit (then when you have experience you can then make an argument for alternatives (as long as you comment why you are not following standard techniques)).
Assignment:
Checking for self assignment. Normally I would say this is an anti-pattern and you should use the copy and swap odiom. (削除) But shared_ptr may be a special case. Let me think about this some more. (削除ここまで)
shared_ptr& operator=(const shared_ptr& other) noexcept
{
if(this == &other) {
return *this;
}
finish_one_instance_();
control_ = other.control_;
++other.use_count();
return *this;
}
shared_ptr& operator=(shared_ptr&& other) noexcept
{
std::swap(this, shared_ptr{other});
return *this;
}
(削除) OK. Not sure about this. But (削除ここまで) I still think the modern pattern of using a single assignment operation to support both copy and move works in this situation.
// Pass by value.
// Usually achieves the optimum code for both R-Value and L-Values.
//
// Your multi threading and locking and increment/decrement throw some intricacies in there. But I still think this is a better solution to your two options above.
shared_ptr& operator=(shared_ptr other) noexcept
{
std::swap(this, shared_ptr{other});
return *this;
}
Lets actually see the code for Copy/Move for your original compared to the single assignment version.
Here is the code in the move:
// MOVE ORIG MOVE Single Method
// Explicit Move // Implicit Move.
// Identical
: control_(std::exchange(other.control_,nullptr))
// Body
// Identical
std::swap(this, shared_ptr{other});
// Destructor of param // Destructor of param
// Identical
finish_one_instance_();
// Return // Return
// Identical
return *this;
Here is the code in the copy.
// COPY ORIG COPY Single method
// Implicit Copy (ptr)
: control_{other.control_}
if(control_) {
++control_->usages_;
}
// Assignment operator // Assignment operator
if(this == &other) { std::swap(this, shared_ptr{other});
return *this;
}
// Implicit destructor (ptr)
finish_one_instance_(); finish_one_instance_();
control_ = other.control_;
if (countrol_) {
++control->usages_;
}
return *this; return *this;
So the order of instructions are different. The only significant difference is the check for self assignment Vs swap().
Swap is a trivial operation, while branch (if statement) is a pain for optimization and thus most likely self-deprecating. So same number of increments and decrements. So I would go with the single assignment method version.
Note: I deliberately don't look at threading. There is more complexity here, but I don't think it changes this analysis. When you get it working correctly in a threading environment, the single assignment method will still be optimal.
Same as above I would use std::exchange
.
explicit weak_ptr(weak_ptr&& r) noexcept
{
std::swap(*this, r);
}
Personal Stuff:
You can feel free to ignore this section.
Don't like snake_case
(In C++). I don't think I have seen any C++ code that uses it.
This is not portable.
#pragma once
Prefer to use proper include guards. That way it will work on all conforming compilers.
Threadingness
I think others have covered the threading issues. I will ignore anything to do with it (because it is hard and I don't want to think that hard :-)
struct control_block
{
explicit control_block(T* payload)
: payload_(payload)
{
}
std::atomic<int> usages_{1};
std::atomic<int> weak_usages_{0};
T* payload_{nullptr};
};
Extra:
I wrote a lot about creating a smart pointer here:
-
\$\begingroup\$ Upvoted for the critical bug. Some of the other points are debatable. \$\endgroup\$Davislor– Davislor2023年09月26日 16:25:29 +00:00Commented Sep 26, 2023 at 16:25
-
2\$\begingroup\$ Comment on
snake_case
... google c++ style guide says to use snake_case. a lot of places do it that way ime as well. google.github.io/styleguide/cppguide.html#Variable_Names \$\endgroup\$g19fanatic– g19fanatic2023年09月26日 17:36:01 +00:00Commented Sep 26, 2023 at 17:36 -
1\$\begingroup\$ @g19fanatic I generally see snake_case in C programming, personally I use camelCase in C++. \$\endgroup\$2023年09月26日 17:46:54 +00:00Commented Sep 26, 2023 at 17:46
-
\$\begingroup\$ @g19fanatic Though Google Style guide has improved over the years, the last time I looked I still not agree with a lot of stuff. I personally don't think it is a good guide and don't recommend it. Maybe it has improved since I last looked. \$\endgroup\$Loki Astari– Loki Astari2023年09月26日 18:36:45 +00:00Commented Sep 26, 2023 at 18:36
-
\$\begingroup\$ @g19fanatic But in this specific instance, the snake case is just a personal preference. I don't think there is anything wrong apart from it being slightly strange looking (for C++). \$\endgroup\$Loki Astari– Loki Astari2023年09月26日 18:40:04 +00:00Commented Sep 26, 2023 at 18:40
Much of the same feedback I’ve given on other shared_ptr
implementations applies here. In particular:
This design of control_block
is Never Going to Work
You’re unfortunately going to need to start over from scratch.
You Need One Lock-Free Atomic Update of Both Counters
Your stated goal, appropriately, is a lock-free, thread-safe data structure. The counters you have now are:
std::atomic<int> usages_{1};
std::atomic<int> weak_usages_{0};
First, int
is a signed type, and negative values are always errors, so it makes more sense to use unsigned
types for the counter and get better range. Second, int
has been anything from 16-bit to 64-bit on real-world implementations, which makes it a bit of a newbie trap. Here, there are real compilers where std::atomic<int>
would overflow to a negative value after only 32,767 copies. Third, and most importantly, two separate atomic variables cannot meet the requirements.
If you have two separate atomic
counters, you can only update them with two separate atomic transactions. If one thread is decrementing usages
and another is decrementing weak_usages
at the same time, there is no guarantee that one and only one thread will see both counters equal to 0, which is neccessary to properly free the control block. Memory ordering will not help: sequential consistency only applies to updates from the same thread. For example, the thread scheduler could choose to suspend your thread in between the instructions that check the two counters.
Therefore, you might pack the two counters (which should be an unsigned type at least 32 bits wide) into a single atomic variable, and update both values with a single atomic compare-and-swap. Possible implementation choices: a bitfield containing two 32-bit fields; an atomic struct
of two uint32_t
values with alignas(8)
, or a std::atomic<uint_least64_t>
with some utility functions to update either value. All mainstream x86_64
compilers will also generate cmpxchg16b
instructions if you select an instruction set that has them and use a naturally-aligned atomic struct
of two 64-bit values.
Whatever you choose, you may want to static_assert
that your atomic counter-pair is_always_lock_free
.
Edit: That isn’t the only approach that works. It’s possible to fix the algorithm along the lines of Deduplicator’s suggestions, by among other things:
Changing lines such as
--control_->usages_;
if (control_->usages_ == 0)
to the atomic
if (--control->usages == 0)
And having the functions that increment the strong count (such as copy construction) do so only if it is not 0
, in a single atomic operation. This could allow the destructors to use an algorithm with one or two decrement operations rather than a CAS loop, which might be a more-optimal wait-free algorithm on some architectures such as x86, or less efficient on other architectures where the atomic decrements would need to be two store-conditional loops rather than one.
You Cannot Store the Data Pointer in the Control Block
This implementation will not meet the requirement (in [util.smartptr.shared.const]) that a shared_ptr<T>
constructed from a shared_ptr<Y>
, where T*
and Y*
are compatible types, share ownership of the object. In particular, the T*
payload
can have a different object representation than the Y*
from which it is constructed. (For example, the virtual
base class of a derived class with multiple inheritance.)
Doing it that way will make each shared_ptr
slightly smaller, but at the cost of requiring an extra dereference each time it accesses the data pointer. Since a shared_pointer
will access its data often and its counters rarely, it makes sense anyway to optimize for the common case by putting both the control_block
and payload
pointers in the shared_ptr
object itself. On many architectures, a shared_ptr
will still fit in a register and a std::atomic<shared_ptr>
will still be lock-free. (On some, forcing natural alignment of the shared_ptr
will generate slightly-more-optimal code.)
Comparisons
In general, your comparison operators should be non-member friend
functions, so that you can properly resolve overloads. For example, you might want it to be possible to compare a shared_ptr
to a different type of pointer.
If you implement the <=>
spaceship operator, you must also implement ==
separately (for complex technical reasons involving inheritance and overriding). The compiler will then generate the remaining comparison operators for you.
Use Copy-and-Swap
You mention that you omitted swap
because there’s "not much to learn from" it. Granted, this is true for now, because your class trivially wraps a pointer to the control block. You then say you want to implement the required overload of std::atomic<shared_ptr>
.
Once you move the payload
from shared_ptr::control_block
into shared_ptr
, this is going to stop being so trivial. You’ll now find it very useful to have a way to update the object that is atomic, lock-free and exception-free. You’re currently calling std::swap
on the control-block pointers in your move constructor and assignment operator. If you’d written swap
and used that in both places, you would not need to change them now, when you’re forced to alter the contents of shared_ptr
.
There Are Other Thread-Safety Bugs
You mention that you’re aware of some race conditions. As G. Sleipen mentions in more detail, a great many of these are caused by writing multiple separate atomic operations, when what you need is a single atomic operation (usually an atomic compare-and-swap). Read their advice, too.
Future Directions: get_deleter
You haven’t implemented custom deleters at all, but since you say you’re thinking about them: when you do, it might be very helpful to turn control_block
into an abstract base class that concrete_control_block<Deleter>
derives from. (Since the shared pointer no longer stores its payload in the control block, the abstract interface no longer needs to remember what T
is. The counter-manipulation and destroying the control block are both fully generic. However, if you are implementing get_deleter
, it is useful to remember the type of the stored deleter and store a copy of the original payload.) That lets you implement a helper such as
template<typename T, typename Deleter>
// requires clauses go here.
virtual void* concrete_control_block<T, Deleter>::get_deleter_helper(const std::type_info type) const override noexcept
which you can then call from get_deleter<Deleter, T>
, as
static_cast<Deleter*>(sp.control_->get_deleter_helper(typeid(Deleter)))
The simpler alternative you mentioned in your comment will work too: store a std::function<void(T*)>
in control_block<T>
and implement get_deleter
by calling std::function::target
.
-
\$\begingroup\$ Davislor wrote: "You’re unfortunately going to need to start over from scratch.". I plan to address the issues found. Is is then better to create new question (and link it here from a comment) or somehow edit this question (and how)? Note: I'm new to StackExchange Code review. \$\endgroup\$Tomas Tintera– Tomas Tintera2023年09月27日 11:08:40 +00:00Commented Sep 27, 2023 at 11:08
-
\$\begingroup\$ @TomasTintera Okay, that was an overstatement. The part I originally recommended you "start over from scratch" was
control_block
. Then Deduplicator showed up and corrected me that there is a way to get thread-safety with separate atomic counters in the control block. You might use either approach; it’s up to you. To implement all the requirements (includingget_deleter
), you will definitely need to add several other pieces of data to the control block, as well as a copy of the object pointer to eachshared_ptr
instance. \$\endgroup\$Davislor– Davislor2023年09月27日 14:57:29 +00:00Commented Sep 27, 2023 at 14:57 -
\$\begingroup\$ @TomasTintera But making a new second draft sounds like a great idea! \$\endgroup\$Davislor– Davislor2023年09月27日 14:59:47 +00:00Commented Sep 27, 2023 at 14:59
-
\$\begingroup\$ If you replace the payload pointer with a deleter
std::function
or similar, then you could keep that in the control block. (But then I guess you do need a payload pointer in theshared_ptr
/weak_ptr
to keep track of what subobject the current smart pointer object refers to.) \$\endgroup\$Daniel Schepler– Daniel Schepler2023年09月27日 16:37:12 +00:00Commented Sep 27, 2023 at 16:37 -
\$\begingroup\$ @DanielSchepler In order to implement
get_deleter
, you need to store both the deleter in the control block, as well as either a copy of the original object pointer or the object itself. This is in addition to theT*
that you need to store in theshared_ptr
instance, because the conversion fromshared_ptr<Y>
toshared_ptr<T>
could require aT*
different from the originalY*
to share the same control block. \$\endgroup\$Davislor– Davislor2023年09月27日 16:49:07 +00:00Commented Sep 27, 2023 at 16:49
Make sure your operations are really atomic
Using atomic variables does not automatically make the operations on your objects atomic. Consider these lines in finish_one_instance_()
:
--control_->usages_;
if (control_->usages_ == 0)
...
Those are two separate atomic operations. Together they are no longer atomic. And this immediately leads to a race condition: consider two threads, each holding a shared_ptr
that is pointing to the same control_block
. Now they both destroy their shared_ptr
object at exactly the same time. Then they first could both execute --control_->usages_
, and only then do they both execute the if
-statement. Both will now see that control_->usages_ == 0
, and a double free will happen that at best crashes your program, at worst causes unexpected things to happen.
You can fix this by writing this instead:
if (--control_->usages_ == 0)
...
This was an easy fix. But the same mistake happens when calling lock()
on a weak_ptr
:
return expired() ? shared_ptr<T>() : shared_ptr<T>(*this);
Don't let the fact that this is written on one line fool you. It is equivalent to:
if (expired())
return shared_ptr<T>();
else
return shared_ptr<T>(*this);
Consider that expired()
is executed and returns false
. However, another thread can then jump in and causes the usages_
to drop to zero. Then the first thread calls shared_ptr<T>(*this)
. Sure, the constructor of shared_ptr
that takes a weak_ptr
also checks expired()
, but this has, again, exactly the same problem.
The fix here is more complicated. You must increment usages_
atomically, but only if it isn't zero. You have to use a compare-and-exchange method in a loop here.
Finally, it can happen that one thread destroys the last remaining shared_ptr
, and another the last remaining weak_ptr
. See Davislor's answer about the issue and possible fix for that.
weak_ptr::lock()
should be noexcept
The standard says that lock()
is noexcept
. This should not be hard to do, just make sure you don't throw
anywhere when trying to lock an expired()
object.
Currently, your two use-counts are handled completely independent, the pitfalls of which others already explained (namely, if they are decremented to zero at the same time, who did it last?).
A better fix than stuffing them together or some such and hoping you get an efficient lock-free update for the resultant bigger structure, just make the existence of any strong claim at all hold down a single weak claim (properly counted in the member storing the weak count, like all others). Thus, you start out with a single strong claim and the single weak claim all strong claims collectively own.
Last strong claim removed means cleaning up the resource and then releasing the collective weak claim.
Last weak claim removed means cleaning up the control block.
No operation involving strong claims looks at weak claims, aside from the removal of the last strong claim (removing the collective weak claim at the end, proper sequencing is crucial).
No operation involving weak claims looks at strong claims, aside from trying to add a strong claim while only holding a weak one (which doesn't look at weak claims).
-
\$\begingroup\$ Let us continue this discussion in chat. \$\endgroup\$Davislor– Davislor2023年09月26日 22:13:46 +00:00Commented Sep 26, 2023 at 22:13
std::enable_shared_from_this
. \$\endgroup\$shared_ptr const&
, because you know with 100% certainty that, whatever else may be going on the universe,control_->usages_
is always at least 1... because you haveother
right there. You are correct that there is a race condition in theweak_ptr
copy constructor, but that can be easily fixed by:weak_ptr(weak_ptr const& r) noexcept : weak_ptr{r.lock()} {}
. Generally, making more of a shared pointer is easy, safe, and efficient... making less of a shared pointer is considerably trickier, and costlier. \$\endgroup\$