I have implemented a simple binary search tree class in C++ using std::unique_ptr
objects to hold the pointers to each node. In doing this I have come across a situation that is somewhat questionable in the Delete
method. Since I can't copy the unique pointers, but I need a way to traverse the tree with a "temporary" pointer, I had to resort to using a raw pointer to do the traversal.
This is the section I am talking about (In the Delete
method):
// Need to use a raw pointer because I can't assign to a std::unique_ptr
Node* n = node.get();
while (n->right) {
// Using a reference to a unique_ptr instead of a raw pointer will cause this line to fail to compile
n = n->right.get();
}
Is this an acceptable thing to do?
It compiles and runs. And since the underlying pointer is still managed by a unique_ptr object it will be deleted properly. Everything about this works fine, but it just doesn't feel right for some reason.
The full source code is shown below (Feel free to comment on anything else that may be improved as well):
#include <iostream>
#include <memory>
using std::make_unique;
using std::unique_ptr;
template <typename T>
class Tree {
struct Node {
Node(const T& value)
: value(value), left(nullptr), right(nullptr) {}
T value;
unique_ptr<Node> left;
unique_ptr<Node> right;
};
public:
Tree() : root_(nullptr) {}
// Insert a value into the tree
void Insert(const T& value) {
Insert(root_, value);
}
// Delete a value from the tree
bool Delete(const T& value) {
return Delete(root_, value);
}
// Search the tree for a node and return true if it is found
bool Contains(const T& value) const {
return Contains(root_, value);
}
private:
void Insert(unique_ptr<Node>& node, const T& value) {
if (not node) {
node = make_unique<Node>(value);
}
else {
value < node->value
? Insert(node->left, value)
: Insert(node->right, value);
}
}
bool Delete(unique_ptr<Node>& node, const T& value) {
if (not node) {
return false;
}
else if (value == node->value) {
if (node->left) {
unique_ptr<Node>& right = node->right;
node = move(node->left);
if (right) {
// Need to use a raw pointer because I can't assign to a std::unique_ptr
Node* n = node.get();
while (n->right) {
// Using a reference to a unique_ptr instead of a raw pointer will cause this line to fail to compile
n = n->right.get();
}
n->right = move(right);
}
}
else {
node = move(node->right);
}
return true;
}
else {
return value < node->value
? Delete(node->left, value)
: Delete(node->right, value);
}
}
bool Contains(const unique_ptr<Node>& node, const T& value) const {
if (not node) {
return false;
}
else if (node->value == value) {
return true;
}
else {
return value < node->value
? Contains(node->left, value)
: Contains(node->right, value);
}
}
unique_ptr<Node> root_;
};
4 Answers 4
On raw and smart pointers
First, I'll address your specific question, about using a raw pointer to refer to nodes in the tree. I believe you're doing exactly the right thing here, because smart pointers convey information about ownership of the resource.
We are looking at nodes that are owned by the tree (actually, each node is owned by its parent), and it's entirely appropriate that that ownership is encoded as a std::unique_ptr
; that says that this is the only place that's responsible for deleting the node when it's no longer required.
Our Delete
method walks nodes from the root downwards; it needs to refer to the nodes, but only during the execution of the method (it doesn't store pointers anywhere). A raw pointer is perfectly suitable here.
General review
Is <iostream>
used anywhere? I think it can be omitted.
If this is a header file, avoid bringing std::make_unique
and std::unique_ptr
into the global namespace, as that will affect every file that includes the header. It's reasonable to do so inside the function bodies, but at that point it's likely better to simply use the qualified names. Including them within the scope of the type is a middle ground where opinions will differ.
We should include <utility>
for std::move
(and the same comments apply to importing that name into ::
).
Consider turning tests around to test a positive first, which slightly reduces cognitive load. Instead of:
if (not node) { node = make_unique<Node>(value); } else { value < node->value ? Insert(node->left, value) : Insert(node->right, value); }
This is easier to reason about:
if (node) {
value < node->value
? Insert(node->left, value)
: Insert(node->right, value);
}
else {
node = make_unique<Node>(value);
}
Actually, that ternary can be narrowed to within the argument list (you can add more parens according to taste):
if (node) {
Insert(value < node->value ? node->left : node->right, value);
}
else {
node = make_unique<Node>(value);
}
On a related note, if we return early from within if
, we can just flow into the next code without else
:
if (not node) { return false; } else if (node->value == value) { return true; } else { return value < node->value ? Contains(node->left, value) : Contains(node->right, value); }
That can become
if (not node) {
return false;
}
if (node->value == value) {
return true;
}
return Contains(value < node->value ? node->left : node->right, value);
We might be able to remove some repetition by adding a method to find the insertion point for a value - Contains()
can return true if that insertion point has a value that compares equal, and Delete()
can start from that point in removing the value.
If the tree could be large, then perhaps iteration may be better than recursion for walking the tree. That means we'll need pointers in some places where we currently have references, and some things may have to lose const
qualification, but may give a performance improvement and save on stack usage (or may not, depending on how your compiler manages to optimise the recursive call - always check these things!).
Further directions
Try to evolve the public interface to supporting standard container operations. This probably begins with creating iterators and the member functions that return them.
Consider maintaining a balanced tree - you'll want to read up on the theory if you're not already familiar with the techniques to do this.
You haven't shown your unit tests here, but it's worthwhile occasionally running them with a memory checker such as Valgrind. As well as ensuring we don't leak memory, that will also help identify any uses of dangling pointers or uninitialized values, which an ordinary run might not.
-
\$\begingroup\$ How would I go about turning the recursion into iteration? I've heard that it can be much more efficient to do things that way but I can't really see how it should be done. Are there any good examples on how to do this? \$\endgroup\$tjwrona– tjwrona2019年03月06日 19:12:10 +00:00Commented Mar 6, 2019 at 19:12
-
\$\begingroup\$ And as for the other comments - iostream is used later in the file for some testing routines I wrote but you're right, it is not needed for the tree. Also the tree class will eventually be in a header so I will remove the using statements. \$\endgroup\$tjwrona– tjwrona2019年03月06日 19:17:52 +00:00Commented Mar 6, 2019 at 19:17
-
\$\begingroup\$ Additionally it seems in Visual Studio I don't need
#include <utility>
orusing std::move
in order to callmove
... The code compiles and runs fine... that's a little scary. \$\endgroup\$tjwrona– tjwrona2019年03月06日 19:17:56 +00:00Commented Mar 6, 2019 at 19:17 -
\$\begingroup\$ Worthwhile reading on SO: Way to go from recursion to iteration \$\endgroup\$Toby Speight– Toby Speight2019年03月06日 20:31:19 +00:00Commented Mar 6, 2019 at 20:31
-
\$\begingroup\$ I looked into replacing the recursion with iteration, but for binary trees the code starts to get pretty messy (you need to manage a lot of pointers to pointers). For the sake of readability I feel recursion is the better approach in this case. I don't foresee myself using it in a situation with huge datasets that could cause stack overflows. \$\endgroup\$tjwrona– tjwrona2019年03月11日 17:39:34 +00:00Commented Mar 11, 2019 at 17:39
Here are some ideas to help you improve your program.
Fix the bug
If we create a Tree<std::string> t
and then insert "epsilon", "gamma", "beta", "delta", "alpha", "mu", and "kappa" in that order, we have a tree like this:
If we then call t.Delete("beta");
we invoke undefined behavior and mess up the tree by effectively deleting two nodes (both "beta" and "delta" are now gone):
Use raw pointers
The usual advice is to replace raw pointers with smart pointers, but in this particular case, I would advise going the other way for a number of reasons. First, the std::unique_ptr
is not part of the public interface of the tree, so changing it to a raw pointer would not affect users of the tree at all. Second, the smart pointer is only getting in the way here, since it would be much simpler to write this code with plain pointers. Third, a plain pointer version would better allow you to have a single private concrete implementation class using a void *
for the data. The template would then only have a very light wrapper to convert to and from void *
to type T
. This makes it certain that if you have multiple kinds of trees, there will only be one code instance that actually implements the tree manipulations and each templated version will only have a small bit of code overhead per type.
Avoid polluting the global namespace with using
The using std::make_unique
and using std::unique_ptr
in this code are at file scope. If this is intended to be an include file, then that effectively brings those to into the global scope which could cause name collisions. Instead, you could simply remove those and simply add std::
where appropriate or omit them entirely if you follow the previous suggestion.
Don't define a constructor that only initializes data
The only thing your Tree
constructor does is to initialize root_
to a fixed value. It's generally better to instead use member initializers instead. See Cpp Guidelines C.45. I'd also simplify the Node
constructor by using member initializers for left
and right
.
Provide a usable interface
The way it's currently defined, it's not possible to do anything at all with nodes once they are stored in the Tree
except to determine whether they're stored in the tree. That's unlikely to be a useful interface in the real world.
Use only necessary #include
s
The #include <iostream>
line is not necessary and can be safely removed.
Consider using more idiomatic C++
While your use of not
is not technically wrong, you should be aware that many experienced C++ programmers will be unfamiliar with alternative operators, and so if others read your code, it might be an impediment to their understanding of it.
-
\$\begingroup\$ Could you expand on the idea of using
void *
in the implementation and then only having a template wrapper? From what I understand templates should always be preferred over usingvoid *
but maybe I have missed the point. Isvoid *
fine to use as long as it doesn't appear in the public interface? \$\endgroup\$tjwrona– tjwrona2019年03月08日 14:59:17 +00:00Commented Mar 8, 2019 at 14:59 -
\$\begingroup\$ I've fixed the bug. Thanks for pointing it out :) \$\endgroup\$tjwrona– tjwrona2019年03月08日 15:27:01 +00:00Commented Mar 8, 2019 at 15:27
-
\$\begingroup\$ The use of
void *
as the data type is just one way to have a concrete implementation which lies beneath (and is hidden under) a templated class. Imagine creating theTree
with a concrete type such asint
. It's usually easier to make sure everything works. Then once it works, change tovoid *
and you can store pointers to anything instead. There is very little reason to usevoid *
in modern C++, but IMHO, using them within a class is fine. Most implementations ofoperator new
use it just that way, for example. \$\endgroup\$Edward– Edward2019年03月08日 15:55:09 +00:00Commented Mar 8, 2019 at 15:55
unique_ptr<Node>& right = node->right;
node = move(node->left); // it will destroy the previous node and right nodes
if (right) // right will be nullptr as the right nodes would have been destructed in the previous line
{...}
In your Delete() function, this logic seems to be flawed. When you move your node->left into node, then all the existing right nodes will be destroyed because you just created a reference of unique_ptr which doesn't stop the following move operation from deleting these nodes.
if (right) {...}
So, in effect the above condition will never be true and also, you will lose the right nodes.
The correct way to do this will be to move the right nodes into a local variable something like this :
auto right = std::move(node->right);
node = move(node->left);
if (right) // Now, right will contain all the right nodes
{...}
Missing include guard.
#ifndef TJWRONA_CONTAINER_BINARY_SEARCH_TREE_H #define TJWRONA_CONTAINER_BINARY_SEARCH_TREE_H // code #endif // TJWRONA_CONTAINER_BINARY_SEARCH_TREE_H
#include
what you use. Don't#include
what you don't. You don't useiostream
in your code. You do usememory
(std::unique_ptr
) andutility
(std::move
).#include <memory> #include <utility>
Don't pollute the global namespace. Wrap your code in a namespace.
namespace container { // code } // namespace container
std::make_unique
doesn't exist in C++11 due to an oversight. It was introduced in C++14. You can implement it easily in C++11.Use
using
-directives at narrower scopes.namespace container { using std::unique_ptr; // unique_ptr to all code within container namespace. } // namespace container // container::unique_ptr to anyone outside the container namespace.
You can also alias types into logical names for simplification.
namespace container { template <typename T> struct Tree { private: struct Node; using link_type = std::unique_ptr<Node>; using value_type = T; struct Node { link_type prev_; link_type next_; value_type value_;
Give entities names that better describes what they represent.
template <typename ElementType> class BinarySearchTree {
When initializing data members to constant values, prefer in-class member initialization.
std::unique_ptr
default-initializes its value tonullptr
.Provide a constructor to initialize the object into a valid state. If you define a constructor yourself, you will need to explicitly define a default constructor if you want one.
struct Node { Node(value_type const& value) : value_(value) {} Node(value_type&& value, link_type&& left, link_type&& right) : value_(std::move(value)), left_(std::move(left)), right_(std::move(right)) {} Node(value_type const& value, link_type&& left, link_type&& right) : value_(value), left_(std::move(left)), right_(std::move(right)) {} value_type value_; link_type left_; link_type right_; }; // No other constructors for BinarySearchTree(). // The compiler will implicitly create one for you. // You can explicitly define a default constructor if you choose. // BinarySearchTree() = default; link_type root_;
Consider the rule of three/five/zero. Rule of Three/Five/Zero Since you haven't defined any of the five special member functions in your
BinarySearchTree
, this is what happens.- Copy Constructor - Implicitly deleted by the compiler because
std::unique_ptr
has a deleted copy constructor. If you want copy construction, you need to define this yourself. - Copy Assignment - Implicitly deleted by the compiler because
std::unique_ptr
has a deleted copy-assignment operator. If you want copy assignment, you need to define this yourself. - Move Constructor - Automatically generated by the compiler. Behavior is correct for
std::unique_ptr
, transferring ownership upon move. - Move Assignment - Automatically generated by the compiler. Behavior is correct for
std::unique_ptr
, transferring ownership upon move. - Destructor - Automatically generated by the compiler. However, the behavior of
std::unique_ptr
's destructor is a recursive chain of destructor calls for itsleft
andright
children. YourBinarySearchTree
isn't a balanced tree, so it's possible for the tree to take on the characteristic of a singly-linked list (elements inserted in either ascending or descending order). This tree of breadth 1 is going to have N recursive destruction calls. For a sufficiently large enough N, you are going to consume the available stack space and overflow. You need to declare the destructor and define the correct behavior.
By user declaring a destructor, the compiler will no longer implicitly declare/define the move operations. So you will need to declare/define those yourself as well. The copy operations would normally still remain implicitly defaulted, but
std::unique_ptr
being uncopyable leaves them as deleted.BinarySearchTree(BinarySearchTree const&) = delete; BinarySearchTree& operator=(BinarySearchTree const&) = delete; BinarySearchTree(BinarySearchTree&&) = default; BinarySearchTree& operator=(BinarySearchTree&&) = default; ~BinarySearchTree() { // Iterative destruction implementation }
- Copy Constructor - Implicitly deleted by the compiler because
std::shared_ptr
? \$\endgroup\$std::shared_ptr
, but usingstd::unique_ptr
is generally more efficient (theres no extra memory overhead and move operations are faster than copy). I guess using a raw pointer to refer to something owned by astd::unique_ptr
is essentially the same as creating astd::weak_ptr
from astd::shared_ptr
? \$\endgroup\$