I've recently made an attempt at creating a binary search tree class that allows for duplicate elements. I've included as many functions as I can think of so that I can use it later on.
I've tested the code and it works well, without any memory leaks when using -fsanitize=address.
I was wondering how my code looks overall, and if there were any ways I could improve / change the code. As a side note I do plan to implement a self balancing tree next.
Here is the code
#include <iostream>
#include <vector>
#include <iterator>
#include <functional>
template <class T>
class binary_search_tree {
private:
// tree node
struct bst_node {
// value stored at node
T value;
/* include a variable to keep count of
how many of this value there are. That way
we can have duplicates values in the tree */
int value_count;
// pointers to children and parent
bst_node *left;
bst_node *right;
bst_node *parent;
// node constructor
explicit bst_node(T value) {
this->value = value;
value_count = 1;
left = nullptr;
right = nullptr;
parent = nullptr;
}
};
// pointer to the root
bst_node *root;
// number of nodes
int amount_of_nodes;
// searches for given balue in bst
bst_node * search(T value);
// calculate max value from given node
bst_node * maximum(bst_node *node);
// calculate min value from given node
bst_node * minimum(bst_node *node);
// calculate successor from given node
bst_node * successor(bst_node *node);
// calculate predescessor from given node
bst_node * predecessor(bst_node *node);
// helper function to replace node u with node v
void shift_nodes(bst_node *u, bst_node *v);
/* traverses the bst in pre-order order, visiting node, left, then right.
The */
void preorder_traversal(bst_node *node, std::function<void(T)> callback);
void inorder_traversal(bst_node *node, std::function<void(T)> callback);
void postorder_traversal(bst_node *node, std::function<void(T)> callback);
// function to recursively clear nodes from memory
void delete_recursive(bst_node *node);
public:
binary_search_tree();
binary_search_tree(T value);
binary_search_tree(std::vector<T> &values);
~binary_search_tree();
// indicates whether given value is within bst
bool contains_value(T value);
T bst_max();
T bst_min();
bool insert(T value);
bool remove(T value);
int node_count();
std::vector<T> get_elements_preorder();
std::vector<T> get_elements_inorder();
std::vector<T> get_elements_postorder();
};
// create empty BST
template<class T>
binary_search_tree<T>::binary_search_tree() {
root = nullptr;
amount_of_nodes = 0;
}
// create single node BST
template<class T>
binary_search_tree<T>::binary_search_tree(T value) {
root = new bst_node(value);
amount_of_nodes = 1;
}
// create BST from a vector of values
template<class T>
binary_search_tree<T>::binary_search_tree(std::vector<T> &values) {
// set the node first
root = new bst_node(values[0]);
amount_of_nodes = 1;
// iterate over all but first value and insert them
typename std::vector<T>::iterator it;
for (it = values.begin() + 1; it != values.end(); it++) {
insert(*it);
}
}
template<class T>
binary_search_tree<T>::~binary_search_tree()
{
delete_recursive(root);
}
template<class T>
void binary_search_tree<T>::delete_recursive(bst_node *node) {
if (node) {
delete_recursive(node->left);
delete_recursive(node->right);
delete node;
node = NULL;
}
}
// iterative search find element in BST. Returns nullptr if not found
template<class T>
typename binary_search_tree<T>::bst_node * binary_search_tree<T>::search(T value) {
// return nullptr if empty bst
if (root == nullptr) {
return root;
}
// create raw ptr for searching, pointing to root
bst_node *temp = root;
// iteratively search the bst
while (value != temp->value) {
if (value < temp->value) {
temp = temp->left;
} else {
temp = temp->right;
}
// return nullptr if not found
if (temp == nullptr) {
return temp;
}
}
// return value if found
return temp;
}
// returns whether the value is in the bst or not
template<class T>
bool binary_search_tree<T>::contains_value(T value) {
if (search(value) == nullptr) {
return false;
} else {
return true;
}
}
// find node with the maximum value from the given node
template<class T>
typename binary_search_tree<T>::bst_node * binary_search_tree<T>::maximum(bst_node *node) {
bst_node *temp = node;
while (temp->right != nullptr) {
temp = temp->right;
}
return temp;
}
// find node with the minimum value from the given root
template<class T>
typename binary_search_tree<T>::bst_node * binary_search_tree<T>::minimum(bst_node *node) {
bst_node *temp = node;
while (temp->left != nullptr) {
temp = temp->left;
}
return temp;
}
/* calculate the successor of a given node.
The successor is the node with the smallest
value greater than the given node.
(If all values are placed in order,
it will be the value directly after it) */
template<class T>
typename binary_search_tree<T>::bst_node * binary_search_tree<T>::successor(bst_node *node) {
// return the leftmost child of the right child
if (node->right != nullptr) {
return minimum(node);
}
bst_node *temp = node;
bst_node *parent_node = node->parent;
// navigate up the parent ancestor nodes
while(parent_node != nullptr && temp == parent_node->right) {
temp = parent_node;
parent_node = parent_node->parent;
}
return parent_node;
}
/* calculate the predecessor of a given node.
The predecessor is the node with the largest
value smaller than the given node
(If all values are placed in order,
it will be the value directly before it) */
template<class T>
typename binary_search_tree<T>::bst_node * binary_search_tree<T>::predecessor(bst_node *node) {
// return the rightmost child of the left child
if (node->left != nullptr) {
return maximum(node);
}
bst_node *temp = node;
bst_node *parent_node = node->parent;
// navigate up the parent ancestor nodes
while (parent_node != nullptr && node == parent_node->left) {
temp = parent_node;
parent_node = parent_node->parent;
}
return parent_node;
}
// returns the value of the maximum value in the bst
template<class T>
T binary_search_tree<T>::bst_max() {
return maximum(root)->value;
}
// returns the value of the minimum value in the bst
template<class T>
T binary_search_tree<T>::bst_min() {
return minimum(root)->value;
}
// inserts a new node into the bst
template<class T>
bool binary_search_tree<T>::insert(T value) {
// if the tree is empty just create the node
if (root == nullptr) {
root = new bst_node(value);
// if a root already exists
} else {
bst_node *existing_node = search(value);
// if the value already exists in the tree, just increment the count
if (existing_node != nullptr) {
amount_of_nodes++;
existing_node->value_count++;
return true;
}
// create the new node
bst_node* new_node = new bst_node(value);
// set current node to root
bst_node* current_node = root;
// keep a pointer to the previous node
bst_node* prev_node = nullptr;
// iteratively traverse the tree
while (current_node != nullptr) {
prev_node = current_node;
// lower values to the left, higher values to the right
if (value < current_node->value) {
current_node = current_node->left;
} else {
current_node = current_node->right;
}
}
// assign the pointer to the parent
new_node->parent = prev_node;
// create pointer to current node for parent
if (value < prev_node->value) {
prev_node->left = new_node;
} else {
prev_node->right = new_node;
}
}
amount_of_nodes++;
return true;
}
// helper function to replace node u with v in the bst
template<class T>
void binary_search_tree<T>::shift_nodes(bst_node *u, bst_node *v) {
if (u == nullptr) {
return;
}
// if u is the root node, set v as the root node
if (u->parent == nullptr) {
root = v;
// if u is to the left of it's parent
} else if (u == u->parent->left) {
u->parent->left = v;
// if u is to the right of it's parent
} else {
u->parent->right = v;
}
// make u's parent v's parent
if (v != nullptr) {
v->parent = u->parent;
}
}
// removes node with matching value
template<class T>
bool binary_search_tree<T>::remove(T value) {
// find the node
bst_node *current_node = search(value);
// if value is not found, return
if (current_node == nullptr) {
return false;
}
int node_occurrences = current_node->value_count;
/* make a copy of the node to be removed so we can
delete it from memory afterwards */
bst_node *copy = current_node;
/* if the current node doesn't have a left child
then we can replace it with it's right child.
If it doesn't have a right child it gets set
to nullptr, effectively removing it from the bst */
if (current_node->left == nullptr) {
shift_nodes(current_node, current_node->right);
/* otherwise, if it doesn't have a right child,
replace it with it's left child. If it doesn't
have a left child it gets set to nullptr,
effectively removing it from the bst */
} else if (current_node->right == nullptr) {
shift_nodes(current_node, current_node->left);
/* if the current node has two children, then it's
successor takes it's position */
} else {
// find successor
bst_node *succ = successor(current_node);
/* if the successor isn't the current
nodes immediate child */
if (succ->parent != current_node) {
// replace successor with it's right child
shift_nodes(succ, succ->right);
succ->right = current_node->right;
succ->right->parent = succ;
}
// replace current node with the successor
shift_nodes(current_node, succ);
succ->left = current_node->left;
succ->left->parent = succ;
}
// delete the node from memory
delete copy;
copy = NULL;
amount_of_nodes -= node_occurrences;
return true;
}
// return how many nodes there are in the bst
template<class T>
int binary_search_tree<T>::node_count() {
return amount_of_nodes;
}
// traverse the tree in preorder fashion. Visit the node, then left, then right.
template<class T>
void binary_search_tree<T>::preorder_traversal(bst_node *node, std::function<void(T)> callback) {
if (node == nullptr) {
return;
}
// take into account each occurance of the element
for (int i = 0; i < node->value_count; i++) {
// invokes elements.push_back(node->value)
callback(node->value);
}
preorder_traversal(node->left, callback);
preorder_traversal(node->right, callback);
}
// traverse the tree in inorder fashion. Visit the left, then the node, then the right.
template<class T>
void binary_search_tree<T>::inorder_traversal(bst_node *node, std::function<void(T)> callback) {
if (node == nullptr) {
return;
}
inorder_traversal(node->left, callback);
for (int i = 0; i < node->value_count; i++) {
callback(node->value);
}
inorder_traversal(node->right, callback);
}
// traverse the tree in postorder fashion. Visit the left, then right, then the node
template<class T>
void binary_search_tree<T>::postorder_traversal(bst_node *node, std::function<void(T)> callback) {
if (node == nullptr) {
return;
}
postorder_traversal(node->left, callback);
postorder_traversal(node->right, callback);
for (int i = 0; i < node->value_count; i++) {
callback(node->value);
}
}
// store elements of bst into a vector in preorder order
template<class T>
std::vector<T> binary_search_tree<T>::get_elements_preorder() {
std::vector<T> elements;
/* call preorder_traversal and pass elements.push_back(node_value)
as a lambda function. This way this function can be called
in preorder_traversal via the std::function<void(int)> callback */
preorder_traversal(root, [&](T node_value) {
elements.push_back(node_value);
});
return elements;
}
// store elements of bst into a vector in inorder order
template<class T>
std::vector<T> binary_search_tree<T>::get_elements_inorder() {
std::vector<T> elements;
inorder_traversal(root, [&](T node_value) {
elements.push_back(node_value);
});
return elements;
}
// store elements of bst into a vector in postorder order
template<class T>
std::vector<T> binary_search_tree<T>::get_elements_postorder() {
std::vector<T> elements;
postorder_traversal(root, [&](T node_value) {
elements.push_back(node_value);
});
return elements;
}
3 Answers 3
Overview
Don't follow the rule of three or five and contains a managed resource. So this code is broken if you accidently make a copy of a tree.
If T is large (or expensive) it would be nice to use move semantics to "move" the value into the tree rather than copy the value into the tree.
Your class is not const correct. Non mutating functions should be marked const.
The function:
bool binary_search_tree<T>::insert(T value) {
Is not very effecient.
Personal Preferecnce
Ignore this section if you don't agree. This is my preference and I don't expect there to be a consensus.
Snake case binary_search_tree is not as common in C++. I prefer (and seem to seem more C++) as:
- Upper Camel Case: For user defined types.
BinarySearchTree - Camel Case: For objects and methods.
shiftNodes
CodeReview
Don't write redundant comments.
// pointer to the root
bst_node *root;
// number of nodes
int amount_of_nodes;
// searches for given balue in bst
bst_node * search(T value);
// calculate max value from given node
bst_node * maximum(bst_node *node);
// calculate min value from given node
bst_node * minimum(bst_node *node);
// calculate successor from given node
bst_node * successor(bst_node *node);
// calculate predescessor from given node
bst_node * predecessor(bst_node *node);
The name of the function says exactly the same as the comment. So why. Comment wrote is a real problem in production code so make sure your comments add meaningful information to the source. As a rule of thumb: comments should be "WHY" the code describes "HOW". And naming things is a real skill.
Arguments could be made to create a helper class that does the traversal (See Visitor Pattern).
void preorder_traversal(bst_node *node, std::function<void(T)> callback);
void inorder_traversal(bst_node *node, std::function<void(T)> callback);
void postorder_traversal(bst_node *node, std::function<void(T)> callback);
Why limit yourself to copy a vector.
binary_search_tree(std::vector<T> &values);
Would this not be better to say that any container that support iterators could be copied into the tree.
template<typebname C>
binary_search_tree(C& values);
Why is this returning a bool?
// indicates whether given value is within bst
bool contains_value(T value);
Since you can have more than one value I would expect this to return a count.
Don't return by value. T may be large or expensive to copy out of the object.
T bst_max();
T bst_min();
Return a reference to the object. The caller can then make a choice to decide if they need a local copy that can be manipulated.
Also non mutating methods should be mared const so the compiler can do its best.
T const& bst_max() const;
T const& bst_min() const;
Mars as const (it does not modify the object).
int node_count();
This seems like an expensive way to extract the elements. Why not get an iterator. Then the user of your tree can make the decision if they want to copy into a vector (or another container).
std::vector<T> get_elements_preorder();
std::vector<T> get_elements_inorder();
std::vector<T> get_elements_postorder();
std::vector<T> data{std::begin(tree), std::end(tree)};
Prefer to use initializer lists:
binary_search_tree<T>::binary_search_tree() {
root = nullptr;
amount_of_nodes = 0;
}
I would write:
binary_search_tree<T>::binary_search_tree()
: root{nullptr}
, amount_of_nodes{0}
{}
In this case no difference. But if the members are non trivial then they will have a cost to default iniitalize and then be reset with the assignment operator. By always using the initializer list you make sure you are consistent.
Don't like adding the node in the constructor like this.
binary_search_tree<T>::binary_search_tree(T value) {
root = new bst_node(value);
amount_of_nodes = 1;
}
binary_search_tree<T>::binary_search_tree(std::vector<T> &values) {
// set the node first
root = new bst_node(values[0]);
amount_of_nodes = 1;
// STUFF
}
If you change how the node is created then you have to modify this in multiple places. It would be better to have a function that creates and adds a node and call that from the constructor (like insert).
I would write like this:
binary_search_tree<T>::binary_search_tree(T value)
: binary_search_tree{}
{
insert(value);
}
binary_search_tree<T>::binary_search_tree(std::vector<T> &values)
: binary_search_tree{}
{
for(auto& value: values) {
insert(value);
}
}
This is totally fine in my opinion.
void binary_search_tree<T>::delete_recursive(bst_node *node) {
if (node) {
delete_recursive(node->left);
delete_recursive(node->right);
delete node;
node = NULL; // Note: We use nullptr in modern C++
// but this is not needed.
// I also think this is harmeful as it will
// hide errors that can be detected in debug
// mode (debug libraries may be able to detect
// double deletes).
}
}
But people may complain that that this can potentially blow the stack because of recursion. So you could re-write to use a linear loop.
void binary_search_tree<T>::getBottomLeft(bst_node *node) {
while (node->left) {
node = node->left;
}
return node;
}
void binary_search_tree<T>::delete_linear(bst_node *node) {
if (!node) {
return;
}
bst_node* bottomLeft = getBottomLeft(node);
while (node)
{
if (node->right) {
bottomLeft->left = node->right;
bottomLeft = getBottomLeft(bottomLeft);
}
bst_node* old = node;
node = node->left;
delete old;
}
}
Lots of extra code to test for null can be simplified:
typename binary_search_tree<T>::bst_node * binary_search_tree<T>::search(T value) {
bst_node* temp = root;
while (temp != nullptr && temp->value != value)
{
temp = (value < temp->value) ? temp->left : temp->right;
}
return temp;
}
A classic anti-pattern is:
if (test) {
return true;
}
else {
return false;
}
This can be simplified to:
return test;
We see this anti-pattern here:
bool binary_search_tree<T>::contains_value(T value) {
if (search(value) == nullptr) {
return false;
} else {
return true;
}
}
Let us simplify this to:
bool binary_search_tree<T>::contains_value(T value) {
return search(value) != nullptr;
}
BUG
You need to check for null here:
OR document that is only valid on non empty tree(s) but in that case you also need a method that allows the user to check if the tree is empty.
T binary_search_tree<T>::bst_max() {
return maximum(root)->value;
}
T binary_search_tree<T>::bst_min() {
return minimum(root)->value;
}
What happens if the tree is empty.
Does this function return false?
bool binary_search_tree<T>::insert(T value) {
Also this function is expensive as you do two traversals of the tree. One to search for existance the second to find the leaf node to add the valu under. Since this is not a balanced tree worst case senario is a linear traversal of the whole set.
Given how you have successor etc this could quite easily become a balanced tree with only a little more work.
The new guidence is to pass by value so that we can then move the object value into the tree. But you don't seem to use move semantics so you should be passing by T const& value to prevent an extra copy of the object.
Pass the value by T const& to prvent an extra copy of the value.
bool binary_search_tree<T>::remove(T value) {
int node_occurrences = current_node->value_count;
Misleading comment (this is the kind of comment that will confuse people and lead to comment rote).
/* make a copy of the node to be removed so we can
delete it from memory afterwards */
// YOU ARE NOT MAKING A COPY OF THE NODE!!!!!
bst_node *copy = current_node;
This should be const function
int binary_search_tree<T>::node_count() {
return amount_of_nodes;
}
-
\$\begingroup\$ As for
contains_value(), I would separate this into acontains()which matches the C++20 member function of the same name for STL containers, andcount()which has existed for a much longer time in the STL. \$\endgroup\$G. Sliepen– G. Sliepen2022年06月02日 18:43:38 +00:00Commented Jun 2, 2022 at 18:43
Something hinted at already in the answers of vnp and Martin York: think about the size of bst_node. If T is very small, then you will have the overhead of three pointers and an int for every item in your tree. And this is not counting the overhead from the memory allocator itself for storing metadata about each node you allocated on the heap. You do not really need a parent pointer, and you can get rid of value_count if you just allow multiple nodes with the same value to be in the tree. Of course, there is a trade-off; it will be less efficient then if you have lots of nodes with the same value, but it will be more efficient if you have few of those.
Also consider what happens when T is huge. The overhead of the pointers and value_count is small then, but then making copies of T becomes a performance issue. Make sure you pass parameters and return values as const references where appropriate, not only in the member functions of your classes, but also in the callbacks.
Returning
nullptronsearchfailure is very dubious. Consider what happens ininsert: whensearchfails, thewhile (current_node != nullptr)traverses the same path again, repeating the effort already made bysearch. Consider returning a pair ofparent, nodepointers.Keeping a parent pointer seems unnecessary. Correct me if I am wrong, it is only used in
remove- directly and viasuccessor. However, first thingremovedoes issearch. Keeping in mind the above bullet,searchis in the position to compute and return the parent at no cost.It is also used in
predecessor, but alaspredecessoritself is never used.insertalways returnstrue. Shouldn't it bevoid, and don't bother?An implementation of
contains_valueis anti-idiomatic. It should be justreturn search(value) != nullptr;get_elementsgroup of methods are of a very limited utility. Consider a more general form, as inapply_preorder(std::function<void(T)> callback). Passing a second user-supplied argument also looks like a good idea.BTW, you wouldn't need to
#include <vector>, which looks very out of place here.I don't see the value of maintaining the
amount_of_nodes.Why
#include <iostream>?
-
\$\begingroup\$ Thanks for the feedback. I'll try and implement what you mentioned. Also, the reason for the
amount_of_nodesvariable is for handling duplicate values in the tree. For example rather than having 3 seperate nodes for 3 nodes with the value 11, there will just be 1 node indicating that there are 3 occurances of it. This is also why I havesearchreturnnullptron failure, since if it doesn't returnnullptrthen we can increment theamount_of_nodesvalue instead. In regards to the parent node, wouldn't removing it hinder performance? \$\endgroup\$0xREDACTED– 0xREDACTED2022年06月02日 07:52:07 +00:00Commented Jun 2, 2022 at 7:52 -
1\$\begingroup\$ @G.Sliepen I am not opposed to
contains_value(), The implementation is anti-idiomatic. \$\endgroup\$vnp– vnp2022年06月02日 19:22:50 +00:00Commented Jun 2, 2022 at 19:22 -
\$\begingroup\$ @G.Sliepen Wording was indeed unclear, thanks for pointing it out. Edited. \$\endgroup\$vnp– vnp2022年06月02日 19:40:25 +00:00Commented Jun 2, 2022 at 19:40