As an exercise to get better in Python, I decided to implement a Binary Search Tree. This implementation uses two classes: BSTNode and BSTree.
Because most methods were implemented using recursion, I created a public method, for most of them, that wraps a private one, which is in charge of making the real work. I did that, because I didn't wanted to give the user the responsibility of passing the root of the tree to every method call. With that, the user can write my_tree.insert(10)
instead of my_tree.insert(my_tree.root, 10)
.
Is this a reasonable way of solve this problem? Is there a better one? Are private methods and attributes commonly used in Python?
The implementation:
class BSTNode(object):
""" Node class used by the Binary Search Tree. """
def __init__(self, value):
self.value = value
self.size = 1
self.left = None
self.right = None
def compute_size(self):
""" Computes the `self` size according to its children sizes. """
self.size = 1
if self.left:
self.size = self.size + self.left.size
if self.right:
self.size = self.size + self.right.size
class BSTree(object):
"""
Binary Search Tree implementation which doesn't allow repeated
Nodes.
"""
def __init__(self):
self.__root = None
self.__size = 0
###############
# Public API #
###############
def pre_order_traversal(self, fn=None):
"""
Traverse tree in pre-order and apply `fn` to every Node value.
"""
if fn is None:
def fn(x): return print(x)
self.__pre_order_traversal(self.__root, fn)
def in_order_traversal(self, fn=None):
"""
Traverse tree in in-order and apply `fn` to every Node value.
"""
if fn is None:
def fn(x): return print(x)
self.__in_order_traversal(self.__root, fn)
def post_order_traversal(self, fn=None):
"""
Traverse tree in post-order and apply `fn` to every Node value.
"""
if fn is None:
def fn(x): return
print(x)
self.__pre_order_traversal(self.__root, fn)
def insert(self, value):
""" Inserts a Node. Doesn't insert duplicated values."""
if self.__root is None:
self.__root = BSTNode(value)
else:
self.__root = self.__insert(self.__root, value)
self.__size = self.__root.size
def remove(self, value):
"""
Removes a Node which contains the value `value`.
To remove a Node, three cases must be handled.
Case 1: leaf node
-> delete it
Case 2: node has one child
-> delete node and put its child in its place
Case 3: node has two children
-> delete node and put its smallest child from its right branch in its place
"""
if self.__root:
self.__root = self.__remove(self.__root, value)
def contains(self, value):
""" Returns True if `value` is found. """
return self.__contains(self.__root, value)
def size(self):
""" Returns the number of elements inside the BST. """
return self.__size
###############
# Private API #
###############
def __pre_order_traversal(self, node, fn):
if node is None:
return
fn(node.value)
if node.left:
self.__pre_order_traversal(node.left, fn)
if node.right:
self.__pre_order_traversal(node.right, fn)
def __in_order_traversal(self, node, fn):
if node is None:
return
if node.left:
self.__in_order_traversal(node.left, fn)
fn(node.value)
if node.right:
self.__in_order_traversal(node.right, fn)
def __post_order_traversal(self, node, fn):
if node is None:
return
if node.left:
self.__post_order_traversal(node.left, fn)
if node.right:
self.__post_order_traversal(node.right, fn)
fn(node.value)
def __insert(self, node, value):
if node is None:
return BSTNode(value)
if node.value > value:
node.left = self.__insert(node.left, value)
elif node.value < value:
node.right = self.__insert(node.right, value)
node.compute_size()
return node
def __remove(self, node, value):
if node.value == value:
# Case 1
if node.left is None and node.right is None:
return None
# Case 2
elif node.left and node.right is None:
return node.left
# Case 2
elif node.left is None and node.right:
return node.right
# Case 3
else:
parent_node = node
smallest_node = node.right
while smallest_node.left:
parent_node = smallest_node
smallest_node = smallest_node.left
# The right Node is the smallest one
if parent_node == node:
smallest_node.left = node.left
# The smallest Node was found to the left of its right branch
else:
parent_node.left = smallest_node.right
smallest_node.left = node.left
smallest_node.right = node.right
return smallest_node
elif node.value > value and node.left:
node.left = self.__remove(node.left, value)
elif node.value < value and node.right:
node.right = self.__remove(node.right, value)
node.compute_size()
return node
def __contains(self, node, value):
if node.value == value:
return True
if node.value > value and node.left:
return self.__contains(node.left, value)
if node.value < value and node.right:
return self.__contains(node.right, value)
return False
if __name__ == "__main__":
tree = BSTree()
values = [100, 50, 150, 25, 75, 120, 200, 110, 115]
for v in values:
tree.insert(v)
print("####")
tree.pre_order_traversal()
tree.remove(100)
print("####")
tree.pre_order_traversal()
2 Answers 2
Bugs
You have a size-tracking bug. Also, the .contains()
method crashes on an empty tree.
>>> t = BSTree()
>>> t.size()
0
>>> t.insert(3)
>>> t.size()
1
>>> t.remove(3)
>>> t.size()
1
>>> t.contains(3)
Traceback (most recent call last):
... line 170, in __contains
if node.value == value:
AttributeError: 'NoneType' object has no attribute 'value'
Interface
It would be nice if the return value of the .insert()
and .remove()
methods indicated whether the operations succeeded or failed.
Furthermore, the size of the tree can either increase by 1 if .insert()
succeeds, decrease by 1 if .remove()
succeeds, or remain the same if either fails. There is no need for BSTNode
to have a .compute_size()
method.
I suggest defining .size()
as a read-only property, so that users can write t.size
instead of t.size()
.
Use a single underscore prefix to indicate private members. Double underscore is for name mangling.
Be consistent in your docstring wording. PEP 257 says that you should prefer the imperative to the indicative:
- The docstring is a phrase ending in a period. It prescribes the function or method's effect as a command ("Do this", "Return that"), not as a description; e.g. don't write "Returns the pathname ...".
Traversal API
You mis-wrote the preamble for .post_order_traversal()
:
if fn is None: def fn(x): return print(x)
But even when written correctly, the preamble is silly. In Python 3, print
is a function, so you could just make it the default value:
def post_order_traversal(self, fn=print):
...
For even more idiomatic Python, avoid passing fn
as a callback. Instead, do the traversal as a generator function, yield
ing each value. The caller would write instead:
for value in tree.post_order_traversal():
print(value)
Private helper methods
Each of your private helper method exists to serve one corresponding public method. For situations like this, just write an inner function, to reduce clutter, and to avoid having to jump around when reading the code.
class BSTree:
...
def pre_order_traversal(self):
def pre_order(node):
if node is None: return
yield node.value
yield from pre_order(node.left)
yield from pre_order(node.right)
yield from pre_order(self._root)
...
Suggested solution
For brevity, I have omitted the docstrings.
I've fixed the crash in .contains()
, and rewritten its helper method to be more compact.
In __remove()
, cases 1 and 2 can be condensed down to two lines. I've also renamed smallest_node
to more clearly indicate that it refers to the next larger node in the tree.
class BSTNode:
def __init__(self, value):
self.value = value
self.left = self.right = None
class BSTree:
def __init__(self):
self._root = None
self._size = 0
def pre_order_traversal(self):
def pre_order(node):
if node is None: return
yield node.value
yield from pre_order(node.left)
yield from pre_order(node.right)
yield from pre_order(self._root)
def in_order_traversal(self):
def in_order(node):
if node is None: return
yield from in_order(node.left)
yield node.value
yield from in_order(node.right)
yield from in_order(self._root)
def post_order_traversal(self):
def post_order(node):
if node is None: return
yield from post_order(node.left)
yield from post_order(node.right)
yield node.value
yield from post_order(self._root)
@property
def size(self):
return self._size
def contains(self, value):
def _contains(node, value):
return (
False if node is None else
_contains(node.left, value) if value < node.value else
_contains(node.right, value) if value > node.value else
True
)
return _contains(self._root, value)
def insert(self, value):
def _insert(node, value):
if node is None:
return BSTNode(value)
elif value == node.value:
return None
elif value < node.value:
node.left = _insert(node.left, value)
elif value > node.value:
node.right = _insert(node.right, value)
return node
self._root = _insert(self._root, value)
if self._root:
self._size += 1
return self._root is not None
def remove(self, value):
def _remove(node, value):
if node.value == value:
if not (node.left and node.right):
return node.left or node.right, True
else:
# Replace the node with its next larger successor
successor, parent = node.right, node
while successor.left:
successor, parent = successor.left, successor
successor.left = node.left
if parent != node:
parent.left = successor.right
successor.right = node.right
return successor, True
elif value < node.value and node.left:
node.left, removed = _remove(node.left, value)
return node, removed
elif value > node.value and node.right:
node.right, removed = _remove(node.right, value)
return node, removed
return node, False
if self._root is None:
return False
self._root, removed = _remove(self._root, value)
self._size -= int(removed)
return removed
-
\$\begingroup\$ Thank you for taking the time to give me such a thoughtful review. I really liked the idea of returning tuples to indicate an operation's result and using generators instead of callback functions. \$\endgroup\$rcanepa– rcanepa2017年09月17日 15:20:56 +00:00Commented Sep 17, 2017 at 15:20
self.size = self.size + self.left.size
This is more naturally expressed as self.size += self.left.size
self.__root = None
self.__size = 0
The double underscore invokes name mangling, which can be helpful it you anticipate inheritance. Here, a single underscore would probably suffice.
You organized the source code into Public and Private API sections, which clearly makes some sort of sense. Personally I would have found it more helpful, when reading code, to see each public method immediately followed by its private method. For example, your (excellent) docstrings matter a great deal when trying to understand implementations, but one must scroll to find them.
Methods like pre_order_traversal() are already using nested functions. Private helpers that start with _
are perfectly nice, but you might consider using the alternative of hiding each helper by making it a nested function.
values = [100, 50, 150, 25, 75, 120, 200, 110, 115]
Thank you for including test code that exercises the target code. That is certainly helpful. It would be more helpful to include a unittest.TestCase with good measured code coverage of the target code.
Overall, it looks solid. Ship it!
self
. They would probably be better as methods on the node class, such thatBSTree.foo
callsself.root.foo()
, etc. \$\endgroup\$