I'm working on some code to find the max depth/height in a tree. I came up with a recursive solution, and I find myself needing to pass an integer "by reference". Currently, I'm using a mutable container to achieve this, but is there a more Pythonic way?
class Node:
"""
Node in a tree, with children stored in a list.
"""
def __init__(self, keyParam, childrenParam = None):
self.key = keyParam #some id of the node; not strictly required, but often needed.
if childrenParam is None:
self.children = []
else:
self.children = childrenParam
def dfsWithDepth(node, currentDepth, currentMaxContainer):
currentDepth += 1
if currentDepth > currentMaxContainer[0]:
currentMaxContainer[0] = currentDepth
for child in node.children:
if child is not None:
dfsWithDepth(child, currentDepth, currentMaxContainer)
# %% Construct tree
rootNode = Node('n1')
rootNode.children = [Node('n2'), Node('n3')]
tempNode = rootNode.children[0] # n2
tempNode.children = [Node('n3'), Node('n4')]
tempNode = tempNode.children[1] # n4
tempNode.children = [None, Node('n5')]
currentMaxContainer = [0]
dfsWithDepth(rootNode, 0, currentMaxContainer)
print(currentMaxContainer)
5 Answers 5
Optimization and restructuring
Node
class constructor (__init__
method)
- the arguments
keyParam, childrenParam
introduce unneeded verbosity and better named as justkey
andchildren
- the whole
if ... else ...
condition for ensuringchildren
list is simply replaced withself.children = children or []
expression
The initial dfsWithDepth
function should be renamed to say find_max_depth
to follow PEP8 style guide naming conventions.
The former parameters currentDepth, currentMaxContainer
become unneeded.
Within the crucial recursive function it's worth to capture "empty" node and node with no children beforehand. That's achieved by the following condition:
if node is None or not node.children:
return 1
The remaining recursive search relies on the crucial find_max_depth
and builtin max
functions calls and accumulates the final max depth through traversing all subsequent sub-trees (child nodes).
The final optimized version:
class Node:
"""
Node in a tree, with children stored in a list.
"""
def __init__(self, key, children=None):
self.key = key # some id of the node; not strictly required, but often needed.
self.children = children or []
def find_max_depth(node):
if node is None or not node.children:
return 1
return 1 + max(map(find_max_depth, node.children))
Testing (I've added additional 5th level for demonstration):
# Construct tree
rootNode = Node('n1')
rootNode.children = [Node('n2'), Node('n3')]
tempNode = rootNode.children[0] # n2
tempNode.children = [Node('n3'), Node('n4')]
tempNode = tempNode.children[1] # n4
tempNode.children = [None, Node('n5')]
tempNode = tempNode.children[1] # n5
tempNode.children = [Node('name7')]
print(find_max_depth(rootNode)) # 5
-
\$\begingroup\$ Even better default children to [] in the first place, rather than use a needless None flag value. \$\endgroup\$richardb– richardb2020年01月12日 08:50:32 +00:00Commented Jan 12, 2020 at 8:50
-
9\$\begingroup\$ default
[]
is the #1 Python gotcha. Don't do it! docs.python-guide.org/writing/gotchas \$\endgroup\$Samwise– Samwise2020年01月12日 09:24:03 +00:00Commented Jan 12, 2020 at 9:24 -
\$\begingroup\$ @SamStafford Thanks Sam. I did wonder why nobody else had spotted the apparently obvious but I figured somebody would set me straight soon enough if I'd suggested something dumb. \$\endgroup\$richardb– richardb2020年01月12日 09:32:20 +00:00Commented Jan 12, 2020 at 9:32
-
1\$\begingroup\$ It bites everyone once! :D \$\endgroup\$Samwise– Samwise2020年01月12日 16:55:23 +00:00Commented Jan 12, 2020 at 16:55
-
\$\begingroup\$ @SamStafford I learned something new today! \$\endgroup\$Aleksandr Hovhannisyan– Aleksandr Hovhannisyan2020年01月12日 16:55:26 +00:00Commented Jan 12, 2020 at 16:55
It seems simpler to me to return the value:
def dfsWithDepth(node, currentDepth, currentMax):
currentDepth += 1
if currentDepth > currentMax:
currentMax = currentDepth
for child in node.children:
if child is not None:
currentMax = dfsWithDepth(child, currentDepth, currentMax)
return currentMax
# ...
print(dfsWithDepth(rootNode, 0, 0))
You could even set default values for the two depth arguments to simplify the use of the recursive function:
def dfsWithDepth(node, currentDepth=0, currentMax=0):
# ...
print(dfsWithDepth(rootNode))
Modifying state in a recursive function is a bad idea if you can avoid it. If a function does not modify state, it becomes a pure function; such functions are easier to reason about and test. To produce output, the best way is usually to return a value.
But first, a word on the terminology you chose. In computer science, the depth of a tree node refers to how far it is from the root node; in other words, how far you can walk up. The value you are computing, is how far you can walk down from a node, which is commonly called the height of a node.
Your original function, renamed to height
, is:
def height(node, currentHeight, currentMaxContainer):
currentHeight += 1
if currentHeight > currentMaxContainer[0]:
currentMaxContainer[0] = currentHeight
for child in node.children:
if child is not None:
height(child, currentHeight, currentMaxContainer)
Let's modify it so it doesn't modify currentMaxContainer
, but returns a value instead:
def height(node, currentHeight):
for child in node.children:
if child is not None:
child_height = height(child, currentHeight)
if child_height > currentHeight:
currentHeight = child_height
return currentHeight + 1
Now it finds the maximum height of its children, adds 1 (for the current node), and returns that. Usage is now print(height(startNode, startHeight))
, or print(height(rootNode, 0))
.
We can make the fact that it finds the maximum height of its children more explicit:
def height(node, currentHeight):
max_height = 0
for child in node.children:
if child is not None:
child_height = height(child, 0)
if child_height > max_height:
max_height = child_height
return currentHeight + max_height + 1
This brings into question: why is currentHeight
passed as an argument? It's only used once, and added to the total result. Really, the function doesn't care what some height (depth?) above the current node is, it should only concern itself with the height below the current node. It's a bit of outside state that is irrelevant. After all, you can just add any currentHeight
value to the return value of height()
!
def height(node):
max_height = 0
for child in node.children:
if child is not None:
child_height = height(child)
if child_height > max_height:
max_height = child_height
return max_height + 1
This finds the height of the subtree referred to by node
(a definition now completely consistent with common CS terminology). Usage is now print(height(startNode) + startHeight)
, or print(height(rootNode) + 0)
, which obviously becomes print(height(rootNode))
.
Having removed unnecessary state from the function, it becomes easier to simplify. First, let's consider that None
case; a child that is None
should not increase max_height
, that's what the if
is for. But if we specify that height(None) = 0
, then we don't need that test in the loop:
def height(node):
if node is None:
return 0
max_height = 0
for child in node.children:
child_height = height(child)
if child_height > max_height:
max_height = child_height
return max_height + 1
The remaining loop is just finding the maximum of height(child)
for all children, which can be expressed a lot shorter and more Pythonic:
def height(node):
if node is None:
return 0
return max(map(height, node.children), default=0) + 1
Alternatively, we could filter out the None
children like we did before with the explicit test:
def height(node):
return max(map(height, filter(None, node.children)), default=0) + 1
But I like the previous version better, because it is more explicit.
Those None
children are a bit of a recurring headache though, and that will be the case for any code that has to iterate over a node's children. I recommend simply disallowing that case. In other words, node.children
should only contain (0 or more) valid children, never any None
s. This should simplify a lot of code that interacts with your trees. The function then is simply:
def height(node):
return max(map(height, node.children), default=0) + 1
And finally, this function should really be a method on the class:
class Node:
...
def height(self):
return max(c.height() for c in self.children), default=0) + 1
Usage then finally becomes print(startNode.height() + startHeight)
, or print(rootNode.height())
.
Note that tree structures are very common, so there are libraries implementing them along with common tree operations. You may want to use one of those, or perhaps look at their APIs and/or source for inspiration. One such library for Python is anytree (no affiliation or endorsement).
My preferred way to solve the mutable immutable, is to just us a closure. This has the benefit that you don't have to pass currentDepthContainer
, and you can return just the maximum at the end.
Note I have changed your variable names to be Pythonic too.
def maximum_depth(root):
def inner(node, depth):
nonlocal max_depth
depth += 1
if depth > max_depth:
max_depth = depth
for child in node.children:
if child is not None:
inner(child, depth)
max_depth = 0
inner(root, 0)
return max_depth
But really, I don't think that your solution is that great readability wise. As you could just return the maximum depth so far and then just use max
.
def maximum_depth(node, depth=0):
depth += 1
return max(
(
maximum_depth(child, depth)
for child in node.children
if child is not None
),
default=depth,
)
-
2\$\begingroup\$ Wouldn't
inner(child, depth+1)
be both shorter and cleaner? \$\endgroup\$hyde– hyde2020年01月12日 19:38:06 +00:00Commented Jan 12, 2020 at 19:38 -
\$\begingroup\$ @hyde You also have to do
depth + 1 > max_depth
andmax_depth = depth + 1
. So yeah you save one line, to then add 3+ 1
s. Beauty is in the eye of the beholder. I personally don't think either is better. \$\endgroup\$2020年01月13日 06:37:29 +00:00Commented Jan 13, 2020 at 6:37 -
\$\begingroup\$ Why? Now you do "pass depth, depth+=1" and there's nothing between. So just "pass depth+1" should do exactly the same thing. \$\endgroup\$hyde– hyde2020年01月13日 06:53:42 +00:00Commented Jan 13, 2020 at 6:53
I would suggest a turned around approach: compute the depth upon construction of the nodes. This is pure, and makes sure it is only computed once. It requires you to treat the class as immutable, though (i.e., no later extending of children
), which might or might not be fine in your use case.
class Node:
"""
Node in a tree, with children stored in a list.
"""
def __init__(self, key, children = None):
self.key = key
if children is None:
self.children = []
else:
self.children = children
self.depth = max((c.depth for c in self.children), default = 0) + 1
root = Node('n1', [Node('n2'), Node('n3', [Node('n4'), Node('n5')])])
print(root.depth) # 3
print(root.children[0].depth) # 1
print(root.children[1].depth) # 2
print(root.children[1].children[0].depth) # 1
default
is an extra argument to max
which tells it what to return in the case of children
being empty.
lower_case_with_underscores
style. \$\endgroup\$return
. \$\endgroup\$