Python: pass "mutable integer" in recursion

Question 1

I'm working on some code to find the max depth/height in a tree. I came up with a recursive solution, and I find myself needing to pass an integer "by reference". Currently, I'm using a mutable container to achieve this, but is there a more Pythonic way?

class Node:
"""
Node in a tree, with children stored in a list.
"""
 def __init__(self, keyParam, childrenParam = None):
 self.key = keyParam #some id of the node; not strictly required, but often needed.
 if childrenParam is None:
 self.children = []
 else:
 self.children = childrenParam
def dfsWithDepth(node, currentDepth, currentMaxContainer):
 currentDepth += 1
 if currentDepth > currentMaxContainer[0]:
 currentMaxContainer[0] = currentDepth
 for child in node.children:
 if child is not None:
 dfsWithDepth(child, currentDepth, currentMaxContainer)
# %% Construct tree
rootNode = Node('n1')
rootNode.children = [Node('n2'), Node('n3')]
tempNode = rootNode.children[0] # n2
tempNode.children = [Node('n3'), Node('n4')]
tempNode = tempNode.children[1] # n4
tempNode.children = [None, Node('n5')]
currentMaxContainer = [0]
dfsWithDepth(rootNode, 0, currentMaxContainer)
print(currentMaxContainer)

Question 2

Variable and function names should follow the lower_case_with_underscores style.

Question 3

You don't need mutable parameters. Just return.

Question 4

Optimization and restructuring

Node class constructor (__init__ method)

the arguments keyParam, childrenParam introduce unneeded verbosity and better named as just key and children
the whole if ... else ... condition for ensuring children list is simply replaced with self.children = children or [] expression

The initial dfsWithDepth function should be renamed to say find_max_depth to follow PEP8 style guide naming conventions.

The former parameters currentDepth, currentMaxContainer become unneeded.
Within the crucial recursive function it's worth to capture "empty" node and node with no children beforehand. That's achieved by the following condition:

if node is None or not node.children:
 return 1

The remaining recursive search relies on the crucial find_max_depth and builtin max functions calls and accumulates the final max depth through traversing all subsequent sub-trees (child nodes).

The final optimized version:

class Node:
 """
 Node in a tree, with children stored in a list.
 """
 def __init__(self, key, children=None):
 self.key = key # some id of the node; not strictly required, but often needed.
 self.children = children or []
def find_max_depth(node):
 if node is None or not node.children:
 return 1
 return 1 + max(map(find_max_depth, node.children))

Testing (I've added additional 5th level for demonstration):

# Construct tree
rootNode = Node('n1')
rootNode.children = [Node('n2'), Node('n3')]
tempNode = rootNode.children[0] # n2
tempNode.children = [Node('n3'), Node('n4')]
tempNode = tempNode.children[1] # n4
tempNode.children = [None, Node('n5')]
tempNode = tempNode.children[1] # n5
tempNode.children = [Node('name7')]
print(find_max_depth(rootNode)) # 5

Question 5

Even better default children to [] in the first place, rather than use a needless None flag value.

Question 6

default [] is the #1 Python gotcha. Don't do it! docs.python-guide.org/writing/gotchas

Question 7

@SamStafford Thanks Sam. I did wonder why nobody else had spotted the apparently obvious but I figured somebody would set me straight soon enough if I'd suggested something dumb.

Question 8

It bites everyone once! :D

Question 9

@SamStafford I learned something new today!

Question 10

It seems simpler to me to return the value:

def dfsWithDepth(node, currentDepth, currentMax):
 currentDepth += 1
 if currentDepth > currentMax:
 currentMax = currentDepth
 for child in node.children:
 if child is not None:
 currentMax = dfsWithDepth(child, currentDepth, currentMax)
 return currentMax
# ...
print(dfsWithDepth(rootNode, 0, 0))

You could even set default values for the two depth arguments to simplify the use of the recursive function:

def dfsWithDepth(node, currentDepth=0, currentMax=0):
 # ...
print(dfsWithDepth(rootNode))

Question 11

Modifying state in a recursive function is a bad idea if you can avoid it. If a function does not modify state, it becomes a pure function; such functions are easier to reason about and test. To produce output, the best way is usually to return a value.

But first, a word on the terminology you chose. In computer science, the depth of a tree node refers to how far it is from the root node; in other words, how far you can walk up. The value you are computing, is how far you can walk down from a node, which is commonly called the height of a node.

Your original function, renamed to height, is:

def height(node, currentHeight, currentMaxContainer):
 currentHeight += 1
 if currentHeight > currentMaxContainer[0]:
 currentMaxContainer[0] = currentHeight
 for child in node.children:
 if child is not None:
 height(child, currentHeight, currentMaxContainer)

Let's modify it so it doesn't modify currentMaxContainer, but returns a value instead:

def height(node, currentHeight):
 for child in node.children:
 if child is not None:
 child_height = height(child, currentHeight)
 if child_height > currentHeight:
 currentHeight = child_height
 return currentHeight + 1

Now it finds the maximum height of its children, adds 1 (for the current node), and returns that. Usage is now print(height(startNode, startHeight)), or print(height(rootNode, 0)).

We can make the fact that it finds the maximum height of its children more explicit:

def height(node, currentHeight):
 max_height = 0
 for child in node.children:
 if child is not None:
 child_height = height(child, 0)
 if child_height > max_height:
 max_height = child_height
 return currentHeight + max_height + 1

This brings into question: why is currentHeight passed as an argument? It's only used once, and added to the total result. Really, the function doesn't care what some height (depth?) above the current node is, it should only concern itself with the height below the current node. It's a bit of outside state that is irrelevant. After all, you can just add any currentHeight value to the return value of height()!

def height(node):
 max_height = 0
 for child in node.children:
 if child is not None:
 child_height = height(child)
 if child_height > max_height:
 max_height = child_height
 return max_height + 1

This finds the height of the subtree referred to by node (a definition now completely consistent with common CS terminology). Usage is now print(height(startNode) + startHeight), or print(height(rootNode) + 0), which obviously becomes print(height(rootNode)).

Having removed unnecessary state from the function, it becomes easier to simplify. First, let's consider that None case; a child that is None should not increase max_height, that's what the if is for. But if we specify that height(None) = 0, then we don't need that test in the loop:

def height(node):
 if node is None:
 return 0
 max_height = 0
 for child in node.children:
 child_height = height(child)
 if child_height > max_height:
 max_height = child_height
 return max_height + 1

The remaining loop is just finding the maximum of height(child) for all children, which can be expressed a lot shorter and more Pythonic:

def height(node):
 if node is None:
 return 0
 return max(map(height, node.children), default=0) + 1

Alternatively, we could filter out the None children like we did before with the explicit test:

def height(node):
 return max(map(height, filter(None, node.children)), default=0) + 1

But I like the previous version better, because it is more explicit.

Those None children are a bit of a recurring headache though, and that will be the case for any code that has to iterate over a node's children. I recommend simply disallowing that case. In other words, node.children should only contain (0 or more) valid children, never any Nones. This should simplify a lot of code that interacts with your trees. The function then is simply:

def height(node):
 return max(map(height, node.children), default=0) + 1

And finally, this function should really be a method on the class:

class Node:
 ...
 def height(self):
 return max(c.height() for c in self.children), default=0) + 1

Usage then finally becomes print(startNode.height() + startHeight), or print(rootNode.height()).

Note that tree structures are very common, so there are libraries implementing them along with common tree operations. You may want to use one of those, or perhaps look at their APIs and/or source for inspiration. One such library for Python is anytree (no affiliation or endorsement).

Question 12

My preferred way to solve the mutable immutable, is to just us a closure. This has the benefit that you don't have to pass currentDepthContainer, and you can return just the maximum at the end.

Note I have changed your variable names to be Pythonic too.

def maximum_depth(root):
 def inner(node, depth):
 nonlocal max_depth
 depth += 1
 if depth > max_depth:
 max_depth = depth
 for child in node.children:
 if child is not None:
 inner(child, depth)
 max_depth = 0
 inner(root, 0)
 return max_depth

But really, I don't think that your solution is that great readability wise. As you could just return the maximum depth so far and then just use max.

def maximum_depth(node, depth=0):
 depth += 1
 return max(
 (
 maximum_depth(child, depth)
 for child in node.children
 if child is not None
 ),
 default=depth,
 )

Question 13

Wouldn't inner(child, depth+1) be both shorter and cleaner?

Question 14

@hyde You also have to do depth + 1 > max_depth and max_depth = depth + 1. So yeah you save one line, to then add 3 + 1s. Beauty is in the eye of the beholder. I personally don't think either is better.

Question 15

Why? Now you do "pass depth, depth+=1" and there's nothing between. So just "pass depth+1" should do exactly the same thing.

Question 16

I would suggest a turned around approach: compute the depth upon construction of the nodes. This is pure, and makes sure it is only computed once. It requires you to treat the class as immutable, though (i.e., no later extending of children), which might or might not be fine in your use case.

class Node:
 """
 Node in a tree, with children stored in a list.
 """
 def __init__(self, key, children = None):
 self.key = key
 if children is None:
 self.children = []
 else:
 self.children = children
 self.depth = max((c.depth for c in self.children), default = 0) + 1
root = Node('n1', [Node('n2'), Node('n3', [Node('n4'), Node('n5')])])
print(root.depth) # 3
print(root.children[0].depth) # 1
print(root.children[1].depth) # 2
print(root.children[1].children[0].depth) # 1

default is an extra argument to max which tells it what to return in the case of children being empty.

score 11 · Answer 1 · 2020-01-12 08:24:44Z

Optimization and restructuring

Node class constructor (__init__ method)

the arguments keyParam, childrenParam introduce unneeded verbosity and better named as just key and children
the whole if ... else ... condition for ensuring children list is simply replaced with self.children = children or [] expression

The initial dfsWithDepth function should be renamed to say find_max_depth to follow PEP8 style guide naming conventions.

The former parameters currentDepth, currentMaxContainer become unneeded.
Within the crucial recursive function it's worth to capture "empty" node and node with no children beforehand. That's achieved by the following condition:

if node is None or not node.children:
 return 1

The remaining recursive search relies on the crucial find_max_depth and builtin max functions calls and accumulates the final max depth through traversing all subsequent sub-trees (child nodes).

The final optimized version:

class Node:
 """
 Node in a tree, with children stored in a list.
 """
 def __init__(self, key, children=None):
 self.key = key # some id of the node; not strictly required, but often needed.
 self.children = children or []
def find_max_depth(node):
 if node is None or not node.children:
 return 1
 return 1 + max(map(find_max_depth, node.children))

Testing (I've added additional 5th level for demonstration):

# Construct tree
rootNode = Node('n1')
rootNode.children = [Node('n2'), Node('n3')]
tempNode = rootNode.children[0] # n2
tempNode.children = [Node('n3'), Node('n4')]
tempNode = tempNode.children[1] # n4
tempNode.children = [None, Node('n5')]
tempNode = tempNode.children[1] # n5
tempNode.children = [Node('name7')]
print(find_max_depth(rootNode)) # 5

Even better default children to [] in the first place, rather than use a needless None flag value.
default [] is the #1 Python gotcha. Don't do it! docs.python-guide.org/writing/gotchas
@SamStafford Thanks Sam. I did wonder why nobody else had spotted the apparently obvious but I figured somebody would set me straight soon enough if I'd suggested something dumb.

Cris Luengo Cris Luengo 7,0011 gold badge14 silver badges37 bronze badges · Answer 2 · 2020-01-12 07:35:10Z

It seems simpler to me to return the value:

def dfsWithDepth(node, currentDepth, currentMax):
 currentDepth += 1
 if currentDepth > currentMax:
 currentMax = currentDepth
 for child in node.children:
 if child is not None:
 currentMax = dfsWithDepth(child, currentDepth, currentMax)
 return currentMax
# ...
print(dfsWithDepth(rootNode, 0, 0))

You could even set default values for the two depth arguments to simplify the use of the recursive function:

def dfsWithDepth(node, currentDepth=0, currentMax=0):
 # ...
print(dfsWithDepth(rootNode))

marcelm marcelm 1714 bronze badges · Answer 3 · 2020-01-12 17:48:33Z

Modifying state in a recursive function is a bad idea if you can avoid it. If a function does not modify state, it becomes a pure function; such functions are easier to reason about and test. To produce output, the best way is usually to return a value.

But first, a word on the terminology you chose. In computer science, the depth of a tree node refers to how far it is from the root node; in other words, how far you can walk up. The value you are computing, is how far you can walk down from a node, which is commonly called the height of a node.

Your original function, renamed to height, is:

def height(node, currentHeight, currentMaxContainer):
 currentHeight += 1
 if currentHeight > currentMaxContainer[0]:
 currentMaxContainer[0] = currentHeight
 for child in node.children:
 if child is not None:
 height(child, currentHeight, currentMaxContainer)

Let's modify it so it doesn't modify currentMaxContainer, but returns a value instead:

def height(node, currentHeight):
 for child in node.children:
 if child is not None:
 child_height = height(child, currentHeight)
 if child_height > currentHeight:
 currentHeight = child_height
 return currentHeight + 1

Now it finds the maximum height of its children, adds 1 (for the current node), and returns that. Usage is now print(height(startNode, startHeight)), or print(height(rootNode, 0)).

We can make the fact that it finds the maximum height of its children more explicit:

def height(node, currentHeight):
 max_height = 0
 for child in node.children:
 if child is not None:
 child_height = height(child, 0)
 if child_height > max_height:
 max_height = child_height
 return currentHeight + max_height + 1

This brings into question: why is currentHeight passed as an argument? It's only used once, and added to the total result. Really, the function doesn't care what some height (depth?) above the current node is, it should only concern itself with the height below the current node. It's a bit of outside state that is irrelevant. After all, you can just add any currentHeight value to the return value of height()!

def height(node):
 max_height = 0
 for child in node.children:
 if child is not None:
 child_height = height(child)
 if child_height > max_height:
 max_height = child_height
 return max_height + 1

This finds the height of the subtree referred to by node (a definition now completely consistent with common CS terminology). Usage is now print(height(startNode) + startHeight), or print(height(rootNode) + 0), which obviously becomes print(height(rootNode)).

Having removed unnecessary state from the function, it becomes easier to simplify. First, let's consider that None case; a child that is None should not increase max_height, that's what the if is for. But if we specify that height(None) = 0, then we don't need that test in the loop:

def height(node):
 if node is None:
 return 0
 max_height = 0
 for child in node.children:
 child_height = height(child)
 if child_height > max_height:
 max_height = child_height
 return max_height + 1

The remaining loop is just finding the maximum of height(child) for all children, which can be expressed a lot shorter and more Pythonic:

def height(node):
 if node is None:
 return 0
 return max(map(height, node.children), default=0) + 1

Alternatively, we could filter out the None children like we did before with the explicit test:

def height(node):
 return max(map(height, filter(None, node.children)), default=0) + 1

But I like the previous version better, because it is more explicit.

Those None children are a bit of a recurring headache though, and that will be the case for any code that has to iterate over a node's children. I recommend simply disallowing that case. In other words, node.children should only contain (0 or more) valid children, never any Nones. This should simplify a lot of code that interacts with your trees. The function then is simply:

def height(node):
 return max(map(height, node.children), default=0) + 1

And finally, this function should really be a method on the class:

class Node:
 ...
 def height(self):
 return max(c.height() for c in self.children), default=0) + 1

Usage then finally becomes print(startNode.height() + startHeight), or print(rootNode.height()).

Note that tree structures are very common, so there are libraries implementing them along with common tree operations. You may want to use one of those, or perhaps look at their APIs and/or source for inspiration. One such library for Python is anytree (no affiliation or endorsement).

Peilonrayz ♦Peilonrayz 44.4k7 gold badges80 silver badges157 bronze badges · Answer 4 · 2020-01-12 08:22:02Z

My preferred way to solve the mutable immutable, is to just us a closure. This has the benefit that you don't have to pass currentDepthContainer, and you can return just the maximum at the end.

Note I have changed your variable names to be Pythonic too.

def maximum_depth(root):
 def inner(node, depth):
 nonlocal max_depth
 depth += 1
 if depth > max_depth:
 max_depth = depth
 for child in node.children:
 if child is not None:
 inner(child, depth)
 max_depth = 0
 inner(root, 0)
 return max_depth

But really, I don't think that your solution is that great readability wise. As you could just return the maximum depth so far and then just use max.

def maximum_depth(node, depth=0):
 depth += 1
 return max(
 (
 maximum_depth(child, depth)
 for child in node.children
 if child is not None
 ),
 default=depth,
 )

Wouldn't inner(child, depth+1) be both shorter and cleaner?
@hyde You also have to do depth + 1 > max_depth and max_depth = depth + 1. So yeah you save one line, to then add 3 + 1s. Beauty is in the eye of the beholder. I personally don't think either is better.
Why? Now you do "pass depth, depth+=1" and there's nothing between. So just "pass depth+1" should do exactly the same thing.

phipsgabler phipsgabler 4432 silver badges7 bronze badges · Answer 5 · 2020-01-12 16:13:28Z

I would suggest a turned around approach: compute the depth upon construction of the nodes. This is pure, and makes sure it is only computed once. It requires you to treat the class as immutable, though (i.e., no later extending of children), which might or might not be fine in your use case.

class Node:
 """
 Node in a tree, with children stored in a list.
 """
 def __init__(self, key, children = None):
 self.key = key
 if children is None:
 self.children = []
 else:
 self.children = children
 self.depth = max((c.depth for c in self.children), default = 0) + 1
root = Node('n1', [Node('n2'), Node('n3', [Node('n4'), Node('n5')])])
print(root.depth) # 3
print(root.children[0].depth) # 1
print(root.children[1].depth) # 2
print(root.children[1].children[0].depth) # 1

default is an extra argument to max which tells it what to return in the case of children being empty.

Stack Exchange Network

Python: pass "mutable integer" in recursion

5 Answers 5

Optimization and restructuring

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Python: pass "mutable integer" in recursion

5 Answers 5

Optimization and restructuring

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions