I'm interested in recursive generation of trees in Python. For my eventual application, it's important that the trees be represented as a list of nodes (rather than say a dictionary, or an edge list, or some other format). Also, in my eventual application, each node of the tree will have between 0 and hundreds of children. In this skeleton code, the number of children for each node is generated randomly.
The trees generated have a parameter-defined maximum depth, but because even nodes not at the maximum depth might have no children, some leaves will be closer to the root than others.
I wrote this function for the task:
import random
def generate_random_tree(nodelist=[], idx=0, parent=None, depth=0, max_children=2, max_depth=2):
"""Build a list of nodes in a random tree up to a maximum depth.
:param: nodelist list, the nodes in the tree; each node is a list with elements [idx, parent, depth]
:param: idx int, the index of a node
:param: parent int, the index of the node's parent
:param: depth int, the distance of a node from the root
:param: max_children int, the maximum number of children a node can have
:param: max_depth int, the maximum distance from the tree to the root"""
if depth < max_depth and depth >=0:
# add a random number of children
n = random.randint(0, max_children)
nodelist.extend([[idx+i, parent, depth] for i in xrange(n)])
# for each new child, add new children
[generate_random_tree(nodelist, len(nodelist), idx+i, depth+1, max_children, max_depth) for i in xrange(n)]
elif depth == max_depth:
# add a random number of leaves
n = random.randint(0, max_children)
nodelist.extend([[idx+i, parent, depth] for i in xrange(n)])
return
else: # this should never happen
raise ValueError('Algorithm halted because depth > max_depth or depth < 0.')
return
The function is used like this:
tree =[[0, None, 0]] # the algorithm starts with a root node which has no parents and depth 0
random.seed(0)
generate_random_tree(nodelist=tree, idx=len(tree), parent=0, depth=1, max_children=3, max_depth=3)
[0, None, 0]
[1, 0, 1]
[2, 0, 1]
[3, 0, 1]
[4, 1, 2]
[5, 1, 2]
[6, 1, 2]
[7, 4, 3]
[8, 5, 3]
[9, 6, 3]
[10, 6, 3]
[11, 2, 2]
[12, 11, 3]
[13, 11, 3]
[14, 11, 3]
[15, 3, 2]
[16, 15, 3]
This function does the task, but I'm eager for feedback on the quality of this implementation. Any feedback is welcome, of course, but particular questions I have include:
Are there any better ways to find appropriate indices for each new node other than calling
len()
?Are there simple, recursive Python implementations that
return
values rather than relying on list mutability?(This may be out of scope for se.cr; if so, please ignore) Is there a way to get the enumeration to be depth-first rather than the breadth-first implementation I currently have?
Although this code isn't going to go into a production in environment or be worked on by teams of professional software engineers, high-level comments about readability, best practices, and hidden corner cases would be much appreciated.
5 Answers 5
This is really nicely written code! I do have a few small tips, and then I'll try to answer some of your small questions.
- I noticed from looking at your code, that you only use the
random.randint
function from therandom
module/library. Rather than importing the entirerandom
module, you can just dofrom random import randint
. - In your
else
block at the very end of your function, you have areturn
after you raise an exception. Raising an exception will exit the program, so thisreturn
can be removed entirely. - You also mention in a comment by the
else
block, that this will never happen. If it never happens, then why do you need it? It can be removed as well. - I did find something odd about this function. Underneath the comment
# for each new child, add new children
you have an expression that's evaluated, but you don't return the value of the expression. Preferably, this line should be changed toreturn [ ... ]
. - Finally, using
len
to find the new appropriate indices is really the only good option, right now. You can always dotree[len(tree) - 1].index() + 1
, but that's horridly unclear. Just stick withlen
.
-
\$\begingroup\$ Thanks very much for the tips. Regarding point 4: I added a
return
to the line in question, and the program worked the same as before. My impression was thatlist
methods likeextend
andappend
actually returnNone
, so I don't think I'd use thereturn
ed value in any case. But is having thereturn
preferred for aesthetic reasons? \$\endgroup\$Curt F.– Curt F.2015年06月27日 16:22:06 +00:00Commented Jun 27, 2015 at 16:22 -
\$\begingroup\$ @CurtF. The
return
should return the generatednodelist
, and can then be used as a value. \$\endgroup\$Ethan Bierlein– Ethan Bierlein2015年06月27日 16:25:49 +00:00Commented Jun 27, 2015 at 16:25
Your code looks nice, but it could be improved in places:
You should improve your naming to fit PEP8 standards, like nodelist
should be node_list
.
Your comments should start on a new line like:
"""
stuff here
"""
instead of:
""" stuff
here """
Your return below the the Raise ValueError
is superfluous and can be removed.
You should add more whitespace between your operators, like so:
idx+i
=> idx + i
depth >=0
=> depth >= 0
Your if-statement comments are inconsistently formatted:
elif depth == max_depth: # add a random number of leaves else: # this should never happen
Please choose a single method, whether same-line or new-line.
Finally, idx
is a bad naming choice for index
, it seems really slack to simply remove two characters, and make the script much less readable.
It doesn't make sense to set a default value for the nodelist=[]
parameter in generate_random_tree
. Since the tree is never returned and only built in nodelist
,
it will not be accessible in a normal way when the method returns.
However, as @janne-karila explained in a comment:
It will grow each time the function is called without arguments, and can be inspected as
generate_random_tree.__defaults__[0]
. Note that accessing__defaults__
is meant for special purposes only, not as an alternative way of returning a value.
Instead of this:
if depth < max_depth and depth >=0:
Check out this awesome operator:
if 0 <= depth < max_depth:
This code appears twice:
n = random.randint(0, max_children) nodelist.extend([[idx+i, parent, depth] for i in xrange(n)])
Code duplication is bad: if later you need to change something, you have to remember to make the same change everywhere, which is error prone. It's better to extract common logic to a helper method.
Notes
def generate_random_tree(nodelist=[], ...):
It may be a bad idea to have mutable default values, but you certainly can do it.
Questions
Are there simple, recursive Python implementations that return values rather than relying on list mutability?
Nothing hard, just create and return list, like this:
from copy import copy
from random import randint
def make_tree(index, parent, depth, max_children, max_depth):
if depth > max_depth:
return []
nodes = []
for child in range(randint(1, max_children)):
nodes.append([index + child, parent, depth])
for node in copy(nodes):
nodes.extend(make_tree(index + len(nodes),
node[0],
depth + 1,
max_children,
max_depth))
return nodes
print(make_tree(0, None, 0, 2, 2))
NOTE: there is virtual root node. You can simply add it to the beginning.
Are there any better ways to find appropriate indices for each new node other than calling len()?
You can modify example above to return tuple with 2 params: subtree and number of nodes. In this case you should not evaluate number of nodes on each iteration using len
.
Can be done like this:
def make_tree(index, parent, depth, max_children, max_depth):
if depth > max_depth:
return [], 0
nodes = []
count = 0
children_count = randint(1, max_children)
count += children_count
for child in range(children_count):
nodes.append([index + child, parent, depth])
for node in copy(nodes):
sub_tree, sub_count = make_tree(
index + count, node[0], depth + 1, max_children, max_depth)
count += sub_count
nodes.extend(sub_tree)
return nodes, count
(This may be out of scope for se.cr; if so, please ignore) Is there a way to get the enumeration to be depth-first rather than the breadth-first implementation I currently have?
Just move creation of children into loop of current level nodes creation.
-
\$\begingroup\$ Thanks for the help. Good points and your version definitely makes depth-first easy. Can I ask you more about the bug you mentioned? When you say "can't do that", do you mean stylistically or PEP8 or something? My code ran for me at least in Python 2.7.10 (anaconda distribution on Mac OSX 10.10.3) \$\endgroup\$Curt F.– Curt F.2015年06月27日 21:31:42 +00:00Commented Jun 27, 2015 at 21:31
-
\$\begingroup\$ @CurtF.checked. Dynamic objects in named params was not supported 'til python 3. But now it works. Didn't find related news for more details. \$\endgroup\$outoftime– outoftime2015年06月27日 21:58:30 +00:00Commented Jun 27, 2015 at 21:58
-
2\$\begingroup\$ Can you rephrase "Actually, you can't do that. Named arguments have to be non mutable."? It may be a bad idea to have mutable default values, but you certainly can do it. \$\endgroup\$Gareth Rees– Gareth Rees2015年06月29日 19:24:56 +00:00Commented Jun 29, 2015 at 19:24
Based on feedback in the other answers, I wanted to present some new improved code, and to compare the performance of the various suggestions. (I am the OP who asked the question.)
First, here's an improved version of my function (which does not return
results but relies on list mutability in Python), which takes into account suggestions from janos, Quill, and Ethan Bierlein:
from random import randint
def generate_random_tree(node_list, idx=0, parent=None, depth=0, max_children=2, max_depth=2):
"""
Build a list of nodes in a random tree up to a maximum depth.
:param: node_list list of nodes in the tree; each node is a list with elements [idx, parent, depth]
:param: idx int, the index of a node
:param: parent int, the index of the node's parent
:param: depth int, the distance of a node from the root
:param: max_children int, the maximum number of children a node can have
:param: max_depth int, the maximum distance from the tree to the root
"""
def add_children(node_list, idx, parent, depth, max_children):
"""Helper function for generate_random_tree() that adds n random child nodes to node_list."""
n = randint(0, max_children)
node_list.extend([[idx+i, parent, depth] for i in xrange(n)])
return n
if 0 <= depth < max_depth:
# add a random number n of children
n = add_children(node_list, idx, parent, depth, max_children)
# for each new child, add new children
[generate_random_tree(node_list, len(node_list), idx+i, depth+1, max_children, max_depth) for i in xrange(n)]
elif depth == max_depth:
# add a random number of leaves
add_children(node_list, idx, parent, depth, max_children)
return
I was happy with how this turned out, but honestly, the solution presented by outoftime was even clearer and easier to read. So I compared his make_tree()
with this version of generate_random_tree()
using timeit
. Here are some results for generate_random_tree()
:
from random import seed
import timeit
tree =[[0, None, 0]]
seed(0)
max_depths = [3, 5, 8, 13]
for max_depth in max_depths:
%timeit generate_random_tree(node_list=tree, idx=len(tree), parent=0, depth=1, max_children=3, max_depth=max_depth)
The slowest run took 6.02 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 12.8 μs per loop
10000 loops, best of 3: 35.9 μs per loop
10000 loops, best of 3: 138 μs per loop
1000 loops, best of 3: 1.09 ms per loop
The same analysis for make_tree()
:
tree =[[0, None, 0]]
seed(0)
for max_depth in max_depths:
%timeit make_tree(index=len(tree), parent=0, depth=1, max_children=3, max_depth=max_depth)
The slowest run took 4.66 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 30.9 μs per loop
10000 loops, best of 3: 148 μs per loop
1000 loops, best of 3: 1.16 ms per loop
10 loops, best of 3: 41.2 ms per loop
So the option to rely on list mutability and have the function return
nothing, just modify the list passed in, seems significantly faster, probably because it does not have to copy()
any lists. The downside is that the code is harder to understand.
Thanks to everyone for their review!