Enumeration of random trees

Question 1

I'm interested in recursive generation of trees in Python. For my eventual application, it's important that the trees be represented as a list of nodes (rather than say a dictionary, or an edge list, or some other format). Also, in my eventual application, each node of the tree will have between 0 and hundreds of children. In this skeleton code, the number of children for each node is generated randomly.

The trees generated have a parameter-defined maximum depth, but because even nodes not at the maximum depth might have no children, some leaves will be closer to the root than others.

I wrote this function for the task:

import random
def generate_random_tree(nodelist=[], idx=0, parent=None, depth=0, max_children=2, max_depth=2):
 """Build a list of nodes in a random tree up to a maximum depth.
 :param: nodelist list, the nodes in the tree; each node is a list with elements [idx, parent, depth]
 :param: idx int, the index of a node
 :param: parent int, the index of the node's parent
 :param: depth int, the distance of a node from the root
 :param: max_children int, the maximum number of children a node can have
 :param: max_depth int, the maximum distance from the tree to the root"""
 if depth < max_depth and depth >=0:
 # add a random number of children
 n = random.randint(0, max_children)
 nodelist.extend([[idx+i, parent, depth] for i in xrange(n)]) 
 # for each new child, add new children
 [generate_random_tree(nodelist, len(nodelist), idx+i, depth+1, max_children, max_depth) for i in xrange(n)]
 elif depth == max_depth:
 # add a random number of leaves
 n = random.randint(0, max_children)
 nodelist.extend([[idx+i, parent, depth] for i in xrange(n)]) 
 return
 else: # this should never happen
 raise ValueError('Algorithm halted because depth > max_depth or depth < 0.')
 return

The function is used like this:

tree =[[0, None, 0]] # the algorithm starts with a root node which has no parents and depth 0
random.seed(0)
generate_random_tree(nodelist=tree, idx=len(tree), parent=0, depth=1, max_children=3, max_depth=3)
[0, None, 0]
[1, 0, 1]
[2, 0, 1]
[3, 0, 1]
[4, 1, 2]
[5, 1, 2]
[6, 1, 2]
[7, 4, 3]
[8, 5, 3]
[9, 6, 3]
[10, 6, 3]
[11, 2, 2]
[12, 11, 3]
[13, 11, 3]
[14, 11, 3]
[15, 3, 2]
[16, 15, 3]

This function does the task, but I'm eager for feedback on the quality of this implementation. Any feedback is welcome, of course, but particular questions I have include:

Are there any better ways to find appropriate indices for each new node other than calling len()?
Are there simple, recursive Python implementations that return values rather than relying on list mutability?
(This may be out of scope for se.cr; if so, please ignore) Is there a way to get the enumeration to be depth-first rather than the breadth-first implementation I currently have?

Although this code isn't going to go into a production in environment or be worked on by teams of professional software engineers, high-level comments about readability, best practices, and hidden corner cases would be much appreciated.

Question 2

This is really nicely written code! I do have a few small tips, and then I'll try to answer some of your small questions.

I noticed from looking at your code, that you only use the random.randint function from the random module/library. Rather than importing the entire random module, you can just do from random import randint.
In your else block at the very end of your function, you have a return after you raise an exception. Raising an exception will exit the program, so this return can be removed entirely.
You also mention in a comment by the else block, that this will never happen. If it never happens, then why do you need it? It can be removed as well.
I did find something odd about this function. Underneath the comment # for each new child, add new children you have an expression that's evaluated, but you don't return the value of the expression. Preferably, this line should be changed to return [ ... ].
Finally, using len to find the new appropriate indices is really the only good option, right now. You can always do tree[len(tree) - 1].index() + 1, but that's horridly unclear. Just stick with len.

Question 3

Thanks very much for the tips. Regarding point 4: I added a return to the line in question, and the program worked the same as before. My impression was that list methods like extend and append actually return None, so I don't think I'd use the returned value in any case. But is having the return preferred for aesthetic reasons?

Question 4

@CurtF. The return should return the generated nodelist, and can then be used as a value.

Question 5

Your code looks nice, but it could be improved in places:

You should improve your naming to fit PEP8 standards, like nodelist should be node_list.

Your comments should start on a new line like:

"""
 stuff here
"""

instead of:

""" stuff
here """

Your return below the the Raise ValueError is superfluous and can be removed.

You should add more whitespace between your operators, like so:
idx+i => idx + i
depth >=0 => depth >= 0

Your if-statement comments are inconsistently formatted:

elif depth == max_depth:
 # add a random number of leaves
else: # this should never happen

Please choose a single method, whether same-line or new-line.

Finally, idx is a bad naming choice for index, it seems really slack to simply remove two characters, and make the script much less readable.

Question 6

It doesn't make sense to set a default value for the nodelist=[] parameter in generate_random_tree. Since the tree is never returned and only built in nodelist, it will not be accessible in a normal way when the method returns. However, as @janne-karila explained in a comment:

It will grow each time the function is called without arguments, and can be inspected as generate_random_tree.__defaults__[0]. Note that accessing __defaults__ is meant for special purposes only, not as an alternative way of returning a value.

Instead of this:

if depth < max_depth and depth >=0:

Check out this awesome operator:

if 0 <= depth < max_depth:

This code appears twice:

 n = random.randint(0, max_children)
 nodelist.extend([[idx+i, parent, depth] for i in xrange(n)])

Code duplication is bad: if later you need to change something, you have to remember to make the same change everywhere, which is error prone. It's better to extract common logic to a helper method.

Question 7

Notes

def generate_random_tree(nodelist=[], ...):

It may be a bad idea to have mutable default values, but you certainly can do it.

Questions

Are there simple, recursive Python implementations that return values rather than relying on list mutability?

Nothing hard, just create and return list, like this:

from copy import copy
from random import randint
def make_tree(index, parent, depth, max_children, max_depth):
 if depth > max_depth: 
 return []
 nodes = []
 for child in range(randint(1, max_children)):
 nodes.append([index + child, parent, depth])
 for node in copy(nodes):
 nodes.extend(make_tree(index + len(nodes), 
 node[0], 
 depth + 1, 
 max_children, 
 max_depth))
 return nodes
print(make_tree(0, None, 0, 2, 2))

NOTE: there is virtual root node. You can simply add it to the beginning.

Are there any better ways to find appropriate indices for each new node other than calling len()?

You can modify example above to return tuple with 2 params: subtree and number of nodes. In this case you should not evaluate number of nodes on each iteration using len.

Can be done like this:

def make_tree(index, parent, depth, max_children, max_depth):
 if depth > max_depth: 
 return [], 0
 nodes = []
 count = 0
 children_count = randint(1, max_children)
 count += children_count
 for child in range(children_count):
 nodes.append([index + child, parent, depth])
 for node in copy(nodes):
 sub_tree, sub_count = make_tree(
 index + count, node[0], depth + 1, max_children, max_depth)
 count += sub_count
 nodes.extend(sub_tree)
 return nodes, count

(This may be out of scope for se.cr; if so, please ignore) Is there a way to get the enumeration to be depth-first rather than the breadth-first implementation I currently have?

Just move creation of children into loop of current level nodes creation.

Question 8

Thanks for the help. Good points and your version definitely makes depth-first easy. Can I ask you more about the bug you mentioned? When you say "can't do that", do you mean stylistically or PEP8 or something? My code ran for me at least in Python 2.7.10 (anaconda distribution on Mac OSX 10.10.3)

Question 9

@CurtF.checked. Dynamic objects in named params was not supported 'til python 3. But now it works. Didn't find related news for more details.

Question 10

Can you rephrase "Actually, you can't do that. Named arguments have to be non mutable."? It may be a bad idea to have mutable default values, but you certainly can do it.

Question 11

Based on feedback in the other answers, I wanted to present some new improved code, and to compare the performance of the various suggestions. (I am the OP who asked the question.)

First, here's an improved version of my function (which does not return results but relies on list mutability in Python), which takes into account suggestions from janos, Quill, and Ethan Bierlein:

from random import randint
def generate_random_tree(node_list, idx=0, parent=None, depth=0, max_children=2, max_depth=2):
 """
 Build a list of nodes in a random tree up to a maximum depth.
 :param: node_list list of nodes in the tree; each node is a list with elements [idx, parent, depth]
 :param: idx int, the index of a node
 :param: parent int, the index of the node's parent
 :param: depth int, the distance of a node from the root
 :param: max_children int, the maximum number of children a node can have
 :param: max_depth int, the maximum distance from the tree to the root
 """
 def add_children(node_list, idx, parent, depth, max_children):
 """Helper function for generate_random_tree() that adds n random child nodes to node_list."""
 n = randint(0, max_children)
 node_list.extend([[idx+i, parent, depth] for i in xrange(n)])
 return n
 if 0 <= depth < max_depth:
 # add a random number n of children
 n = add_children(node_list, idx, parent, depth, max_children)
 # for each new child, add new children
 [generate_random_tree(node_list, len(node_list), idx+i, depth+1, max_children, max_depth) for i in xrange(n)]
 elif depth == max_depth:
 # add a random number of leaves
 add_children(node_list, idx, parent, depth, max_children)
 return

I was happy with how this turned out, but honestly, the solution presented by outoftime was even clearer and easier to read. So I compared his make_tree() with this version of generate_random_tree() using timeit. Here are some results for generate_random_tree():

from random import seed
import timeit
tree =[[0, None, 0]] 
seed(0)
max_depths = [3, 5, 8, 13]
for max_depth in max_depths:
 %timeit generate_random_tree(node_list=tree, idx=len(tree), parent=0, depth=1, max_children=3, max_depth=max_depth)

The slowest run took 6.02 times longer than the fastest. This could mean that an intermediate result is being cached 
100000 loops, best of 3: 12.8 μs per loop
10000 loops, best of 3: 35.9 μs per loop
10000 loops, best of 3: 138 μs per loop
1000 loops, best of 3: 1.09 ms per loop

The same analysis for make_tree():

tree =[[0, None, 0]] 
seed(0)
for max_depth in max_depths:
 %timeit make_tree(index=len(tree), parent=0, depth=1, max_children=3, max_depth=max_depth)

The slowest run took 4.66 times longer than the fastest. This could mean that an intermediate result is being cached 
10000 loops, best of 3: 30.9 μs per loop
10000 loops, best of 3: 148 μs per loop
1000 loops, best of 3: 1.16 ms per loop
10 loops, best of 3: 41.2 ms per loop

So the option to rely on list mutability and have the function return nothing, just modify the list passed in, seems significantly faster, probably because it does not have to copy() any lists. The downside is that the code is harder to understand.

Thanks to everyone for their review!

score 5 · Accepted Answer · 2015-06-27 15:04:51Z

This is really nicely written code! I do have a few small tips, and then I'll try to answer some of your small questions.

I noticed from looking at your code, that you only use the random.randint function from the random module/library. Rather than importing the entire random module, you can just do from random import randint.
In your else block at the very end of your function, you have a return after you raise an exception. Raising an exception will exit the program, so this return can be removed entirely.
You also mention in a comment by the else block, that this will never happen. If it never happens, then why do you need it? It can be removed as well.
I did find something odd about this function. Underneath the comment # for each new child, add new children you have an expression that's evaluated, but you don't return the value of the expression. Preferably, this line should be changed to return [ ... ].
Finally, using len to find the new appropriate indices is really the only good option, right now. You can always do tree[len(tree) - 1].index() + 1, but that's horridly unclear. Just stick with len.

Thanks very much for the tips. Regarding point 4: I added a return to the line in question, and the program worked the same as before. My impression was that list methods like extend and append actually return None, so I don't think I'd use the returned value in any case. But is having the return preferred for aesthetic reasons?
@CurtF. The return should return the generated nodelist, and can then be used as a value.

Stack Exchange Network

Enumeration of random trees

5 Answers 5

Notes

Questions

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Enumeration of random trees

5 Answers 5

Notes

Questions

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions