Using range minimal query for lowest common ancestor (LCA)

Question 1

Here is my code to use range minimal query to resolve the LCA problem. Any advice on code bugs, performance improvement in terms of algorithm time complexity, code style are appreciated.

More specifically, my questions are:

I am using in-order traverse to build depth array. Should I use other traverse, like pre-order (dfs), level-order, or post order?
I am using two additional data structures, index_node_map and node_index_map. I'm wondering if there are any ways to remove them, as they make my code seem lengthy.

Major ideas of my idea are:

Build depth array of each node by in-order traverse
Build MRQ index for depth array built in first step
For any two nodes, their LCA are the lowest depth node between them (in depth array)

from collections import defaultdict
class TreeNode:
 def __init__(self, value, left, right):
 self.value = value
 self.left = left
 self.right = right
class Solution:
 def __init__(self):
 self.index_node_map = defaultdict(int)
 self.node_index_map = defaultdict(int)
 self.depth = []
 self.rmp_map = defaultdict(tuple) # key: (index, step), value: (min value, index)
 def in_order(self, root, level):
 if root.left:
 self.in_order(root.left, level+1)
 self.depth.append(level)
 self.index_node_map[len(self.depth)-1]=root.value
 self.node_index_map[root.value] = len(self.depth)-1
 if root.right:
 self.in_order(root.right, level+1)
 def build_depth_index(self, root, level):
 self.in_order(root, level)
 print self.depth
 print self.index_node_map
 print self.node_index_map
 self.build_rmq_map(self.depth)
 def build_rmq_map(self, result):
 step = len(result).bit_length()
 for i in range(len(result)):
 self.rmp_map[(i,0)]=(result[i],i)
 for s in range(1, step+1):
 for i in range(len(result)):
 if i + 2 **(s-1) < len(result):
 self.rmp_map[(i,s)] = min(self.rmp_map[(i,s-1)],self.rmp_map[(i+2**(s-1),s-1)])
 else:
 self.rmp_map[(i,s)] = self.rmp_map[(i, s-1)]
 def rmp_query(self, index1, index2):
 step = (index2 - index1).bit_length()-1
 x = self.rmp_map[(index1, step)]
 y = self.rmp_map[(index2+1-2**(step), step)]
 # return index of depth array min value
 if x[0] < y[0]:
 return x[1]
 else:
 return y[1]
 def lca_query(self, node1, node2):
 index1 = self.node_index_map[node1]
 index2 = self.node_index_map[node2]
 min_index = self.rmp_query(index1, index2)
 return self.index_node_map[min_index]
if __name__ == "__main__":
 root = TreeNode(1, TreeNode(2, TreeNode(4, None, None), TreeNode(5, None, None)), TreeNode(3, TreeNode(6, None, None),TreeNode(7,None,None)))
 s = Solution()
 s.build_depth_index(root, 0)
 print s.lca_query(4,5)
 print s.lca_query(4, 7)

Question 2

Please comment your code.

Question 3

This review describes my thoughts as a fellow competitive programmer.

About your questions

In-order traversal is a must here. Otherwise the RMQ-based algorithm won't work - it requires that for each vertexes A and B their LCA occurs between any occurrences of A and B.
You will need some kind of maps to map tree node to their position in the RMQ array. So, at least one of them should stay. However, you can get rid of index_node_map if you make RMQ store node's "value" (which is actually node's id, as far as I understood from your code).

Major problems

Your code does not satisfy PEP 8 at least in terms of whitespaces (it's easy to check and fix with tools like pep8, flake8, autopep8), that significantly reduces readability.
You have God object in the solution, which, well, "solves the problem". Meaning, it mixes together all of the following:
1. In-order traversal of a tree.
2. Mapping between tree nodes and their indexes in the traversal.
3. Calculating depths of vertexes.
4. Reducing LCA to RMQ with
5. Solution of RMQ problem with sparse table.
Moreover, all these are highly tied together and you cannot, say, debug RMQ separately. It is a very common problem among competitive programmers (especially beginners). Though, some do not consider it to be a problem, but you're posting on codereview.
You initialize object outside of constructor, in the build_depth_index function. It may lead to a false conclusion that one may re-use the object with different inputs by calling build_depth_index with new tree. Despite "work in the constructor" being a kind of anti-pattern, I feel it's OK to move all initialization to the constructor in this case:
1. You won't be able to reuse or not initialize the object.
2. You will have less mutable state to care about (as opposed to the "initialize everything in build_depth_index and make the object reusable" approach), which is generally helpful for debugging.
There are a bunch of implicit invariants, e.g.:
1. You assume that all values in the tree are different. It's not enforced with asserts, so it's possible to accidentally apply your algorithm to "wrong" tree and get "almost correct" results.
2. In the in_order function you assume that each vertex occurs in the in-order traversal only once (because you "reenumerate" vertexes), but it's not the case for non-binary trees. Algorithm is more general than that. I'd suggest to avoid writing general algorithms for some "specific" cases because that is prone to copy-pasting and makes the algorithm harder to debug.
There is some debug output in the middle, which cannot be switched off.
No need in defaultdicts. I'd suggest to avoid defaultdicts when implementing data structures (esp. RMQ) because typically you should initialize the structure first and use it afterwards.
self.rmp_map = defaultdict(tuple) # key: (index, step), value: (min value, index)

you represent two-dimensional array with dict. Please don't do that, use arrays.
Use namedtuple instead of regular tuples to improve readability.

My suggestions

Get rid of God object

I'd suggest the following objects and functions:

class Rmq:
 def __init__(self, array, key):
 """Initializes RMQ structure for 'array', assuming that the value
 being minimized is returned by the 'key' function, if applied to
 array element"""
 def get_minimal_element(self, l, r):
 """Returns minimal element of array from position l to r (inclusive)"""
def get_in_order_traverse(tree):
 """Traverses tree in-order and returns a list of nodes."""
def get_depths(tree):
 """Calculates depths of all vertexes in the tree and returns mapping
 from vertex's value to its depth (root's depth is 0)."""
class Lca:
 def __init__(self, tree):
 """Initializes LCA solver for the given tree."""
 def get_lca(self, a, b):
 """Given two vertexes a and b returns their LCA as a TreeNode."""

Of course, that design can be de-coupled even further. Say, Lca.__init__ can take an optional argument specifying Rmq's factory so we can plug different implementations of Rmq for debugging.

Other suggestions

Rename TreeNode.value to TreeNode.id, because LCA does care about node "values", it cares about their ids only.
Even better approach is to make TreeNodes hashable by overloading __hash__ and __eq__ so functions do not depend anymore on "what is node's id and how to find it".
Replace TreeNode.left and TreeNode.right with a list called TreeNode.children. For the sake of generality - algorithm does not care.
Use logging module for debug output so it can be switched on/off dynamically.
Use namedtuple instead of tuples.
In your sparse table implementation you can ignore all those segments which extend beyond array's length and get rid of if. You never need to access them, as all requests to RMQ lie inside the original array.

Question 4

Thanks yeputons, for your comments -- " if you make RMQ store node's "value"", wondering how to make RMQ store values? I think RMQ needs to store node depth in order to work since RMQ order by depth? If I mis-understand your points, please feel free to correct me.

Question 5

BTW, for your comments, "There is some debug output in the middle, which cannot be switched off.", what bug do you mean?

Question 6

@LinMa right now your sparse table stores pair of two numbers: depth (which participates in finding minimal value) and index (which is the "result" returned by RMQ). My suggestion is to store vertex's "value" instead of that "index" and avoid extra conversion. About "debug output": you have some print statements which are not required by problem's statement.

Question 7

Thanks yeputons, love your comments and mark your reply as answer. And grant the bounty points. I have make a new post here (codereview.stackexchange.com/questions/155076/…), your advice is highly appreciated. I applied your comments to break down God object (but in a different way), and also applied your comments to save one of the map (I only use node to index map).

Question 8

@LinMa you're welcome. Thank you. I have thought about id being builtin function and checked with this question.

yeputons yeputons 2911 silver badge3 bronze badges · Accepted Answer · 2017-02-07 07:33:00Z

This review describes my thoughts as a fellow competitive programmer.

About your questions

In-order traversal is a must here. Otherwise the RMQ-based algorithm won't work - it requires that for each vertexes A and B their LCA occurs between any occurrences of A and B.
You will need some kind of maps to map tree node to their position in the RMQ array. So, at least one of them should stay. However, you can get rid of index_node_map if you make RMQ store node's "value" (which is actually node's id, as far as I understood from your code).

Major problems

Your code does not satisfy PEP 8 at least in terms of whitespaces (it's easy to check and fix with tools like pep8, flake8, autopep8), that significantly reduces readability.
You have God object in the solution, which, well, "solves the problem". Meaning, it mixes together all of the following:
1. In-order traversal of a tree.
2. Mapping between tree nodes and their indexes in the traversal.
3. Calculating depths of vertexes.
4. Reducing LCA to RMQ with
5. Solution of RMQ problem with sparse table.
Moreover, all these are highly tied together and you cannot, say, debug RMQ separately. It is a very common problem among competitive programmers (especially beginners). Though, some do not consider it to be a problem, but you're posting on codereview.
You initialize object outside of constructor, in the build_depth_index function. It may lead to a false conclusion that one may re-use the object with different inputs by calling build_depth_index with new tree. Despite "work in the constructor" being a kind of anti-pattern, I feel it's OK to move all initialization to the constructor in this case:
1. You won't be able to reuse or not initialize the object.
2. You will have less mutable state to care about (as opposed to the "initialize everything in build_depth_index and make the object reusable" approach), which is generally helpful for debugging.
There are a bunch of implicit invariants, e.g.:
1. You assume that all values in the tree are different. It's not enforced with asserts, so it's possible to accidentally apply your algorithm to "wrong" tree and get "almost correct" results.
2. In the in_order function you assume that each vertex occurs in the in-order traversal only once (because you "reenumerate" vertexes), but it's not the case for non-binary trees. Algorithm is more general than that. I'd suggest to avoid writing general algorithms for some "specific" cases because that is prone to copy-pasting and makes the algorithm harder to debug.
There is some debug output in the middle, which cannot be switched off.
No need in defaultdicts. I'd suggest to avoid defaultdicts when implementing data structures (esp. RMQ) because typically you should initialize the structure first and use it afterwards.
self.rmp_map = defaultdict(tuple) # key: (index, step), value: (min value, index)

you represent two-dimensional array with dict. Please don't do that, use arrays.
Use namedtuple instead of regular tuples to improve readability.

My suggestions

Get rid of God object

I'd suggest the following objects and functions:

class Rmq:
 def __init__(self, array, key):
 """Initializes RMQ structure for 'array', assuming that the value
 being minimized is returned by the 'key' function, if applied to
 array element"""
 def get_minimal_element(self, l, r):
 """Returns minimal element of array from position l to r (inclusive)"""
def get_in_order_traverse(tree):
 """Traverses tree in-order and returns a list of nodes."""
def get_depths(tree):
 """Calculates depths of all vertexes in the tree and returns mapping
 from vertex's value to its depth (root's depth is 0)."""
class Lca:
 def __init__(self, tree):
 """Initializes LCA solver for the given tree."""
 def get_lca(self, a, b):
 """Given two vertexes a and b returns their LCA as a TreeNode."""

Of course, that design can be de-coupled even further. Say, Lca.__init__ can take an optional argument specifying Rmq's factory so we can plug different implementations of Rmq for debugging.

Other suggestions

Rename TreeNode.value to TreeNode.id, because LCA does care about node "values", it cares about their ids only.
Even better approach is to make TreeNodes hashable by overloading __hash__ and __eq__ so functions do not depend anymore on "what is node's id and how to find it".
Replace TreeNode.left and TreeNode.right with a list called TreeNode.children. For the sake of generality - algorithm does not care.
Use logging module for debug output so it can be switched on/off dynamically.
Use namedtuple instead of tuples.
In your sparse table implementation you can ignore all those segments which extend beyond array's length and get rid of if. You never need to access them, as all requests to RMQ lie inside the original array.

Thanks yeputons, for your comments -- " if you make RMQ store node's "value"", wondering how to make RMQ store values? I think RMQ needs to store node depth in order to work since RMQ order by depth? If I mis-understand your points, please feel free to correct me.
BTW, for your comments, "There is some debug output in the middle, which cannot be switched off.", what bug do you mean?
@LinMa right now your sparse table stores pair of two numbers: depth (which participates in finding minimal value) and index (which is the "result" returned by RMQ). My suggestion is to store vertex's "value" instead of that "index" and avoid extra conversion. About "debug output": you have some print statements which are not required by problem's statement.
Thanks yeputons, love your comments and mark your reply as answer. And grant the bounty points. I have make a new post here (codereview.stackexchange.com/questions/155076/…), your advice is highly appreciated. I applied your comments to break down God object (but in a different way), and also applied your comments to save one of the map (I only use node to index map).
@LinMa you're welcome. Thank you. I have thought about id being builtin function and checked with this question.

Stack Exchange Network

Using range minimal query for lowest common ancestor (LCA)

1 Answer 1

About your questions

Major problems

My suggestions

Get rid of God object

Other suggestions

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Using range minimal query for lowest common ancestor (LCA)

1 Answer 1

About your questions

Major problems

My suggestions

Get rid of God object

Other suggestions

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions