Here is my code to use range minimal query to resolve the LCA problem. Any advice on code bugs, performance improvement in terms of algorithm time complexity, code style are appreciated.
More specifically, my questions are:
- I am using in-order traverse to build depth array. Should I use other traverse, like pre-order (dfs), level-order, or post order?
- I am using two additional data structures,
index_node_map
andnode_index_map
. I'm wondering if there are any ways to remove them, as they make my code seem lengthy.
Major ideas of my idea are:
- Build depth array of each node by in-order traverse
- Build MRQ index for depth array built in first step
- For any two nodes, their LCA are the lowest depth node between them (in depth array)
from collections import defaultdict
class TreeNode:
def __init__(self, value, left, right):
self.value = value
self.left = left
self.right = right
class Solution:
def __init__(self):
self.index_node_map = defaultdict(int)
self.node_index_map = defaultdict(int)
self.depth = []
self.rmp_map = defaultdict(tuple) # key: (index, step), value: (min value, index)
def in_order(self, root, level):
if root.left:
self.in_order(root.left, level+1)
self.depth.append(level)
self.index_node_map[len(self.depth)-1]=root.value
self.node_index_map[root.value] = len(self.depth)-1
if root.right:
self.in_order(root.right, level+1)
def build_depth_index(self, root, level):
self.in_order(root, level)
print self.depth
print self.index_node_map
print self.node_index_map
self.build_rmq_map(self.depth)
def build_rmq_map(self, result):
step = len(result).bit_length()
for i in range(len(result)):
self.rmp_map[(i,0)]=(result[i],i)
for s in range(1, step+1):
for i in range(len(result)):
if i + 2 **(s-1) < len(result):
self.rmp_map[(i,s)] = min(self.rmp_map[(i,s-1)],self.rmp_map[(i+2**(s-1),s-1)])
else:
self.rmp_map[(i,s)] = self.rmp_map[(i, s-1)]
def rmp_query(self, index1, index2):
step = (index2 - index1).bit_length()-1
x = self.rmp_map[(index1, step)]
y = self.rmp_map[(index2+1-2**(step), step)]
# return index of depth array min value
if x[0] < y[0]:
return x[1]
else:
return y[1]
def lca_query(self, node1, node2):
index1 = self.node_index_map[node1]
index2 = self.node_index_map[node2]
min_index = self.rmp_query(index1, index2)
return self.index_node_map[min_index]
if __name__ == "__main__":
root = TreeNode(1, TreeNode(2, TreeNode(4, None, None), TreeNode(5, None, None)), TreeNode(3, TreeNode(6, None, None),TreeNode(7,None,None)))
s = Solution()
s.build_depth_index(root, 0)
print s.lca_query(4,5)
print s.lca_query(4, 7)
-
6\$\begingroup\$ Please comment your code. \$\endgroup\$Raziman T V– Raziman T V2017年02月03日 08:23:17 +00:00Commented Feb 3, 2017 at 8:23
1 Answer 1
This review describes my thoughts as a fellow competitive programmer.
About your questions
- In-order traversal is a must here. Otherwise the RMQ-based algorithm won't work - it requires that for each vertexes A and B their LCA occurs between any occurrences of A and B.
- You will need some kind of maps to map tree node to their position in the RMQ array. So, at least one of them should stay. However, you can get rid of
index_node_map
if you make RMQ store node's "value" (which is actually node's id, as far as I understood from your code).
Major problems
- Your code does not satisfy PEP 8 at least in terms of whitespaces (it's easy to check and fix with tools like
pep8
,flake8
,autopep8
), that significantly reduces readability. You have God object in the solution, which, well, "solves the problem". Meaning, it mixes together all of the following:
- In-order traversal of a tree.
- Mapping between tree nodes and their indexes in the traversal.
- Calculating depths of vertexes.
- Reducing LCA to RMQ with
- Solution of RMQ problem with sparse table.
Moreover, all these are highly tied together and you cannot, say, debug RMQ separately. It is a very common problem among competitive programmers (especially beginners). Though, some do not consider it to be a problem, but you're posting on codereview.
- You initialize object outside of constructor, in the
build_depth_index
function. It may lead to a false conclusion that one may re-use the object with different inputs by callingbuild_depth_index
with new tree. Despite "work in the constructor" being a kind of anti-pattern, I feel it's OK to move all initialization to the constructor in this case:- You won't be able to reuse or not initialize the object.
- You will have less mutable state to care about (as opposed to the "initialize everything in
build_depth_index
and make the object reusable" approach), which is generally helpful for debugging.
- There are a bunch of implicit invariants, e.g.:
- You assume that all values in the tree are different. It's not enforced with
assert
s, so it's possible to accidentally apply your algorithm to "wrong" tree and get "almost correct" results. - In the
in_order
function you assume that each vertex occurs in the in-order traversal only once (because you "reenumerate" vertexes), but it's not the case for non-binary trees. Algorithm is more general than that. I'd suggest to avoid writing general algorithms for some "specific" cases because that is prone to copy-pasting and makes the algorithm harder to debug.
- You assume that all values in the tree are different. It's not enforced with
- There is some debug output in the middle, which cannot be switched off.
- No need in
defaultdict
s. I'd suggest to avoiddefaultdict
s when implementing data structures (esp. RMQ) because typically you should initialize the structure first and use it afterwards. self.rmp_map = defaultdict(tuple) # key: (index, step), value: (min value, index)
you represent two-dimensional array with
dict
. Please don't do that, use arrays.- Use
namedtuple
instead of regular tuples to improve readability.
My suggestions
Get rid of God object
I'd suggest the following objects and functions:
class Rmq:
def __init__(self, array, key):
"""Initializes RMQ structure for 'array', assuming that the value
being minimized is returned by the 'key' function, if applied to
array element"""
def get_minimal_element(self, l, r):
"""Returns minimal element of array from position l to r (inclusive)"""
def get_in_order_traverse(tree):
"""Traverses tree in-order and returns a list of nodes."""
def get_depths(tree):
"""Calculates depths of all vertexes in the tree and returns mapping
from vertex's value to its depth (root's depth is 0)."""
class Lca:
def __init__(self, tree):
"""Initializes LCA solver for the given tree."""
def get_lca(self, a, b):
"""Given two vertexes a and b returns their LCA as a TreeNode."""
Of course, that design can be de-coupled even further. Say, Lca.__init__
can take an optional argument specifying Rmq
's factory so we can plug different implementations of Rmq
for debugging.
Other suggestions
- Rename
TreeNode.value
toTreeNode.id
, because LCA does care about node "values", it cares about their ids only. - Even better approach is to make
TreeNode
s hashable by overloading__hash__
and__eq__
so functions do not depend anymore on "what is node's id and how to find it". - Replace
TreeNode.left
andTreeNode.right
with a list calledTreeNode.children
. For the sake of generality - algorithm does not care. - Use
logging
module for debug output so it can be switched on/off dynamically. - Use
namedtuple
instead of tuples. - In your sparse table implementation you can ignore all those segments which extend beyond array's length and get rid of
if
. You never need to access them, as all requests to RMQ lie inside the original array.
-
\$\begingroup\$ Thanks yeputons, for your comments -- " if you make RMQ store node's "value"", wondering how to make RMQ store values? I think RMQ needs to store node depth in order to work since RMQ order by depth? If I mis-understand your points, please feel free to correct me. \$\endgroup\$Lin Ma– Lin Ma2017年02月09日 06:57:37 +00:00Commented Feb 9, 2017 at 6:57
-
\$\begingroup\$ BTW, for your comments, "There is some debug output in the middle, which cannot be switched off.", what bug do you mean? \$\endgroup\$Lin Ma– Lin Ma2017年02月09日 06:59:33 +00:00Commented Feb 9, 2017 at 6:59
-
1\$\begingroup\$ @LinMa right now your sparse table stores pair of two numbers: depth (which participates in finding minimal value) and index (which is the "result" returned by RMQ). My suggestion is to store vertex's "value" instead of that "index" and avoid extra conversion. About "debug output": you have some
print
statements which are not required by problem's statement. \$\endgroup\$yeputons– yeputons2017年02月09日 07:32:22 +00:00Commented Feb 9, 2017 at 7:32 -
\$\begingroup\$ Thanks yeputons, love your comments and mark your reply as answer. And grant the bounty points. I have make a new post here (codereview.stackexchange.com/questions/155076/…), your advice is highly appreciated. I applied your comments to break down God object (but in a different way), and also applied your comments to save one of the map (I only use node to index map). \$\endgroup\$Lin Ma– Lin Ma2017年02月11日 06:29:41 +00:00Commented Feb 11, 2017 at 6:29
-
1\$\begingroup\$ @LinMa you're welcome. Thank you. I have thought about
id
being builtin function and checked with this question. \$\endgroup\$yeputons– yeputons2017年02月12日 14:07:52 +00:00Commented Feb 12, 2017 at 14:07