I recently had an interview problem where I was asked to find a compact representation of nodes in a tree.
As an example, consider the following tree:
A / | \ B C D / | \ | \ E F G H I /\ J K
The task was, given a list of node values, find a compact list of node values that completely describes the list given. A node value being in the list implictly means that all it's children are in the list as well and viceversa.
As a few examples, for the tree given:
compact([A]) = [A]
compact([B, C, D, J]) = [A]
compact([J, K]) = [E]
compact([J, K, F, G]) = [B]
compact([E, F, G, H]) = [B, H]
compact([H, I, D]) = [D]
This was my recursive attempt in python (from memory so may be some mistakes/syntax errors), which would be called with the root of the tree first. I was told:
- I don't need to check for empty trees
- I can assume that each node has basic tree methods (so can check it's value, if it's a leaf node etc).
- The non-compact list of nodes values is passed in through node_list and I was told that there would be no duplicates in this list and that each value in it would correspond to a node in the tree
def compacted_tree(node, node_list):
compact_list = []
if node.value in node_list:
#if this node is in the list then it is it's own consise representation
return [node.value]
allChildrenInCompactList = True
for child in node.get_children():
compact_list.extend(compacted_tree(child, node_list))
if (allChildrenInCompactList):
#only keep checking this if it's still true
allChildrenInCompactList = child.value in compact_list
if (allChildrenInCompactList and !node.isleaf()):
#all my children are in the list so the compact list should just be me
return [node.value]
return compact_list
I said that this would run with complexity \$O(n^2)\$ (with n=number of nodes in the tree) as in the worst case, every node needs to be visited and, in the worst case, every node needs to check all it's children against the node_list which is an \$O(n)\$ operation too. I said the space complexity should be \$O(n)\$ as in the worst case you may have to add all your nodes to the compact_list
before it can simplify.
It was hard to get a good read of how I went from the interview but I'm yet to get a call back. The interviewer hinted that the task could have been done with better complexity but I fail to see how. He also said that my time complexity should have been not just in terms of the number of nodes in the tree but also the length of the node_list
.
Is anyone able to tell me how I should've approached this problem/analysis better?
1 Answer 1
This is, strictly speaking, off-topic for Code Review, because it's a question of algorithm design, not one of implementation. But I'll try to explain how I would approach it.
1. Notation
In order to describe an algorithm and its performance, we need some notation. So:
- \$n\$ is the number of nodes in the tree;
- \$M\$ is the set of nodes we want to compact;
- \$m = |M|\,ドル the initial number of nodes in \$M\$;
- \$C(x)\$ is the set of children of the node \$x\$ in the tree.
- \$P(x)\$ is the parent of the node \$x\$ in the tree.
I'm going to assume that I can test for membership in a set in amortized time \$O(1)\$ and add or remove an element from a set in amortized time \$O(1)\$. This means we need to use Python's set
data structure.
2. Eliminate redundancies
The set \$M\$ might be redundant: namely it might contain nodes \$x\$ and \$y\$ such that \$y\$ is an ancestor of \$x\$ (that is, \$y = P(x)\$ or \$y = P(P(x))\$ or ...). All such \$x\$ need to be removed from \$M\$. For example, the set \$\{B, C, D, J\}\$ is redundant because \$B = P(P(J))\$ and so \$J\$ can be removed.
This can be done in one depth-first traversal over the tree:
- Initialize \$k\$ to 0.
- On entering a node \$x ∈ M\$: if \$k = 0\,ドル set \$k = 1\,ドル otherwise remove \$x\$ from \$M\$.
- On leaving a node \$x ∈ M\$: set \$k = 0\$.
This visits each node once and so takes time \$O(n)\$.
3. The naïve algorithm
Any algorithm for this problem is going to have to find nodes all of whose children are in the set of nodes to be compacted (that is, to find nodes \$x\$ where \$C(x) ⊆ M\$), and update the set of nodes accordingly.
So the first algorithm that springs to mind is the simple one in which we repeatedly search the whole tree for a node \$x\$ matching that condition, stopping when we can't find any more such nodes. Formally:
- Eliminate redundancies from \$M\$ as described above.
- For each non-leaf node \$x\$ in the tree:
- If \$C(x) ⊆ M\$ then:
- Update \$M\$ to \$(M \setminus C(x)) ∪ \{x\}\$
- Go to step 2.
What's the complexity of this algorithm? Step 1 takes time \$O(n)\$. Step 3 takes \$O(m)\$ (since \$M\$ stays the same or gets smaller as the algorithm progresses) and so does step 4 (for the same reason), so each loop 2–5 over the nodes in the tree takes time \$O(mn)\,ドル and we run that loop \$O(n)\$ times. The overall complexity is thus \$O(mn^2)\$.
4. Looking at the nodes in the right order
How can we improve this algorithm? It would be nice not to have to restart the loop over the nodes in the tree each time we update \$M\$. The reason we restart the loop is because updating \$M\$ might make a compaction available that we already considered and rejected. So we have to start over from the beginning so that we don't miss any compactions.
But if we looked at the nodes in the right order: that is, if we always looked at the children of \$x\$ before looking at \$x\,ドル then this couldn't happen. This order of traversing a tree is known as post-order.
So the revised algorithm looks like this:
- Eliminate redundancies from \$M\$.
- For each non-leaf node \$x\$ in post-order:
- If \$C(x) ⊆ M\$ then:
- Update \$M\$ to \$(M \setminus C(x)) ∪ \{x\}\$
Here the loop 2–4 takes \$O(mn)\$ as before, but we only execute it once, for an overall complexity of \$O(mn)\$.
5. Speeding up the subset test
The step that is contributing the most to the overall complexity is the subset test \$C(x) ⊆ M\,ドル which is executed \$O(n)\$ times and costs \$O(m)\$. How could we speed this up? The insight here is that all we need to know is the number of items in \$C(x) ∩ M\,ドル because when \$|C(x) ∩ M| = |C(x)|\$ then \$C(x) ⊆ M\$. So we can maintain a count for each node, of the number of its children that are in \$M\,ドル and just compare the counts. This results in the following algorithm:
- Eliminate redundancies from \$M\$.
- Initialize a map \$K\$ from nodes to non-negative integers such that \$K(x) = |C(x) ∩ M|\,ドル the number of children of \$x\$ that are in the set of nodes to be compacted.
- For each non-leaf node \$x\$ in post-order:
- If \$K(x) = |C(x)|\$ then:
- Update \$M\$ to \$(M \setminus C(x)) ∪ \{x\}\$
- Update \$K(P(x))\$ to \$K(P(x)) + 1\$
Here step 1 takes \$O(n)\,ドル step 2 takes \$O(m)\,ドル the loop 3–6 is executed \$O(n)\$ times, steps 4 and 6 take \$O(1)\$ each time they are executed, so \$O(n)\$ overall, and step 5 takes \$O(n)\$ overall since each node can be removed from \$M\$ at most once and added to \$M\$ at most once. So the overall complexity is \$O(n)\$.
Explore related questions
See similar questions with these tags.
[A]
is always an answer. What are expected result for[B, C]
and[E, H]
? \$\endgroup\$node_list
or all of it's children are in the list. I guess you could say the task was to repeatedly compact the set of nodes using the two above rules until you get the smallest possible set of nodes. Really sorry if this wasn't clear before. \$\endgroup\$