Interview task to efficiently find compact representation of nodes in tree and find big O time

Question 1

I recently had an interview problem where I was asked to find a compact representation of nodes in a tree.

As an example, consider the following tree:

 A
 / | \ 
 B C D
 / | \ | \ 
 E F G H I
 /\
 J K

The task was, given a list of node values, find a compact list of node values that completely describes the list given. A node value being in the list implictly means that all it's children are in the list as well and viceversa.

As a few examples, for the tree given:

compact([A]) = [A]
compact([B, C, D, J]) = [A]
compact([J, K]) = [E]
compact([J, K, F, G]) = [B]
compact([E, F, G, H]) = [B, H]
compact([H, I, D]) = [D]

This was my recursive attempt in python (from memory so may be some mistakes/syntax errors), which would be called with the root of the tree first. I was told:

I don't need to check for empty trees
I can assume that each node has basic tree methods (so can check it's value, if it's a leaf node etc).
The non-compact list of nodes values is passed in through node_list and I was told that there would be no duplicates in this list and that each value in it would correspond to a node in the tree

def compacted_tree(node, node_list):
 compact_list = []
 if node.value in node_list:
 #if this node is in the list then it is it's own consise representation
 return [node.value]
 allChildrenInCompactList = True
 for child in node.get_children():
 compact_list.extend(compacted_tree(child, node_list))
 if (allChildrenInCompactList):
 #only keep checking this if it's still true
 allChildrenInCompactList = child.value in compact_list
 if (allChildrenInCompactList and !node.isleaf()):
 #all my children are in the list so the compact list should just be me
 return [node.value]
 return compact_list

I said that this would run with complexity \$O(n^2)\$ (with n=number of nodes in the tree) as in the worst case, every node needs to be visited and, in the worst case, every node needs to check all it's children against the node_list which is an \$O(n)\$ operation too. I said the space complexity should be \$O(n)\$ as in the worst case you may have to add all your nodes to the compact_list before it can simplify.

It was hard to get a good read of how I went from the interview but I'm yet to get a call back. The interviewer hinted that the task could have been done with better complexity but I fail to see how. He also said that my time complexity should have been not just in terms of the number of nodes in the tree but also the length of the node_list.

Is anyone able to tell me how I should've approached this problem/analysis better?

Question 2

I am afraid you've misread the question. An arbitrary tree couldn't be compacted. The example hints a complete (aka full) tree.

Question 3

@vnp: I'm afraid you've misread the question. Maybe compacting a tree means something in a mathematical setting but in this context I have tried my best to describe what the operation should do with some examples, could I make this clearer? The interviewer definitely wanted me to describe an algorithm that would work on arbitrary trees.

Question 4

Honestly I still don't understand the problem statement. Given first two examples, I'd be tempted to say that [A] is always an answer. What are expected result for [B, C] and [E, H]?

Question 5

@vnp: The examples make it clear that the task is as follows. Let L(nodes) be the set of leaves of the tree which have an ancestor in the set nodes. We are given a set of nodes M, and the task is to return the smallest set of nodes N such that L(N) = L(M).

Question 6

@vpn: compact([B, C]) = [B, C] and similarly compact([E, H]) = [E, H]. Basically a node is in the compacted list if it is in node_list or all of it's children are in the list. I guess you could say the task was to repeatedly compact the set of nodes using the two above rules until you get the smallest possible set of nodes. Really sorry if this wasn't clear before.

Question 7

This is, strictly speaking, off-topic for Code Review, because it's a question of algorithm design, not one of implementation. But I'll try to explain how I would approach it.

1. Notation

In order to describe an algorithm and its performance, we need some notation. So:

\$n\$ is the number of nodes in the tree;
\$M\$ is the set of nodes we want to compact;
\$m = |M|\,ドル the initial number of nodes in \$M\$;
\$C(x)\$ is the set of children of the node \$x\$ in the tree.
\$P(x)\$ is the parent of the node \$x\$ in the tree.

I'm going to assume that I can test for membership in a set in amortized time \$O(1)\$ and add or remove an element from a set in amortized time \$O(1)\$. This means we need to use Python's set data structure.

2. Eliminate redundancies

The set \$M\$ might be redundant: namely it might contain nodes \$x\$ and \$y\$ such that \$y\$ is an ancestor of \$x\$ (that is, \$y = P(x)\$ or \$y = P(P(x))\$ or ...). All such \$x\$ need to be removed from \$M\$. For example, the set \$\{B, C, D, J\}\$ is redundant because \$B = P(P(J))\$ and so \$J\$ can be removed.

This can be done in one depth-first traversal over the tree:

Initialize \$k\$ to 0.
On entering a node \$x ∈ M\$: if \$k = 0\,ドル set \$k = 1\,ドル otherwise remove \$x\$ from \$M\$.
On leaving a node \$x ∈ M\$: set \$k = 0\$.

This visits each node once and so takes time \$O(n)\$.

3. The naïve algorithm

Any algorithm for this problem is going to have to find nodes all of whose children are in the set of nodes to be compacted (that is, to find nodes \$x\$ where \$C(x) ⊆ M\$), and update the set of nodes accordingly.

So the first algorithm that springs to mind is the simple one in which we repeatedly search the whole tree for a node \$x\$ matching that condition, stopping when we can't find any more such nodes. Formally:

Eliminate redundancies from \$M\$ as described above.
For each non-leaf node \$x\$ in the tree:
If \$C(x) ⊆ M\$ then:
Update \$M\$ to \$(M \setminus C(x)) ∪ \{x\}\$
Go to step 2.

What's the complexity of this algorithm? Step 1 takes time \$O(n)\$. Step 3 takes \$O(m)\$ (since \$M\$ stays the same or gets smaller as the algorithm progresses) and so does step 4 (for the same reason), so each loop 2–5 over the nodes in the tree takes time \$O(mn)\,ドル and we run that loop \$O(n)\$ times. The overall complexity is thus \$O(mn^2)\$.

4. Looking at the nodes in the right order

How can we improve this algorithm? It would be nice not to have to restart the loop over the nodes in the tree each time we update \$M\$. The reason we restart the loop is because updating \$M\$ might make a compaction available that we already considered and rejected. So we have to start over from the beginning so that we don't miss any compactions.

But if we looked at the nodes in the right order: that is, if we always looked at the children of \$x\$ before looking at \$x\,ドル then this couldn't happen. This order of traversing a tree is known as post-order.

So the revised algorithm looks like this:

Eliminate redundancies from \$M\$.
For each non-leaf node \$x\$ in post-order:
If \$C(x) ⊆ M\$ then:
Update \$M\$ to \$(M \setminus C(x)) ∪ \{x\}\$

Here the loop 2–4 takes \$O(mn)\$ as before, but we only execute it once, for an overall complexity of \$O(mn)\$.

5. Speeding up the subset test

The step that is contributing the most to the overall complexity is the subset test \$C(x) ⊆ M\,ドル which is executed \$O(n)\$ times and costs \$O(m)\$. How could we speed this up? The insight here is that all we need to know is the number of items in \$C(x) ∩ M\,ドル because when \$|C(x) ∩ M| = |C(x)|\$ then \$C(x) ⊆ M\$. So we can maintain a count for each node, of the number of its children that are in \$M\,ドル and just compare the counts. This results in the following algorithm:

Eliminate redundancies from \$M\$.
Initialize a map \$K\$ from nodes to non-negative integers such that \$K(x) = |C(x) ∩ M|\,ドル the number of children of \$x\$ that are in the set of nodes to be compacted.
For each non-leaf node \$x\$ in post-order:
If \$K(x) = |C(x)|\$ then:
Update \$M\$ to \$(M \setminus C(x)) ∪ \{x\}\$
Update \$K(P(x))\$ to \$K(P(x)) + 1\$

Here step 1 takes \$O(n)\,ドル step 2 takes \$O(m)\,ドル the loop 3–6 is executed \$O(n)\$ times, steps 4 and 6 take \$O(1)\$ each time they are executed, so \$O(n)\$ overall, and step 5 takes \$O(n)\$ overall since each node can be removed from \$M\$ at most once and added to \$M\$ at most once. So the overall complexity is \$O(n)\$.

Gareth Rees Gareth Rees 50.1k3 gold badges130 silver badges210 bronze badges · Answer 1 · 2014-08-29 17:40:23Z

This is, strictly speaking, off-topic for Code Review, because it's a question of algorithm design, not one of implementation. But I'll try to explain how I would approach it.

1. Notation

In order to describe an algorithm and its performance, we need some notation. So:

\$n\$ is the number of nodes in the tree;
\$M\$ is the set of nodes we want to compact;
\$m = |M|\,ドル the initial number of nodes in \$M\$;
\$C(x)\$ is the set of children of the node \$x\$ in the tree.
\$P(x)\$ is the parent of the node \$x\$ in the tree.

I'm going to assume that I can test for membership in a set in amortized time \$O(1)\$ and add or remove an element from a set in amortized time \$O(1)\$. This means we need to use Python's set data structure.

2. Eliminate redundancies

The set \$M\$ might be redundant: namely it might contain nodes \$x\$ and \$y\$ such that \$y\$ is an ancestor of \$x\$ (that is, \$y = P(x)\$ or \$y = P(P(x))\$ or ...). All such \$x\$ need to be removed from \$M\$. For example, the set \$\{B, C, D, J\}\$ is redundant because \$B = P(P(J))\$ and so \$J\$ can be removed.

This can be done in one depth-first traversal over the tree:

Initialize \$k\$ to 0.
On entering a node \$x ∈ M\$: if \$k = 0\,ドル set \$k = 1\,ドル otherwise remove \$x\$ from \$M\$.
On leaving a node \$x ∈ M\$: set \$k = 0\$.

This visits each node once and so takes time \$O(n)\$.

3. The naïve algorithm

Any algorithm for this problem is going to have to find nodes all of whose children are in the set of nodes to be compacted (that is, to find nodes \$x\$ where \$C(x) ⊆ M\$), and update the set of nodes accordingly.

So the first algorithm that springs to mind is the simple one in which we repeatedly search the whole tree for a node \$x\$ matching that condition, stopping when we can't find any more such nodes. Formally:

Eliminate redundancies from \$M\$ as described above.
For each non-leaf node \$x\$ in the tree:
If \$C(x) ⊆ M\$ then:
Update \$M\$ to \$(M \setminus C(x)) ∪ \{x\}\$
Go to step 2.

What's the complexity of this algorithm? Step 1 takes time \$O(n)\$. Step 3 takes \$O(m)\$ (since \$M\$ stays the same or gets smaller as the algorithm progresses) and so does step 4 (for the same reason), so each loop 2–5 over the nodes in the tree takes time \$O(mn)\,ドル and we run that loop \$O(n)\$ times. The overall complexity is thus \$O(mn^2)\$.

4. Looking at the nodes in the right order

How can we improve this algorithm? It would be nice not to have to restart the loop over the nodes in the tree each time we update \$M\$. The reason we restart the loop is because updating \$M\$ might make a compaction available that we already considered and rejected. So we have to start over from the beginning so that we don't miss any compactions.

But if we looked at the nodes in the right order: that is, if we always looked at the children of \$x\$ before looking at \$x\,ドル then this couldn't happen. This order of traversing a tree is known as post-order.

So the revised algorithm looks like this:

Eliminate redundancies from \$M\$.
For each non-leaf node \$x\$ in post-order:
If \$C(x) ⊆ M\$ then:
Update \$M\$ to \$(M \setminus C(x)) ∪ \{x\}\$

Here the loop 2–4 takes \$O(mn)\$ as before, but we only execute it once, for an overall complexity of \$O(mn)\$.

5. Speeding up the subset test

The step that is contributing the most to the overall complexity is the subset test \$C(x) ⊆ M\,ドル which is executed \$O(n)\$ times and costs \$O(m)\$. How could we speed this up? The insight here is that all we need to know is the number of items in \$C(x) ∩ M\,ドル because when \$|C(x) ∩ M| = |C(x)|\$ then \$C(x) ⊆ M\$. So we can maintain a count for each node, of the number of its children that are in \$M\,ドル and just compare the counts. This results in the following algorithm:

Eliminate redundancies from \$M\$.
Initialize a map \$K\$ from nodes to non-negative integers such that \$K(x) = |C(x) ∩ M|\,ドル the number of children of \$x\$ that are in the set of nodes to be compacted.
For each non-leaf node \$x\$ in post-order:
If \$K(x) = |C(x)|\$ then:
Update \$M\$ to \$(M \setminus C(x)) ∪ \{x\}\$
Update \$K(P(x))\$ to \$K(P(x)) + 1\$

Here step 1 takes \$O(n)\,ドル step 2 takes \$O(m)\,ドル the loop 3–6 is executed \$O(n)\$ times, steps 4 and 6 take \$O(1)\$ each time they are executed, so \$O(n)\$ overall, and step 5 takes \$O(n)\$ overall since each node can be removed from \$M\$ at most once and added to \$M\$ at most once. So the overall complexity is \$O(n)\$.

Stack Exchange Network

Interview task to efficiently find compact representation of nodes in tree and find big O time

1 Answer 1

1. Notation

2. Eliminate redundancies

3. The naïve algorithm

4. Looking at the nodes in the right order

5. Speeding up the subset test

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Interview task to efficiently find compact representation of nodes in tree and find big O time

1 Answer 1

1. Notation

2. Eliminate redundancies

3. The naïve algorithm

4. Looking at the nodes in the right order

5. Speeding up the subset test

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions