I am looking for a way to split a tree into $k$ clusters so that the cluster with largest diameter is as small as possible. All edges have the same length. I'm hoping for an algorithm that can handle arbitrary trees (with arbitrary branching factor), not just binary trees.
-
$\begingroup$ Welcome to CS.SE! What have you tried? What approaches have you considered? What are you looking for? Provable worst-case asymptotic running time, or efficiency in practice? How will you evaluate answers? We expect you to make a significant effort before asking here, and to show us what you came up with or where you got stuck. I'd discourage bare problem statements where you just give the requirements, without showing us any context, motivation, or indication of what you've tried or what your specific issue is. If your question is just 2 sentences, that's often a signal to expand on it. $\endgroup$D.W.– D.W. ♦2015年08月17日 02:50:15 +00:00Commented Aug 17, 2015 at 2:50
-
$\begingroup$ Hello. Actually I haven't tried anything except for looking for a answer online. From my experience my algorithms are very far from optimal and they usually don't work correctly for all cases so I just admitted to myself that I am powerless when it comes to creating them. Also most of algorithms I need already exist, I just don't know about them, that's why I asked here. $\endgroup$Bartosz– Bartosz2015年08月17日 07:21:01 +00:00Commented Aug 17, 2015 at 7:21
-
$\begingroup$ OK. Do you have a binary tree, or do you need to be able to handle arbitrary trees with arbitrary branching factor? Looks like dynamic programming should handle the former (and maybe the latter, but I'm not sure about that). $\endgroup$D.W.– D.W. ♦2015年08月17日 15:47:20 +00:00Commented Aug 17, 2015 at 15:47
-
$\begingroup$ I need to handle arbitrary trees. And thank you for your interest. $\endgroup$Bartosz– Bartosz2015年08月18日 09:42:37 +00:00Commented Aug 18, 2015 at 9:42
2 Answers 2
Your problem is equivalent to the following:
Input: a tree $T,ドル an integer $r$ (the radius)
Find: a set $S$ of $k$ vertices, such that every vertex in the tree is at distance $\le r$ from one of the vertices in $S$; or failure if no such set exists
Why is this equivalent? Well, if you have such a set $S,ドル then you immediately can find $k$ clusters of diameter $\le 2r$ (for each $s \in S,ドル you get a cluster of nodes that are at distance $\le r$ from $s$). Conversely, if you have $k$ clusters of diameter $\le 2r,ドル then you can find such a set $S$: given a cluster, pick a pair of nodes $x,y$ in the cluster that maximizes $d(x,y)$; then find a shortest path from $x$ to $y,ドル and let $s$ be a node halfway along that shortest path; then one can prove every node in the cluster is at distance $\le r$ from $s$. So, if you can solve the above problem, then linear search or binary search on $r$ will find the smallest such diameter.
The best algorithm I can find for this problem is the trivial $O(n^{k+1})$ algorithm: enumerate all subsets of size $k,ドル and test each to see whether it satisfies all the requirements. The number of sets you must enumerate is $C(n,k) = O(n^k),ドル and given a candidate set you can test whether it is valid in $O(n)$ time. Unfortunately, for modest or large values of $k,ドル this will be totally impractical.
As an optimization, you can let $S'$ be the set of vertices that are at height $\ge r$. (The height of a vertex $v$ is the length of the longest downward path from $v$ to a leaf; in other words, the height of a leaf is 0ドル,ドル and the height of an internal node is one more than the height of its highest child.) Then, you only need to enumerate subsets of $S'$ of size $k$. In practice, this might provide a significant improvement, even though asymptotically it might not make much of a difference to the worst-case running time.
I don't know whether it is possible to do better, or whether a polynomial-time algorithm exists. I think the problem can be solved in polynomial time for binary trees using dynamic programming, but you want to handle arbitrary trees, and I don't know what happens in that case.
An algorithm for the equivalent problem denoted by D.W. :
The problem :
Input: a tree $T,ドル an integer $r$ (the radius)
Find: a set $S$ of $k$ vertices, such that every vertex in the tree is at distance $\le r$ from one of the vertices in $S$; or failure if no such set exists
The algorithm :
- Sort the nodes by decreasing depth using a BFS.
- Pick the first one (a leaf).
- Go up in the tree to its $r^{th}$ ancester (the parent of the parent ... of the leaf) and add this ancester to $S$. The parent of the root is the root itself.
- Remove (both from the tree and from the sorted container) all the nodes covered by this new cluster center.
- If none of the nodes are remaining then return $S$. Else, if $|S| = k$ then return failure.
- Go to 2.
Correctness :
Since the algorithm adds at most $k$ centers and since a node can not be removed if it is not covered by a center, the algorithm returns a failure when it has to return a failure.
Then, when it has to return a solution. The size of $S$ is still increasing and will reach $k,ドル so it's enough to prove by reccurence that $S$ is always a subset of a solution, while the algorithm is running.
At the beginning $S$ is empty and a solution exists by hypothesis. Then, assume that, at some point in the algorithm, $S$ is a subset of a solution $S^*$. Let's prove that we can still reach a solution after adding a new center in $S$.
Let $u$ be the node chosen in the step 2. Let $v$ be the node chosen in the step 3, $v$ is the next center which will be added. Whatever are the centers in $S^*,ドル $u$ has to be covered by at least one of them, let's denote it $w$. The centers selected before don't cover $u,ドル else $u$ would not be still in the tree. Thus $w$ is in the remaining nodes and not in $S$ and there is only 2 cases :
If $w$ is a descendant of $v,ドル since $v$ has the max depth among the remaining nodes, $w$ cover a subset of the remaining nodes which are covered by $v$. Thus $( S^* \setminus \{ w \} ) \cup \{ v \}$ is also a solution and we can reach it from $S \cup \{ v \}$ since $S \cup \{ v \} \subset ( S^* \setminus \{ w \} ) \cup \{ v \}$.
Else, $w$ is further from $u$ than $v,ドル so the distance between $u$ and $w$ is longer than $r$ and $w$ is too far for covering $u,ドル it's absurd.
Hence the reccurence holds and we can conclude that the greedy choice of the centers lead to a solution that the algorithm returns.
Complexity
Linear : the step 1 is a BFS and then each node is seen and removed once at most. Including the binary search on $r,ドル we get an $O(n \log n)$ algorithm for the problem of the question.
In addition, the height is an upper bound for the minimal $r,ドル hence for trees which have $O(\log n)$ height we have a $O(n \log \log n)$ algorithm for the problem of the question.
-
1$\begingroup$ Beautiful. In case anyone was thinking that step 1 needs a radix/bucket sort to stay O(n): a DFS or BFS directly gives all node depths, so explicit "sorting" isn't needed. $\endgroup$j_random_hacker– j_random_hacker2015年08月19日 14:56:43 +00:00Commented Aug 19, 2015 at 14:56
-
$\begingroup$ @j_random_hacker Good idea, added. I have also extended the sketch of proof. $\endgroup$François– François2015年08月19日 16:44:14 +00:00Commented Aug 19, 2015 at 16:44