The Wikipedia page on Disjoint-Set data structures presents \4ドル\$ distinct algorithms for finding the root node of the tree, and \2ドル\$ distinct algorithms for performing the union operation. In this post, I will present all \4ドル * 2\$ combinations:
com.github.coderodde.util.disjointset.DisjointSet
:
package com.github.coderodde.util.disjointset;
import java.util.HashMap;
import java.util.Map;
import java.util.Objects;
/**
* This class implements the
* <a href="https://en.wikipedia.org/wiki/Disjoint-set_data_structure">
* disjoint-set data structure</a>.
*
* @param <E> the satellite data type.
*
* @author Rodion "rodde" Efremov
* @version 1.6 (Sep 5, 2021)
* @since 1.6 (Sep 5, 2021).
*/
public final class DisjointSet<E> {
private final Map<E, DisjointSetNode<E>> disjointSetMap = new HashMap<>();
/**
* The disjoint set root finder.
*/
private final AbstractDisjointSetRootFinder<E> disjointSetRootFinder;
/**
* The disjoint set operation provider.
*/
private final AbstractDisjointSetUnionComputer<E> disjointSetUnionComputer;
/**
* Constructs a disjoint-set data structure with specific operation
* implementations.
*
* @param disjointSetRootFinder the root finder operation implementation
* object.
*
* @param disjointSetUnionComputer the union operation implementation
* object.
*/
public DisjointSet(AbstractDisjointSetRootFinder<E> disjointSetRootFinder,
AbstractDisjointSetUnionComputer<E> disjointSetUnionComputer) {
this.disjointSetRootFinder =
Objects.requireNonNull(
disjointSetRootFinder,
"The input DisjointSetRootFinder is null.");
this.disjointSetUnionComputer =
Objects.requireNonNull(
disjointSetUnionComputer,
"The input DisjointSetUnionComputer is null.");
this.disjointSetRootFinder.ownerDisjointSet = this;
this.disjointSetUnionComputer.ownerDisjointSet = this;
}
/**
* Finds the root of the tree to which {@code item} belongs to.
*
* @param item the target item.
* @return the root of the tree to which {@code item} belongs to.
*/
public E find(E item) {
return disjointSetRootFinder.find(item);
}
/**
* Unites the two trees into one, unless the two items already belong to the
* same tree.
*
* @param item1 the first item.
* @param item2 the second item.
*/
public void union(E item1, E item2) {
disjointSetUnionComputer.union(item1, item2);
}
DisjointSetNode<E> find(DisjointSetNode<E> node) {
if (node == node.getParent()) {
return node;
}
return find(node.getParent());
}
DisjointSetNode<E> getNode(E item) {
DisjointSetNode<E> node = disjointSetMap.get(item);
if (node == null) {
node = new DisjointSetNode<E>(item);
disjointSetMap.put(item, node);
}
return node;
}
}
com.github.coderodde.util.disjointset.DisjointSetNode
:
package com.github.coderodde.util.disjointset;
/**
* This class implements the nodes of the disjoint-set data structures.
*
* @param <E> the satellite data type.
*
* @author Rodion "rodde" Efremov
* @version 1.6 (Sep 4, 2021)
* @since 1.6 (Sep 4, 2021)
*/
final class DisjointSetNode<E> {
/**
* The actual item being held.
*/
private final E item;
/**
* The parent node of this node.
*/
private DisjointSetNode<E> parent;
/**
* The rank of this node. The rank of a node is its maximum height from any
* leaf node.
*/
private int rank;
/**
* The size of this node. The size of a node is the number of nodes under
* it.
*/
private int size;
public DisjointSetNode(E item) {
this.item = item;;
this.parent = this;
}
public E getItem() {
return item;
}
public DisjointSetNode<E> getParent() {
return parent;
}
public int getRank() {
return rank;
}
public int getSize() {
return size;
}
@Override
public String toString() {
return "[" + item + "]";
}
void setParent(DisjointSetNode<E> parent) {
this.parent = parent;
}
void setRank(int rank) {
this.rank = rank;
}
void setSize(int size) {
this.size = size;
}
}
com.github.coderodde.util.disjointset.AbstractDisjointSetRootFinder
:
package com.github.coderodde.util.disjointset;
/**
* This abstract class defines the API for for root finding algorithms in a
* disjoint-set data structure.
*
* @param <E> the satellite data type.
*
* @author Rodion "rodde" Efremov
* @version 1.6 (Sep 5, 2021)
* @since 1.6 (Sep 5, 2021)
*/
public abstract class AbstractDisjointSetRootFinder<E> {
DisjointSet<E> ownerDisjointSet;
/**
* If {@code item} belongs to a tree, returns the root of that three.
* Otherwise, a trivial, empty tree {@code T} is created, and {@code item}
* is added to {@code T}.
*
* @param item the query item.
* @return the root of the tree to which {@code item} belongs to.
*/
public abstract E find(E item);
}
com.github.coderodde.util.disjointset.AbstractDisjointSetUnionComputer
:
package com.github.coderodde.util.disjointset;
/**
* This abstract class defines the API for the union computer algorithms in a
* disjoint-set data structure.
*
* @param <E> the satellite date type.
*
* @author Rodion "rodde" Efremov
* @version 1.6 (Sep 5, 2021)
* @since 1.6 (Sep 5, 2021)
*/
public abstract class AbstractDisjointSetUnionComputer<E> {
DisjointSet<E> ownerDisjointSet;
/**
* If both {@code item1} and {@code item2} belong to the same tree, does
* nothing. Otherwise, unites the two into a single tree.
*
* @param item1 the first item.
* @param item2 the second item;
*/
public abstract void union(E item1, E item2);
}
com.github.coderodde.util.disjointset.DisjointSetIterativePathCompressionNodeFinder
:
package com.github.coderodde.util.disjointset;
/**
* This class implements the root finding algorithm for the disjoint-set data
* structure using iterative path compression.
*
* @param <E> the satellite data type.
*
* @author Rodion "rodde" Efremov
* @version 1.6 (Sep 5, 2021)
* @since 1.6 (Sep 5, 2021)
*/
public final class DisjointSetIterativePathCompressionNodeFinder<E>
extends AbstractDisjointSetRootFinder<E> {
/**
* {@inheritDoc }
*/
@Override
public E find(E item) {
DisjointSetNode<E> node =
ownerDisjointSet.find(ownerDisjointSet.getNode(item));
DisjointSetNode<E> root = node;
while (root.getParent() != root) {
root = root.getParent();
}
while (node.getParent() != root) {
DisjointSetNode<E> parent = node.getParent();
node.setParent(root);
node = parent;
}
return root.getItem();
}
}
com.github.coderodde.util.disjointset.DisjointSetRecursivePathCompressionNodeFinder
:
package com.github.coderodde.util.disjointset;
/**
* This class implements the root finding algorithm for the disjoint-set data
* structure using recursive path compression.
*
* @param <E> the satellite data type.
*
* @author Rodion "rodde" Efremov
* @version 1.6 (Sep 5, 2021)
* @since 1.6 (Sep 5, 2021)
*/
public final class DisjointSetRecursivePathCompressionNodeFinder<E>
extends AbstractDisjointSetRootFinder<E> {
/**
* {@inheritDoc }
*/
@Override
public E find(E item) {
DisjointSetNode<E> node =
ownerDisjointSet.find(ownerDisjointSet.getNode(item));
if (node == node.getParent()) {
return node.getItem();
}
node.setParent(ownerDisjointSet.find(node.getParent()));
return node.getParent().getItem();
}
}
com.github.coderodde.util.disjointset.DisjointSetPathHalvingNodeFinder
:
package com.github.coderodde.util.disjointset;
/**
* This class implements the root finding algorithm for the disjoint-set data
* structure using path halving.
*
* @param <E> the satellite data type.
*
* @author Rodion "rodde" Efremov
* @version 1.6 (Sep 5, 2021)
* @since 1.6 (Sep 5, 2021)
*/
public final class DisjointSetPathHalvingNodeFinder<E>
extends AbstractDisjointSetRootFinder<E> {
/**
* {@inheritDoc }
*/
@Override
public E find(E item) {
DisjointSetNode<E> node =
ownerDisjointSet.find(ownerDisjointSet.getNode(item));
while (node.getParent() != node) {
node.setParent(node.getParent().getParent());
node = node.getParent();
}
return node.getItem();
}
}
com.github.coderodde.util.disjointset.DisjointSetPathSplittingNodeFinder
:
package com.github.coderodde.util.disjointset;
/**
*
* This class implements the root finding algorithm for the disjoint-set data
* structure using path splitting.
*
* @param <E> the satellite data type.
*
* @author Rodion "rodde" Efremov
* @version 1.6 (Sep 5, 2021)
* @since 1.6 (Sep 5, 2021)
*/
public final class DisjointSetPathSplittingNodeFinder<E>
extends AbstractDisjointSetRootFinder<E> {
@Override
public E find(E item) {
DisjointSetNode<E> node =
ownerDisjointSet.find(ownerDisjointSet.getNode(item));
while (node.getParent() != node) {
DisjointSetNode<E> parent = node.getParent();
node.setParent(parent.getParent());
node = parent;
}
return node.getItem();
}
}
com.github.coderodde.util.disjointset.DisjointSetUnionByRankComputer
:
package com.github.coderodde.util.disjointset;
/**
* This class implements the union operation by rank.
*
* @param <E> the satellite data type.
*
* @author Rodion "rodde" Efremov
* @version 1.6 (Sep 5, 2021)
* @since 1.6 (Sep 5, 2021)
*/
public final class DisjointSetUnionByRankComputer<E>
extends AbstractDisjointSetUnionComputer<E> {
/**
* {@inheritDoc }
*/
@Override
public void union(E item1, E item2) {
DisjointSetNode<E> node1 =
ownerDisjointSet.find(ownerDisjointSet.getNode(item1));
DisjointSetNode<E> node2 =
ownerDisjointSet.find(ownerDisjointSet.getNode(item2));
if (node1 == node2) {
return;
}
if (node1.getRank() < node2.getRank()) {
DisjointSetNode<E> tempNode = node1;
node1 = node2;
node2 = tempNode;
}
node2.setParent(node1);
if (node1.getRank() == node2.getRank()) {
node1.setRank(node1.getRank() + 1);
}
}
}
com.github.coderodde.util.disjointset.DisjointSetUnionBySizeComputer
:
package com.github.coderodde.util.disjointset;
/**
* This class implements the union operation by tree size.
*
* @param <E> the satellite data type.
*
* @author Rodion "rodde" Efremov
* @version 1.6 (Sep 5, 2021)
* @since 1.6 (Sep 5, 2021)
*/
public final class DisjointSetUnionBySizeComputer<E>
extends AbstractDisjointSetUnionComputer<E> {
/**
* {@inheritDoc }
*/
@Override
public void union(E item1, E item2) {
DisjointSetNode<E> node1 =
ownerDisjointSet.find(ownerDisjointSet.getNode(item1));
DisjointSetNode<E> node2 =
ownerDisjointSet.find(ownerDisjointSet.getNode(item2));
if (node1 == node2) {
return;
}
if (node1.getSize() < node2.getSize()) {
DisjointSetNode<E> tempNode = node1;
node1 = node2;
node2 = tempNode;
}
// Here, node1.getSize() >= node2.getSize()
node2.setParent(node1);
node1.setSize(node1.getSize() + node2.getSize());
}
}
Repository
The entire story is here.
The typical benchmark output might be:
Seed: 1630845583522
The benchmark graph is built in 222 milliseconds.
................................................................................
Root finder: DisjointSetRecursivePathCompressionNodeFinder, union computer: DisjointSetUnionByRankComputer
Duration: 149 ms. Total edges: 199997. Total weight: 36417.94367839121.
................................................................................
Root finder: DisjointSetRecursivePathCompressionNodeFinder, union computer: DisjointSetUnionBySizeComputer
Duration: 4670 ms. Total edges: 199997. Total weight: 36417.94367839121.
................................................................................
Root finder: DisjointSetIterativePathCompressionNodeFinder, union computer: DisjointSetUnionByRankComputer
Duration: 153 ms. Total edges: 199997. Total weight: 36417.94367839121.
................................................................................
Root finder: DisjointSetIterativePathCompressionNodeFinder, union computer: DisjointSetUnionBySizeComputer
Duration: 4649 ms. Total edges: 199997. Total weight: 36417.94367839121.
................................................................................
Root finder: DisjointSetPathHalvingNodeFinder, union computer: DisjointSetUnionByRankComputer
Duration: 157 ms. Total edges: 199997. Total weight: 36417.94367839121.
................................................................................
Root finder: DisjointSetPathHalvingNodeFinder, union computer: DisjointSetUnionBySizeComputer
Duration: 4645 ms. Total edges: 199997. Total weight: 36417.94367839121.
................................................................................
Root finder: DisjointSetPathSplittingNodeFinder, union computer: DisjointSetUnionByRankComputer
Duration: 141 ms. Total edges: 199997. Total weight: 36417.94367839121.
................................................................................
Root finder: DisjointSetPathSplittingNodeFinder, union computer: DisjointSetUnionBySizeComputer
Duration: 4633 ms. Total edges: 199997. Total weight: 36417.94367839121.
................................................................................
1. Root finder: DisjointSetPathSplittingNodeFinder, union computer: DisjointSetUnionByRankComputer, 141 milliseconds.
2. Root finder: DisjointSetRecursivePathCompressionNodeFinder, union computer: DisjointSetUnionByRankComputer, 149 milliseconds.
3. Root finder: DisjointSetIterativePathCompressionNodeFinder, union computer: DisjointSetUnionByRankComputer, 153 milliseconds.
4. Root finder: DisjointSetPathHalvingNodeFinder, union computer: DisjointSetUnionByRankComputer, 157 milliseconds.
5. Root finder: DisjointSetPathSplittingNodeFinder, union computer: DisjointSetUnionBySizeComputer, 4633 milliseconds.
6. Root finder: DisjointSetPathHalvingNodeFinder, union computer: DisjointSetUnionBySizeComputer, 4645 milliseconds.
7. Root finder: DisjointSetIterativePathCompressionNodeFinder, union computer: DisjointSetUnionBySizeComputer, 4649 milliseconds.
8. Root finder: DisjointSetRecursivePathCompressionNodeFinder, union computer: DisjointSetUnionBySizeComputer, 4670 milliseconds.
Critique request
Please, tell me anything that comes to mind. In particular, is there any way to put the implementation classes into an implementation package (perhaps com.github.coderodde.util.disjointset.impl
)?
3 Answers 3
There aren't many best practices on how to package classes, a rule of thumb is to package classes of a module/feature together in a way so that they aren't strongly coupled together. If you were to make a ...disjointset.impl
and a ...disjointset.abstr
pack for example, you'd couple them together strongly since you can't use the implementations without the abstract base classes.
On the other hand, creating something like ...disjointset.finder
and ...disjointset.computer
works if the finders can compile without the computers and vice versa. I might have overlooked something, but they only seem to rely on DisjointSet
and DisjointSetNode
which would stay in the "root" ...disjointset
package so it would work.
The result would look like this:
com/github/coderodde/util/disjointset/
--> computers/
--> finders/
--> DisjointSet.java
--> DisjointSetNode.java
com/github/coderodde/util/disjointset/computers
--> AbstractDisjointSetUnionComputer.java
--> DisjointSetUnionByRankComputer.java
--> DisjointSetUnionBySizeComputer.java
com/github/coderodde/util/disjointset/finders
--> AbstractDisjointSetRootFinder.java
--> DisjointSetIterativePathCompressionRootFinder.java
--> DisjointSetPathHalvingRootFinder.java
--> DisjointSetPathSplittingRootFinder.java
--> DisjointSetRecursivePathCompressionRootFinder.java
You could also keep the package as it is. It's not too big and it all has to do with manipulating a DisjointSet
, so adding further layers might not even be necessary! For comparison, this mp3 library just puts all files into its main package and it has a lot more files.
For some further info, this text on how to divide an app into packages is a very interesting read.
Logic
If you're using Java 8+, in getNode
I think you can simplify the logic using Map#computeIfAbsent
like this
return disjointMapSet.computeIfAbsent(item, () -> new DisjointSetNode<>(item));
Is there a reason not to implement equals
/ hashcode
for the DisjointSetNode
classes? Relying on reference equality seems less easier to reason about than using the equals
method.
In DisjointSetIterativePathCompressionNodeFinder#find
, what is the purpose of
while (root.getParent() != root) {
root = root.getParent();
}
shouldn't ownerDisjointSet.find(ownerDisjointSet.getNode(item))
return the root of the tree or am I misreading this?
In DisjointSetRecursivePathCompressionNodeFinder
, what is the purpose of node == node.getParent()
- won't the if
always be true since DisjointSet#find
only returns when node == node.getParent()
?
Naming and Organization
Is there a reason why DisjointSetNode
is not an inner class
called Node
scoped to the DisjointSet
class itself? It feels like that would narrow the scoping of DisjointSetNode
without having to worry about name-spacing with a prefix like DisjointSet
.
The relationship between the Abstract
classes and the DisjointSet
class seems circular - DisjointSet
relies on these Abstract
classes but then the Abstract
classes rely on the DisjointSet
. I feel like the Abstract
classes are better represented by interface
s (since there's really no shared logic between the implementations of the Abstract
classes except the shared dependency. The only thing these Abstract
classes depend on the DisjointSet
to get are the root
nodes - what if these were passed to the concrete implementations of the Abstract
classes?
Some other minor things are the naming for variables like disjointSetMap
which is probably more descriptive as itemsToNodes
or something of that nature. Or getNode
is more like getOrCreateNode
.
-
\$\begingroup\$ In
DisjointSetIterativePathCompressionNodeFinder.find()
, what is the purpose of [iteratively finding the root]? I think that part implements the first pass of the two-pass algorithm presented in the hyperlinked en.wikipedia article. Now about all those introductory calls toownerDisjointSet.find()
... \$\endgroup\$greybeard– greybeard2021年10月16日 09:48:48 +00:00Commented Oct 16, 2021 at 9:48 -
\$\begingroup\$ (One conceivable reason for
DisjointSetNode
not being an inner class ofDisjointSet
is that it doesn't need to know its instantiating instance - that leaves the question: Why not nested?) \$\endgroup\$greybeard– greybeard2021年10月16日 09:57:48 +00:00Commented Oct 16, 2021 at 9:57
Jae Bradley raises a very valid concern:
The use of ownerDisjointSet.find()
on the node found by owner....getNode(item)
prevents the differences in root finding algorithms to have any effect.
To compound to the problem, union computation does *not use disjointSetRootFinder
, but ownerDisjointSet.find()
.
From one step back, you
- try multiple inheritance without language support using manual double dispatch.
- miss the baseline cases:
- no path shortening
- union by seat-of-pants (first/second/random item gets to be parent)
Many problems start with the way we think about things, intimately related to naming.
And one crux in this regard is unhelpful established names such as
disjoint-set data structure,
union–find data structure,
disjoint-set forest - every one of them too long (and an indication that a concept may not (yet) be optimally formed&established). While DisjointSets
wouldn't be wrong, I'd slightly prefer PartitionedSet
.
One thing that doesn't sit well with me is DisjointSetNode
sporting an int rank
and an int size
at the same time.
What if we ditch ...Node
?
- Have a Map not from item(payload) to
...Node
, but to parent (item).- asked for "the" representative of the set
- an unknown item i is in, return i
- for known items, do "the walk"
- union of sets specified by two items as usual
- by seat-of-pants (use with path shortening, only)
(a pity Java does not provide an attractive way to return more than one value - the number of steps taken in finding each representative would open an interesting can of worms) - for by-rank or by-size (do you want to update these in case of path shortening?)
have Mapsitem2size
&item2rank
,get()
overridden usinggetOrDefault()
suitably instead of creating entries for leaves, initially null.
On first access, instantiate appropriate Map.
On subsequent accesses, check other Map for being null.
- by seat-of-pants (use with path shortening, only)
- asked for "the" representative of the set
My benchmark results using a 1.8 JRE stopped to look arbitrary after urging to collect garbage before duration = runKruskalsAlgorithm(rootFinder, unionComputer)
(From GitHub repository, not presented for review.
(findMinimumSpanningTree(graph, weightFunction)
is odd for constructing the result not from its minimumSpanningTree
.))
Explore related questions
See similar questions with these tags.
abstract class
tointerface
for[defining] the API
? \$\endgroup\$RootFinder
to find the root disjoint-set forest node. (How did you build confidence the intended ones got called?) \$\endgroup\$