(See the previous version here.)
(See the next version here.)
This time, I have incorporated all the suggestions made by Roman; my new version follows.
package com.github.coderodde.codereview.notepad;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collection;
import java.util.Collections;
import java.util.List;
/**
* This class implements a method for computing clusterness measure (CM). CM
* asks for a set of points {@code x_1, x_2, ..., x_n}, and it returns a
* number within {@code [0, 1]}. The idea behind CM is that it returns 0 when
* all the data points are equal and 1 whenever all adjacent points are
* equidistant.
*
* @author Rodion "rodde" Efremov
* @version 1.61 (Jun 10, 2019)
* @since 1.6 (Feb 21, 2019)
*/
public final class ClusternessMeasure {
public double computeClusternessMeasure(Collection<Double> points) {
return computeClusternessMeasure(new SortedMeasurementPoints(points));
}
public double computeClusternessMeasure(SortedMeasurementPoints points) {
double minimumPoint = points.get(0);
double maximumPoint = points.get(points.indexOfLastEntry());
double range = maximumPoint - minimumPoint;
if (range == 0.0) {
// Once here, all data points are equal and so CM must be 1.0.
return 1.0;
}
double expectedDifference = range / points.indexOfLastEntry();
double sum = 0.0;
for (int i = 0; i < points.indexOfLastEntry(); i++) {
double currentDifference = points.get(i + 1) - points.get(i);
sum += Math.min(
Math.abs(expectedDifference - currentDifference),
expectedDifference);
}
return sum / range;
}
public static final class SortedMeasurementPoints {
private final List<Double> points;
public SortedMeasurementPoints(Collection<Double> points) {
List<Double> pointsAsList = new ArrayList<>(points);
Collections.sort(pointsAsList);
this.points = pointsAsList;
}
public double get(int index) {
return points.get(index);
}
public int indexOfLastEntry() {
return points.size() - 1;
}
}
public static void main(String[] args) {
ClusternessMeasure cm = new ClusternessMeasure();
// CM = 0.0
System.out.println(
cm.computeClusternessMeasure(
Arrays.asList(1.0, 2.0, 3.0, 4.0, 5.0)));
// CM = 1.0
System.out.println(
cm.computeClusternessMeasure(
Arrays.asList(2.0, 2.0, 2.0)));
System.out.println(
cm.computeClusternessMeasure(
Arrays.asList(1.0, 2.0, 3.0, 5.0, 4.0, 6.0, 7.1)));
System.out.println(
cm.computeClusternessMeasure(
Arrays.asList(1.0, 1.2, 3.0, 3.0, 3.0)));
}
}
Critique request
I would like to receive the comments regarding both the code and idea. Thank you in advance.
1 Answer 1
Tests
Unit tests are better than printing to the console. One example:
@Test
public void testReturn0WhenEquidistantPoints() {
ClusternessMeasure cm = new ClusternessMeasure();
List<Double> points = List.of(1.0, 2.0, 3.0, 4.0, 5.0);
double expected = 0.0;
double delta = 0.0;
double actual = cm.computeClusternessMeasure(points);
assertEquals(expected, actual, delta);
}
In this way, there is no need to check if the code prints the correct output to the console, Junit will take care of it.
Static
From what I see, ClusternessMeasure
holds no state, so its methods could be static. You could have something like:
double result = ClusternessMeasure.compute(points);
Streams
Depending on the version of Java you plan to use, in this part:
public SortedMeasurementPoints(Collection<Double> points) {
List<Double> pointsAsList = new ArrayList<>(points);
Collections.sort(pointsAsList);
this.points = pointsAsList;
}
you could use Streams:
public SortedMeasurementPoints(Collection<Double> points) {
this.points = points.stream().sorted()
.collect(Collectors.toUnmodifiableList());
}
It is also handy that the result will be an unmodifiable list since points
don't need to change.
Design
The class ClusternessMeasure
exposes two public methods:
double computeClusternessMeasure(Collection<Double> points)
double computeClusternessMeasure(SortedMeasurementPoints points)
Both give the same result, but the second requires an instance of SortedMeasurementPoints
. The caller (likely) won't already have an instance of it so it needs to study that class to know wheater he needs it or not. Finally, realizing that passing an unsorted collection is the same as wrapping it in a SortedMeasurementPoints
. For this reason, I think that the second method is an unnecessary burden for the caller, and it can be private.
Input validation
List<Double> buggyList = Arrays.asList(1.0,null);
cm.computeClusternessMeasure(buggyList); // NullPointerException
Explore related questions
See similar questions with these tags.