I've written a class for a project of mine whose responsibility is to perform basic statistic calculations like mean, median and mode. The aim is for it to be a helper class, so it should not be instantiable. I later want to extend it to include weighted mean and range/interquartile range as well as variance.
Here is the code:
import java.util.Arrays;
import java.util.HashMap;
public class BasicStatistics {
//Prevent instantiation...
//throws UnsupportedOperationException to defend against reflection
private BasicStatistics() throws UnsupportedOperationException{
throw new UnsupportedOperationException();
}
public static class Mode {
private static int mode;
private static HashMap<Integer, Integer> hm = new HashMap<Integer, Integer>();
private Mode() throws UnsupportedOperationException{
throw new UnsupportedOperationException();
}
public static int calculateMode(int ints[]){
int max = 0;
int temp = 0;
for (int i = 0; i < ints.length; i++){
if (hm.get(ints[i]) != null){
int count = hm.get(ints[i]);
count += 1;
hm.put(ints[i], count);
if (count > max){
max = count;
temp = ints[i];
}
} else {
hm.put(ints[i], 1);
}
}
if (temp == 0){
mode = ints[0];
} else {
mode = temp;
}
return mode;
}
}
public static class Mean {
private static float mean;
private Mean() throws UnsupportedOperationException{
throw new UnsupportedOperationException();
}
public static float calculateMean(int ints[]){
for (int element : ints){
mean += element;
}
return mean / ints.length;
}
}
public static class Median {
private static float median;
private Median() throws UnsupportedOperationException{
throw new UnsupportedOperationException();
}
public static float calculateMedian(int ints[]){
Arrays.sort(ints);
if (ints.length % 2 == 0){
median = (float) (ints[ints.length / 2] + ints[(ints.length / 2) - 1]);
median /= 2;
} else {
median = (float) ints[ints.length / 2];
}
return median;
}
}
}
-
1\$\begingroup\$ Welcome to Code Review! I hope you get some great answers. \$\endgroup\$Phrancis– Phrancis2016年08月20日 03:10:16 +00:00Commented Aug 20, 2016 at 3:10
-
1\$\begingroup\$ Please see "What you can and cannot do after receiving answers. If you're looking for more feedback, then what you want to do is post a new question with the revised code. \$\endgroup\$mdfst13– mdfst132016年08月21日 13:12:35 +00:00Commented Aug 21, 2016 at 13:12
2 Answers 2
Don't overdo Single Responsibility Principle
You have four classes, three of which calculate the mean, median, and mode respectively. The thing is though that their responsibility doesn't match their names. They have responsibility over the data, which they then express as the statistics. But if all the Mode
is responsible for is the mode, then it shouldn't know about the data. All it should know about is the Mode
itself.
Consider instead making one class with multiple methods. Then the class can have the data as a member.
Don't overuse static
private static int mode; private static HashMap<Integer, Integer> hm = new HashMap<Integer, Integer>();
This means that every single instance of this class has the same mode
and hm
.
private int mode;
private Map<Integer, Integer> counts = new HashMap<>();
I also changed HashMap
to Map
on the left. Usually you make types the interface rather than the implementation. That ensures that you can change the implementation easily later if desired.
I changed hm
to counts
as being more descriptive.
The <>
notation will only work in newer versions of Java but saves typing when available.
If you need access to fields, then you should instantiate the class and access the fields through normal methods. E.g.
StatisticsCalculator statCalc = new StatisticsCalculator(data);
statCalc.calculate();
Set<Integer> modes = getModes();
Now you can create a second object with different data without losing the first. Or worse, merging with it.
The presumption here is that calculate
would calculate all the stats and you could just access them afterwards. If you always use them together, this is more efficient. You could also calculate them on a just-in-time basis.
The Map
is unnecessary
First, you don't need to make the Map
a class field. It would be fine as a local variable.
Second, since you sort the data to get the median anyway, you might as well use the sorted data to get the mode as well. With sorted data, you can simply track the mode of the data that you've seen.
private Set<Integer> calculateMode() {
Set<Integer> modes = new HashSet<>();
int largestCount = 0;
int count = 0;
int previous = sortedData[0];
for (int datum : sortedData) {
if (previous == datum) {
count++
} else {
count = 1;
}
if (count > largestCount) {
largestCount = count;
modes.clear();
modes.add(datum);
} else if (count == largestCount) {
modes.add(datum);
}
}
return modes;
}
This version returns a Set
, since there is no guarantee that there will be a unique mode.
This method is private
because it requires sorted data. A public
version could check if the data was sorted prior to calling the private
version.
I renamed temp
to previous
as being more descriptive.
Because sorting clumps all the like values together, we don't need to store anything but the current and largest counts.
Saying count++
is more idiomatic than count += 1
.
I don't verify that there is at least one element in the data. You might want to throw
an exception in that case. As is, both your original version and this version would crash, as they attempt to access the first element. It's possible that the correct place to do that validation is outside this method.
What if there's a 0 in the data?
if (temp == 0){ mode = ints[0];
If the correct answer is a 0 that is not in the first position, this will return the wrong answer.
Consider making a calculateSum
method
If you had a calculateSum
method, you could make calculateMean
a one liner. And report the sum of the data if you wanted.
-
\$\begingroup\$ Thanks for your detailed response. I've put my new code into the OP. Do you have any suggestions on how I could make this class extensible so it could work on number types other than integers, for example floats or BigDecimals? \$\endgroup\$PythonNewb– PythonNewb2016年08月21日 11:26:43 +00:00Commented Aug 21, 2016 at 11:26
- Throwing an
UnsupportedOperationException
looks like overdoing it. In normal code, your classes cannot be instantiated, and whoever does it by reflection doesn't gain anything from it. Therefore, there is no point in protecting against it. - Prefer
double
overfloat
, since it is more precise. - Don't store intermediate results from integer operations in a floating-point variables (
Mean.mean
). This will lead to wrong results as soon as the sum of the ints gets over 17 millions. - Is there a chance that the given
int
arrays have length 0? Then, protect againstArrayIndexOutOfBoundsException
(incalculateMedian
).