I am trying to make general class to find mode. I am looking for some general feedback on how I can improve the structure and efficiency of my code.
package analysis.statistic;
import java.util.Arrays;
import java.util.Comparator;
import java.util.HashMap;
import java.util.Map;
import java.util.Objects;
public class Mode {
/**
* Return objects which appears most often.
*
* @return Map<object,countAppears> most appears objects.
*/
@SuppressWarnings("unchecked")
public static <T> Map<T, Integer> mode(T... objects) {
Objects.requireNonNull(objects, "objects must not be null");
if (objects.length == 0) {
return new HashMap<>();
}
Arrays.sort(objects);
ModeCalc<T> calc = new ModeCalc<T>(objects[0]);
for (T t : objects) {
calc.checkMaxAppears(t);
}
return calc.getMode();
}
/**
* Work like {@link #mode(Object...)}.
*/
@SuppressWarnings("unchecked")
public static <T> Map<T, Integer> mode(Comparator<? super T> c, T... objects) {
Objects.requireNonNull(objects, "objects must not be null");
if (objects.length == 0) {
return new HashMap<>();
}
Arrays.sort(objects, c);
ModeCalc<T> calc = new ModeCalc<T>(objects[0]);
for (T t : objects) {
calc.checkMaxAppears(t);
}
return calc.getMode();
}
/**
* Work like {@link #mode(Object...)}.
*/
public static Map<Integer, Integer> mode(int... numbers) {
Objects.requireNonNull(numbers, "numbers must not be null");
if (numbers.length == 0) {
return new HashMap<>();
}
Arrays.sort(numbers);
ModeCalc<Integer> calc = new ModeCalc<Integer>(numbers[0]);
for (int t : numbers) {
calc.checkMaxAppears(t);
}
return calc.getMode();
}
/**
* Work like {@link #mode(Object...)}.
*/
public static Map<Long, Integer> mode(long... numbers) {
Objects.requireNonNull(numbers, "numbers must not be null");
if (numbers.length == 0) {
return new HashMap<>();
}
Arrays.sort(numbers);
ModeCalc<Long> calc = new ModeCalc<>(numbers[0]);
for (long t : numbers) {
calc.checkMaxAppears(t);
}
return calc.getMode();
}
/**
* Work like {@link #mode(Object...)}.
*/
public static Map<Double, Integer> mode(double... numbers) {
Objects.requireNonNull(numbers, "numbers must not be null");
if (numbers.length == 0) {
return new HashMap<>();
}
Arrays.sort(numbers);
ModeCalc<Double> calc = new ModeCalc<Double>(numbers[0]);
for (double t : numbers) {
calc.checkMaxAppears(t);
}
return calc.getMode();
}
/**
* Work like {@link #mode(Object...)}.
*/
public static Map<Float, Integer> mode(float... numbers) {
Objects.requireNonNull(numbers, "numbers must not be null");
if (numbers.length == 0) {
return new HashMap<>();
}
Arrays.sort(numbers);
ModeCalc<Float> modeCalc = new ModeCalc<Float>(numbers[0]);
for (float t : numbers) {
modeCalc.checkMaxAppears(t);
}
return modeCalc.getMode();
}
/**
* Work like {@link #mode(Object...)}.
*/
public static Map<String, Integer> mode(String... strings) {
Objects.requireNonNull(strings, "strings must not be null");
if (strings.length == 0) {
return new HashMap<>();
}
Arrays.sort(strings);
ModeCalc<String> state = new ModeCalc<>(strings[0]);
for (String t : strings) {
state.checkMaxAppears(t);
}
return state.getMode();
}
private static class ModeCalc<T> {
private int nTimesLastObjectAppears = 0;
private int maxTimeObjectAppears = 0;
private T prevObject;
Map<T, Integer> mostAppearsObjects;
public ModeCalc(T firstObjectInArray) {
prevObject = firstObjectInArray;
mostAppearsObjects = new HashMap<>();
}
void checkMaxAppears(T currentObject) {
if (currentObject.equals(prevObject)) {
nTimesLastObjectAppears += 1;
} else {
addObjectToMap();
prevObject = currentObject;
nTimesLastObjectAppears = 1;
}
}
void addObjectToMap() {
if (nTimesLastObjectAppears > maxTimeObjectAppears) {
mostAppearsObjects.clear();
mostAppearsObjects.put(prevObject, nTimesLastObjectAppears);
maxTimeObjectAppears = nTimesLastObjectAppears;
} else if (nTimesLastObjectAppears == maxTimeObjectAppears) {
mostAppearsObjects.put(prevObject, nTimesLastObjectAppears);
}
}
Map<T, Integer> getMode() {
// to check appears of last object of loop and add it to map
addObjectToMap();
return mostAppearsObjects;
}
}
}
3 Answers 3
Result class
Since you are returning all the possible modes ... and all of them have the same number of occurrences then you don't need to return a Map since the integer values is the same for all of them. Perhaps you can create you own custom Result class that contains a unmodifiable Set of all the mode objects and the common mode frequency:
public class Mode {
// ...
public static class Result<T> {
public final Set<T> values;
public final int frequency;
private Result(final Set<T> values, final int frequency) {
this.values = values;
this.frequency = frequency;
}
}
// ...
}
Depending on what style you want to adopt you might want to keep those fields private and provide accessors to access them. The constructor is kept private as Mode
's code will be resposible to pass appropriate and valid values for those to fields (e.g. the values set must not contain a null and not to be modifiable, the frequency cannot less than 1...).
Support for comparables and non-comparables:
Bug alert: as per Arrays.sort(Object[]) javadoc map(Object ...) will fail if the user passes objects that do not implement Comparable or they are not mutually comparable. – Valentin Ruano 8 hours ago
You can change the signature so that to enforce T to be Comparable to fix that. Then map(T ... o) can delegate on map(Comparator, T ... o)by providing a trivial comparator that uses T's own compareTo. Perhaps Comparator class has some static to create it (Comparator.naturalOrder()
).
public static Mode {
//...
public static final <T> Result<T> modes(final Comparator<? super T> cmp, final T ... values) {
final List<T> sortedValues = Stream.of(values)
.sorted(cmp)
.collect(Collectors.toList());
final ModeCalc<T> modeCalc = ...
for (final T value : sortedValues) {
//...
}
//...
return new Result<>(Collections.unmodifiableSet(modeValues), maxFrequency);
}
public static final <T extends Comparable<? super T>> Result<T> modes(final T ... values) {
return modes(Comparator.naturalOrder(), values);
}
}
Support for primitives
In the case of primitives you might get some performance gain by providing specialized code for each primitive.... however currently your code ends up
wrapping those primitive values in the corresponding wrapper class (e.g. Long for long) and so in that case you can simply delegate on the general solution for Object
s:
public static final Result<Double> modes(final double ... values) {
return modes(DoubleStream.of(values).boxed().toArray(Double[]::new));
}
Your code does a relatively simple task in a horribly complicated manner. It accomplishes this by two means: First, the logic itself is slightly more complicated than it needs to be. But the main problem is the code's design, which is so obscure that the actual purpose of the methods, fields, variables, classes etc. is almost incomprehensible without going through every line of code and seeing what exactly it does. The names of your methods do not help in this regard, either. I will first try to address the second issue, that is, make the code clearer and more managable, and then review the algorithm itself.
The Code
The main brainsore is the class
ModeCalc<T>
. Let's look at its implementation:- The method name
checkMaxAppears
is confusing, because, apart from the fact that it is grammatically incoherent, the method itself does not check whether the passed item appears more often than other items. It just increases the internal count for that item and, if it's a new item, evaluates the previous item's count. So a more appropriate method name would be simplyprocessObject
. - The method name
addObjectToMap
is a flat-out lie. An object is only added to the map if it doesn't appear less often then previously encountered objects. Also, what object is added to the map? To what map? A better name, i.e. a name that describes what the method actually does, would be, for example,evaluateCountOfPreviousObject
. - The fields
nTimesLastObjectAppears
andprevObject
are directly related to each other. Their relationship can be described as a key-value-pair, with the previous object being the key and the number of times it has appeared until now being the value. The code's design does not reflect this at all, these two values being stored in separate fields that are completely independent of each other. A design that would instead reflect this relationship could be storing the two values in a singleMap.Entry<T, Integer>
. There are two publicly available implementations ofMap.Entry
:AbstractMap.SimpleEntry
andAbstractMap.SimpleImmutableEntry
. In this case, we need aSimpleEntry
, because the number of appearances of an item increases with every encounter of that item. - Next, about the field
mostAppearsObjects
. As Valentin Ruano has already pointed out in his answer, it is a complete misdesign to make this field aMap<T, Integer>
, because its actual intention is to map multipleT
items to one common integer, and not to map everyT
item to a separate value. What's even worse is that you store this common integer value in yet another field, namelymaxTimeObjectAppears
. So how to clean this up? First, makemostAppearsObjects
aSet<T>
instead of aMap<T, Integer>
, since you already have a field where you store the integer value representing the number of appearances. And then, apply the logic described in step 3., that is, replace the fieldsmostAppearsObjects
andmaxTimeObjectAppears
with a single field of typeMap.Entry<Set<T>, Integer>
(which is, in essence, what Valentin Ruano suggested). In this case, theMap.Entry
can even be aSimpleImmutableEntry
, because the value for a givenSet
of already countedT
items is not going to change. After this redesign, it is necessary to change the return type ofgetMode()
toMap.Entry<Set<T>, Integer>
. - Passing a
T
item to the constructor is unnecessary and creates ambiguity, because it might appear as if the passed item is already counted. Instead, the field that is used to store the previous object can be initialized to a default value, which would have to benull
since we do not know what typeT
represents andnull
is valid for all types. Note that this logic works even if the first item passed toprocessObject(T)
isnull
. - Speaking about
null
values, it is better to useObjects.equals(Object, Object)
rather thanObject.equals(Object)
if you are dealing with potentialnull
values, because the latter will throw aNullPointerException
if the firstObject
reference isnull
. - About the access modifiers in this class: Making the class itself
private
and the constructorpublic
is an oxymoron. I presume that you did this because you are not aware that an inner class' members are always accessible from the outer class, evenprivate
members. So even if you declared theModeCalc
constructorprivate
, you would still be able to access it fromMode
.
So much for the class
ModeCalc<T>
. Let's take a look at the redesigned version:private static class ModeCalc<T> { private Entry<T, Integer> previousObjectAndItsFrequency; private Entry<Set<T>, Integer> mostFrequentObjectsAndTheirFrequency; ModeCalc() { previousObjectAndItsFrequency = new SimpleEntry<>(null, 0); mostFrequentObjectsAndTheirFrequency = new SimpleImmutableEntry<>(new HashSet<>(), 0); } void processObject(T currentObject) { if (Objects.equals(currentObject, previousObjectAndItsFrequency.getKey())) { previousObjectAndItsFrequency.setValue(previousObjectAndItsFrequency.getValue() + 1); } else { evaluateCountOfPreviousObject(); previousObjectAndItsFrequency = new SimpleEntry<>(currentObject, 1); } } void evaluateCountOfPreviousObject() { if (previousObjectAndItsFrequency.getValue() > mostFrequentObjectsAndTheirFrequency.getValue()) { Set<T> mostFrequentObjects = new HashSet<>(); mostFrequentObjects.add(previousObjectAndItsFrequency.getKey()); mostFrequentObjectsAndTheirFrequency = new SimpleImmutableEntry<>(mostFrequentObjects, previousObjectAndItsFrequency.getValue()); } else if (previousObjectAndItsFrequency.getValue().equals(mostFrequentObjectsAndTheirFrequency.getValue())) { mostFrequentObjectsAndTheirFrequency.getKey().add(previousObjectAndItsFrequency.getKey()); } } Entry<Set<T>, Integer> getMode() { evaluateCountOfPreviousObject(); return mostFrequentObjectsAndTheirFrequency; } }
It is unfortunately a bit more verbose, but it is also much more compact, because the
ModeCalc
's state (i.e. its fields) is organized so that related values are stored together, and no state is stored multiple times. By the way, you should be aware thatModeCalc
's responsibility is merely counting the maximum number of consecutive appearances of identical elements, and that you use this feature from outsideModeCalc
in order to calculate the maximum number of total appearances of identical elements by sorting the elements first. So be sure not to mix up the resposibilities of your classes and methods, even if, in this case, it is just aprivate
inner class that is only used for one purpose.- The method name
- Excessive code duplication with the
mode
methods, culminating in the most pointless duplication of all – a method that can be omitted entirely: The methodmode(String...)
is completely redundant, because you already have<T> mode(T...)
which can accept an array ofString
s. You have already been shown ways to reduce code duplication here, which can be applied almost one-to-one to this case. However, you chose not to try them but instead copy the same code over and over again just to replace a single type, so I won't go into that and simply assume that you have a preference for excessive code duplication and that this is a conscious design decision. - I think you can actually use
@SafeVarargs
with the first twomode
methods because the methods don't depend on the runtime type of the array, but I'm not 100% sure, maybe someone else can come along and confirm that. See also here.
OK, so much for the code. Now, for ...
The Algorithm
Your algorithm works, but it requires a condition that is not necessary for the applicability of the task of finding the maximum frequency of elements in an array. This condition is, of course, that the elements must be comparable, either through their natural ordering, or through a Comparator
. What's more, this ordering must be consistent with equals, otherwise, the method will be out of control.
This greatly reduces the usability of your code, even though the task it carries out is a relatively simple one. But you could simply iterate through the array, counting the frequencies of all elements along the way, and then determine the highest frequency and get all elements with that frequency. That way, you won't have to worry about the elements being comparable, and because your code doesn't do more than it needs to do, it will probably also be faster.
Update
Another problem just occurred to me: By sorting it, you are modifying the array passed to the mode
methods (in case it is passed as an array and not as varargs), but you don't state this in the method's documentation. This is inconsiderate. You should either specify this in the documentation, or create a copy and sort this copy, or simply require that the array passed to the method be already sorted (like the binarySearch
methods in the Arrays
class).
Also I fail to note that for the sake of calculating the mode there is no need to assume that the elements are comparable. With a properly defined equal
and hashCode
functions you can obtain the same results with a complexity O(n) where n is the number of input elements assuming that equals
and hashCode
are themselves O(1). You current solution is O(n log n) at best.
public final class Mode {
// prevent instantiation of this helper class.
private Mode() {}
public static class Result<T> {
// see code above in my first answer.
}
public static <T> Result<T> mode(T ... values) {
public Map<T, Integer> frequencies = new HashMap<>();
public Set<T> modes = new HashSet<>();
int modeFrequency = 0;
for (final T value : values) {
final Integer before = frequencies.getOrDefault(value, 0);
final Integer after = before + 1;
frequencies.put(value, after);
if (after == modeFrequency) {
modes.add(value);
} else if (after > modeFrequency) {
modes.clear();
modes.add(value);
modeFrequency = after;
}
}
return new Result<>(Collections.unmodifiableSet(modes), modeFrequency);
}
//...
}
The primitive methods can delegate on this one as described above in my previous answer.
Explore related questions
See similar questions with these tags.
Map
since the integer values is the same for all of them. Perhaps you can create you own customResult
class that contains a unmodifiableSet<T>
of all the mode objects and the common mode frequency. Alternatively Third-party common libraries containTuple
orPair
to return two values. \$\endgroup\$Arrays.sort(Object[])
javadoc map(Object ...) will fail if the user passes objects that do not implementComparable
or they are not mutually comparable. \$\endgroup\$T
to beComparable<? super T>
to fix that. Thenmap(T ... o)
can delegate onmap(Comparator<? super T>, T ... o)
by providing a trivial comparator that uses T's owncompareTo
. PerhapsComparator
class has some static to create it. \$\endgroup\$