Simplified the actual problem to focus on comparing/removing performance.
Given a set S with n elements , find the most optimal way to compare each element to all others until a condition is met that decides if and which item to remove. I experience long runtimes when n is high (e.g. n=3000 has a runtime of 2500ms on an intel i5-6500 cpu)
An element is a Tuple object (itv, t) where itv holds an integer interval value (between 0 and 4, inclusive) and t holds an integer total benefit value.
The condition: Given Tuple s and another Tuple s', if itv == itv', then if t <= t' remove s, else remove s'.
A naïve example. S={(1,20),(0,40),(1,35)}
Compare (1,20) with (0,40): itv != itv', continue.
Compare (1,20) with (1,35): itv == itv', t < t', remove (1,20) from S.
Compare (0,40) with (1,34): itv != itv', return S.
Due to Java's constraints on removing elements while iterating, for-loops are wildly inefficient and I currently use a nested foreach-loop inside an Iterator loop. However, I still feel like it underperforms. This thought is mostly because the code looks terrible, in my opinion.
Below is my implementation. It tries to reduce the amount of unnecessary comparisons as much as possible. My code uses an outer Iterator for comparing the main Tuple to the others. In the case that the interval matches and the main Tuple's total benefit is worse, the inner loop is stopped, the main Tuple is removed and we continue with the next Iterator item. If the inner Tuple is worse, however, then the inner Tuple is marked for deletion and will be deleted at the end of the outer loop iteration.
Does anyone see any glaring mistakes, have any comments about performance (expensive calls) or refactoring for code clarity?
public static HashSet<Tuple> compareAndRemove(HashSet<Tuple> set) {
HashSet<Tuple> bestTuples = new HashSet<>(set.size()); //Return set
Collection<Tuple> tupleRemovals = new HashSet<>(); //Tuples to remove this iteration
Iterator<Tuple> s_iter;
do {
set.removeAll(tupleRemovals);
removedElements += tupleRemovals.size();
tupleRemovals = new HashSet<>();
s_iter = set.iterator();
if (s_iter.hasNext()) {
Tuple s = s_iter.next();
if (!s_iter.hasNext()) { //Last element; nothing to compare anymore
bestTuples.add(s);
s_iter.remove();
break;
}
//Compare with other
boolean removeMainTuple = false;
for (Tuple s_prime : set) {
boolean equalIntervals = true;
if (s.equals(s_prime)) continue; //Skip same element
//Check interval
if (s.itv == s_prime.itv) {
//Decide which Tuple to discard
if (s.t <= s_prime.t) {
removeMainTuple = true; //Remove Iterator element
break;
} else {
tupleRemovals.add(s_prime); //Remove foreach element
}
}
}
//Move current item to return set if it was the best
if (!removeMainTuple) {
bestTuples.add(s);
}
s_iter.remove();
}
} while (s_iter.hasNext());
bestTuples.addAll(set);
return bestTuples;
}
-
\$\begingroup\$ This looks like a great first post, good job! \$\endgroup\$IEatBagels– IEatBagels2019年10月02日 18:38:45 +00:00Commented Oct 2, 2019 at 18:38
-
\$\begingroup\$ Thank you! I'm curious if I designed my algorithm wrong, or if I just made a few simple mistakes like using the wrong data structure... \$\endgroup\$Arjen– Arjen2019年10月02日 18:45:17 +00:00Commented Oct 2, 2019 at 18:45
-
\$\begingroup\$ Are you just trying to find the best tuples, or are you trying to mutate the Set in the parameter to remove the ones that aren't the best, or both? (If both, shouldn't the result of the mutation and the Set returned be the same Set?) \$\endgroup\$Jeremy Hunt– Jeremy Hunt2019年10月02日 21:55:08 +00:00Commented Oct 2, 2019 at 21:55
-
\$\begingroup\$ Runtime has precedence over space complexity for this, although I do appreciate a solution that does not have an unnecessarily large space complexity. \$\endgroup\$Arjen– Arjen2019年10月03日 13:40:48 +00:00Commented Oct 3, 2019 at 13:40
1 Answer 1
In the case where you just want to find the best tuples, the code can be much simpler:
- find the
Tuple
with the bestt
for each value ofitv
(sounds like a job forMap
) - put the best bunch of
Tuple
into a Set, and return it
public static Set<Tuple> bestTuples(Set<Tuple> set) {
Map<Integer, Tuple> bestTuplesByItv = new HashMap<Integer, Tuple>();
Tuple bestTupleSoFar;
for (Tuple t: set) {
if ((bestTupleSoFar = bestTuplesByItv.get(t.itv)) == null || t.t > bestTupleSoFar.t) {
bestTuplesByItv.put(t.itv, t);
}
}
Set<Tuple> bestTuples = new HashSet<Tuple>();
bestTuples.addAll(bestTuplesByItv.values());
return bestTuples;
}
This should also be faster - your solution has some nested loops, O(n^2), but the suggestion above just iterates through the entire set
once. My gut tells me that removing things from a Set is slower than just making a new set with the things we want, but I'm not sure about HashSet.
Since the values for the Map
are coming from a HashSet
, all the values are guaranteed to have unique hash value. Since that's the case, we don't need a LinkedHashMap
- a plain HashMap
will work just fine (LinkedHashMap
creates linked lists to deal with hash collisions - but that's unnecessary as we won't have any).
Your code seems to be mutating the Set
that is passed in as a parameter though - if that's desired behaviour, it's a trivial modification from the code above - just clear the values in set
and add the values we want before returning:
public static Set<Tuple> compareAndRemove(Set<Tuple> set) {
Map<Integer, Tuple> bestTuplesByItv = new HashMap<Integer, Tuple>();
Tuple bestTupleSoFar;
for (Tuple t: set) {
if ((bestTupleSoFar = bestTuplesByItv.get(t.itv)) == null || t.t > bestTupleSoFar.t) {
bestTuplesByItv.put(t.itv, t);
}
}
set.clear();
set.addAll(bestTuplesByItv.values());
return set;
}