Remove elements compared with other elements in a Set based on a condition

Question 1

Simplified the actual problem to focus on comparing/removing performance.

Given a set S with n elements , find the most optimal way to compare each element to all others until a condition is met that decides if and which item to remove. I experience long runtimes when n is high (e.g. n=3000 has a runtime of 2500ms on an intel i5-6500 cpu)

An element is a Tuple object (itv, t) where itv holds an integer interval value (between 0 and 4, inclusive) and t holds an integer total benefit value.

The condition: Given Tuple s and another Tuple s', if itv == itv', then if t <= t' remove s, else remove s'.

A naïve example. S={(1,20),(0,40),(1,35)}

Compare (1,20) with (0,40): itv != itv', continue.
Compare (1,20) with (1,35): itv == itv', t < t', remove (1,20) from S.
Compare (0,40) with (1,34): itv != itv', return S.

Due to Java's constraints on removing elements while iterating, for-loops are wildly inefficient and I currently use a nested foreach-loop inside an Iterator loop. However, I still feel like it underperforms. This thought is mostly because the code looks terrible, in my opinion.

Below is my implementation. It tries to reduce the amount of unnecessary comparisons as much as possible. My code uses an outer Iterator for comparing the main Tuple to the others. In the case that the interval matches and the main Tuple's total benefit is worse, the inner loop is stopped, the main Tuple is removed and we continue with the next Iterator item. If the inner Tuple is worse, however, then the inner Tuple is marked for deletion and will be deleted at the end of the outer loop iteration.

Does anyone see any glaring mistakes, have any comments about performance (expensive calls) or refactoring for code clarity?

public static HashSet<Tuple> compareAndRemove(HashSet<Tuple> set) {
 HashSet<Tuple> bestTuples = new HashSet<>(set.size()); //Return set
 Collection<Tuple> tupleRemovals = new HashSet<>(); //Tuples to remove this iteration
 Iterator<Tuple> s_iter;
 do {
 set.removeAll(tupleRemovals);
 removedElements += tupleRemovals.size();
 tupleRemovals = new HashSet<>();
 s_iter = set.iterator();
 if (s_iter.hasNext()) {
 Tuple s = s_iter.next();
 if (!s_iter.hasNext()) { //Last element; nothing to compare anymore
 bestTuples.add(s);
 s_iter.remove();
 break;
 }
 //Compare with other
 boolean removeMainTuple = false;
 for (Tuple s_prime : set) {
 boolean equalIntervals = true;
 if (s.equals(s_prime)) continue; //Skip same element
 //Check interval
 if (s.itv == s_prime.itv) {
 //Decide which Tuple to discard
 if (s.t <= s_prime.t) {
 removeMainTuple = true; //Remove Iterator element
 break;
 } else {
 tupleRemovals.add(s_prime); //Remove foreach element
 }
 }
 }
 //Move current item to return set if it was the best
 if (!removeMainTuple) {
 bestTuples.add(s);
 }
 s_iter.remove();
 }
 } while (s_iter.hasNext());
 bestTuples.addAll(set);
 return bestTuples;
}

Question 2

This looks like a great first post, good job!

Question 3

Thank you! I'm curious if I designed my algorithm wrong, or if I just made a few simple mistakes like using the wrong data structure...

Question 4

Are you just trying to find the best tuples, or are you trying to mutate the Set in the parameter to remove the ones that aren't the best, or both? (If both, shouldn't the result of the mutation and the Set returned be the same Set?)

Question 5

Runtime has precedence over space complexity for this, although I do appreciate a solution that does not have an unnecessarily large space complexity.

Question 6

In the case where you just want to find the best tuples, the code can be much simpler:

find the Tuple with the best t for each value of itv (sounds like a job for Map)
put the best bunch of Tuple into a Set, and return it

public static Set<Tuple> bestTuples(Set<Tuple> set) {
 Map<Integer, Tuple> bestTuplesByItv = new HashMap<Integer, Tuple>();
 Tuple bestTupleSoFar;
 for (Tuple t: set) {
 if ((bestTupleSoFar = bestTuplesByItv.get(t.itv)) == null || t.t > bestTupleSoFar.t) {
 bestTuplesByItv.put(t.itv, t);
 }
 }
 Set<Tuple> bestTuples = new HashSet<Tuple>();
 bestTuples.addAll(bestTuplesByItv.values());
 return bestTuples;
}

This should also be faster - your solution has some nested loops, O(n^2), but the suggestion above just iterates through the entire set once. My gut tells me that removing things from a Set is slower than just making a new set with the things we want, but I'm not sure about HashSet.

Since the values for the Map are coming from a HashSet, all the values are guaranteed to have unique hash value. Since that's the case, we don't need a LinkedHashMap - a plain HashMap will work just fine (LinkedHashMap creates linked lists to deal with hash collisions - but that's unnecessary as we won't have any).

Your code seems to be mutating the Set that is passed in as a parameter though - if that's desired behaviour, it's a trivial modification from the code above - just clear the values in set and add the values we want before returning:

public static Set<Tuple> compareAndRemove(Set<Tuple> set) {
 Map<Integer, Tuple> bestTuplesByItv = new HashMap<Integer, Tuple>();
 Tuple bestTupleSoFar;
 for (Tuple t: set) {
 if ((bestTupleSoFar = bestTuplesByItv.get(t.itv)) == null || t.t > bestTupleSoFar.t) {
 bestTuplesByItv.put(t.itv, t);
 }
 }
 set.clear();
 set.addAll(bestTuplesByItv.values());
 return set;
}

Jeremy Hunt Jeremy Hunt 8434 silver badges9 bronze badges · Accepted Answer · 2019-10-02 23:18:43Z

In the case where you just want to find the best tuples, the code can be much simpler:

find the Tuple with the best t for each value of itv (sounds like a job for Map)
put the best bunch of Tuple into a Set, and return it

public static Set<Tuple> bestTuples(Set<Tuple> set) {
 Map<Integer, Tuple> bestTuplesByItv = new HashMap<Integer, Tuple>();
 Tuple bestTupleSoFar;
 for (Tuple t: set) {
 if ((bestTupleSoFar = bestTuplesByItv.get(t.itv)) == null || t.t > bestTupleSoFar.t) {
 bestTuplesByItv.put(t.itv, t);
 }
 }
 Set<Tuple> bestTuples = new HashSet<Tuple>();
 bestTuples.addAll(bestTuplesByItv.values());
 return bestTuples;
}

This should also be faster - your solution has some nested loops, O(n^2), but the suggestion above just iterates through the entire set once. My gut tells me that removing things from a Set is slower than just making a new set with the things we want, but I'm not sure about HashSet.

Since the values for the Map are coming from a HashSet, all the values are guaranteed to have unique hash value. Since that's the case, we don't need a LinkedHashMap - a plain HashMap will work just fine (LinkedHashMap creates linked lists to deal with hash collisions - but that's unnecessary as we won't have any).

Your code seems to be mutating the Set that is passed in as a parameter though - if that's desired behaviour, it's a trivial modification from the code above - just clear the values in set and add the values we want before returning:

public static Set<Tuple> compareAndRemove(Set<Tuple> set) {
 Map<Integer, Tuple> bestTuplesByItv = new HashMap<Integer, Tuple>();
 Tuple bestTupleSoFar;
 for (Tuple t: set) {
 if ((bestTupleSoFar = bestTuplesByItv.get(t.itv)) == null || t.t > bestTupleSoFar.t) {
 bestTuplesByItv.put(t.itv, t);
 }
 }
 set.clear();
 set.addAll(bestTuplesByItv.values());
 return set;
}

Stack Exchange Network

Remove elements compared with other elements in a Set based on a condition

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Remove elements compared with other elements in a Set based on a condition

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions