I am learning to use sets. My question is : Sets do not contain duplicates. When we try to insert duplicates, it does not throw any error and automatically removes duplicates. Is it a good practice to check each value before inserting into set whether it exists or not? Or is it OK to do something like the below code? I think Java would be internally doing the check using .contains(value) . What do you think?
What would be the Big O complexity in both the cases considering there are n elements going into the set?
import java.util.HashSet;
import java.util.Set;
public class DuplicateTest {
public static void main(String[] args) {
// TODO Auto-generated method stub
Set<Integer> mySet = new HashSet<Integer>();
mySet.add(10);
mySet.add(20);
mySet.add(30);
mySet.add(40);
mySet.add(50);
mySet.add(50);
mySet.add(50);
mySet.add(50);
mySet.add(50);
mySet.add(50);
System.out.println("Contents of the Hash Set :"+mySet);
}
}
5 Answers 5
As per the docs:
public boolean add(E e)Adds the specified element to this set if it is not already present. More formally, adds the specified element e to this set if this set contains no element e2 such that (e==null ? e2==null : e.equals(e2)). If this set already contains the element, the call leaves the set unchanged and returns false.
So the add() method already returns you a true or a false. So you don't need to do the additional check.
Comments
Compare with the API documentation of Set.add(E)
The add method checks if the element is already in the Set. If the element is already present, then the new element is not added, and the Set remains unchanged. In most situations, you don't need to check anything.
The complexity of the method depends of the concrete implementation of Set that you are using.
Comments
Its ok not to check. This is the main advantage over Sets of Lists, as they will automatically filter out duplicates.
HashSet has constant time performance (http://docs.oracle.com/javase/8/docs/api/java/util/HashSet.html)
This class offers constant time performance for the basic operations (add, remove, contains and size), assuming the hash function disperses the elements properly among the buckets
1 Comment
The add function returns a boolean which you can check to determine if the item was already in the Set. This is of course based on your needs and isn't a best practice. Its good to know that it will not remove an item that is already there so it can't be depended on to update the existing value with new information if you are defining equals based on surrogate keys from your database. This is opposite the way Maps work as a map will return any existing value and replace it with the new value.
Comments
Here are answers to your questions:
When we try to insert duplicates, it does not throw any error and automatically removes duplicates.
Your understanding is not correct. The call to Set.add() will not add a new item if it is already in the set; this statement applies to all implementations of Set, including HashSet and TreeSet.
Is it a good practice to check each value before inserting into set whether it exists or not? or is it okay to do something like the below code? I think java would be internally doing the check using .contains(value) . What do you think?
Because your understanding was incorrect from the start, then you do not need to check each value before inserting into the set to see if it already exists. Yes, internally, it is doing something like contains().
What would be the Big Oh complexity in both the cases considering there are "n" elements going into the set?
For HashSet, the time complexity is O(1) for each add(). For TreeSet() -- which you didn't use -- the time complexity is O(lg N) for each add().
1 Comment
O(n) if the hashing algorithm is not optimal
HashSetis backed by aHashMap, your answer can be found here: stackoverflow.com/a/4553642/4490686containsinstead it just won't add an element which is already there, i.e. it doesn't add any overhead to do this.for(int x: set) { set.add(x); }andfor(int x:set) { set.contains(x); set.add(x); }have the same Big Oh complexity, as long asaddandcontainshave the same complexity. Because O(C*n) == O(n), for any constant number C.addreturns abooleantelling you whether the element has been added or was a duplicate, there is never any reason to check before adding.