34

I am learning to use sets. My question is : Sets do not contain duplicates. When we try to insert duplicates, it does not throw any error and automatically removes duplicates. Is it a good practice to check each value before inserting into set whether it exists or not? Or is it OK to do something like the below code? I think Java would be internally doing the check using .contains(value) . What do you think?

What would be the Big O complexity in both the cases considering there are n elements going into the set?

import java.util.HashSet;
import java.util.Set;
public class DuplicateTest {
 public static void main(String[] args) {
 // TODO Auto-generated method stub
 Set<Integer> mySet = new HashSet<Integer>();
 mySet.add(10);
 mySet.add(20);
 mySet.add(30);
 mySet.add(40);
 mySet.add(50);
 mySet.add(50);
 mySet.add(50);
 mySet.add(50);
 mySet.add(50);
 mySet.add(50);
 System.out.println("Contents of the Hash Set :"+mySet);
 }
}
Raedwald
49.2k49 gold badges162 silver badges250 bronze badges
asked Jan 10, 2016 at 23:50
9
  • Because a HashSet is backed by a HashMap, your answer can be found here: stackoverflow.com/a/4553642/4490686 Commented Jan 10, 2016 at 23:55
  • 2
    It doesn't do a contains instead it just won't add an element which is already there, i.e. it doesn't add any overhead to do this. Commented Jan 11, 2016 at 0:00
  • 1
    Just FYI you can't change Big Oh complexity by adding another operation with the same complexity as the one that is already applied. What I mean, is that both for(int x: set) { set.add(x); } and for(int x:set) { set.contains(x); set.add(x); } have the same Big Oh complexity, as long as add and contains have the same complexity. Because O(C*n) == O(n), for any constant number C. Commented Jan 11, 2016 at 0:25
  • It depends on what you want to do with your set. Do you want to find out if sth. is duplicated, then such a check would be necessary, if you only want to remove doublettes, then the set already does everything you want. Commented Jan 11, 2016 at 10:27
  • 1
    @Aron_dc: since add returns a boolean telling you whether the element has been added or was a duplicate, there is never any reason to check before adding. Commented Jan 11, 2016 at 11:13

5 Answers 5

47

As per the docs:

public boolean add(E e)

Adds the specified element to this set if it is not already present. More formally, adds the specified element e to this set if this set contains no element e2 such that (e==null ? e2==null : e.equals(e2)). If this set already contains the element, the call leaves the set unchanged and returns false.

So the add() method already returns you a true or a false. So you don't need to do the additional check.

answered Jan 10, 2016 at 23:56
Sign up to request clarification or add additional context in comments.

Comments

10

Compare with the API documentation of Set.add(E)

The add method checks if the element is already in the Set. If the element is already present, then the new element is not added, and the Set remains unchanged. In most situations, you don't need to check anything.

The complexity of the method depends of the concrete implementation of Set that you are using.

Holger
301k43 gold badges481 silver badges827 bronze badges
answered Jan 10, 2016 at 23:56

Comments

4

Its ok not to check. This is the main advantage over Sets of Lists, as they will automatically filter out duplicates.

HashSet has constant time performance (http://docs.oracle.com/javase/8/docs/api/java/util/HashSet.html)

This class offers constant time performance for the basic operations (add, remove, contains and size), assuming the hash function disperses the elements properly among the buckets

answered Jan 10, 2016 at 23:52

1 Comment

@YassinHajaj - have linked to APIi and provided the relevant section.
2

The add function returns a boolean which you can check to determine if the item was already in the Set. This is of course based on your needs and isn't a best practice. Its good to know that it will not remove an item that is already there so it can't be depended on to update the existing value with new information if you are defining equals based on surrogate keys from your database. This is opposite the way Maps work as a map will return any existing value and replace it with the new value.

answered Jan 10, 2016 at 23:59

Comments

1

Here are answers to your questions:

When we try to insert duplicates, it does not throw any error and automatically removes duplicates.

Your understanding is not correct. The call to Set.add() will not add a new item if it is already in the set; this statement applies to all implementations of Set, including HashSet and TreeSet.

Is it a good practice to check each value before inserting into set whether it exists or not? or is it okay to do something like the below code? I think java would be internally doing the check using .contains(value) . What do you think?

Because your understanding was incorrect from the start, then you do not need to check each value before inserting into the set to see if it already exists. Yes, internally, it is doing something like contains().

What would be the Big Oh complexity in both the cases considering there are "n" elements going into the set?

For HashSet, the time complexity is O(1) for each add(). For TreeSet() -- which you didn't use -- the time complexity is O(lg N) for each add().

answered Jan 11, 2016 at 0:16

1 Comment

A HashSet can have a complexity of O(n) if the hashing algorithm is not optimal

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.