Should you check for a duplicate before inserting into a set

Question 1

I am learning to use sets. My question is : Sets do not contain duplicates. When we try to insert duplicates, it does not throw any error and automatically removes duplicates. Is it a good practice to check each value before inserting into set whether it exists or not? Or is it OK to do something like the below code? I think Java would be internally doing the check using .contains(value) . What do you think?

What would be the Big O complexity in both the cases considering there are n elements going into the set?

import java.util.HashSet;
import java.util.Set;
public class DuplicateTest {
 public static void main(String[] args) {
 // TODO Auto-generated method stub
 Set<Integer> mySet = new HashSet<Integer>();
 mySet.add(10);
 mySet.add(20);
 mySet.add(30);
 mySet.add(40);
 mySet.add(50);
 mySet.add(50);
 mySet.add(50);
 mySet.add(50);
 mySet.add(50);
 mySet.add(50);
 System.out.println("Contents of the Hash Set :"+mySet);
 }
}

Question 2

Because a HashSet is backed by a HashMap, your answer can be found here: stackoverflow.com/a/4553642/4490686

Question 3

It doesn't do a contains instead it just won't add an element which is already there, i.e. it doesn't add any overhead to do this.

Question 4

Just FYI you can't change Big Oh complexity by adding another operation with the same complexity as the one that is already applied. What I mean, is that both for(int x: set) { set.add(x); } and for(int x:set) { set.contains(x); set.add(x); } have the same Big Oh complexity, as long as add and contains have the same complexity. Because O(C*n) == O(n), for any constant number C.

Question 5

It depends on what you want to do with your set. Do you want to find out if sth. is duplicated, then such a check would be necessary, if you only want to remove doublettes, then the set already does everything you want.

Question 6

@Aron_dc: since add returns a boolean telling you whether the element has been added or was a duplicate, there is never any reason to check before adding.

Question 7

As per the docs:

public boolean add(E e)

Adds the specified element to this set if it is not already present. More formally, adds the specified element e to this set if this set contains no element e2 such that (e==null ? e2==null : e.equals(e2)). If this set already contains the element, the call leaves the set unchanged and returns false.

So the add() method already returns you a true or a false. So you don't need to do the additional check.

Question 8

Compare with the API documentation of Set.add(E)

The add method checks if the element is already in the Set. If the element is already present, then the new element is not added, and the Set remains unchanged. In most situations, you don't need to check anything.

The complexity of the method depends of the concrete implementation of Set that you are using.

Question 9

Its ok not to check. This is the main advantage over Sets of Lists, as they will automatically filter out duplicates.

HashSet has constant time performance (http://docs.oracle.com/javase/8/docs/api/java/util/HashSet.html)

This class offers constant time performance for the basic operations (add, remove, contains and size), assuming the hash function disperses the elements properly among the buckets

Question 10

@YassinHajaj - have linked to APIi and provided the relevant section.

Question 11

The add function returns a boolean which you can check to determine if the item was already in the Set. This is of course based on your needs and isn't a best practice. Its good to know that it will not remove an item that is already there so it can't be depended on to update the existing value with new information if you are defining equals based on surrogate keys from your database. This is opposite the way Maps work as a map will return any existing value and replace it with the new value.

Question 12

Here are answers to your questions:

When we try to insert duplicates, it does not throw any error and automatically removes duplicates.

Your understanding is not correct. The call to Set.add() will not add a new item if it is already in the set; this statement applies to all implementations of Set, including HashSet and TreeSet.

Is it a good practice to check each value before inserting into set whether it exists or not? or is it okay to do something like the below code? I think java would be internally doing the check using .contains(value) . What do you think?

Because your understanding was incorrect from the start, then you do not need to check each value before inserting into the set to see if it already exists. Yes, internally, it is doing something like contains().

What would be the Big Oh complexity in both the cases considering there are "n" elements going into the set?

For HashSet, the time complexity is O(1) for each add(). For TreeSet() -- which you didn't use -- the time complexity is O(lg N) for each add().

Question 13

A HashSet can have a complexity of O(n) if the hashing algorithm is not optimal

Atri 5,8835 gold badges32 silver badges40 bronze badges · Accepted Answer · 2016-01-10 23:56:02Z

As per the docs:

public boolean add(E e)

Adds the specified element to this set if it is not already present. More formally, adds the specified element e to this set if this set contains no element e2 such that (e==null ? e2==null : e.equals(e2)). If this set already contains the element, the call leaves the set unchanged and returns false.

So the add() method already returns you a true or a false. So you don't need to do the additional check.

CollectivesTM on Stack Overflow

Should you check for a duplicate before inserting into a set

5 Answers 5

Comments

Comments

1 Comment

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

5 Answers 5

Comments

Comments

1 Comment

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related