Java - Removing duplicates in an ArrayList

Question 1

I'm working on a program that uses an ArrayList to store Strings. The program prompts the user with a menu and allows the user to choose an operation to perform. Such operations are adding Strings to the List, printing the entries etc. What I want to be able to do is create a method called removeDuplicates(). This method will search the ArrayList and remove any duplicated values. I want to leave one instance of the duplicated value(s) within the list. I also want this method to return the total number of duplicates removed.

I've been trying to use nested loops to accomplish this but I've been running into trouble because when entries get deleted, the indexing of the ArrayList gets altered and things don't work as they should. I know conceptually what I need to do but I'm having trouble implementing this idea in code.

Here is some pseudo code:

start with first entry; check each subsequent entry in the list and see if it matches the first entry; remove each subsequent entry in the list that matches the first entry;

after all entries have been examined, move on to the second entry; check each entry in the list and see if it matches the second entry; remove each entry in the list that matches the second entry;

repeat for entry in the list

Here's the code I have so far:

public int removeDuplicates()
{
 int duplicates = 0;
 for ( int i = 0; i < strings.size(); i++ )
 {
 for ( int j = 0; j < strings.size(); j++ )
 {
 if ( i == j )
 {
 // i & j refer to same entry so do nothing
 }
 else if ( strings.get( j ).equals( strings.get( i ) ) )
 {
 strings.remove( j );
 duplicates++;
 }
 }
 }
 return duplicates;
}

UPDATE: It appears that Will is looking for a homework solution that involves developing the algorithm to remove duplicates, rather than a pragmatic solution using Sets. See his comment:

Thx for the suggestions. This is part of an assignment and I believe the teacher had intended for the solution to not include sets. In other words, I am to come up with a solution that will search for and remove duplicates without implementing a HashSet. The teacher suggested using nested loops which is what I'm trying to do but I've been having some problems with the indexing of the ArrayList after certain entries are removed.

Question 2

If running them through a Set (which people have already suggested) is not possible then it would be helpful to know if there are any other limits, for instance O(?). Your current solution is O(n^2) which is very commonly in CS courses thought to be too heavy for something like this.

Question 3

If your teacher asks you to do your homework in Java, then give him that pragmatic solution with Set =)

Question 4

Why not use a collection such as Set (and an implementation like HashSet) which naturally prevents duplicates?

Question 5

+1, using a Set is the best option. If you want to count the number of duplicates removed, store in an List as before, then construct a Set by passing a List into the constructor and then comparing the size difference between the two to get the number of duplicates.

Question 6

+1 for the solution -1 for not a suitable solution for a homework = 0 pts. :( @Will didn't tag it as such tough

Question 7

what if preservation of order is important?

Question 8

@Carl - use a LinkedHashSet then.

Question 9

To use set you will have to implement Equals inorder for the Set to work correctly on user created objects.

Question 10

You can use nested loops without any problem:

public static int removeDuplicates(ArrayList<String> strings) {
 int size = strings.size();
 int duplicates = 0;
 // not using a method in the check also speeds up the execution
 // also i must be less that size-1 so that j doesn't
 // throw IndexOutOfBoundsException
 for (int i = 0; i < size - 1; i++) {
 // start from the next item after strings[i]
 // since the ones before are checked
 for (int j = i + 1; j < size; j++) {
 // no need for if ( i == j ) here
 if (!strings.get(j).equals(strings.get(i)))
 continue;
 duplicates++;
 strings.remove(j);
 // decrease j because the array got re-indexed
 j--;
 // decrease the size of the array
 size--;
 } // for j
 } // for i
 return duplicates;
}

Question 11

Without testing it, this looks to be ideal. Note that the inner index starts one after the outer (you don't need to check from the beginning of the list every time because you've already checked up to the outer index value for duplicates). Most importantly, it seems to actually answer the question asked!

Question 12

@Azder - Does it really throw IndexOutOfBoundsException? Your condition j < size would take care of it. Wouldn't it? So there is no need to restrict it to i < size -1.

Question 13

yeah, might be the case, still this way one extra unneeded cycle for i is avoided

Question 14

You could try this one liner to take a copy of the String preserving order.

List<String> list;
List<String> dedupped = new ArrayList<String>(new LinkedHashSet<String>(list));

This approach is also O(n) amortized instead of O(n^2)

Question 15

With set, running time should be O(n)

Question 16

Just to clarify my comment on matt b's answer, if you really want to count the number of duplicates removed, use this code:

List<String> list = new ArrayList<String>();
// list gets populated from user input...
Set<String> set = new HashSet<String>(list);
int numDuplicates = list.size() - set.size();

Question 17

well, ive thought about a hashset but this is part of an assignment and the teacher didn't mention the hashset as a possible solution. I think we're supposed to come up with an implmentation without using hashset.

Question 18

Okay, so it's your understanding that this is an assignment to see if you can develop the proper algorithm for removing duplicates, rather than just "getting it done"? I will clarify your initial question/post.

Question 19

List<String> lst = new ArrayList<String>();
lst.add("one");
lst.add("one");
lst.add("two");
lst.add("three");
lst.add("three");
lst.add("three");
Set se =new HashSet(lst);
lst.clear();
lst = new ArrayList<String>(se);
for (Object ls : lst){
 System.out.println("Resulting output---------" + ls); 
}

Question 20

I've been trying to use nested loops to accomplish this but I've been running into trouble because when entries get deleted, the indexing of the ArrayList gets altered and things don't work as they should

Why don't you just decrease the counter each time you delete an entry.

When you delete an entry the elements will move too:

ej:

String [] a = {"a","a","b","c" }

positions:

a[0] = "a";
a[1] = "a"; 
a[2] = "b";
a[3] = "c";

After you remove your first "a" the indexes are:

a[0] = "a";
a[1] = "b";
a[2] = "c";

So, you should take this into consideration and decrease the value of j ( j--) to avoid "jumping" over a value.

See this screenshot:

its working

Question 21

Give it a good try and let me know if you need to see that missing snippet. You're almost there!!

Question 22

@BalusC: I have no idea. I'll try asking on SuperUser ( although I'm pretty sure it will be closed as "no computer related )

Question 23

@BalusC: It is Monaco superuser.com/questions/121123/whats-the-name-of-this-font

Question 24

I see. Mac only. There's however a Windows clone webdevkungfu.com/textmate-envy-aka-monaco-font-for-windows Thanks :)

Question 25

public Collection removeDuplicates(Collection c) {
// Returns a new collection with duplicates removed from passed collection.
 Collection result = new ArrayList();
 for(Object o : c) {
 if (!result.contains(o)) {
 result.add(o);
 }
 }
 return result;
}

or

public void removeDuplicates(List l) {
// Removes duplicates in place from an existing list
 Object last = null;
 Collections.sort(l);
 Iterator i = l.iterator();
 while(i.hasNext()) {
 Object o = i.next();
 if (o.equals(last)) {
 i.remove();
 } else {
 last = o;
 }
 }
}

Both untested.

Question 26

I like first approach; it is easy to understand, and makes use of all the possible optimizations that are coded in "contains()"

Question 27

I think the method declaration should be: public <E> Collection<E> removeDuplicates(Collection<E> c) in order to return the same kind of Collection as entered. In your example, for a Collection<String> passed, a Collection<Object> will be returned. But the basic idea is nice!

Question 28

Collections.sort() requires the items to be comparable.

Question 29

Assuming you can't use a Set like you said, the easiest way of solving the problem is to use a temporary list, rather than attempting to remove the duplicates in place:

public class Duplicates {
 public static void main(String[] args) {
 List<String> list = new ArrayList<String>();
 list.add("one");
 list.add("one");
 list.add("two");
 list.add("three");
 list.add("three");
 list.add("three");
 System.out.println("Prior to removal: " +list);
 System.out.println("There were " + removeDuplicates(list) + " duplicates.");
 System.out.println("After removal: " + list);
 }
 public static int removeDuplicates(List<String> list) {
 int removed = 0;
 List<String> temp = new ArrayList<String>();
 for(String s : list) {
 if(!temp.contains(s)) {
 temp.add(s);
 } else {
 //if the string is already in the list, then ignore it and increment the removed counter
 removed++;
 }
 }
 //put the contents of temp back in the main list
 list.clear();
 list.addAll(temp);
 return removed;
 }
}

Question 30

a temporary list doubles the memory footprint of the list.

Question 31

You could do something like this, must of what people answered above is one alternative, but here's another.

for (int i = 0; i < strings.size(); i++) {
 for (int j = j + 1; j > strings.size(); j++) {
 if(strings.get(i) == strings.get(j)) {
 strings.remove(j);
 j--;
 }`
 }
 }
return strings;

Question 32

Using a set is the best option to remove the duplicates:

If you have a list of of arrays you can remove the duplicates and still retain array list features:

 List<String> strings = new ArrayList<String>();
 //populate the array
 ...
 List<String> dedupped = new ArrayList<String>(new HashSet<String>(strings));
 int numdups = strings.size() - dedupped.size();

if you can't use a set, sort the array (Collections.sort()) and iterate over the list, checking if the current element is equal to the previous element, if it is, remove it.

Question 33

Using a set is the best option (as others suggested).

If you want to compare all elements in a list with eachother you should slightly adapt your for loops:

for(int i = 0; i < max; i++)
 for(int j = i+1; j < max; j++)

This way you don't compare each element only once instead of twice. This is because the second loop start at the next element compared to the first loop.

Also when removing from a list when iterating over them (even when you use a for loop instead of an iterator), keep in mind that you reduce the size of the list. A common solution is to keep another list of items you want to delete, and then after you finished deciding which to delete, you delete them from the original list.

Question 34

public ArrayList removeDuplicates(ArrayList <String> inArray)
{
 ArrayList <String> outArray = new ArrayList();
 boolean doAdd = true;
 for (int i = 0; i < inArray.size(); i++)
 {
 String testString = inArray.get(i);
 for (int j = 0; j < inArray.size(); j++)
 {
 if (i == j)
 {
 break;
 }
 else if (inArray.get(j).equals(testString))
 {
 doAdd = false;
 break;
 }
 }
 if (doAdd)
 {
 outArray.add(testString);
 }
 else
 {
 doAdd = true;
 }
 }
 return outArray;
}

Question 35

You could replace the duplicate with an empty string*, thus keeping the indexing in tact. Then after you've completed you can strip out the empty strings.

*But only if an empty string isn't valid in your implementation.

Question 36

The problem you are seeing in your code is that you remove an entry during iteration, thus invalidating the iteration location.

For example:

{"a", "b", "c", "b", "b", "d"} 
 i j

Now you are removing strings[j].

{"a", "b", "c", "b", "d"} 
 i j

The inner loop ends and j is incremented.

{"a", "b", "c", "b", "d"} 
 i j

Only one duplicate 'b' detected...oops.

best practice in these cases is to store the locations that have to be removed, and remove them after you have finished iterating through the arraylist. (One bonus, the strings.size() call can be optimized outside of the loops by you or the compiler)

Tip, you can start iterating with j at i+1, you've already checked the 0 - i!

Question 37

The inner for loop is invalid. If you delete an element, you cannot increment j, since j is now pointing at the element after the one you deleted, and you will need to inspect it.

In other words, you should use a while loop instead of a for loop, and only increment j if the elements at i and j do not match. If they do match, remove the element at j. size() will decrease by 1 and j will now be pointing at the following element, so there is no need to increase j.

Also, there is no reason to inspect all elements in the inner loop, just the ones following i, since duplicates before i have already been removed by prior iterations.

Question 38

public <Foo> Entry<Integer,List<Foo>> uniqueElementList(List<Foo> listWithPossibleDuplicates) {
 List<Foo> result = new ArrayList<Foo>();//...might want to pre-size here, if you have reliable info about the number of dupes
 Set<Foo> found = new HashSet<Foo>(); //...again with the pre-sizing
 for (Foo f : listWithPossibleDuplicates) if (found.add(f)) result.add(f);
 return entryFactory(listWithPossibleDuplicates.size()-found.size(), result);
}

and then some entryFactory(Integer key, List<Foo> value) method. If you want to mutate the original list (possibly not a good idea, but whatever) instead:

public <Foo> int removeDuplicates(List<Foo> listWithPossibleDuplicates) {
 int original = listWithPossibleDuplicates.size();
 Iterator<Foo> iter = listWithPossibleDuplicates.iterator();
 Set<Foo> found = new HashSet<Foo>();
 while (iter.hasNext()) if (!found.add(iter.next())) iter.remove();
 return original - found.size();
}

for your particular case using strings, you may need to deal with some additional equality constraints (e.g., are upper and lower case versions the same or different?).

EDIT: ah, this is homework. Look up Iterator/Iterable in the Java Collections framework, as well as Set, and see if you don't come to the same conclusion I offered. The generics part is just gravy.

Question 39

I am bit late to join this question, but I have come with a better solution regarding the same using GENERIC type. All the above provided solutions are just a solution. They are increasing a lead to the complexity of whole runtime thread.

RemoveDuplicacy.java

We can minimize it using a technique which should do the required , at the Load Time.

Example : For suppose when you are using a arraylist of the class type as :

ArrayList<User> usersList = new ArrayList<User>();
 usersList.clear();
 User user = new User();
 user.setName("A");
 user.setId("1"); // duplicate
 usersList.add(user);
 user = new User();
 user.setName("A");
 user.setId("1"); // duplicate
 usersList.add(user);
 user = new User();
 user.setName("AB");
 user.setId("2"); // duplicate
 usersList.add(user);
 user = new User();
 user.setName("C");
 user.setId("4");
 usersList.add(user);
 user = new User();
 user.setName("A");
 user.setId("1"); // duplicate
 usersList.add(user);
 user = new User();
 user.setName("A");
 user.setId("2"); // duplicate
 usersList.add(user);
}

The Class for which is the base for the arraylist used above : User class

class User {
 private String name;
 private String id;
 /**
 * @param name
 * the name to set
 */
 public void setName(String name) {
 this.name = name;
 }
 /**
 * @return the name
 */
 public String getName() {
 return name;
 }
 /**
 * @param id
 * the id to set
 */
 public void setId(String id) {
 this.id = id;
 }
 /**
 * @return the id
 */
 public String getId() {
 return id;
 }

}

Now in java there are two Overrided methods present of Object (parent) Class, which can help here in the means to serve our purpose better.They are :

@Override
 public int hashCode() {
 final int prime = 31;
 int result = 1;
 result = prime * result + ((id == null) ? 0 : id.hashCode());
 return result;
 }
 @Override
 public boolean equals(Object obj) {
 if (this == obj)
 return true;
 if (obj == null)
 return false;
 if (getClass() != obj.getClass())
 return false;
 User other = (User) obj;
 if (id == null) {
 if (other.id != null)
 return false;
 } else if (!id.equals(other.id))
 return false;
 return true;
 }

You have to override these methods in the User class

Here is the complete code :

https://gist.github.com/4584310

Let me know if you have any queries.

Question 40

You can add the list into a HashSet and then again convert that hashset into list to remove the duplicates.

public static int removeDuplicates(List<String> duplicateList){
 List<String> correctedList = new ArrayList<String>();
 Set<String> a = new HashSet<String>();
 a.addAll(duplicateList);
 correctedList.addAll(a);
 return (duplicateList.size()-correctedList.size());
}

here it will return the number of duplicates. You can also use the correctList with all unique values

Question 41

Below is the code to remove duplicate elements from a list without changing the order of the list,without using temporary list and without using any set variables.This code saves the memory and boosts performance.

This is a generic method which works with any kind of list.

This was the question asked in one of the interviews. Searched in many forums for the solution but could not find one,so thought this is the correct forum to post the code.

 public List<?> removeDuplicate(List<?> listWithDuplicates) {
 int[] intArray = new int[listWithDuplicates.size()];
 int dupCount = 1;
 int arrayIndex = 0;
 int prevListIndex = 0; // to save previous listIndex value from intArray
 int listIndex;
 for (int i = 0; i < listWithDuplicates.size(); i++) {
 for (int j = i + 1; j < listWithDuplicates.size(); j++) {
 if (listWithDuplicates.get(j).equals(listWithDuplicates.get(i)))
 dupCount++;
 if (dupCount == 2) {
 intArray[arrayIndex] = j; // Saving duplicate indexes to an array
 arrayIndex++;
 dupCount = 1;
 }
 }
 }
 Arrays.sort(intArray);
 for (int k = intArray.length - 1; k >= 0; k--) {
 listIndex = intArray[k];
 if (listIndex != 0 && prevListIndex != listIndex){
 listWithDuplicates.remove(listIndex);
 prevListIndex = listIndex;
 }
 }
 return listWithDuplicates;
}

matt b matt b 140k66 gold badges287 silver badges352 bronze badges · Accepted Answer · 2010-03-12 19:19:42Z

38

Why not use a collection such as Set (and an implementation like HashSet) which naturally prevents duplicates?

Share

Improve this answer

answered Mar 12, 2010 at 19:19

matt b's user avatar

matt b matt b

140k66 gold badges287 silver badges352 bronze badges

6 Comments

Peter

Peter Over a year ago

+1, using a Set is the best option. If you want to count the number of duplicates removed, store in an List as before, then construct a Set by passing a List into the constructor and then comparing the size difference between the two to get the number of duplicates.

2010年03月12日T19:23:59.13Z+00:00

OscarRyz

OscarRyz Over a year ago

+1 for the solution -1 for not a suitable solution for a homework = 0 pts. :( @Will didn't tag it as such tough

2010年03月12日T19:48:18.07Z+00:00

Carl

Carl Over a year ago

what if preservation of order is important?

2010年03月12日T20:41:36.763Z+00:00

Ravi Wallau

Ravi Wallau Over a year ago

@Carl - use a LinkedHashSet then.

2010年03月12日T20:47:12.103Z+00:00

monksy

monksy Over a year ago

To use set you will have to implement Equals inorder for the Set to work correctly on user created objects.

2010年08月06日T06:39:58.253Z+00:00

|

CollectivesTM on Stack Overflow

Java - Removing duplicates in an ArrayList

19 Answers 19

6 Comments

3 Comments

1 Comment

2 Comments

Comments

4 Comments

3 Comments

1 Comment

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

19 Answers 19

6 Comments

3 Comments

1 Comment

2 Comments

Comments

4 Comments

3 Comments

1 Comment

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related