20

I'm working on a program that uses an ArrayList to store Strings. The program prompts the user with a menu and allows the user to choose an operation to perform. Such operations are adding Strings to the List, printing the entries etc. What I want to be able to do is create a method called removeDuplicates(). This method will search the ArrayList and remove any duplicated values. I want to leave one instance of the duplicated value(s) within the list. I also want this method to return the total number of duplicates removed.

I've been trying to use nested loops to accomplish this but I've been running into trouble because when entries get deleted, the indexing of the ArrayList gets altered and things don't work as they should. I know conceptually what I need to do but I'm having trouble implementing this idea in code.

Here is some pseudo code:

start with first entry; check each subsequent entry in the list and see if it matches the first entry; remove each subsequent entry in the list that matches the first entry;

after all entries have been examined, move on to the second entry; check each entry in the list and see if it matches the second entry; remove each entry in the list that matches the second entry;

repeat for entry in the list

Here's the code I have so far:

public int removeDuplicates()
{
 int duplicates = 0;
 for ( int i = 0; i < strings.size(); i++ )
 {
 for ( int j = 0; j < strings.size(); j++ )
 {
 if ( i == j )
 {
 // i & j refer to same entry so do nothing
 }
 else if ( strings.get( j ).equals( strings.get( i ) ) )
 {
 strings.remove( j );
 duplicates++;
 }
 }
 }
 return duplicates;
}

UPDATE: It appears that Will is looking for a homework solution that involves developing the algorithm to remove duplicates, rather than a pragmatic solution using Sets. See his comment:

Thx for the suggestions. This is part of an assignment and I believe the teacher had intended for the solution to not include sets. In other words, I am to come up with a solution that will search for and remove duplicates without implementing a HashSet. The teacher suggested using nested loops which is what I'm trying to do but I've been having some problems with the indexing of the ArrayList after certain entries are removed.

Dariusz
22.5k9 gold badges81 silver badges116 bronze badges
asked Mar 12, 2010 at 19:16
2
  • 1
    If running them through a Set (which people have already suggested) is not possible then it would be helpful to know if there are any other limits, for instance O(?). Your current solution is O(n^2) which is very commonly in CS courses thought to be too heavy for something like this. Commented Mar 12, 2010 at 19:38
  • If your teacher asks you to do your homework in Java, then give him that pragmatic solution with Set =) Commented Mar 12, 2010 at 21:45

19 Answers 19

38

Why not use a collection such as Set (and an implementation like HashSet) which naturally prevents duplicates?

answered Mar 12, 2010 at 19:19

6 Comments

+1, using a Set is the best option. If you want to count the number of duplicates removed, store in an List as before, then construct a Set by passing a List into the constructor and then comparing the size difference between the two to get the number of duplicates.
+1 for the solution -1 for not a suitable solution for a homework = 0 pts. :( @Will didn't tag it as such tough
what if preservation of order is important?
@Carl - use a LinkedHashSet then.
To use set you will have to implement Equals inorder for the Set to work correctly on user created objects.
|
17

You can use nested loops without any problem:

public static int removeDuplicates(ArrayList<String> strings) {
 int size = strings.size();
 int duplicates = 0;
 // not using a method in the check also speeds up the execution
 // also i must be less that size-1 so that j doesn't
 // throw IndexOutOfBoundsException
 for (int i = 0; i < size - 1; i++) {
 // start from the next item after strings[i]
 // since the ones before are checked
 for (int j = i + 1; j < size; j++) {
 // no need for if ( i == j ) here
 if (!strings.get(j).equals(strings.get(i)))
 continue;
 duplicates++;
 strings.remove(j);
 // decrease j because the array got re-indexed
 j--;
 // decrease the size of the array
 size--;
 } // for j
 } // for i
 return duplicates;
}
answered Mar 13, 2010 at 10:06

3 Comments

Without testing it, this looks to be ideal. Note that the inner index starts one after the outer (you don't need to check from the beginning of the list every time because you've already checked up to the outer index value for duplicates). Most importantly, it seems to actually answer the question asked!
@Azder - Does it really throw IndexOutOfBoundsException? Your condition j < size would take care of it. Wouldn't it? So there is no need to restrict it to i < size -1.
yeah, might be the case, still this way one extra unneeded cycle for i is avoided
14

You could try this one liner to take a copy of the String preserving order.

List<String> list;
List<String> dedupped = new ArrayList<String>(new LinkedHashSet<String>(list));

This approach is also O(n) amortized instead of O(n^2)

answered Mar 13, 2010 at 10:37

1 Comment

With set, running time should be O(n)
8

Just to clarify my comment on matt b's answer, if you really want to count the number of duplicates removed, use this code:

List<String> list = new ArrayList<String>();
// list gets populated from user input...
Set<String> set = new HashSet<String>(list);
int numDuplicates = list.size() - set.size();
answered Mar 12, 2010 at 19:27

2 Comments

well, ive thought about a hashset but this is part of an assignment and the teacher didn't mention the hashset as a possible solution. I think we're supposed to come up with an implmentation without using hashset.
Okay, so it's your understanding that this is an assignment to see if you can develop the proper algorithm for removing duplicates, rather than just "getting it done"? I will clarify your initial question/post.
4
List<String> lst = new ArrayList<String>();
lst.add("one");
lst.add("one");
lst.add("two");
lst.add("three");
lst.add("three");
lst.add("three");
Set se =new HashSet(lst);
lst.clear();
lst = new ArrayList<String>(se);
for (Object ls : lst){
 System.out.println("Resulting output---------" + ls); 
}
Devin Burke
13.8k12 gold badges59 silver badges85 bronze badges
answered May 30, 2011 at 9:40

Comments

4

I've been trying to use nested loops to accomplish this but I've been running into trouble because when entries get deleted, the indexing of the ArrayList gets altered and things don't work as they should

Why don't you just decrease the counter each time you delete an entry.

When you delete an entry the elements will move too:

ej:

String [] a = {"a","a","b","c" }

positions:

a[0] = "a";
a[1] = "a"; 
a[2] = "b";
a[3] = "c";

After you remove your first "a" the indexes are:

a[0] = "a";
a[1] = "b";
a[2] = "c";

So, you should take this into consideration and decrease the value of j ( j--) to avoid "jumping" over a value.

See this screenshot:

its working

Boann
50.3k16 gold badges125 silver badges153 bronze badges
answered Mar 12, 2010 at 19:56

4 Comments

Give it a good try and let me know if you need to see that missing snippet. You're almost there!!
@BalusC: I have no idea. I'll try asking on SuperUser ( although I'm pretty sure it will be closed as "no computer related )
I see. Mac only. There's however a Windows clone webdevkungfu.com/textmate-envy-aka-monaco-font-for-windows Thanks :)
3
public Collection removeDuplicates(Collection c) {
// Returns a new collection with duplicates removed from passed collection.
 Collection result = new ArrayList();
 for(Object o : c) {
 if (!result.contains(o)) {
 result.add(o);
 }
 }
 return result;
}

or

public void removeDuplicates(List l) {
// Removes duplicates in place from an existing list
 Object last = null;
 Collections.sort(l);
 Iterator i = l.iterator();
 while(i.hasNext()) {
 Object o = i.next();
 if (o.equals(last)) {
 i.remove();
 } else {
 last = o;
 }
 }
}

Both untested.

answered Mar 12, 2010 at 20:08

3 Comments

I like first approach; it is easy to understand, and makes use of all the possible optimizations that are coded in "contains()"
I think the method declaration should be: public <E> Collection<E> removeDuplicates(Collection<E> c) in order to return the same kind of Collection as entered. In your example, for a Collection<String> passed, a Collection<Object> will be returned. But the basic idea is nice!
Collections.sort() requires the items to be comparable.
1

Assuming you can't use a Set like you said, the easiest way of solving the problem is to use a temporary list, rather than attempting to remove the duplicates in place:

public class Duplicates {
 public static void main(String[] args) {
 List<String> list = new ArrayList<String>();
 list.add("one");
 list.add("one");
 list.add("two");
 list.add("three");
 list.add("three");
 list.add("three");
 System.out.println("Prior to removal: " +list);
 System.out.println("There were " + removeDuplicates(list) + " duplicates.");
 System.out.println("After removal: " + list);
 }
 public static int removeDuplicates(List<String> list) {
 int removed = 0;
 List<String> temp = new ArrayList<String>();
 for(String s : list) {
 if(!temp.contains(s)) {
 temp.add(s);
 } else {
 //if the string is already in the list, then ignore it and increment the removed counter
 removed++;
 }
 }
 //put the contents of temp back in the main list
 list.clear();
 list.addAll(temp);
 return removed;
 }
}
answered Mar 13, 2010 at 9:23

1 Comment

a temporary list doubles the memory footprint of the list.
1

You could do something like this, must of what people answered above is one alternative, but here's another.

for (int i = 0; i < strings.size(); i++) {
 for (int j = j + 1; j > strings.size(); j++) {
 if(strings.get(i) == strings.get(j)) {
 strings.remove(j);
 j--;
 }`
 }
 }
return strings;
answered Aug 27, 2015 at 3:36

Comments

0

Using a set is the best option to remove the duplicates:

If you have a list of of arrays you can remove the duplicates and still retain array list features:

 List<String> strings = new ArrayList<String>();
 //populate the array
 ...
 List<String> dedupped = new ArrayList<String>(new HashSet<String>(strings));
 int numdups = strings.size() - dedupped.size();

if you can't use a set, sort the array (Collections.sort()) and iterate over the list, checking if the current element is equal to the previous element, if it is, remove it.

answered Mar 12, 2010 at 19:32

Comments

0

Using a set is the best option (as others suggested).

If you want to compare all elements in a list with eachother you should slightly adapt your for loops:

for(int i = 0; i < max; i++)
 for(int j = i+1; j < max; j++)

This way you don't compare each element only once instead of twice. This is because the second loop start at the next element compared to the first loop.

Also when removing from a list when iterating over them (even when you use a for loop instead of an iterator), keep in mind that you reduce the size of the list. A common solution is to keep another list of items you want to delete, and then after you finished deciding which to delete, you delete them from the original list.

answered Mar 12, 2010 at 19:51

Comments

0
public ArrayList removeDuplicates(ArrayList <String> inArray)
{
 ArrayList <String> outArray = new ArrayList();
 boolean doAdd = true;
 for (int i = 0; i < inArray.size(); i++)
 {
 String testString = inArray.get(i);
 for (int j = 0; j < inArray.size(); j++)
 {
 if (i == j)
 {
 break;
 }
 else if (inArray.get(j).equals(testString))
 {
 doAdd = false;
 break;
 }
 }
 if (doAdd)
 {
 outArray.add(testString);
 }
 else
 {
 doAdd = true;
 }
 }
 return outArray;
}
answered Mar 12, 2010 at 20:14

Comments

0

You could replace the duplicate with an empty string*, thus keeping the indexing in tact. Then after you've completed you can strip out the empty strings.

*But only if an empty string isn't valid in your implementation.

answered Mar 12, 2010 at 20:17

Comments

0

The problem you are seeing in your code is that you remove an entry during iteration, thus invalidating the iteration location.

For example:

{"a", "b", "c", "b", "b", "d"} 
 i j 

Now you are removing strings[j].

{"a", "b", "c", "b", "d"} 
 i j 

The inner loop ends and j is incremented.

{"a", "b", "c", "b", "d"} 
 i j

Only one duplicate 'b' detected...oops.

best practice in these cases is to store the locations that have to be removed, and remove them after you have finished iterating through the arraylist. (One bonus, the strings.size() call can be optimized outside of the loops by you or the compiler)

Tip, you can start iterating with j at i+1, you've already checked the 0 - i!

answered Mar 13, 2010 at 22:15

Comments

0

The inner for loop is invalid. If you delete an element, you cannot increment j, since j is now pointing at the element after the one you deleted, and you will need to inspect it.

In other words, you should use a while loop instead of a for loop, and only increment j if the elements at i and j do not match. If they do match, remove the element at j. size() will decrease by 1 and j will now be pointing at the following element, so there is no need to increase j.

Also, there is no reason to inspect all elements in the inner loop, just the ones following i, since duplicates before i have already been removed by prior iterations.

answered Mar 14, 2010 at 12:56

Comments

0
public <Foo> Entry<Integer,List<Foo>> uniqueElementList(List<Foo> listWithPossibleDuplicates) {
 List<Foo> result = new ArrayList<Foo>();//...might want to pre-size here, if you have reliable info about the number of dupes
 Set<Foo> found = new HashSet<Foo>(); //...again with the pre-sizing
 for (Foo f : listWithPossibleDuplicates) if (found.add(f)) result.add(f);
 return entryFactory(listWithPossibleDuplicates.size()-found.size(), result);
}

and then some entryFactory(Integer key, List<Foo> value) method. If you want to mutate the original list (possibly not a good idea, but whatever) instead:

public <Foo> int removeDuplicates(List<Foo> listWithPossibleDuplicates) {
 int original = listWithPossibleDuplicates.size();
 Iterator<Foo> iter = listWithPossibleDuplicates.iterator();
 Set<Foo> found = new HashSet<Foo>();
 while (iter.hasNext()) if (!found.add(iter.next())) iter.remove();
 return original - found.size();
}

for your particular case using strings, you may need to deal with some additional equality constraints (e.g., are upper and lower case versions the same or different?).

EDIT: ah, this is homework. Look up Iterator/Iterable in the Java Collections framework, as well as Set, and see if you don't come to the same conclusion I offered. The generics part is just gravy.

answered Mar 12, 2010 at 20:33

Comments

0

I am bit late to join this question, but I have come with a better solution regarding the same using GENERIC type. All the above provided solutions are just a solution. They are increasing a lead to the complexity of whole runtime thread.

RemoveDuplicacy.java

We can minimize it using a technique which should do the required , at the Load Time.

Example : For suppose when you are using a arraylist of the class type as :

ArrayList<User> usersList = new ArrayList<User>();
 usersList.clear();
 User user = new User();
 user.setName("A");
 user.setId("1"); // duplicate
 usersList.add(user);
 user = new User();
 user.setName("A");
 user.setId("1"); // duplicate
 usersList.add(user);
 user = new User();
 user.setName("AB");
 user.setId("2"); // duplicate
 usersList.add(user);
 user = new User();
 user.setName("C");
 user.setId("4");
 usersList.add(user);
 user = new User();
 user.setName("A");
 user.setId("1"); // duplicate
 usersList.add(user);
 user = new User();
 user.setName("A");
 user.setId("2"); // duplicate
 usersList.add(user);
}

The Class for which is the base for the arraylist used above : User class

class User {
 private String name;
 private String id;
 /**
 * @param name
 * the name to set
 */
 public void setName(String name) {
 this.name = name;
 }
 /**
 * @return the name
 */
 public String getName() {
 return name;
 }
 /**
 * @param id
 * the id to set
 */
 public void setId(String id) {
 this.id = id;
 }
 /**
 * @return the id
 */
 public String getId() {
 return id;
 }

}

Now in java there are two Overrided methods present of Object (parent) Class, which can help here in the means to serve our purpose better.They are :

@Override
 public int hashCode() {
 final int prime = 31;
 int result = 1;
 result = prime * result + ((id == null) ? 0 : id.hashCode());
 return result;
 }
 @Override
 public boolean equals(Object obj) {
 if (this == obj)
 return true;
 if (obj == null)
 return false;
 if (getClass() != obj.getClass())
 return false;
 User other = (User) obj;
 if (id == null) {
 if (other.id != null)
 return false;
 } else if (!id.equals(other.id))
 return false;
 return true;
 }

You have to override these methods in the User class

Here is the complete code :

https://gist.github.com/4584310

Let me know if you have any queries.

answered Jan 21, 2013 at 7:42

Comments

0

You can add the list into a HashSet and then again convert that hashset into list to remove the duplicates.

public static int removeDuplicates(List<String> duplicateList){
 List<String> correctedList = new ArrayList<String>();
 Set<String> a = new HashSet<String>();
 a.addAll(duplicateList);
 correctedList.addAll(a);
 return (duplicateList.size()-correctedList.size());
}

here it will return the number of duplicates. You can also use the correctList with all unique values

answered Aug 3, 2013 at 22:46

Comments

0

Below is the code to remove duplicate elements from a list without changing the order of the list,without using temporary list and without using any set variables.This code saves the memory and boosts performance.

This is a generic method which works with any kind of list.

This was the question asked in one of the interviews. Searched in many forums for the solution but could not find one,so thought this is the correct forum to post the code.

 public List<?> removeDuplicate(List<?> listWithDuplicates) {
 int[] intArray = new int[listWithDuplicates.size()];
 int dupCount = 1;
 int arrayIndex = 0;
 int prevListIndex = 0; // to save previous listIndex value from intArray
 int listIndex;
 for (int i = 0; i < listWithDuplicates.size(); i++) {
 for (int j = i + 1; j < listWithDuplicates.size(); j++) {
 if (listWithDuplicates.get(j).equals(listWithDuplicates.get(i)))
 dupCount++;
 if (dupCount == 2) {
 intArray[arrayIndex] = j; // Saving duplicate indexes to an array
 arrayIndex++;
 dupCount = 1;
 }
 }
 }
 Arrays.sort(intArray);
 for (int k = intArray.length - 1; k >= 0; k--) {
 listIndex = intArray[k];
 if (listIndex != 0 && prevListIndex != listIndex){
 listWithDuplicates.remove(listIndex);
 prevListIndex = listIndex;
 }
 }
 return listWithDuplicates;
}
answered Jul 19, 2014 at 14:54

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.