15

I have an Array of Objects that need the duplicates removed/filtered. I was going to just override equals & hachCode on the Object elements, and then stick them in a Set... but I figured I should at least poll stackoverflow to see if there was another way, perhaps some clever method of some other API?

asked Dec 10, 2008 at 20:07
3
  • 1
    Why place yourself in this spot? Why not prevent the duplicates in the first place? Commented Dec 10, 2008 at 20:11
  • 5
    Ask a question about removing duplicates... get a bunch of duplicate answers. The irony! Commented Dec 10, 2008 at 20:19
  • The way you're describing is perfect. Commented Dec 11, 2008 at 0:35

9 Answers 9

21

I would agree with your approach to override hashCode() and equals() and use something that implements Set.

Doing so also makes it absolutely clear to any other developers that the non-duplicate characteristic is required.

Another reason - you get to choose an implementation that meets your needs best now:

and you don't have to change your code to change the implementation in the future.

Ken Gentle
13.4k2 gold badges44 silver badges49 bronze badges
answered Dec 10, 2008 at 20:13
Sign up to request clarification or add additional context in comments.

Comments

9

I found this in the web

Here are two methods that allow you to remove duplicates in an ArrayList. removeDuplicate does not maintain the order where as removeDuplicateWithOrder maintains the order with some performance overhead.

  1. The removeDuplicate Method:

    /** List order not maintained **/
    public static void removeDuplicate(ArrayList arlList)
    {
     HashSet h = new HashSet(arlList);
     arlList.clear();
     arlList.addAll(h);
    }
    
  2. The removeDuplicateWithOrder Method:

    /** List order maintained **/
    public static void removeDuplicateWithOrder(ArrayList arlList)
    {
     Set set = new HashSet();
     List newList = new ArrayList();
     for (Iterator iter = arlList.iterator(); iter.hasNext();) {
     Object element = iter.next();
     if (set.add(element))
     newList.add(element);
     }
     arlList.clear();
     arlList.addAll(newList);
    }
    
Ken Gentle
13.4k2 gold badges44 silver badges49 bronze badges
answered Dec 10, 2008 at 20:16

3 Comments

Good answer, but your 2nd example isn't in a code format block
thanks to Ken G... i tried it a couple of times but i could ot fix the second code blog
LinkedHashSet keeps it in order. That may optimize it further.
3

Overriding equals and hashCode and creating a set was my first thought too. It's good practice to have some overridden version of these methods anyway in your inheritance hierarchy.

I think that if you use a LinkedHashSet you'll even preserve order of unique elements...

answered Dec 10, 2008 at 20:14

3 Comments

Yep, LinkedHashSet will maintain the insertion order.
It is not good practise to override equals and hashCode "anyway", especially in any class that will sit in an inheritance hierarchy. See Effective Java (Bloch) for more.
McDowell, I wan't really clear - I meant that there should be an overridden version somewhere in your inheritance hierarchy, and I've amended the answer to reflect that. I've not got a copy of Effective Java - is this what Bloch is getting at?
2

Basically, you want a LinkedHashSet<T> implementation that supports the List<T> interface for random access. Hence, this is what you need:

public class LinkedHashSetList<T> extends LinkedHashSet<T> implements List<T> {

// Implementations for List<T> methods here ...

}

The implementation of the List<T> methods would access and manipulate the underlying LinkedHashSet<T>. The trick is to have this class behave correctly when one attempts to add duplicates via the List<T> add methods (throwing an exception or re-adding the item at a different index would be options: which you can either choose one of or make configurable by users of the class).

answered Dec 11, 2008 at 0:28

Comments

2

Use a List distinctList to record element at the first time iterator stumble into it, returns the distinctList as list removed all duplicates

 private List removeDups(List list) {
 Set tempSet = new HashSet();
 List distinctList = new ArrayList();
 for(Iterator it = list.iterator(); it.hasNext();) {
 Object next = it.next();
 if(tempSet.add(next)) {
 distinctList.add(next);
 } 
 }
 return distinctList;
 } 

answered Jun 19, 2012 at 3:50

9 Comments

the complexity is very high, since List.contains has a O(n) time complexity, thus the complexity is O(N^2)
@FilipLuchianenco you are right, I have updated my implementation
Then you only need to keep adding the new value, if it exists it will just return false. As a result, you'll end up with an iterator and a Set to which you keep adding unique values. The only downside is the order as Set does not keep it due to resizing. Then the other solution is to have a list and a set, and if your set.add(object) returns true, you add it to the new list as well; then return the list.
Why do we need to care about the order of the Set, since what we are going to change is the List passed in the function, which we use iterator to remove duplicate, which doesn't change the internal order of the List
well, removing an item from a List is a very inefficient, since List.remove() will have to create a new list and copy all elements every time, so your complexity now is O(n^k) where k is the list size. So I did not even want to consider that as an option.
|
1

I'd like to reiterate the point made by Jason in the comments:

Why place yourself at that point at all?

Why use an array for a data structure that shouldn't hold duplicates at all?

Use a Set or a SortedSet (when the elements have a natural order as well) at all times to hold the elements. If you need to keep the insertion order, then you can use the LinkedHashSet as it has been pointed out.

Having to post-process some data structure is often a hint that you should have choosen a different one to begin with.

answered Dec 10, 2008 at 21:41

1 Comment

I agree with all comments about the concerns of the initial data structure being an Array. I am trying to lobby the developer to refactor to a Set. Thanks all for your feedback and wisdom!
1

Of course the original post begs the question, "How did you get that array (that might contain duplicated entries) in the first place?"

Do you need the array (with duplicates) for other purposes, or could you simply use a Set from the beginning?

Alternately, if you need to know the number of occurrences of each value, you could use a Map<CustomObject, Integer> to track counts. Also, the Google Collections definition of the Multimap classes may be of use.

answered Dec 10, 2008 at 22:03

Comments

0

A Set is definitely your best bet. The only way to remove things from an array (without creating a new one) is to null them out, and then you end up with a lot of null-checks later.

answered Dec 10, 2008 at 20:14

Comments

0

Speaking from a general programming standard you could always double enumerate the collections then the compare the source and target.

And if your inner enumeration always starts one entry after the source, it's fairly efficient (pseudo code to follow)

foreach ( array as source )
{
 // keep track where we are in the array
 place++;
 // loop the array starting at the entry AFTER the current one we are comparing to
 for ( i=place+1; i < max(array); i++ )
 {
 if ( source === array[place] )
 {
 destroy(array[i]);
 }
 }
}

You could arguably add a break; statement after the destroy but then you only discover the first duplicate, but if that's all you will ever have, then it would be a nice small optimization.

answered Dec 10, 2008 at 20:17

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.