Finding the Intersection of Arrays

Question 1

I would like any advice on how to improve this code. To clarify what this code considers an intersection to be, the intersection of [3,3,4,6,7] and [3,3,3,6,7] is [3,6,7]. I would appreciate any improvements to make the code more readable or perform faster.

public ArrayList<Integer> findIt (int[] a, int[] b){
 ArrayList<Integer> result = new ArrayList<Integer>();
 Arrays.sort(a);
 Arrays.sort(b);
 int i = 0;
 int j = 0;
 while(i < a.length && j < b.length){
 if(a[i]<b[j])
 ++i;
 else if (a[i] > b[j])
 ++j;
 else{
 if (!result.contains(a[i]))
 result.add(a[i]);
 ++i;
 ++j;
 } 
 }
 return result;
}

Question 2

Examples of given arrays are already sorted. Are all given arrays sorted? If result ArrayList can not contain duplicate elements, maybe Set would be better?

Question 3

Since the answers are using Lists instead of arrays as input, I'll ask directly: Any special reason you are using arrays? It's very unusual to use arrays in Java, because Lists are much more flexible. Generally, unless you need to optimize extremely for speed, you should reconsider using arrays at all.

Question 4

You mean "intersection".

Question 5

Algorithm

You're using a O(n log(n)) algorithm here, which could be O(n2) for difficult cases since contains() is O(n) and is called in a loop. Instead, use the property of HashSet: access is O(1), which means you can achieve this in O(n) time. My algorithm below simply keeps track of what it can add to the result. I can add everything that exists in the first list, but I should not add an item I already added.

See my tested code for my implementation:

public static List<Integer> intersection(List<Integer> a, List<Integer> b){
 Set<Integer> canAdd = new HashSet<Integer>(a);
 List<Integer> result = new ArrayList<Integer>();
 for (int n: b) {
 if(first.contains(n)) {
 result.add(n);
 // we wish to add only one n
 canAdd.remove(n);
 }
 }
 return result;
}

Comments on your code

You should return a List, not an ArrayList since it's an implementation detail. Using an ArrayList is OK here since adding at the end of it has O(1) amortized complexity. Otherwise, lists should be LinkedList (when adding to the beginning, not to the end, as @Landei points out).
Be careful about the names you choose. findIt doesn't seem to be appropriate, but removeDuplicates or intersection describes what the code does.

Question 6

I agree with everything except with the LinkedList comment. Benchmarks show that ArrayList is almost always preferable. See e.g. javaspecialists.eu/archive/Issue111.html

Question 7

You use a reference first, but it isn't defined (your code won't compile). If it's a or b, you have a nasty side-effect.

Question 8

@Landei Oops, I meant to use addFirst. Since it's not always available in the List interface, I'll switch back to ArrayList. @X-Zero Opps, that's canAdd, I forgot to change that one.

Question 9

@seand Haha, so this is why I couldn't understand what an "interception" was.

Question 10

@Cygal : Assuming that HashSet.contains() is really faster than the List one, this could be the ultimate code :-)

public static List<Integer> intersection(List<Integer> a, List<Integer> b) {
 Set<Integer> aSet = new HashSet<Integer>(a);
 Set<Integer> bSet = new HashSet<Integer>(b);
 for(Iterator<Integer> it = aSet.iterator(); it.hasNext();) {
 if(!bSet.contains(it.next())) it.remove();
 }
 return new ArrayList<Integer>(aSet);
}

Question 11

The new HashSet trick is nice, but The behavior of an iterator is unspecified if the underlying collection is modified while the iteration is in progress in any way other than by calling remove().. Bien tenté!

Question 12

'aSet' is a local variable so it cannot be modified from anywhere else. It should be good :-) Merci !

Question 13

I don't know if this is about concurrent accesss or "removing while doing a for loop". Maybe this works and is specified, but I would not do this in production code unless I was sure. (That's forbidden in C++, by the way.)

Question 14

In Java, the only way to remove an Object from a collection is to use an Iterator. This note is applicable only if there is a concurrent access. Trust me, it works well ;-)

Question 15

What about ? It's clean.

public static List<Integer> intersection(List<Integer> a, List<Integer> b){
 List<Integer> result = new ArrayList<Integer>();
 for(int v : a) {
 if(b.contains(v) && !result.contains(v)) {
 result.add(v);
 }
 }
 // sort if you need
 Collections.sort(result); 
 return result;
}

Question 16

It's simply a performance issue: List.contains takes a lot of time compared to HashSet.contains(): O(1) vs O(n). Otherwise, it's nice and easy to read, here's an upvote. :)

Question 17

Side note: you should use for (Integer instead of for (int because you're unnecessarily boxing and unboxing as a result.

Question 18

You can use CollectionUtils utility in Apache Commons Collection. There is intersection method in this class. I will write my utility for your case in following:

public <T> Collection<T> intersection(T[] one, T[] two)
{
 List<T> list_one = Arrays.asList(one);
 List<T> list_two = Arrays.asList(two);
 List<T> intersection = (List<T>) org.apache.commons.collections
 .CollectionUtils.intersection(list_one, list_two);
 return intersection;
}

I hope my answer will useful for you.

Quentin Pradet Quentin Pradet 7,0641 gold badge25 silver badges44 bronze badges · Accepted Answer · 2012-03-12 10:53:32Z

Algorithm

You're using a O(n log(n)) algorithm here, which could be O(n2) for difficult cases since contains() is O(n) and is called in a loop. Instead, use the property of HashSet: access is O(1), which means you can achieve this in O(n) time. My algorithm below simply keeps track of what it can add to the result. I can add everything that exists in the first list, but I should not add an item I already added.

See my tested code for my implementation:

public static List<Integer> intersection(List<Integer> a, List<Integer> b){
 Set<Integer> canAdd = new HashSet<Integer>(a);
 List<Integer> result = new ArrayList<Integer>();
 for (int n: b) {
 if(first.contains(n)) {
 result.add(n);
 // we wish to add only one n
 canAdd.remove(n);
 }
 }
 return result;
}

Comments on your code

You should return a List, not an ArrayList since it's an implementation detail. Using an ArrayList is OK here since adding at the end of it has O(1) amortized complexity. Otherwise, lists should be LinkedList (when adding to the beginning, not to the end, as @Landei points out).
Be careful about the names you choose. findIt doesn't seem to be appropriate, but removeDuplicates or intersection describes what the code does.

I agree with everything except with the LinkedList comment. Benchmarks show that ArrayList is almost always preferable. See e.g. javaspecialists.eu/archive/Issue111.html
You use a reference first, but it isn't defined (your code won't compile). If it's a or b, you have a nasty side-effect.
@Landei Oops, I meant to use addFirst. Since it's not always available in the List interface, I'll switch back to ArrayList. @X-Zero Opps, that's canAdd, I forgot to change that one.
@seand Haha, so this is why I couldn't understand what an "interception" was.

Stack Exchange Network

Finding the Intersection of Arrays

4 Answers 4

Algorithm

Comments on your code

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Finding the Intersection of Arrays

4 Answers 4

Algorithm

Comments on your code

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions