I would like any advice on how to improve this code. To clarify what this code considers an intersection to be, the intersection of [3,3,4,6,7] and [3,3,3,6,7] is [3,6,7]. I would appreciate any improvements to make the code more readable or perform faster.
public ArrayList<Integer> findIt (int[] a, int[] b){
ArrayList<Integer> result = new ArrayList<Integer>();
Arrays.sort(a);
Arrays.sort(b);
int i = 0;
int j = 0;
while(i < a.length && j < b.length){
if(a[i]<b[j])
++i;
else if (a[i] > b[j])
++j;
else{
if (!result.contains(a[i]))
result.add(a[i]);
++i;
++j;
}
}
return result;
}
4 Answers 4
Algorithm
You're using a O(n log(n)) algorithm here, which could be O(n2) for difficult cases since contains()
is O(n) and is called in a loop. Instead, use the property of HashSet
: access is O(1), which means you can achieve this in O(n) time. My algorithm below simply keeps track of what it can add to the result. I can add everything that exists in the first list, but I should not add an item I already added.
See my tested code for my implementation:
public static List<Integer> intersection(List<Integer> a, List<Integer> b){
Set<Integer> canAdd = new HashSet<Integer>(a);
List<Integer> result = new ArrayList<Integer>();
for (int n: b) {
if(first.contains(n)) {
result.add(n);
// we wish to add only one n
canAdd.remove(n);
}
}
return result;
}
Comments on your code
- You should return a
List
, not anArrayList
since it's an implementation detail. Using anArrayList
is OK here since adding at the end of it has O(1) amortized complexity. Otherwise, lists should beLinkedList
(when adding to the beginning, not to the end, as @Landei points out). - Be careful about the names you choose.
findIt
doesn't seem to be appropriate, butremoveDuplicates
orintersection
describes what the code does.
-
1\$\begingroup\$ I agree with everything except with the
LinkedList
comment. Benchmarks show thatArrayList
is almost always preferable. See e.g. javaspecialists.eu/archive/Issue111.html \$\endgroup\$Landei– Landei2012年03月12日 15:18:18 +00:00Commented Mar 12, 2012 at 15:18 -
\$\begingroup\$ You use a reference
first
, but it isn't defined (your code won't compile). If it'sa
orb
, you have a nasty side-effect. \$\endgroup\$Clockwork-Muse– Clockwork-Muse2012年03月12日 16:07:32 +00:00Commented Mar 12, 2012 at 16:07 -
\$\begingroup\$ @Landei Oops, I meant to use addFirst. Since it's not always available in the
List
interface, I'll switch back to ArrayList. @X-Zero Opps, that'scanAdd
, I forgot to change that one. \$\endgroup\$Quentin Pradet– Quentin Pradet2012年03月12日 16:53:22 +00:00Commented Mar 12, 2012 at 16:53 -
\$\begingroup\$ @seand Haha, so this is why I couldn't understand what an "interception" was. \$\endgroup\$Quentin Pradet– Quentin Pradet2012年03月13日 17:25:25 +00:00Commented Mar 13, 2012 at 17:25
@Cygal : Assuming that HashSet.contains() is really faster than the List one, this could be the ultimate code :-)
public static List<Integer> intersection(List<Integer> a, List<Integer> b) {
Set<Integer> aSet = new HashSet<Integer>(a);
Set<Integer> bSet = new HashSet<Integer>(b);
for(Iterator<Integer> it = aSet.iterator(); it.hasNext();) {
if(!bSet.contains(it.next())) it.remove();
}
return new ArrayList<Integer>(aSet);
}
-
\$\begingroup\$ The
new HashSet
trick is nice, but The behavior of an iterator is unspecified if the underlying collection is modified while the iteration is in progress in any way other than by callingremove()
.. Bien tenté! \$\endgroup\$Quentin Pradet– Quentin Pradet2012年03月13日 13:03:07 +00:00Commented Mar 13, 2012 at 13:03 -
\$\begingroup\$ 'aSet' is a local variable so it cannot be modified from anywhere else. It should be good :-) Merci ! \$\endgroup\$Xavier– Xavier2012年03月13日 13:15:29 +00:00Commented Mar 13, 2012 at 13:15
-
\$\begingroup\$ I don't know if this is about concurrent accesss or "removing while doing a for loop". Maybe this works and is specified, but I would not do this in production code unless I was sure. (That's forbidden in C++, by the way.) \$\endgroup\$Quentin Pradet– Quentin Pradet2012年03月13日 13:22:01 +00:00Commented Mar 13, 2012 at 13:22
-
1\$\begingroup\$ In Java, the only way to remove an Object from a collection is to use an Iterator. This note is applicable only if there is a concurrent access. Trust me, it works well ;-) \$\endgroup\$Xavier– Xavier2012年03月13日 13:30:07 +00:00Commented Mar 13, 2012 at 13:30
What about ? It's clean.
public static List<Integer> intersection(List<Integer> a, List<Integer> b){
List<Integer> result = new ArrayList<Integer>();
for(int v : a) {
if(b.contains(v) && !result.contains(v)) {
result.add(v);
}
}
// sort if you need
Collections.sort(result);
return result;
}
-
\$\begingroup\$ It's simply a performance issue:
List.contains
takes a lot of time compared toHashSet.contains()
: O(1) vs O(n). Otherwise, it's nice and easy to read, here's an upvote. :) \$\endgroup\$Quentin Pradet– Quentin Pradet2012年03月13日 10:05:06 +00:00Commented Mar 13, 2012 at 10:05 -
\$\begingroup\$ Side note: you should use
for (Integer
instead offor (int
because you're unnecessarily boxing and unboxing as a result. \$\endgroup\$Adam Rofer– Adam Rofer2012年03月15日 06:19:00 +00:00Commented Mar 15, 2012 at 6:19
You can use CollectionUtils
utility in Apache Commons Collection. There is intersection
method in this class. I will write my utility for your case in following:
public <T> Collection<T> intersection(T[] one, T[] two)
{
List<T> list_one = Arrays.asList(one);
List<T> list_two = Arrays.asList(two);
List<T> intersection = (List<T>) org.apache.commons.collections
.CollectionUtils.intersection(list_one, list_two);
return intersection;
}
I hope my answer will useful for you.
List
s instead of arrays as input, I'll ask directly: Any special reason you are using arrays? It's very unusual to use arrays in Java, becauseList
s are much more flexible. Generally, unless you need to optimize extremely for speed, you should reconsider using arrays at all. \$\endgroup\$