I implemented a method in Java that returns an array of integers with no duplicates (i.e. an array that has no repeated numbers).
My solution seems rather long. I would like to know of ways to improve it...
public class IntArrayProcessor {
private int[] a;
public IntArrayProcessor(int[] a) {
this.a = a;
}
/**
*
* @return Array with no repeated integers.
*/
public int[] getSet() {
/* creates an array with the same entries and length as this.a */
int[] duplicateA = new int[this.a.length];
/* stores the number of repeated entries in array this.a */
int numberOfDuplicates = 0;
/* is the integer a duplicate or not? */
boolean isDuplicate;
/**
* Counts the number of duplicates in array this.a
*/
for (int i = 0; i < this.a.length; i++) {
duplicateA[i] = this.a[i];
}
for (int i = 0; i < duplicateA.length; i++) {
isDuplicate = false;
for (int j = i + 1; j < this.a.length; j++) {
if (duplicateA[i] == this.a[j]) {
isDuplicate = true;
}
}
if (isDuplicate) {
numberOfDuplicates++;
}
}
/*
* the noDuplicate array has the lenght of the this.a array minus the
* number of repeated entries
*/
int[] noDuplicate = new int[this.a.length - numberOfDuplicates];
/* to keep track of the noDuplicate indexes */
numberOfDuplicates = 0;
/**
* An array with no repeated numbers
*/
for (int i = 0; i < duplicateA.length; i++) {
isDuplicate = false;
for (int j = i + 1; j < this.a.length; j++) {
if (duplicateA[i] == this.a[j]) {
isDuplicate = true;
}
}
if (!(isDuplicate)) {
noDuplicate[numberOfDuplicates] = duplicateA[i];
numberOfDuplicates++;
}
}
return noDuplicate;
}
}
4 Answers 4
Wouldn't it be easier to just use:
public static class IntArrayProcessor {
private Integer[] arr;
public IntArrayProcessor(Integer[] arr) {
this.arr = arr;
}
public Integer[] unique() {
final Set<Integer> uniqueSet = new HashSet<Integer>(Arrays.asList(this.arr));
return uniqueSet.toArray(new Integer[uniqueSet.size()]);
}
}
I changed the method name from getSet()
to the more descriptive unique()
.
This simply generates a hash set, which contains only the unique values of the array. The hash set (uniqueSet
) is then converted from Set
to Integer[]
, and returned.
Given that this is using built-in methods and objects, it would likely be faster than most custom implementations.
Notice that HashSet
:
makes no guarantees as to the insertion order of the set; in particular it does not guarantee that the order will remain constant over time.
If you need to keep insertion order, use LinkedHashSet
as suggested in @mrblewog's comment. If you want to have the internal Set
sorted, use SortedSet
or TreeSet
.
I also changed int[]
to Integer[]
so that it can be used with generics (generics do not support primitive types).
-
\$\begingroup\$ A
LinkedHashSet
would preserve insertion order, which would in effect take[1,2,7,3,2,8,3,2]
to[1,2,7,3,8]
\$\endgroup\$Rhys– Rhys2018年07月17日 14:49:51 +00:00Commented Jul 17, 2018 at 14:49 -
\$\begingroup\$ @mrblewog yes that could be an alternative. However, it will be slower because
LinkedHashSet
has to maintain aLinkedList
internally to keep track of insertion order. \$\endgroup\$esote– esote2018年07月17日 15:11:14 +00:00Commented Jul 17, 2018 at 15:11 -
\$\begingroup\$ Yep - a tradeoff for reviewers and the OP ;-) \$\endgroup\$Rhys– Rhys2018年07月17日 15:14:10 +00:00Commented Jul 17, 2018 at 15:14
-
\$\begingroup\$ @esote Typo in the last paragraph: "Notice that it returns an unsorted array" \$\endgroup\$Aaron Digulla– Aaron Digulla2018年07月23日 14:09:38 +00:00Commented Jul 23, 2018 at 14:09
-
\$\begingroup\$ @AaronDigulla Thanks, it slipped from my mind that
HashSet
usesSet
internally, notSortedSet
. I've updated my answer with details on this. \$\endgroup\$esote– esote2018年07月23日 19:17:24 +00:00Commented Jul 23, 2018 at 19:17
Using the distinct()
method of the stream()
class you can maintain the order and simplify your code:
public static Integer[] makeUnique(Integer[] arr){
return Arrays.stream(arr)
.distinct()
.toArray(Integer[]::new);
}
If you're not ready to learn about streams and their methods, the distinct()
method is fairly simple to emulate and still keep your code simple:
public static Integer[] makeUnique(Integer[] arr){
Map<Integer,Integer> tempMap = new HashMap<>();
for(int i = 0; i < arr.length;i++)
{
if(tempMap.containsValue(arr[i])){
continue;
}
tempMap.put(i,arr[i]);
}
return tempMap.values().toArray(new Integer[0]);
}
For sure, there are easier solutions.
On the other hand, it seems you want to implement the algorithm from scratch ... I try to bring these together.
Therefore, as a reviewer, I'd ask to think about the iterations (times) you need to look at the elements:
You obviously need to compare each element to every other element. Therefore the double nested loop is in the first place justified. Maybe you can memoize the results of some comparisons already made by using a suitable data structure, e.g. a tree, map or alike.
But why twice? Maybe think into the direction of reusing variables for other purposes. That can be discovered in your code multiple times. usually that is a source of problems. Give the variables one single distinct purpose and think of how you can rewrite it then.
To be more precise, with these regards:
- the
isDuplicate
is reused as an attribute for each next array element in a new iteration. You lose it in a new iteration and have to recalculate. Typically one stores the values then. - the
numberOfDuplicates
is reused as array index when building up the result. While not hinting to a better algorithm, the style/readability is a bit affected here and points towards the general rule of thumb.
-
1\$\begingroup\$ My statement was rather targeting the amount of comparisons that are necessary. But yes, tree-like structures save you a bunch of comparisons already made. \$\endgroup\$Andreas– Andreas2018年07月19日 05:38:30 +00:00Commented Jul 19, 2018 at 5:38
This is the code review stack, so I've going to focus on that aspect and you've received some good suggestions on the implementation approach.
1) Name things for the problem or business domain, not the programming domain, so instead of using IntArrayProcessor for the class name, use a class name that encapsulates its Single responsibility. This also applies to naming functions, so instead of getSet
use something that makes the purpose of the function clear; e.g deduplicate
or getUniqueFoo
.
2) Use abstractions over primitive types, Java has excellent flexible collections, which the others have shown. Using them.
3) Do not reinvent the wheel, another reason to use Java collections, learn what libraries are available for the platform you are using.
4) Every time you use a comment, think to yourself 'why?', what made it necessary to explain this snippet of code with a comment. Can I modify the code, rename the variable to make the function of this code clear without the need for a comment. I think nearly all your comments could be removed this way as redundant. Comments should be a last resort not the first way to explain semantics of the code.
5) Decompose each function, it has four loops, extract these to separate functions which improve clarity, this will facilitate reuse and enable testing with TDD.
6) Use Test Driven Development, it really will make learning to code easier, it provides continuous feedback and once you start writing live code it will make you far more productive.