A method that returns an array with no duplicates

Question 1

I implemented a method in Java that returns an array of integers with no duplicates (i.e. an array that has no repeated numbers).

My solution seems rather long. I would like to know of ways to improve it...

public class IntArrayProcessor {
 private int[] a;
 public IntArrayProcessor(int[] a) {
 this.a = a;
 }
 /**
 * 
 * @return Array with no repeated integers.
 */
 public int[] getSet() {
 /* creates an array with the same entries and length as this.a */
 int[] duplicateA = new int[this.a.length];
 /* stores the number of repeated entries in array this.a */
 int numberOfDuplicates = 0;
 /* is the integer a duplicate or not? */
 boolean isDuplicate;
 /**
 * Counts the number of duplicates in array this.a
 */
 for (int i = 0; i < this.a.length; i++) {
 duplicateA[i] = this.a[i];
 }
 for (int i = 0; i < duplicateA.length; i++) {
 isDuplicate = false;
 for (int j = i + 1; j < this.a.length; j++) {
 if (duplicateA[i] == this.a[j]) {
 isDuplicate = true;
 }
 }
 if (isDuplicate) {
 numberOfDuplicates++;
 }
 }
 /*
 * the noDuplicate array has the lenght of the this.a array minus the
 * number of repeated entries
 */
 int[] noDuplicate = new int[this.a.length - numberOfDuplicates];
 /* to keep track of the noDuplicate indexes */
 numberOfDuplicates = 0;
 /**
 * An array with no repeated numbers
 */
 for (int i = 0; i < duplicateA.length; i++) {
 isDuplicate = false;
 for (int j = i + 1; j < this.a.length; j++) {
 if (duplicateA[i] == this.a[j]) {
 isDuplicate = true;
 }
 }
 if (!(isDuplicate)) {
 noDuplicate[numberOfDuplicates] = duplicateA[i];
 numberOfDuplicates++;
 }
 }
 return noDuplicate;
 }
}

Question 2

Wouldn't it be easier to just use:

public static class IntArrayProcessor {
 private Integer[] arr;
 public IntArrayProcessor(Integer[] arr) {
 this.arr = arr;
 }
 public Integer[] unique() {
 final Set<Integer> uniqueSet = new HashSet<Integer>(Arrays.asList(this.arr));
 return uniqueSet.toArray(new Integer[uniqueSet.size()]);
 }
}

I changed the method name from getSet() to the more descriptive unique().

This simply generates a hash set, which contains only the unique values of the array. The hash set (uniqueSet) is then converted from Set to Integer[], and returned.

Given that this is using built-in methods and objects, it would likely be faster than most custom implementations.

Notice that HashSet:

makes no guarantees as to the insertion order of the set; in particular it does not guarantee that the order will remain constant over time.

— HashSet documentation

If you need to keep insertion order, use LinkedHashSet as suggested in @mrblewog's comment. If you want to have the internal Set sorted, use SortedSet or TreeSet.

I also changed int[] to Integer[] so that it can be used with generics (generics do not support primitive types).

Question 3

A LinkedHashSet would preserve insertion order, which would in effect take [1,2,7,3,2,8,3,2] to [1,2,7,3,8]

Question 4

@mrblewog yes that could be an alternative. However, it will be slower because LinkedHashSet has to maintain a LinkedList internally to keep track of insertion order.

Question 5

Yep - a tradeoff for reviewers and the OP ;-)

Question 6

@esote Typo in the last paragraph: "Notice that it returns an unsorted array"

Question 7

@AaronDigulla Thanks, it slipped from my mind that HashSet uses Set internally, not SortedSet. I've updated my answer with details on this.

Question 8

Using the distinct() method of the stream() class you can maintain the order and simplify your code:

public static Integer[] makeUnique(Integer[] arr){
 return Arrays.stream(arr)
 .distinct()
 .toArray(Integer[]::new);
}

If you're not ready to learn about streams and their methods, the distinct() method is fairly simple to emulate and still keep your code simple:

public static Integer[] makeUnique(Integer[] arr){
 Map<Integer,Integer> tempMap = new HashMap<>();
 for(int i = 0; i < arr.length;i++)
 {
 if(tempMap.containsValue(arr[i])){
 continue;
 }
 tempMap.put(i,arr[i]);
 }
 return tempMap.values().toArray(new Integer[0]);
}

Question 9

For sure, there are easier solutions.

On the other hand, it seems you want to implement the algorithm from scratch ... I try to bring these together.

Therefore, as a reviewer, I'd ask to think about the iterations (times) you need to look at the elements:

You obviously need to compare each element to every other element. Therefore the double nested loop is in the first place justified. Maybe you can memoize the results of some comparisons already made by using a suitable data structure, e.g. a tree, map or alike.

But why twice? Maybe think into the direction of reusing variables for other purposes. That can be discovered in your code multiple times. usually that is a source of problems. Give the variables one single distinct purpose and think of how you can rewrite it then.

To be more precise, with these regards:

the isDuplicate is reused as an attribute for each next array element in a new iteration. You lose it in a new iteration and have to recalculate. Typically one stores the values then.
the numberOfDuplicates is reused as array index when building up the result. While not hinting to a better algorithm, the style/readability is a bit affected here and points towards the general rule of thumb.

Question 10

My statement was rather targeting the amount of comparisons that are necessary. But yes, tree-like structures save you a bunch of comparisons already made.

Question 11

This is the code review stack, so I've going to focus on that aspect and you've received some good suggestions on the implementation approach.

1) Name things for the problem or business domain, not the programming domain, so instead of using IntArrayProcessor for the class name, use a class name that encapsulates its Single responsibility. This also applies to naming functions, so instead of getSet use something that makes the purpose of the function clear; e.g deduplicate or getUniqueFoo.

2) Use abstractions over primitive types, Java has excellent flexible collections, which the others have shown. Using them.

3) Do not reinvent the wheel, another reason to use Java collections, learn what libraries are available for the platform you are using.

4) Every time you use a comment, think to yourself 'why?', what made it necessary to explain this snippet of code with a comment. Can I modify the code, rename the variable to make the function of this code clear without the need for a comment. I think nearly all your comments could be removed this way as redundant. Comments should be a last resort not the first way to explain semantics of the code.

5) Decompose each function, it has four loops, extract these to separate functions which improve clarity, this will facilitate reuse and enable testing with TDD.

6) Use Test Driven Development, it really will make learning to code easier, it provides continuous feedback and once you start writing live code it will make you far more productive.

esote esote 3,8002 gold badges24 silver badges44 bronze badges · Answer 1 · 2018-07-16 23:19:38Z

Wouldn't it be easier to just use:

public static class IntArrayProcessor {
 private Integer[] arr;
 public IntArrayProcessor(Integer[] arr) {
 this.arr = arr;
 }
 public Integer[] unique() {
 final Set<Integer> uniqueSet = new HashSet<Integer>(Arrays.asList(this.arr));
 return uniqueSet.toArray(new Integer[uniqueSet.size()]);
 }
}

I changed the method name from getSet() to the more descriptive unique().

This simply generates a hash set, which contains only the unique values of the array. The hash set (uniqueSet) is then converted from Set to Integer[], and returned.

Given that this is using built-in methods and objects, it would likely be faster than most custom implementations.

Notice that HashSet:

makes no guarantees as to the insertion order of the set; in particular it does not guarantee that the order will remain constant over time.

— HashSet documentation

If you need to keep insertion order, use LinkedHashSet as suggested in @mrblewog's comment. If you want to have the internal Set sorted, use SortedSet or TreeSet.

I also changed int[] to Integer[] so that it can be used with generics (generics do not support primitive types).

A LinkedHashSet would preserve insertion order, which would in effect take [1,2,7,3,2,8,3,2] to [1,2,7,3,8]
@mrblewog yes that could be an alternative. However, it will be slower because LinkedHashSet has to maintain a LinkedList internally to keep track of insertion order.
@esote Typo in the last paragraph: "Notice that it returns an unsorted array"
@AaronDigulla Thanks, it slipped from my mind that HashSet uses Set internally, not SortedSet. I've updated my answer with details on this.

user33306user33306 · Answer 2 · 2018-07-17 04:11:01Z

Using the distinct() method of the stream() class you can maintain the order and simplify your code:

public static Integer[] makeUnique(Integer[] arr){
 return Arrays.stream(arr)
 .distinct()
 .toArray(Integer[]::new);
}

If you're not ready to learn about streams and their methods, the distinct() method is fairly simple to emulate and still keep your code simple:

public static Integer[] makeUnique(Integer[] arr){
 Map<Integer,Integer> tempMap = new HashMap<>();
 for(int i = 0; i < arr.length;i++)
 {
 if(tempMap.containsValue(arr[i])){
 continue;
 }
 tempMap.put(i,arr[i]);
 }
 return tempMap.values().toArray(new Integer[0]);
}

Andreas Andreas 1114 bronze badges · Answer 3 · 2018-07-17 12:51:46Z

For sure, there are easier solutions.

On the other hand, it seems you want to implement the algorithm from scratch ... I try to bring these together.

Therefore, as a reviewer, I'd ask to think about the iterations (times) you need to look at the elements:

You obviously need to compare each element to every other element. Therefore the double nested loop is in the first place justified. Maybe you can memoize the results of some comparisons already made by using a suitable data structure, e.g. a tree, map or alike.

But why twice? Maybe think into the direction of reusing variables for other purposes. That can be discovered in your code multiple times. usually that is a source of problems. Give the variables one single distinct purpose and think of how you can rewrite it then.

To be more precise, with these regards:

the isDuplicate is reused as an attribute for each next array element in a new iteration. You lose it in a new iteration and have to recalculate. Typically one stores the values then.
the numberOfDuplicates is reused as array index when building up the result. While not hinting to a better algorithm, the style/readability is a bit affected here and points towards the general rule of thumb.

My statement was rather targeting the amount of comparisons that are necessary. But yes, tree-like structures save you a bunch of comparisons already made.

Martin Spamer Martin Spamer 5251 gold badge4 silver badges13 bronze badges · Answer 4 · 2018-07-17 15:40:42Z

This is the code review stack, so I've going to focus on that aspect and you've received some good suggestions on the implementation approach.

1) Name things for the problem or business domain, not the programming domain, so instead of using IntArrayProcessor for the class name, use a class name that encapsulates its Single responsibility. This also applies to naming functions, so instead of getSet use something that makes the purpose of the function clear; e.g deduplicate or getUniqueFoo.

2) Use abstractions over primitive types, Java has excellent flexible collections, which the others have shown. Using them.

3) Do not reinvent the wheel, another reason to use Java collections, learn what libraries are available for the platform you are using.

4) Every time you use a comment, think to yourself 'why?', what made it necessary to explain this snippet of code with a comment. Can I modify the code, rename the variable to make the function of this code clear without the need for a comment. I think nearly all your comments could be removed this way as redundant. Comments should be a last resort not the first way to explain semantics of the code.

5) Decompose each function, it has four loops, extract these to separate functions which improve clarity, this will facilitate reuse and enable testing with TDD.

6) Use Test Driven Development, it really will make learning to code easier, it provides continuous feedback and once you start writing live code it will make you far more productive.

Stack Exchange Network

A method that returns an array with no duplicates

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

A method that returns an array with no duplicates

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions