Count the number of unique elements in a sorted array

Question 1

Write a method named numUnique that accepts a sorted array of integers as a parameter and that returns the number of unique values in the array. The array is guaranteed to be in sorted order, which means that duplicates will be grouped together.

I'm mainly concerned with following proper conventions and maximizing readability; both of which I feel I know little about. In the last question I posted, I learned a bit about structuring if-statements. How can I think about these statements so that it's easier for me to write them, as well as easier for others to read them? Is there a sort of PEMDAS of programming logic that would help me understand writing conditional code with precedence? Any feedback is appreciated!

public static int numUnique(int[] array){
 if(array.length > 0){
 int countUnique = 1;
 for(int i = 0; i < array.length - 1; i++){
 if(array[i] != array[i+1]){
 countUnique++;
 }
 }
 return countUnique;
 } else {
 return 0;
 }
}

Question 2

Possibly a misunderstanding but this seems like counting discrete values rather than counting uniques. e.g. for an array [3, 2, 3, 2] I'd expect it to return 0. Could be a naming issue rather than a misunderstanding of the requirements.

Question 3

@Legato : probably, given that [3, 2, 3, 2] doesn't look a sorted array. Naming is important, and I find it non-trivial here, not liking countUnique for almost implying how to, and numberOfDistinctValues() looks an alphabetic procession.

Question 4

Just a few minor suggestions:

If you invert the condition on the length, then you can reduce the level of nesting, which is slightly easier to read
An array will never have a negative length. It's pure paranoia to check for array.length <= 0, I suggest to replace with array.length == 0
The spacing is not perfect. Use an IDE to reformat the code
This maybe a matter of taste, but I would rename countUnique to simply count, and the method from numUnique to countUnique

Putting it together:

public static int countUnique(int[] array) {
 if (array.length == 0) {
 return 0;
 }
 int count = 1;
 for (int i = 0; i < array.length - 1; i++) {
 if (array[i] != array[i + 1]) {
 count++;
 }
 }
 return count;
}

Question 5

[rename] method from numUnique to countUnique. How about countUniqueMembers or countUniqueElements

Question 6

We could do a little bit better. It depends on what array ends up looking like, but what if we had like 100 0s and then 500 1s. Do we really need to compare each one individually?

No, we don't. We can binary search the next element.

private static int upper_bound(int[] array, int lo, int hi, int val) {
 int count = hi - lo;
 while (count > 0) {
 int step = count/2;
 int cur = lo + step;
 if (val >= array[cur]) {
 lo = cur + 1;
 count -= step + 1;
 }
 else {
 count = step;
 }
 }
 return lo;
}

And then use that to increment:

public static int countUnique(int[] array) {
 int count = 0;
 int idx = 0;
 while (idx < array.length) {
 ++count;
 // exponential search to find the next elem
 int bound = 1;
 while (idx+bound < array.length && array[idx+bound] == array[idx]) {
 bound *= 2;
 }
 idx = upper_bound(array,
 idx+bound/2,
 Math.min(idx+bound, array.length),
 array[idx]
 );
 }
 return count;
}

Question 7

@JS1 The exponential search bound (just added) should keep it at \$O(n)\$ worst case, though still worse than simply walking the list.

Question 8

Add this condition though :

if(idx < array.length-1 && array[idx]!=array[idx+1] || idx == array.length-1) { System.out.println(array[idx]); count++; }

Question 9

Expanding @janos's answer a bit:

Instead of:
```
for (int i = 0; i < array.length - 1; i++) {
 if (array[i] != array[i + 1]) {
```
I would usually do:
```
for (int i = 1; i < array.length; i++) {
 if (array[i] != array[i - 1]) {
 // ...
```
Because, in my opinion, is easier to understand. I like this because I don't like the - sign in the condition. Of course, this is a matter of preference.
Since you use array.length twice, I would make it a variable, but, your choice.
I don't really like edge-case checks. I prefer the algorithm to be able to handle all cases, but in this case, it looks impossible. You can do:
```
public static int countUnique(int[] array) {
 int length = array.length;
 int count = 1;
 for (int i = 1; i < length; i++) {
 if (array[i] != array[i - 1]) {
 count++;
 }
 }
 return length == 0 ? 0 : count;
}
```
But the ternary is like edge-case handling anyways. I don't think that's a good solution.

Result:

public static int countUnique(int[] array) {
 int length = array.length;
 if (length == 0) {
 return 0;
 }
 int count = 1;
 for (int i = 1; i < length; i++) {
 if (array[i] != array[i - 1]) {
 count++;
 }
 }
 return count;
}

Question 10

Building upon @Barry's answer, Java already has Arrays.binarySearch(int[], int, int, int) that can perform the binary search without having to write additional code ourselves. :)

Using a very similar approach of modeling the next index \$m\$ as where the next larger value (\$V\$) of the current value (\$arr[n]\$) should be inserted, we have the following BiFunction<int[], Integer, Integer> implementation:

private static final BiFunction<int[], Integer, Integer> STEPPER = (sorted, i) -> {
 int r = Arrays.binarySearch(sorted, i, sorted.length, sorted[i] + 1);
 return r < 0 ? Math.abs(r + 1) : r;
 };

We need to handle for r < 0 as \$V\$ may not exist in the array, in which case Arrays.binarySearch() will return (-(insertion point) - 1). For pre-Java 8, this can easily be converted into a method as well.

The full method can then be:

private static int countUnique(int[] sortedArray) {
 if (sortedArray.length == 0) {
 return 0;
 }
 int counter = 1;
 for (int i = STEPPER.apply(sortedArray, 0); i < sortedArray.length;
 i = STEPPER.apply(sortedArray, i)) {
 counter++;
 }
 return counter;
}

After taking care of handling an empty array, we initialize counter as 1 (for the first element) and i to be the index of the next larger value than the first element. After each loop, i is then assigned the index of the next larger value than the current iteration's element.

Visualizing the steps taken can be done with a slight modification to STEPPER:

private static final BiFunction<int[], Integer, Integer> STEPPER = (sorted, i) -> {
 int r = Arrays.binarySearch(sorted, i, sorted.length, sorted[i] + 1);
 r = r < 0 ? Math.abs(r + 1) : r;
 System.out.printf("DEBUG: current (index = %d, value = %d); "
 + "next (index = %d, value = %s)%n",
 i, sorted[i], r, 
 r >= sorted.length ? "END" : sorted[r]);
 return r;
 };

Given this example:

countUnique(new int[]{0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 2, 4, 4, 4, 5, 5, 5, 5, 5});
// Output
DEBUG: current (index = 0, value = 0); next (index = 4, value = 1)
DEBUG: current (index = 4, value = 1); next (index = 7, value = 2)
DEBUG: current (index = 7, value = 2); next (index = 11, value = 4)
DEBUG: current (index = 11, value = 4); next (index = 14, value = 5)
DEBUG: current (index = 14, value = 5); next (index = 19, value = END)
Input(19): [0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 2, 4, 4, 4, 5, 5, 5, 5, 5]

The answer is the number of DEBUG lines printed, i.e. 5. It works by jumping across any indices of the numbers 0 -> 1 -> 2 -> 4 -> 5. The any index part is a known effect from the method's Javadoc:

If the range contains multiple elements with the specified value, there is no guarantee which one will be found.

This still suffices for our usage here as we only need to hit any of the next larger number. Also, since 3 is not present in the list, 4 is returned after 2, and this is where the check for r < 0 is required to convert the result of Arrays.binarySearch() to a positive, usable index.

Question 11

Nope, the r < 0 handling takes care of that by turning (-(insertion point) - 1) to insertion point by Math.abs(r + 1).

Question 12

Ah, I misunderstood what the error case meant. I'm out of cotes today already, here's a phantom +1

Question 13

I am really impressed by efficient algorithms. But for me it is really difficult to identifiy what this is doing and why. For example I am missing an expression like "array[0] == array[1]" to give a hint whats this algorithm all about. If I really need an efficient solution then I would really use your code. But I would put it under test, trust the name of the function and don't care about how it comes to the solution.

Question 14

It's using binary search to find what is the next largest number from the current element, let me update my answer...

Question 15

Upvote comment for adding explanation. But I still have an issue with readability.

Question 16

Solution 1

Maybe for readability a recursive function is a proper solution in this case. But think about it carefully because it will strain the stack.

public static int getUniqueValueCount(int[] array) {
 if (array.length == 0) {
 return 0;
 } else if (array.length == 1) {
 return 1;
 } else {
 return getUniqueValueCount(Arrays.copyOfRange(array, 1, array.length)) 
 + ((array[0] == array[1]) ? 0 : 1);
 }
}

Pros

getUniqueValueCount instead of numUnique --> more expressive name
no for-int-i iteration --> less control structures
no state involved, nearly functional
positive formulation of main condition

Cons

strains the stack
additional array operations O(n^2)

Solution 2

The following solution is a summary of this discussion with some modifications to improve readability. It uses the binary search approach for efficiency and speed. Personally I am not that convinced of it because there are some specifics about Arrays.binarySearch(..) in the case of application and interpretation of the return value.

The return value can hold (as far as I understand) up to two things. In the normal case (key was found) the index of the (any?) key is returned. In the case when the binary search returns a negative value it is considered as "not found". Furthermore the value can be multiplied by (-1) to receive an insertion index. Finally you have to decrement the insertion index by 1.

 private static int getUniqueValueCount(int[] sorted) {
 int count = 0;
 for (int curentIndx = 0; (curentIndx = nextDeviationIndex(sorted, curentIndx)) != -1;) {
 count++;
 }
 return count;
 }
 private static int nextDeviationIndex(int[] sorted, int currentIndex) {
 if (sorted.length == 1 && currentIndex == 0) {
 return 1;
 } else if (currentIndex < sorted.length) {
 return binarySearchNextDeviationIndex(sorted, currentIndex, sorted[currentIndex] + 1);
 } else {
 return -1;
 }
 }
 private static int binarySearchNextDeviationIndex(int[] sorted, int startIndex, int key) {
 int searchResult = Arrays.binarySearch(sorted, startIndex, sorted.length, key);
 // this is very hard to get ...
 return searchResult < 0 ? -searchResult - 1 : searchResult;
 }

Although the solution inflates to three methods nobody can avoid the responsibilites involved to produce the result:

Counting while iterating over the interesting elements until there are no left
Decide how to find the next interesting element and provide the information if there is no element left
Provide an efficient search for the case of mass data

Solution 3 (OO)

This is an solution that shows how the responsibilities are cut in the OO. I guess this is the most verbose one for the given problem.

public static class DeviationIterator implements Iterator<Integer> {
 private int[] sorted;
 private int nextIndex;
 public DeviationIterator(int[] sorted) {
 super();
 this.sorted = sorted;
 this.nextIndex = calculateNextIndex();
 }
 @Override
 public boolean hasNext() {
 return nextIndex != -1;
 }
 @Override
 public Integer next() {
 if (nextIndex != -1) {
 int currentIndex = nextIndex;
 nextIndex = calculateNextIndex();
 return currentIndex;
 } else {
 throw new NoSuchElementException();
 }
 }
 private int calculateNextIndex() {
 if (sorted.length == 1 && nextIndex == 0) {
 return 1;
 } else if (nextIndex < sorted.length) {
 return binarySearchNextDeviationIndex(sorted, nextIndex,
 sorted[nextIndex] + 1);
 } else {
 return -1;
 }
 }
}
private static int getUniqueValueCount(int[] sorted) {
 int count = 0;
 DeviationIterator deviationIterator = new DeviationIterator(sorted);
 while(deviationIterator.hasNext()) {
 deviationIterator.next();
 count++;
 }
 return count;
}

The iterator encapsulates the search algorithm
the getUniqueValueCount uses the iterator to count

The most interesting part is you can clearly see, where the responsibilities are located. Using the iterator pattern identifies the fragment -1 as the EOF for the iterator. The user of the iterator via contract has to ask for if another element is available, so the "Solution 2". But there -1 is not translated into false so the the caller has to interpret it by himself.

Question 17

Why downvoted? Explain please ;-).

Question 18

Recursion for this problem is not a good suggestion.

Question 19

Arrays.copyOfRange() is a really bad option. You shouldn't need to copy an array for this simple operation.

Question 20

The focus of my suggestion was readablitity. And the problem with "Arrays.copyRange()" I mentioned in the main points. I provided a solution together with its limits. I would understand "not voting" for this answer... but downvoting? But after all doesn't matter.

Question 21

@OOP-Expert You're proposing an \$O(n^2)\$ solution to a problem which has a very simple \$O(n)\$ one, with potential for better. Even if you didn't copy the range (for no reason?), recursion offers no benefits here. This is a poor proposal. And it's not great for readability - you're unnecessarily duplicating the recursive call.

janos janos 113k15 gold badges154 silver badges396 bronze badges · Accepted Answer · 2015-12-15 20:07:35Z

Just a few minor suggestions:

If you invert the condition on the length, then you can reduce the level of nesting, which is slightly easier to read
An array will never have a negative length. It's pure paranoia to check for array.length <= 0, I suggest to replace with array.length == 0
The spacing is not perfect. Use an IDE to reformat the code
This maybe a matter of taste, but I would rename countUnique to simply count, and the method from numUnique to countUnique

Putting it together:

public static int countUnique(int[] array) {
 if (array.length == 0) {
 return 0;
 }
 int count = 1;
 for (int i = 0; i < array.length - 1; i++) {
 if (array[i] != array[i + 1]) {
 count++;
 }
 }
 return count;
}

[rename] method from numUnique to countUnique. How about countUniqueMembers or countUniqueElements

Stack Exchange Network

Count the number of unique elements in a sorted array

5 Answers 5

Solution 1

Solution 2

Solution 3 (OO)

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Count the number of unique elements in a sorted array

5 Answers 5

Solution 1

Solution 2

Solution 3 (OO)

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions