Find the first unique character in a string

Question 1

Implementation for a relatively basic programming challenge:

Find the first unique character in a string

So evaluating the string ABCA would return B.

My implementation does the following:

Initialize a boolean array that is the size of the input string - note that the default initialization value is false
The boolean array's values correspond to whether or not the input string character that shares the same index value is unique or not.
Compare each character in the input string against all the characters to the right of it
If there is a match, set the values in the boolean array to true for each character index value.
Iterate through the boolean array and return the first value that is false.
If iteration completes, throw a NoUniqueCharacters Exception

Is there a better approach?

I briefly considered using some sort of Ordered Map (if such a thing exists) that maps characters to their counts where the character key values are ordered by their index in the string

public Character identifyFirstUniqueCharacterInString(final String string) throws NoUniqueCharactersException {
 final char[] chars = string.toCharArray();
 final boolean[] repeatedCharacterIndices = new boolean[chars.length];
 for (int candidateCharacterIndex = 0; candidateCharacterIndex < chars.length; candidateCharacterIndex++) {
 for (int comparisonCharacterIndex = candidateCharacterIndex + 1; comparisonCharacterIndex < chars.length; comparisonCharacterIndex++) {
 if (chars[candidateCharacterIndex] == chars[comparisonCharacterIndex]) {
 repeatedCharacterIndices[candidateCharacterIndex] = true;
 repeatedCharacterIndices[comparisonCharacterIndex] = true;
 }
 }
 }
 for (int repeatedCharacterIndex = 0; repeatedCharacterIndex < repeatedCharacterIndices.length; repeatedCharacterIndex++) {
 if (!repeatedCharacterIndices[repeatedCharacterIndex]) {
 return chars[repeatedCharacterIndex];
 }
 }
 throw new NoUniqueCharactersException();
 }

Question 2

@JaeBradley docs.oracle.com/javase/7/docs/api/java/util/LinkedHashMap.html -> Ordered Map that keep the insertion order, might be useful if you want to try the Ordered map way.

Question 3

@JaeBradley I have actually provided a correct solution, not too sure why it was originally accepted 5 years and 11 months ago...

Question 4

2021年10月31日:

Wells... as pointed out by @RiaD's comment, the original answer below failed to account for values seen. That means a simple string like aabc will return the second a, as it wrongly thinks it's unique having not appeared again.

I think where we got befuddled was trying to keep a mix of arrays in order to know what's unique and where they are. An alternative, arguably simpler way, is to use a map and then iterating on it. (so two iterations...?)

The first iteration collects into a Map<Character, Integer>, with a merging function that returns a -1 when we encounter repeated values.

The next iteration on the new map keeps only the valid positions (>= 0), sorts by the value and returns the minimum one.

 public static char firstUnique(String input) {
 return IntStream.range(0, input.length()).boxed() // first iteration
 .collect(Collectors.toMap(input::charAt, i -> i, (a, b) -> -1))
 .entrySet().stream() // second iteration
 .filter(entry -> entry.getValue() >= 0)
 .min(Map.Entry.comparingByValue())
 .map(Map.Entry::getKey)
 .orElseThrow(NoUniqueCharactersException::new);
 }

(wrong answer provided below, keeping it as a learning lesson...)

One alternative is to let String.indexOf(int, int) do the iteration for you:

 return IntStream.range(0, input.length())
 .filter(i -> input.indexOf(input.charAt(i), i + 1) == -1)
 .mapToObj(i -> Character.valueOf(input.charAt(i)))
 .findFirst().orElseThrow(NoUniqueCharactersException::new);

Loop with IntStream from 0 (inclusive) to the length of input (exclusive).

Filter for indices i where input.indexOf(input.charAt(i), i + 1) gives -1, i.e. there is no occurrence of the character at index i past that position.

For any matches, convert to our desired Character wrapper class using mapToObj().

In order to return the first character, we findFirst() from the IntStream, orElseThrow() our custom NoUniqueCharactersException.

A second alternative is to use String.lastIndexOf(int):

 // instead of this
 // .filter(i -> input.indexOf(input.charAt(i), i + 1) == -1)
 // use this
 .filter(i -> i == input.lastIndexOf(input.charAt(i)))

In this case, if the last index of the character at position i is indeed i, that means we have found our unique character.

Question 5

I find the code very hard to read because of the very long variable names. I therefore rewrote it to:

@Nullable
public static Character firstUnique(String str) {
 char[] chars = str.toCharArray();
 boolean[] repeated = new boolean[chars.length];
 for (int i = 0; i < chars.length; i++) {
 for (int j = i + 1; j < chars.length; j++) {
 if (chars[i] == chars[j]) {
 repeated[i] = true;
 repeated[j] = true;
 }
 }
 }
 for (int i = 0; i < repeated.length; i++) {
 if (!repeated[i]) {
 return chars[i];
 }
 }
 return null;
}

Now the code fits easily on the screen and looks like most other code that is doing string operations. I also removed all redundant parts from the method name:

identify was wrong. A character does not have an identity since Character and char are value types.
Character was redundant because the return type already says Character.
InString was redundant because the method parameter is of type String.

Regarding the algorithmic complexity: Your code works fine for short strings. But you should not run this code on strings of 16M characters since it would take \2ドル^{47}\$ basic operations. Therefore, if this code is ever going to work with long strings, you should have an alternative algorithm:

@Nullable
static OptionalInt firstUniqueLong(String str) {
 BitSet once = new BitSet();
 BitSet dups = new BitSet();
 str.codePoints().forEach(codePoint -> (once.get(codePoint) ? dups : once).set(codePoint));
 return str.codePoints().filter(codePoint -> !dups.get(codePoint)).findFirst();
}

This code also works for small strings.

The execution time is \$\mathcal O(\text{str.length})\,ドル as opposed to the \$\mathcal O(\text{str.length}^2)\$ of your code.

The space requirement of this code depends not on the length of the string but on the actual characters (well, Unicode code points) in it. For ASCII it works well, but since the current implementation for BitSet allocates a simple array, inserting a single emoji in the string allocates 2 times 16 kilobytes of memory. Characters with even higher code points might raise the memory requirement to 2 times 140 kilobytes (= Character.MAX_CODE_POINT / 8).

Question 6

The algorithm is inefficient. For each character, it checks if it's repeated by checking all following characters. The time complexity is \$O(n^2)\$, space complexity is \$O(n)\$.

You can do better. Consider this alternative, and very simple algorithm:

Build a map of counts from the characters in the input.
Loop over the characters in the input again, and check the count.
- If you find count == 1, you have the answer, return it
- If you reach the end of the input, that means no character is unique

With two linear passes over the input, the time complexity is \$O(n)\$, and with the map of counts, space complexity is \$O(1)\$, because the maximum size is the size of the alphabet, a constant.

This exercise is on leetcode, here's one way to implement it:

public int firstUniqChar(String s) {
 Map<Character, Integer> counts = new HashMap<>();
 s.chars().forEach(c -> counts.merge((char) c, 1, Integer::sum));
 
 for (int index = 0; index < s.length(); index++) {
 if (counts.get(s.charAt(index)) == 1) {
 return index;
 }
 }
 
 return -1;
}

When you have additional information about the alphabet to support, you may be able to make a significant optimizations. For example in the leetcode exercise the alphabet is lowercase English letters, which makes it possible to use a simple array for the counts. It doesn't change the order of complexity, but it can significantly improve the execution time.

public int firstUniqChar(String s) {
 int[] counts = new int[26];
 for (int index = 0; index < s.length(); index++) {
 counts[s.charAt(index) - 'a']++;
 }
 
 for (int index = 0; index < s.length(); index++) {
 if (counts[s.charAt(index) - 'a'] == 1) {
 return index;
 }
 }
 
 return -1;
}

h.j.k. h.j.k. 19.3k3 gold badges37 silver badges93 bronze badges · Accepted Answer · 2015-11-30 15:15:46Z

2021年10月31日:

Wells... as pointed out by @RiaD's comment, the original answer below failed to account for values seen. That means a simple string like aabc will return the second a, as it wrongly thinks it's unique having not appeared again.

I think where we got befuddled was trying to keep a mix of arrays in order to know what's unique and where they are. An alternative, arguably simpler way, is to use a map and then iterating on it. (so two iterations...?)

The first iteration collects into a Map<Character, Integer>, with a merging function that returns a -1 when we encounter repeated values.

The next iteration on the new map keeps only the valid positions (>= 0), sorts by the value and returns the minimum one.

 public static char firstUnique(String input) {
 return IntStream.range(0, input.length()).boxed() // first iteration
 .collect(Collectors.toMap(input::charAt, i -> i, (a, b) -> -1))
 .entrySet().stream() // second iteration
 .filter(entry -> entry.getValue() >= 0)
 .min(Map.Entry.comparingByValue())
 .map(Map.Entry::getKey)
 .orElseThrow(NoUniqueCharactersException::new);
 }

(wrong answer provided below, keeping it as a learning lesson...)

One alternative is to let String.indexOf(int, int) do the iteration for you:

 return IntStream.range(0, input.length())
 .filter(i -> input.indexOf(input.charAt(i), i + 1) == -1)
 .mapToObj(i -> Character.valueOf(input.charAt(i)))
 .findFirst().orElseThrow(NoUniqueCharactersException::new);

Loop with IntStream from 0 (inclusive) to the length of input (exclusive).

Filter for indices i where input.indexOf(input.charAt(i), i + 1) gives -1, i.e. there is no occurrence of the character at index i past that position.

For any matches, convert to our desired Character wrapper class using mapToObj().

In order to return the first character, we findFirst() from the IntStream, orElseThrow() our custom NoUniqueCharactersException.

A second alternative is to use String.lastIndexOf(int):

 // instead of this
 // .filter(i -> input.indexOf(input.charAt(i), i + 1) == -1)
 // use this
 .filter(i -> i == input.lastIndexOf(input.charAt(i)))

In this case, if the last index of the character at position i is indeed i, that means we have found our unique character.

Stack Exchange Network

Find the first unique character in a string

3 Answers 3

2021年10月31日:

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Find the first unique character in a string

3 Answers 3

2021年10月31日:

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions