Implementation for a relatively basic programming challenge:
Find the first unique character in a string
So evaluating the string ABCA
would return B
.
My implementation does the following:
- Initialize a
boolean
array that is the size of the input string - note that the default initialization value isfalse
- The
boolean
array's values correspond to whether or not the input string character that shares the same index value is unique or not. - Compare each character in the input string against all the characters to the right of it
- If there is a match, set the values in the boolean array to true for each character index value.
- Iterate through the boolean array and return the first value that is false.
- If iteration completes, throw a
NoUniqueCharacters
Exception
Is there a better approach?
- I briefly considered using some sort of
Ordered Map
(if such a thing exists) that maps characters to their counts where the characterkey
values are ordered by their index in the string
public Character identifyFirstUniqueCharacterInString(final String string) throws NoUniqueCharactersException {
final char[] chars = string.toCharArray();
final boolean[] repeatedCharacterIndices = new boolean[chars.length];
for (int candidateCharacterIndex = 0; candidateCharacterIndex < chars.length; candidateCharacterIndex++) {
for (int comparisonCharacterIndex = candidateCharacterIndex + 1; comparisonCharacterIndex < chars.length; comparisonCharacterIndex++) {
if (chars[candidateCharacterIndex] == chars[comparisonCharacterIndex]) {
repeatedCharacterIndices[candidateCharacterIndex] = true;
repeatedCharacterIndices[comparisonCharacterIndex] = true;
}
}
}
for (int repeatedCharacterIndex = 0; repeatedCharacterIndex < repeatedCharacterIndices.length; repeatedCharacterIndex++) {
if (!repeatedCharacterIndices[repeatedCharacterIndex]) {
return chars[repeatedCharacterIndex];
}
}
throw new NoUniqueCharactersException();
}
-
\$\begingroup\$ @JaeBradley docs.oracle.com/javase/7/docs/api/java/util/LinkedHashMap.html -> Ordered Map that keep the insertion order, might be useful if you want to try the Ordered map way. \$\endgroup\$Marc-Andre– Marc-Andre2015年11月30日 17:43:34 +00:00Commented Nov 30, 2015 at 17:43
-
1\$\begingroup\$ @JaeBradley I have actually provided a correct solution, not too sure why it was originally accepted 5 years and 11 months ago... \$\endgroup\$h.j.k.– h.j.k.2021年10月31日 10:22:53 +00:00Commented Oct 31, 2021 at 10:22
3 Answers 3
2021年10月31日:
Wells... as pointed out by @RiaD's comment, the original answer below failed to account for values seen. That means a simple string like aabc
will return the second a
, as it wrongly thinks it's unique having not appeared again.
I think where we got befuddled was trying to keep a mix of arrays in order to know what's unique and where they are. An alternative, arguably simpler way, is to use a map and then iterating on it. (so two iterations...?)
The first iteration collects into a
Map<Character, Integer>
, with a merging function that returns a-1
when we encounter repeated values.The next iteration on the new map keeps only the valid positions (
>= 0
), sorts by the value and returns the minimum one.public static char firstUnique(String input) { return IntStream.range(0, input.length()).boxed() // first iteration .collect(Collectors.toMap(input::charAt, i -> i, (a, b) -> -1)) .entrySet().stream() // second iteration .filter(entry -> entry.getValue() >= 0) .min(Map.Entry.comparingByValue()) .map(Map.Entry::getKey) .orElseThrow(NoUniqueCharactersException::new); }
(wrong answer provided below, keeping it as a learning lesson...)
One alternative is to let
String.indexOf(int, int)
do the iteration for you:
return IntStream.range(0, input.length()) .filter(i -> input.indexOf(input.charAt(i), i + 1) == -1) .mapToObj(i -> Character.valueOf(input.charAt(i))) .findFirst().orElseThrow(NoUniqueCharactersException::new);
- Loop with
IntStream
from0
(inclusive) to the length ofinput
(exclusive).- Filter for indices
i
whereinput.indexOf(input.charAt(i), i + 1)
gives-1
, i.e. there is no occurrence of the character at indexi
past that position.- For any matches, convert to our desired
Character
wrapper class usingmapToObj()
.- In order to return the first character, we
findFirst()
from theIntStream
,orElseThrow()
our customNoUniqueCharactersException
.
A second alternative is to use
String.lastIndexOf(int)
:
// instead of this // .filter(i -> input.indexOf(input.charAt(i), i + 1) == -1) // use this .filter(i -> i == input.lastIndexOf(input.charAt(i)))
In this case, if the last index of the character at position
i
is indeedi
, that means we have found our unique character.
I find the code very hard to read because of the very long variable names. I therefore rewrote it to:
@Nullable
public static Character firstUnique(String str) {
char[] chars = str.toCharArray();
boolean[] repeated = new boolean[chars.length];
for (int i = 0; i < chars.length; i++) {
for (int j = i + 1; j < chars.length; j++) {
if (chars[i] == chars[j]) {
repeated[i] = true;
repeated[j] = true;
}
}
}
for (int i = 0; i < repeated.length; i++) {
if (!repeated[i]) {
return chars[i];
}
}
return null;
}
Now the code fits easily on the screen and looks like most other code that is doing string operations. I also removed all redundant parts from the method name:
identify
was wrong. A character does not have an identity sinceCharacter
andchar
are value types.Character
was redundant because the return type already saysCharacter
.InString
was redundant because the method parameter is of typeString
.
Regarding the algorithmic complexity: Your code works fine for short strings. But you should not run this code on strings of 16M characters since it would take \2ドル^{47}\$ basic operations. Therefore, if this code is ever going to work with long strings, you should have an alternative algorithm:
@Nullable
static OptionalInt firstUniqueLong(String str) {
BitSet once = new BitSet();
BitSet dups = new BitSet();
str.codePoints().forEach(codePoint -> (once.get(codePoint) ? dups : once).set(codePoint));
return str.codePoints().filter(codePoint -> !dups.get(codePoint)).findFirst();
}
This code also works for small strings.
The execution time is \$\mathcal O(\text{str.length})\,ドル as opposed to the \$\mathcal O(\text{str.length}^2)\$ of your code.
The space requirement of this code depends not on the length of the string but on the actual characters (well, Unicode code points) in it. For ASCII it works well, but since the current implementation for BitSet
allocates a simple array, inserting a single emoji in the string allocates 2 times 16 kilobytes of memory. Characters with even higher code points might raise the memory requirement to 2 times 140 kilobytes (= Character.MAX_CODE_POINT / 8
).
The algorithm is inefficient. For each character, it checks if it's repeated by checking all following characters. The time complexity is \$O(n^2)\$, space complexity is \$O(n)\$.
You can do better. Consider this alternative, and very simple algorithm:
- Build a map of counts from the characters in the input.
- Loop over the characters in the input again, and check the count.
- If you find count == 1, you have the answer, return it
- If you reach the end of the input, that means no character is unique
With two linear passes over the input, the time complexity is \$O(n)\$, and with the map of counts, space complexity is \$O(1)\$, because the maximum size is the size of the alphabet, a constant.
This exercise is on leetcode, here's one way to implement it:
public int firstUniqChar(String s) {
Map<Character, Integer> counts = new HashMap<>();
s.chars().forEach(c -> counts.merge((char) c, 1, Integer::sum));
for (int index = 0; index < s.length(); index++) {
if (counts.get(s.charAt(index)) == 1) {
return index;
}
}
return -1;
}
When you have additional information about the alphabet to support, you may be able to make a significant optimizations. For example in the leetcode exercise the alphabet is lowercase English letters, which makes it possible to use a simple array for the counts. It doesn't change the order of complexity, but it can significantly improve the execution time.
public int firstUniqChar(String s) {
int[] counts = new int[26];
for (int index = 0; index < s.length(); index++) {
counts[s.charAt(index) - 'a']++;
}
for (int index = 0; index < s.length(); index++) {
if (counts[s.charAt(index) - 'a'] == 1) {
return index;
}
}
return -1;
}