Description:
Given a string find if it contains duplicate characters. The string contains only ASCII
characters.
Code:
import java.util.Arrays;
class Main {
/**
* Given a string check if it duplicate characters.
* [] -> true
* [f, o, o] -> false
* [b, a, r] -> true
*
* @param {String} str
*
* @return false if string contains duplicate else true
*/
public static boolean isUnique(String str) {
if (str.length() == 0) {
return true;
}
boolean[] seen = new boolean[25];
for (int i = 0; i < str.length(); i++) {
int index = Character.toLowerCase(str.charAt(i)) - 'a';
if (seen[index]) {
return false;
}
else {
seen[index] = true;
}
}
// invariant
return true;
}
public static void main(String[] args) {
System.out.println(isUnique("")); // true
System.out.println(isUnique("a")); // true
System.out.println(isUnique("foo")); // false
System.out.println(isUnique("bar")); // true
System.out.println(isUnique("hello")); // false
System.out.println(isUnique("Hello")); // false
}
}
The above algorithms runs in O(n)
, the alternate solution is to sort the string and the check the adjacent elements but it would be O(nlogn)
. I need to understand if I am using Java correctly, any other styles related comments are most welcome.
PS: I would like to know if I need to include more tests to cover all cases.
-
\$\begingroup\$ how about null argument? \$\endgroup\$Sharon Ben Asher– Sharon Ben Asher2018年03月26日 19:47:56 +00:00Commented Mar 26, 2018 at 19:47
2 Answers 2
Documentation mismatch
The documentation and the functionality do not fit. Your function is supposed to "check if [a string] has duplicates". From the documentation, I expect a function hasDuplicates
that returns true
if I have duplicate characters and false
otherwise.
However, you provide isUnique
, which does the opposite: return true
if there are no duplicates and false
if there are duplicates. The internal documentation fits, but I would say "checks whether it contains no duplicate characters" or "contains unique characters only".
Possible bugs
You checked your string only on alphabetic strings. What happens on "12345678"
? You will index -48
due to '1' - 'a'
. That's a bug waiting to happen. Also, what happens if your string contains a 'z'
? What happens if str
is null
? Maybe str
shouldn't be null
, but that should get mentioned in the documentation (see above).
Also, you should mention that your function is case-insensitive. But why does "Aa"
count as a duplicate to begin with?
Both cases (non-alpha character and same letters in different cases) should get added to your tests.
Algorithm
Your seen
algorithm is fine if you know exactly how many buckets you need. But you often don't know that. A Map<char,bool>
can happen here. If you use a TreeMap
you end up with the mentioned \$\mathcal O(n \log n)\,ドル but if you use a HashMap
the asymptotic complexity is \$\mathcal O(n)\$ again. Alternatively, use a set.
But that's asymptotic complexity. If you don't consider any non-alpha strings (see section above), then you can immediately return false
if the string is longer than 26
characters due to the pigeon hole principle.
-
\$\begingroup\$ I don't know much about
TreeMap
and I will read about it more. I checked reference book and it says that we can assumeASCII
letters including digits i.e. all256
characters. If I have to I would preferSet
than map. The author also suggests that we could use bit vector too but the solution seem too complex to me. \$\endgroup\$CodeYogi– CodeYogi2018年03月26日 20:02:05 +00:00Commented Mar 26, 2018 at 20:02 -
\$\begingroup\$ @CodeYogi I should add that I don't know much Java. I can read and understand almost all of it, but I don't know the standard library well, for example, so I had my C++ glasses on. If you know that all characters are ASCII values then
boolean[256]
is fine and most likely the easiest and cache friendliest method. \$\endgroup\$Zeta– Zeta2018年03月27日 04:08:50 +00:00Commented Mar 27, 2018 at 4:08
There are considerably more than 25 ASCII characters. Did you mean ASCII letters? Cause if the test was for characters, this code will break. But even if it's just letters, (case aside,) there are 26 letters, not 25. Try adding "pizza" and "crazy" to your test, and see what happens. :P
The quick and dirty solution is to make your array 26 elements long.
A better solution, though, might be to use a
HashSet<Character>
rather than an array of booleans. While a bit slower, it also removes the limitation of "only ASCII letters". Use it a lot like your array: for each character, if it's already in the set, returnfalse
; otherwise add it.Your tests don't include cases like, say, "America" (where upper and lower case appear in the same word). They should.
Speaking of tests, you might consider using an actual testing framework of some kind.
System.out.println
lets you see something's being returned, which is good for just starting out...but something likeTest.assertFalse(isUnique("pizza"))
would let you run the test even if you didn't remember what it was supposed to do.
-
\$\begingroup\$ Good explanation but I am preparing for an interview and I fear there will be any test framework available in editor. \$\endgroup\$CodeYogi– CodeYogi2018年03月26日 20:03:43 +00:00Commented Mar 26, 2018 at 20:03