Check if the given string has duplicates

Question 1

Description:

Given a string find if it contains duplicate characters. The string contains only ASCII characters.

Code:

import java.util.Arrays;
class Main {
 /**
 * Given a string check if it duplicate characters.
 * [] -> true
 * [f, o, o] -> false
 * [b, a, r] -> true
 * 
 * @param {String} str
 * 
 * @return false if string contains duplicate else true
 */ 
 public static boolean isUnique(String str) {
 if (str.length() == 0) {
 return true;
 }
 boolean[] seen = new boolean[25];
 for (int i = 0; i < str.length(); i++) {
 int index = Character.toLowerCase(str.charAt(i)) - 'a';
 if (seen[index]) {
 return false;
 }
 else {
 seen[index] = true;
 }
 }
 // invariant
 return true;
 }
 public static void main(String[] args) {
 System.out.println(isUnique("")); // true
 System.out.println(isUnique("a")); // true
 System.out.println(isUnique("foo")); // false
 System.out.println(isUnique("bar")); // true
 System.out.println(isUnique("hello")); // false
 System.out.println(isUnique("Hello")); // false
 }
}

The above algorithms runs in O(n), the alternate solution is to sort the string and the check the adjacent elements but it would be O(nlogn). I need to understand if I am using Java correctly, any other styles related comments are most welcome.

PS: I would like to know if I need to include more tests to cover all cases.

Question 2

how about null argument?

Question 3

Documentation mismatch

The documentation and the functionality do not fit. Your function is supposed to "check if [a string] has duplicates". From the documentation, I expect a function hasDuplicates that returns true if I have duplicate characters and false otherwise.

However, you provide isUnique, which does the opposite: return true if there are no duplicates and false if there are duplicates. The internal documentation fits, but I would say "checks whether it contains no duplicate characters" or "contains unique characters only".

Possible bugs

You checked your string only on alphabetic strings. What happens on "12345678"? You will index -48 due to '1' - 'a'. That's a bug waiting to happen. Also, what happens if your string contains a 'z'? What happens if str is null? Maybe str shouldn't be null, but that should get mentioned in the documentation (see above).

Also, you should mention that your function is case-insensitive. But why does "Aa" count as a duplicate to begin with?

Both cases (non-alpha character and same letters in different cases) should get added to your tests.

Algorithm

Your seen algorithm is fine if you know exactly how many buckets you need. But you often don't know that. A Map<char,bool> can happen here. If you use a TreeMap you end up with the mentioned \$\mathcal O(n \log n)\,ドル but if you use a HashMap the asymptotic complexity is \$\mathcal O(n)\$ again. Alternatively, use a set.

But that's asymptotic complexity. If you don't consider any non-alpha strings (see section above), then you can immediately return false if the string is longer than 26 characters due to the pigeon hole principle.

Question 4

I don't know much about TreeMap and I will read about it more. I checked reference book and it says that we can assume ASCII letters including digits i.e. all 256 characters. If I have to I would prefer Set than map. The author also suggests that we could use bit vector too but the solution seem too complex to me.

Question 5

@CodeYogi I should add that I don't know much Java. I can read and understand almost all of it, but I don't know the standard library well, for example, so I had my C++ glasses on. If you know that all characters are ASCII values then boolean[256] is fine and most likely the easiest and cache friendliest method.

Question 6

There are considerably more than 25 ASCII characters. Did you mean ASCII letters? Cause if the test was for characters, this code will break. But even if it's just letters, (case aside,) there are 26 letters, not 25. Try adding "pizza" and "crazy" to your test, and see what happens. :P

The quick and dirty solution is to make your array 26 elements long.

A better solution, though, might be to use a HashSet<Character> rather than an array of booleans. While a bit slower, it also removes the limitation of "only ASCII letters". Use it a lot like your array: for each character, if it's already in the set, return false; otherwise add it.
Your tests don't include cases like, say, "America" (where upper and lower case appear in the same word). They should.
Speaking of tests, you might consider using an actual testing framework of some kind. System.out.println lets you see something's being returned, which is good for just starting out...but something like Test.assertFalse(isUnique("pizza")) would let you run the test even if you didn't remember what it was supposed to do.

Question 7

Good explanation but I am preparing for an interview and I fear there will be any test framework available in editor.

Zeta Zeta 19.6k2 gold badges57 silver badges90 bronze badges · Accepted Answer · 2018-03-26 19:48:43Z

Documentation mismatch

The documentation and the functionality do not fit. Your function is supposed to "check if [a string] has duplicates". From the documentation, I expect a function hasDuplicates that returns true if I have duplicate characters and false otherwise.

However, you provide isUnique, which does the opposite: return true if there are no duplicates and false if there are duplicates. The internal documentation fits, but I would say "checks whether it contains no duplicate characters" or "contains unique characters only".

Possible bugs

You checked your string only on alphabetic strings. What happens on "12345678"? You will index -48 due to '1' - 'a'. That's a bug waiting to happen. Also, what happens if your string contains a 'z'? What happens if str is null? Maybe str shouldn't be null, but that should get mentioned in the documentation (see above).

Also, you should mention that your function is case-insensitive. But why does "Aa" count as a duplicate to begin with?

Both cases (non-alpha character and same letters in different cases) should get added to your tests.

Algorithm

Your seen algorithm is fine if you know exactly how many buckets you need. But you often don't know that. A Map<char,bool> can happen here. If you use a TreeMap you end up with the mentioned \$\mathcal O(n \log n)\,ドル but if you use a HashMap the asymptotic complexity is \$\mathcal O(n)\$ again. Alternatively, use a set.

But that's asymptotic complexity. If you don't consider any non-alpha strings (see section above), then you can immediately return false if the string is longer than 26 characters due to the pigeon hole principle.

I don't know much about TreeMap and I will read about it more. I checked reference book and it says that we can assume ASCII letters including digits i.e. all 256 characters. If I have to I would prefer Set than map. The author also suggests that we could use bit vector too but the solution seem too complex to me.
@CodeYogi I should add that I don't know much Java. I can read and understand almost all of it, but I don't know the standard library well, for example, so I had my C++ glasses on. If you know that all characters are ASCII values then boolean[256] is fine and most likely the easiest and cache friendliest method.

Stack Exchange Network

Check if the given string has duplicates

2 Answers 2

Documentation mismatch

Possible bugs

Algorithm

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Check if the given string has duplicates

2 Answers 2

Documentation mismatch

Possible bugs

Algorithm

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions