5
\$\begingroup\$

Find the length of the longest substring without repeating characters

Any comments on my code?

public class LongestSubstring {
 /**
 * First version. O(n^2) runtime
 * For each character, find the longest substring starting
 * with this character and update max
 * @param s
 * @return
 */
 public int lengthOfLongestSubstringV1(String s) {
 char[] c = s.toCharArray();
 int n = c.length;
 int max = 0;
 for (int i = 0; i < n; i++){
 // localMax is the size of the longest substring starting with character i
 // and that has no repeated character
 int localMax = findLocalMax(c,i);
 // update max if localMax > max
 max = (max > localMax) ? max : localMax;
 }
 return max;
 }
 /**
 * find the largest substring that has no repeated character
 * starting at character i in array c
 * @param c
 * @param i
 * @return
 */
 public int findLocalMax(char[] c, int i){
 int n = c.length;
 // seen: characters already seen
 HashSet<Character> seen = new HashSet<Character>();
 int localMax = 0;
 for (int j = i; j < n; j++){
 // c[j] was seen already
 if (seen.contains(c[j])){
 return localMax;
 }
 else{
 seen.add(c[j]);
 localMax++;
 }
 }
 return localMax;
 }
 /**
 * Second version. O(n^2) runtime
 * if a character is seen again,
 * @param s
 * @return
 */
 public int lengthOfLongestSubstringV2(String s) {
 char[] c = s.toCharArray();
 int n = c.length;
 // currentLetters: HashSet of the letters contained in the current substring
 HashSet<Character> currentLetters = new HashSet<Character>();
 // pointer1 points to the beginning of the substring
 int pointer1 = 0;
 int max = 0;
 int currentCount = 0;
 // pointer2 points to the end of the substring
 for (int pointer2 = 0; pointer2 < n; pointer2++){
 if (currentLetters.contains(c[pointer2])){
 // if the letter c[pointer2] has already been seen
 // remove all the letters before the first occurrence
 // of this letter to consider the substring beginning after
 // this first occurrence
 while (c[pointer1] != c[pointer2]){
 currentCount--;
 currentLetters.remove(c[pointer1]);
 pointer1++;
 }
 }
 else{
 // otherwise, add the letter to the substring
 currentCount++;
 currentLetters.add(c[pointer2]);
 }
 max = (max > currentCount) ? max : currentCount;
 }
 return max;
 }
}
public class TestLongestSubstring {
 LongestSubstring lgs = new LongestSubstring();
 @Test
 public void testEmptyString(){
 String s = "";
 assertEquals(0,lgs.lengthOfLongestSubstringV1(s));
 assertEquals(0,lgs.lengthOfLongestSubstringV2(s));
 }
 @Test
 public void testSameCharacterExpects1(){
 String s = "bbbb";
 assertEquals(1,lgs.lengthOfLongestSubstringV1(s));
 assertEquals(1,lgs.lengthOfLongestSubstringV2(s));
 }
 @Test
 public void testabcabcbbExpects3(){
 String s = "abcabcbb";
 assertEquals(3,lgs.lengthOfLongestSubstringV1(s));
 assertEquals(3,lgs.lengthOfLongestSubstringV2(s));
 }
}
public class TestLongestSubstringRunner {
 public static void main(String args[]){
 Result result = JUnitCore.runClasses(TestLongestSubstring.class);
 for (Failure failure : result.getFailures()){
 System.out.println(failure.toString());
 }
 System.out.println(result.wasSuccessful());
 }
}
200_success
145k22 gold badges190 silver badges478 bronze badges
asked May 20, 2015 at 4:36
\$\endgroup\$

2 Answers 2

6
\$\begingroup\$

Time complexity

The time complexity is not \$O(N^2)\$ as you estimated, but actually \$O(N M)\,ドル where n is the length of the input string and m is the number of unique characters in the input string. This makes a big difference, because in practice m is typically bounded by a constant, the size of the alphabet, which in the case of ascii is 256. If m is a constant then the asymptotic complexity becomes simply \$O(N)\,ドル which is great news for you.

Optimizations

A small tweak can make the implementation significantly faster: change the loop condition in the outer function to go until n - max instead of n.

If you know the size of the alphabet in advance, then you can use a boolean[] instead of a set, with the character codes as indexes. It's simpler, and by reducing the autoboxing, faster too.

Use interface types instead of implementations

You could declare the HashSet as a Set instead.

answered May 20, 2015 at 6:31
\$\endgroup\$
5
\$\begingroup\$

Naming :

While it's so easy to give short naming to variable, it's just bad practice.

  • String s
  • char c
  • int n

These are examples of what it shouldn't be. It should be replaced so you know just by reading it what the var is coming from.
Example: String s could be String inputString.

Clear javadoc:

Actually "Find the length of the longest substring without repeating characters" does tell me more then the javadoc of public int lengthOfLongestSubstringV1(String s)
Why?
In your javadoc there is no mentioning about the absence of repeating characters.

Comments:

While code should be self explaining, sometimes it's good to put a comment.
This example is not so good:

// seen: characters already seen
HashSet<Character> seen = new HashSet<Character>();

The comment is there because you think that the person who reads your code will not understand what the HashSet is for.
Maybe it's the naming of the var why you did this?
Otherwise, there is no reason to say what it holds.

Variable to much, using set correct:

public int findLocalMax(char[] c, int i)

This method contains int localMax.
There is no reason to have this variable and counting them up, you just can return the size of your Set.
Actually I should rewrite this whole operation to :

public int findLocalMax(char[] c, int position) {
 int arrayLenght = c.length;
 HashSet<Character> charsSeen = new HashSet<Character>();
 for (int pointer = position; pointer < arrayLenght; pointer++){
 if (!charsSeen.add(c[pointer])){
 break;
 }
 }
 return charsSeen.size();
}

Explication : A set can't have duplicates, so if you try to insert a duplicated value, this would not be done and the add method will return false.
In fact, with your code you could even use a List.
So if this fails, you know it's time to break the for loop and return the result, the charsSeen his size.

I'm not the greatest mathematician so I'm not reviewing your mathematical solution, only improving your current solution to cleaner code.

Again early returns:

In the lengthOfLongestSubstringV1 there is just one thing what could be simple to improve.
Think of this your test case : "abcabcbb".
If your pointer is at the last "c" => your max result is 3.
The last c could only be max 3 long because you don't have more characters.
Why bother checking them?

So again, I would update the method to this :

 public int lengthOfLongestSubstringV1(String inputString) {
 char[] inputCharArray = inputString.toCharArray();
 int charArrayLength = inputCharArray.length;
 int currentMaxLength = 0;
 for (int pointer = 0; pointer < charArrayLength; pointer++){
 int localMax = findLocalMax(inputCharArray,pointer);
 currentMaxLength = (currentMaxLength > localMax) ? currentMaxLength : localMax;
 if (currentMaxLength >= charArrayLength - pointer) {
 break;
 }
 }
 return currentMaxLength;
}

Hope this helps.

h.j.k.
19.3k3 gold badges37 silver badges93 bronze badges
answered May 20, 2015 at 5:16
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.