2
\$\begingroup\$

After optimising my code it still seems pretty brute force and I am looking for ideas to fundamentally improve it.

I have tried improving performance by:

  • Avoiding creating substrings as much as possible - using while loop to compare string instead.
  • Intelligently anticipating ends ie. start + shift + max < str.length() - 1
  • use int instead of Integers

Is there anything obvious I have missed? I have found a few alternatives solutions on the internet and would love to hear an opinion how they are different/better/worse to my implementation:

And before I forget this is my code:

 public static String findLongestRepeatedSubString(String str) {
 int max = 0;
 String result = "";
 for (int start = 0; start + max < str.length() - 1; start++) {
 for (int shift = 1; start + shift + max < str.length() - 1; shift++) {
 int length = 0;
 // While characters match count the length
 while(str.charAt(start + length) == str.charAt(start + shift + length) // matching characters
 && start + shift + length < str.length() - 1){ // hasn't reached the end yet
 length++;
 }
 // If the length is larger - update the new max size
 if (length > max ) {
 max = length;
 result = str.substring(start, start + length + 1);
 }
 }
 }
 return result;
}
asked Jun 26, 2015 at 6:01
\$\endgroup\$

1 Answer 1

1
\$\begingroup\$

Off-by-one error

You have an off-by-one error in the innermost while loop: for input string "aa", this will not reach length 1, and give empty string as the result, which is incorrect.

I looked at this loop with suspicion, because I noticed that it checks the string bound in the second condition. The more natural code organization is to check the bound first, and then the second condition &&-ed to the first will be safe.

I suggest to rearrange that way, and for that you will need to rethink and verify all the indexes and ranges there. Which can be pretty hard to get right. So at this point, before touching anything, I suggest to add a bunch of Junit tests first, try to cover all the corner cases.

More bugs

I tried a bunch of random examples, and the program gives incorrect results:

  • "abac" -> "ab"
  • "abcac" -> "ab"
  • "abcba" -> "bc"

As I suggested in the previous point, it will be good to add Junit tests to verify the logic.

Longest repeated substring problem

Quoting from Wikipedia:

This problem can be solved in linear time and space by building a suffix tree for the string, and finding the deepest internal node in the tree. Depth is measured by the number of characters traversed from the root.

I think you should look into that and abandon your current implementation.

answered Jun 26, 2015 at 7:13
\$\endgroup\$
2
  • \$\begingroup\$ Thanks the suffix tree looks like a great way to solve this. \$\endgroup\$ Commented Jun 27, 2015 at 4:07
  • 1
    \$\begingroup\$ There is a great explanation of using suffix trees for this problem at geeksforgeeks.org/… \$\endgroup\$ Commented Jun 27, 2015 at 4:38

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.