The following code returns the length of longest common substring from given two strings in Java. For cases of around \10ドル^6\$ characters, it is still not fast enough to get the answer in a few seconds.
Any suggestions for efficiency will be greatly appreciated!
int longestCommonSubstring(String s, String t){
int maxlength = 0;
for(int i=0; i<s.length(); i++){
for(int j=0; j<t.length(); j++){
int count = 0;
int g = 0;
while(i+g<s.length() && j+g<t.length()){
if(t.charAt(j+g)==s.charAt(i+g)){
count++;
maxlength = Integer.max(count, maxlength);
}
else{
count = 0;
}
g++;
}
}
}
return maxlength;
}
-
\$\begingroup\$ Is this the solution to a programming-challenge perhaps? \$\endgroup\$Mast– Mast ♦2017年12月27日 16:21:03 +00:00Commented Dec 27, 2017 at 16:21
-
\$\begingroup\$ @Mast Yes it is \$\endgroup\$Sanjiv– Sanjiv2017年12月27日 16:50:19 +00:00Commented Dec 27, 2017 at 16:50
1 Answer 1
Stylistic review
You really need some whitespace here and there. Please put a space between operators and between keywords / brackets, etc. For example:
for(int i=0; i<s.length(); i++){ for(int j=0; j<t.length(); j++){
This would be more readable:
for (int i = 0; i < s.length(); i++) {
for (int j = 0; j < t.length(); j++) {
You also could use better variable names. s
and t
aren't very obvious, and neither is g
.
Overall, the code could potentially be more readable if it could be restructured into using some of the Stream
algorithms, but that's not an easy task.
Performance review
The easiest solutions to this problem take \$O(nm)\$ time, where \$n\$ and \$m\$ are the lengths of the 2 input strings. Your algorithm is slower than this. Here are some places that slow it down:
- You keep incrementing
g
even once the strings become dissimilar. You want to break out of thewhile
loop oncet.charAt(j + g) != s.charAt(i + g)
. Because we keep incrementingi
andj
, you don't have to worry about missing later common substrings. - If
s = "abcdefg"
andt = "abcdefg"
, one iteration of the innerfor
loop will find the correctmaxLength
. However, you then incrementj
and have to iterate overbcdefg
again with the innerwhile
loop even though we know thatbcdefg
is a shorter substring thanabcdefg
. You can keep yourcount
variable from the innerwhile
and use that to skip forward inj
, at the very least. - This isn't an algorithmic concern, but you compute
Integer.max
on every single char, but you only need to do it once you find a complete substring. This can be accomplished by moving that line into theelse
block (before thecount = 0
, which can be removed if you follow my first point).
Those are the easy performance improvements I can see. If you still need to speed up your code, the easiest thing to do would probably be to redesign your algorithm.
One of the more powerful techniques in algorithm design is known as dynamic programming. It is basically just breaking down the problem into smaller pieces, then combining it together. For example, if it was possible to find the largest common substring (LCS) of the left and right halves of s
and t
, then use that information to find the LCS of the whole thing, that would be solving the problem through dynamic programming.
Wikipedia helpfully has a page on this problem. For a dynamic programming solution, it shows that you can solve this by taking all the possible prefixes of the strings, and finding the longest common suffixes for each pair of possible prefixes. If that algorithm isn't efficient enough, it helpfully provides another one which is faster.
-
\$\begingroup\$ Thanks for the input. I really appreciate it. I want to clarify what I am actually trying to do. When you say I need to break out of the loop once I find two characters are unequal, it does miss possible common strings. For instance, in "jumper" & "romper", if I skip because the initials 'j' & 'r' are not equal, then I miss the matching sequence "mper". I would be moving onto comparing 'j' & the second letter 'o'. The i and j are there to determine where the comparison starts for each string, and g does the increment of the position of comparison. \$\endgroup\$Sanjiv– Sanjiv2017年12月30日 11:56:58 +00:00Commented Dec 30, 2017 at 11:56