I know i have some real problems with the following code (as others said). can you please help me to optimize it a little bit? Maybe some example of how to do it.
(find the longest repeating substring with length between x and y)
public static String longestDuplicate(String text, int x, int y)
{
String longest = "";
for (int i = 0; i < text.length() - 2 * longest.length() * 2; i++)
{
OUTER: for (int j = Math.min((text.length() - i -1)/2, y) ; j > longest.length() && j >=x; j--)
{
String find = text.substring(i, i + j);
for (int k = i + j; k <= text.length() - j; k++)
{
if (text.substring(k, k + j).equals(find))
{
longest = find;
continue OUTER;
}
}
break;
}
}
return longest;
}
2 Answers 2
You may want to roll your own comparison method to compare two pieces of a string in-place, instead of using substring(), which as I understand creates an unnecessary temporary copy. It may look like this:
bool CompareInPlace(String text, int from1, int len1, int from2, int len2)
{
// ...
}
Furthermore, instead of keeping a copy of the longest duplicat found, keep its indices from and len.
Then in the end you can return text.substring(longestFrom, longestLen);
.
-
3\$\begingroup\$ substring() in java doesn't create a temporary copy of underlying char array but refers to to the same array as original string (start and end position in this array are different though). \$\endgroup\$Andrey Taptunov– Andrey Taptunov2013年02月08日 14:38:20 +00:00Commented Feb 8, 2013 at 14:38
-
\$\begingroup\$ @Andrey - Agreed; due to the fact that
String
s are immutable, this is a trivial optimization. Still, a compare-in-place may be warranted. \$\endgroup\$Clockwork-Muse– Clockwork-Muse2013年02月08日 22:12:06 +00:00Commented Feb 8, 2013 at 22:12 -
1\$\begingroup\$ This probably depends on the JVM impelementation. If yours is not open source, benchmarking will tell. But even if the JVM is optimized in such a way, keeping only indices will at least save object creation and GC. \$\endgroup\$Gabriel– Gabriel2013年02月11日 11:57:30 +00:00Commented Feb 11, 2013 at 11:57
I'd try using longer variable names and making the code a little bit more flatten. Renaming
x
andy
tominLength
andmaxLength
seems a good start.Then I'd extract out some named variables for the following expressions:
Math.min((text.length() - i -1)/2, y)
i + j
text.length() - 2 * longest.length() * 2
Reference: Chapter 6. Composing Methods, Introduce Explaining Variable in Refactoring: Improving the Design of Existing Code by Martin Fowler
Put the result of the expression, or parts of the expression, in a temporary variable with a name that explains the purpose.
Here is a naive version which I think easier to understand:
public static String naiveLongestDuplicate(final String text, final int minLength,
final int maxLength) {
for (int length = maxLength; length >= minLength; length--) {
final String longestDuplicate = naiveLongestDuplicate(text, length);
if (StringUtils.isNotEmpty(longestDuplicate)) {
return longestDuplicate;
}
}
return "";
}
public static String naiveLongestDuplicate(final String text, final int length) {
final Set<String> substrings = new HashSet<String>();
for (int startIndex = 0; startIndex <= text.length() - length; startIndex++) {
final String substring = text.substring(startIndex, startIndex + length);
if (substrings.contains(substring)) {
return substring;
}
substrings.add(substring);
}
return "";
}
And some tests:
import static org.junit.Assert.assertEquals;
import java.util.Arrays;
import java.util.Collection;
import org.junit.Test;
import org.junit.runner.RunWith;
import org.junit.runners.Parameterized;
import org.junit.runners.Parameterized.Parameters;
@RunWith(value = Parameterized.class)
public class SearcherTest {
private final String text;
private final int minLength;
private final int maxLength;
private final String expectedResult;
public SearcherTest(final String text, final int minLength,
final int maxLength, final String expectedResult) {
this.text = text;
this.minLength = minLength;
this.maxLength = maxLength;
this.expectedResult = expectedResult;
}
@Parameters
public static Collection<Object[]> data() {
//@formatter:off
final Object[][] data = new Object[][] {
{ "abab", 1, 2, "ab"}, // fails
{ "abcabc", 1, 2, "ab"},
{ "abcabc", 1, 3, "abc"}, // fails
{ "abcabcd", 1, 3, "abc"},
{ "abcdxabce", 1, 3, "abc"},
{ "abcdxabce", 3, 4, "abc"}, // fails
{ "abcdxabcd", 3, 4, "abcd"},
{ "abcdeabcde", 3, 7, "abcde"}, // fails
{ "aabbcc", 3, 3, ""},
{ "WXabcWYabcWZ", 1, 2, "ab"},
{ "WXabcWYabcWZ", 1, 3, "abc"},
{ "WXaaWYbbWZccWW", 3, 100, ""},
{ "WWaaaaXXaaaaaYYaaaaaaZZaaaaaa", 3, 6, "aaaaaa"},
{ "WWaaaaXXaaaaaYYaaaaaaZZaaaaaa", 3, 9, "aaaaaa"}
};
//@formatter:on
return Arrays.asList(data);
}
@Test
public void test() {
assertEquals(expectedResult,
Searcher.longestDuplicate(text, minLength, maxLength));
}
}
The ones which are marked with // fails
fails with the posted code. (I haven't checked what's the cause. The test data looks fine for me but I'm not sure that I understood the requirements completely.)
text.length()
if you do end up having to do math, make a int before you go into your for loop statement and put the math there. It will be easier to read. \$\endgroup\$