Given a string $s,ドル I would like to find the longest repeating (at least twice) subsequence. That is, I would like to find a string $w$ which is a subsequence (doesn't have to be a contiguous) of $s$ such that $w=w' \cdot w' $. That is, $w$ is a string whose halves appear twice in a row. Note that $w$ is a subsequence of $s,ドル but not necessarily a substring.
Examples:
For 'ababccabdc' it will be 'abcabc', because 'abc'='abc' and 'abc' appears (at least) twice in 'ababccabdc'.
For 'addbacddabcd' one option is 'dddd' because 'dd' appears twice (I cannot use the same letter several times, but here i have 4 'd's so its ok), but its of lebngth 4. I can find a better one of length 8: 'abcdabcd', because 'abcd' is a substring of 'addbacddabcd' that appears twice.
I'm interested in finding the longest repeating subsequence. This is also called "finding the longest/largest square", but I've read many articles in which a square is defined for a substring and not for a subsequence.
I can easily use a brute force algorithm that will take $O(n^3)$ by iterating on all options for a breakpoint in the string, and then I will have two strings in which I'll be looking for largest/longest common subsequence, but each check will take $O(n^2)$ using a dynamic programming technique, so the entire time will be $O(n^3)$. I found a more efficient algorithm for longest common subsequence which takes $O(\frac{n^2}{\log n})$ , so the running time will be $O(\frac{n^3}{\log n})$.
I am looking for a more efficient algorithm for longest repeating subsequence problem. Perhaps my idea of iterating over all breakpoints wastes too much time, and can be reduced to less iterations. Or perhaps an algorithm with a different attitude can solve this problem.
I've searched in many journals and previous questions, and most of the results I found were about a substring and not about a subsequence.
I've also read that this can be done using suffix trees, but this too was relevant to substrings and I am not sure if such an idea can be extended for subsequence.
I'm looking for a solution that runs in time $O(n^2)$. If there exists one in time $O(n \cdot \log n)$ that will be even better (I am not sure if such exists).
1 Answer 1
Here is a dynamic programming solution.
Suppose that the input string is $x_1\ldots x_n$. Create a table $T$ whose rows and columns are indexed by 0,ドル\ldots,n$ (where $n$ is the length of the string), populated by the rule $$ T[i,j] = \begin{cases} 0 & \text{if $i = 0$ or $j = 0$}, \\ T[i-1,j-1] + 1 & \text{if $x_i = x_j$ and $i \neq j$}, \\ \max(T[i-1,j],T[i,j-1]) & \text{otherwise}. \end{cases} $$ The answer is $T[n,n]$.
-
$\begingroup$ Suppose we are at some $i, j$ with $i=j+1,ドル and the condition in your
ifstatement is true. Thendp[i][j] = dp[i - 1][j - 1] + 1implies that the character at position $i-1=j$ is part of both subsequences. $\endgroup$j_random_hacker– j_random_hacker2016年12月16日 03:37:14 +00:00Commented Dec 16, 2016 at 3:37 -
4
-
$\begingroup$ @Raphael A recursive formula does not count as source code. $\endgroup$Number945– Number9452016年12月29日 07:49:00 +00:00Commented Dec 29, 2016 at 7:49
-
3$\begingroup$ @BreakingBenjamin Depending on your language of choice, you can write down the given recurrence more or less literally. The point is that there is no explanation here. $\endgroup$Raphael– Raphael2016年12月29日 08:40:57 +00:00Commented Dec 29, 2016 at 8:40
Explore related questions
See similar questions with these tags.
$that does not appear in either $u$ or $v,ドル and then run your $o(n^2)$-time algorithm on it. Both "halves" of the longest repeating subsequence will necessarily begin with $x,ドル so one half comes from each of $u$ and $v,ドル solving the LCS problem. $\endgroup$