1
$\begingroup$

I'm trying to understand this dynamic programming related problem, adapted from Kleinberg's Algorithm Design book.

You’re consulting for a group of people (who would prefer not to be mentioned here by name) whose jobs consist of monitoring and analyzing electronic signals coming from ships in coastal Atlantic waters. They want a fast algorithm for a basic primitive that arises frequently: "untangling" a superposition of two known signals. Specifically, they’re picturing a situation in which each of two ships is emitting a short sequence of 0ドルs$ and 1ドルs$ over and over, and they want to make sure that the signal they’re hearing is simply an interleaving of these two emissions, with nothing extra added in. This describes the whole problem; we can make it a little more explicit as follows.

Given a string $x$ consisting of 0ドルs$ and 1ドルs$, we write $x^k$ to denote $k$ copies of $x$ concatenated together. We say that a string $x$ is a repetition of $x$ if it is a prefix of $x^k$ for some number $k$. So $x = 10110110110$ is a repetition of $x = 101$. We say that a string $s$ is an interleaving of $x$ and $y$ if its symbols can be partitioned into two (not necessarily contiguous) subsequences $s'$ and $s''$, so that $s'$ is a repetition of $x$, and $s''$ is a repetition of $y$. (So each digit in $s$ must belong to exactly one of $s'$ or $s''$.) For example, if $x = 101$ and $y = 00$, then $s = 100010101$ is an interleaving of $x$ and $y$, since characters at position 1,2,5,7,8,9ドル$ form 101101ドル$—a repetition of $x$—and the remaining characters form 000ドル$—a repetition of $y$.

In terms of our application, $x$ and $y$ are the repeating sequences from the two ships, and $s$ is the signal we’re listening to: We want to make sure $s$ "unravels" into simple repetitions of $x$ and $y$. Give an efficient algorithm that takes strings $s, x,$ and $y$ and decides if $s$ is an interleaving of $x$ and $y$.

喜欢算法和数学
39.2k4 gold badges35 silver badges95 bronze badges
asked Mar 4, 2020 at 17:09
$\endgroup$

3 Answers 3

2
$\begingroup$

Let's use caps, S for input string, A for x and B for y. Furthermore, let |S| = lenS, |A| = lenA, |B| = lenB.

With any Dynamic Programming problem, it is useful to append to the problem:

"... demonstrate one such solution." or, equivalently "... provide one such proof certificate."

This is because a Dynamic Programming solution is equivalent to searching over proof certificates. For this problem, what's a proof certificate? There are many possibilities. One of them is: Strings over {a, b} of length lenS.

Certificate aabbaa encodes: S[1] = A[1], S[2] = A[2], S[3] = B[1], S[4] = B[2], S[5] = A[3],... where array indices in A & B wrap around i.e. A[r + lenA] = A[r].

Clearly, S is an interleaving of A & B if and only if such a proof certificate exists. The task now boils down to finding one such certificate, if it exists.

Source code in C++ (Repl: https://repl.it/@vemana/csstackexchange#interleaving/main.cpp)

#include <bits/stdc++.h>
using namespace std;
const int MAX_LENS = 100, MAX_LENA=100, MAX_LENB=100;
// Memoization array
// Initialize: to -1.
int memo[MAX_LENS][MAX_LENA][MAX_LENB];
// Initialize: to 0
// Encodes the Proof Certificate, C
// SELECT[r] = 1 implies C[r] = a
// SELECT[r] = 2 implies C[r] = b
int SELECT[MAX_LENS][MAX_LENA][MAX_LENB];
// The input strings
string S, A, B;
// Initialize: lenS = S.size(), lenA = A.size(), lenB = B.size()
// Expected: lenS > 0, lenA > 0, lenB > 0
int lenS, lenA, lenB;
vector<string> getCertificate() {
 if (SELECT[0][0][0] == 0)
 return vector<string>(); // No solution
 vector<string> ret(2);
 int s = 0, a = 0, b = 0;
 while(s < lenS) {
 if (SELECT[s][a][b] == 1) {
 ret[0] += A[a];
 ret[1] += '_';
 a = (a+1)%lenA;
 s = s + 1;
 } else {
 ret[0] += '_';
 ret[1] += B[b];
 b = (b+1)%lenB;
 s = s + 1;
 }
 }
 return ret;
}
// Finds certificate for S[s...lenS] where
// next index in A is a (0 based)
// next index in B is b (0 based)
//
// Returns 0 if not found
// Returns 1 if found
int findCertificate(int s, int a, int b) {
 if (s == lenS)
 return 1; // Found the "empty" certificate.
 int& ret = memo[s][a][b];
 if (ret != -1) // already computed
 return ret;
 // Try to match S with A
 bool takeA = (S[s] == A[a]) && findCertificate(s+1, (a+1)%lenA, b) == 1;
 // Try to match S with B
 bool takeB = (S[s] == B[b]) && findCertificate(s+1, a, (b+1)%lenB) == 1;
 if (takeA) {
 SELECT[s][a][b] = 1; // SELECT A to extend the certificate
 return ret = 1;
 }
 if (takeB) {
 SELECT[s][a][b] = 2; // SELECT B to extend the certificate
 return ret = 1;
 }
 return ret = 0; // Neither matched. So, no certificate found
}
// Just run with ./a.out < in
// File 'in' has three strings, one per line, representing A, B & C.
int main() {
 // initialize S, A, B, lenA, lenB, lenS, SELECT, memo
 cin >> A;
 cin >> B;
 cin >> S;
 lenA = A.size();
 lenB = B.size();
 lenS = S.size();
 memset(memo,-1,sizeof(memo));
 memset(SELECT, 0, sizeof(SELECT));
 int out = findCertificate(0, 0, 0);
 vector<string> cert = getCertificate();
 string output = cert.empty() ? "NOT INTERLEAVED" : ("INTERLEAVED\n" + cert[0] + "\n" + cert[1]);
 cout << output << endl;
 return 0;
}
answered Jun 7, 2020 at 17:23
$\endgroup$
9
  • $\begingroup$ Welcome to SE computer science. I cannot really comment your answer because I am not used to the kind of approach you are using. However, I suspect that people here often prefer an abstract algorithmic presentation to a computer program. And also some of us do not know all languages. So, if you actually use a programming language, it might be nice to say which, and if useful, which version. Some of these answers are still being used years after they have been posted. $\endgroup$ Commented Jun 8, 2020 at 23:33
  • $\begingroup$ Thanks for the advice ! Appreciate it. repl.it/repls/DistantCreativeInstitute#main.cpp is the repl. C++ (version 7.5.0). It prints out an interleaving given three strings. I left an input & instruction to run the program. I liked your elegant NFA argument. My solution is pretty much the same, but in code. I find it easier to present code rather than math; it is after-all one way of expressing mathematical thought and can be tested out on real inputs. $\endgroup$ Commented Jun 9, 2020 at 2:29
  • $\begingroup$ babou, my solution uses the NFA approach you outline. In findCertificate(s, a, b), s stands for position in the input string being recognized, a stands for the state of the first DFA (the one for A), b stands for the state in the second DFA (the one for B). Note that the DFA that recognizes repetitions of A has exactly |A| accepting states. The index 0<=a<|A| identifies which state the DFA is in. The function findCertificate(s,a,b) is just trying out both possibilities for which of the DFAs consumes the next character. My solution is O(|A| |B| |S|). $\endgroup$ Commented Jun 9, 2020 at 2:51
  • $\begingroup$ No way you can make me read that length of code, but I get it from your comments. Not sure how much it differs from the other answer, but ... The advantage of the automata approach is that you get for free all the know-how of automata theory, such as minimization of the number of states. I do try to avoid code, and even math when I can afford it. I believe that when I can do it, it means I understand what I am doing, and I am best able to convey the gist of the problem. Long ago, I once gave up on a class because I got the best grade without understanding what I was doing. Symbol pushing. $\endgroup$ Commented Jun 9, 2020 at 11:33
  • $\begingroup$ One reason I've gravitated towards code is that its runtime complexity can be evaluated. For e.g. my solution is O(|A| |B| |S|). I am not yet clear on how to convert the NFA to a DFA in polynomial time. The powerset construction could theoretically take exponential time in the number of states, which is roughly |A|.|B|. So, even with linear recognition time, the total runtime is dominated by NFA construction, which is exponential. A pathological case for NFA -> DFA construction could be when A = (a^n)x and B = (a^n)y. The NFA can be in one of k+1 states after consuming the first k<n 'a's. $\endgroup$ Commented Jun 9, 2020 at 12:39
1
$\begingroup$

For 0 ≤ k ≤ n, determine all pairs (i, j) such that the first k characters of s are an interleaving of a string X consisting of repeats of x followed by the i first characters of x, 0 ≤ i < length(x), and a string Y consisting of repeats of y followed by the j first characters of y.

For k = 0, this is { (0, 0) }.

For k+1, for every pair (i, j) in the set for k, check if the next character of s matches the next character of x or the next character of y and add 0, 1, or 2 pairs to the set for k+1, avoiding duplicates.

For fixed x, y, you have a regular language and can create a state machine for it. You can create that state machine on the fly, making the algorithm run very fast if the number is input symbols is much more than lcm (len(x), len(y)).

answered Mar 4, 2020 at 17:35
$\endgroup$
0
$\begingroup$

The easiest way to tackle this problem is to first solve it with a non-deterministic solution, then use dynamic programming to change it into a deterministic one.

The main difficulty here is that the problem is so specific that the specifics hide a simple solution that can be applied to a much wider class of problems.

So I will first généralise the problem. You may have noticed that what you call a repetition of $x$ is a regular set. Instead on considering this very peculiar family of regular sets, we will consider all regular sets on the given alphabet. So, instead of repetitions of $x$ and $y$, we consider two regular sets $X$ and $Y$. Then we call an interleaving of $X$ and $Y$ a sequence that can be partitioned in the way you describe into 2 strings $x\in X$ and $y\in Y$.

If you consider two FSA (finite state automata) $F_X$ and $F_Y$ for recognizing respectively $X$ and $Y$, it is quite sraightforward to build from then a non-deterministic FSA that recognizes te set of interleavings of $X$ and $Y$. It is very similar to the cross-product construction to get the intersection of two regular sets. You take as set of states the crossproduct of the states of both FSA, so that you can, nondeterministically in some cases, follow the recognition on the $X$ side of things, or on the $Y$ side of things (but not both as the same time as in the intersection construction). Dealing with initial and final states is left as an exercise.

No dynamic programming yet. Here it comes. As we all know, non-determinism does not work nicely in the real world, unless we accomodate it with various techniques, and dynamic programming is one of them. This also applies to non-deterministic finite-state automata, and in this special case, dynamic programming is called the powerset construction to turn NFA (non-deterministic finite automata) into DFA (deterministic finite automata). So you just do it, and there you can go spying of other ships. The recognition algorithm works in linear-time. The cost is in preprocessing.

The definition of a repetition od a string $x$ plays no role in this solution. It is just to be translated into a FSA that recognizes such a repetition.

answered Jun 8, 2020 at 15:30
$\endgroup$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.