Let $S[1..n] \in \Sigma^*$ be a string of $n$ symbols over the alphabet $\Sigma$ where $|\Sigma| \in \mathcal{O}(1)$. Determine a shortest string which cannot be obtained from $S$ by deleting some (possibly no) symbols.
For instance, we are given $\Sigma = \{ A, C, G, T \}$ and $S = ACGTACGT$. In this case, one valid answer is $CCA,ドル because it does not occur as a subsequence in $S$ (i.e. cannot be obtained by removing some of the symbols from $S$) and there is no such string of length 2ドル$.
I am looking for an algorithm which takes $\mathcal{O}(n)$ or $\mathcal{O}(n \log n)$ time. It is not sufficient to compute the length of the answer, an example should also be constructed.
Observations and Approaches. The length of the answer does not exceed $\displaystyle \frac{n}{|\Sigma|} + 1$ by pigeonhole principle. This upper bound is attained for $S = ACGTACGT...ACGT$ in the above example. If we are only interested in the length of the answer, the following approach should work. Let $f(i)$ denote the largest $k \in \mathbb{N}$ such that every string of length $k$ is a subsequence of the prefix $S[1..i]$. Our final answer is $f(n) + 1$ and $f$ can be computed by dynamic programming in $\mathcal{O}(n)$ time for 1ドル \le i \le n$ simply by recurrence
$\displaystyle f(i) = 1 + f\left(\min{(\ell(i, \sigma_1), \ell(i, \sigma_2), \dots, \ell(i, \sigma_{|\Sigma|})) - 1}\right)$
where $\ell(i, \sigma)$ denotes the last occurrence of $\sigma \in \Sigma$ in the prefix $S[1..i]$. Obviously, the function $f$ is increasing, i.e. $f(i + 1) \ge f(i)$ for 1ドル \le i \le n - 1$. It seems more difficult to construct an example from this information, that is where I am stuck.
Finally note that the problem is a lot easier, if we require the subsequence do be contigous. Since there are only $\mathcal{O}(n^2)$ substrings, but $\mathcal{O}(|\Sigma|^k)$ strings of length $k$ over $\Sigma,ドル $k$ turns out to be very small and even brute-force will succeed.
Any kind of help is appreciated, thanks in advance.
Original Source: CSES Problem Set (https://cses.fi/problemset/task/1087/).
-
$\begingroup$ Have you tried remembering on which character the minimum is achieved (in the recurrence) and using this information somehow to read off the answer? $\endgroup$Yuval Filmus– Yuval Filmus2018年03月02日 04:49:14 +00:00Commented Mar 2, 2018 at 4:49
-
$\begingroup$ Furthermore, when proving that $f(i)$ follows the recurrence, you need to give an upper bound on $f(i)$. This upper bound is basically a string which is not a subsequence of $S[1..i]$. $\endgroup$Yuval Filmus– Yuval Filmus2018年03月02日 04:58:52 +00:00Commented Mar 2, 2018 at 4:58
-
$\begingroup$ I tried remembering the characters and it may be successful, but I am still not able to construct a string from this information. That's why I ask this question. Why is an upper bound on $f(i)$ necessary? If $f(i) = k,ドル all strings of length $k$ consist of a prefix of length $k - 1$ and one letter in the end. Since it must be possible for all ending letters, the prefix from 1ドル$ to $\min{(\ell(i, \sigma_1), \dots, \ell(i, \sigma_{|\Sigma|}))} - 1$ must contain all strings of length $k - 1$ as subsequences. On the other hand, choosing this index is the best we can do ($f$ is increasing). $\endgroup$neutron-byte– neutron-byte2018年03月02日 12:59:09 +00:00Commented Mar 2, 2018 at 12:59
1 Answer 1
For each $i,ドル let $s(i)$ be a string of length $f(i)+1$ that is not a subsequence of $S[1\ldots i],ドル then what you seek for is exactly $s(n)$. Also denote $g(i) = \min\left(\ell\left(i,\sigma_1\right),\ldots,\ell\left(i,\sigma_{|\Sigma|}\right)\right)$. $g(i)$'s can be computed in $O(n)$ time in advance.
We have $$ s(i)=s(g(i)-1)S[g(i)]. $$ Firstly $s(g(i)-1)S[g(i)]$ is of length $f(i)+1$. Suppose by contradiction $s(g(i)-1)S[g(i)]$ is a subsequence of $S[1\ldots i],ドル then the index that the last character $S[g(i)]$ matches must be less than or equal to $g(i),ドル which means $s(g(i)-1)$ is a subsequence of $S[1\ldots (g(i)-1)],ドル a contradiction. So the recurrence is correct.
Now you can use this recurrence to compute $s(n)$:
- Let $i \leftarrow n,ドル and let $s$ be an empty string.
- If $g(i)$ does not exist, which means there is at least one character not appearing in $S[1\ldots i],ドル insert this character to the beginning of $s$ and return $s$.
- Insert $S[g(i)]$ to the beginning of $s$.
- Let $i\leftarrow g(i)-1$.
- Go back to step 2.
-
$\begingroup$ Doesn't require this approach $\mathcal{O}(n^2)$ time? $\endgroup$neutron-byte– neutron-byte2018年03月02日 13:10:37 +00:00Commented Mar 2, 2018 at 13:10
-
$\begingroup$ @neutron-byte Updated. $\endgroup$xskxzr– xskxzr2018年03月02日 13:23:32 +00:00Commented Mar 2, 2018 at 13:23
Explore related questions
See similar questions with these tags.