The algorithm I am looking for has the following requirements: Input is a set of strings. You are looking for a string containing all input strings. The resulting string should be as short as possible. At least shorter than just concatenating all input strings if possible.
Examples:
Input: a ab bc
Output: abc
Input: abcd bcde ef
Output: abcdef
Is there a name for this problem with which it is better to search for solutions?
-
3$\begingroup$ geeksforgeeks.org/shortest-superstring-problem $\endgroup$xskxzr– xskxzr2018年02月21日 08:02:30 +00:00Commented Feb 21, 2018 at 8:02
-
$\begingroup$ @xskxzr Looks like an answer! $\endgroup$Yuval Filmus– Yuval Filmus2018年02月21日 09:39:47 +00:00Commented Feb 21, 2018 at 9:39
1 Answer 1
This problem is called shortest superstring problem. John Gallant, David Maier and James Astorer proved it is NP-hard in 19791.
Given two strings $A$ and $B,ドル let $|A|$ denote the length of $A,ドル and let $S(A,B)$ denote the shortest superstring of $A$ and $B$ where $A$ occurs before $B$. It is easy to reduce this problem to travelling salesman problem, where nodes represent strings, and the distance from string $A$ to $B$ is $|S(A,B)|-|A|-|B|$. So you can use known algorithms for travelling salesman problem to solve this problem.
There are several polynomial approximation algorithms. The best known approximation factor is 2ドル\frac{11}{23},ドル and the corresponding algorithm is proposed by Marcin Mucha in 20132.
1Gallant, J., Maier, D., & Astorer, J. (1980). On finding minimal length superstrings. Journal of Computer and System Sciences, 20(1), 50-58.
2Mucha, M. (2013, January). Lyndon words and short superstrings. In Proceedings of the twenty-fourth annual ACM-SIAM symposium on Discrete algorithms (pp. 958-972). Society for Industrial and Applied Mathematics.
-
$\begingroup$ If A is "abc" and B is "cde" S(A,B) is "abcde". Calculating the distance from A to B results in -1. I thought TSP doesn't handle negative distances? What have I missed? $\endgroup$Vel– Vel2018年02月22日 16:03:18 +00:00Commented Feb 22, 2018 at 16:03
-
1$\begingroup$ @Vel TSP can surely handle negative distances. At least you can add a big enough constant to all distances, and the result does not change. $\endgroup$xskxzr– xskxzr2018年02月22日 16:15:03 +00:00Commented Feb 22, 2018 at 16:15