Given a list of strings, replace each string by one of its non-empty substrings which is not a substring of any of the other strings in the list and as short as possible.
Example
Given the list ["hello","hallo","hola"]
, "hello"
should be replaced by just "e"
as this substring is not contained in "hallo"
and "hola"
and it is as short as possible. "hallo"
could be replaced by either "ha"
or "al"
and "hola"
by any of "ho"
, "ol"
or "la"
.
Rules
- You can assume that the strings will be non-empty and only contain alphabetical characters of the same case.
- You can assume that such a substring exists for each string in the list, i.e. no string in the list will be a substring of any of the other strings.
- Input and output can be in any reasonable format.
- This is code-golf, so try to use as few bytes as possible in the language of your choice.
Test Cases
Only one possible output is given for most cases.
["ppcg"] -> ["p"] (or ["c"] or ["g"])
["hello","hallo","hola"] -> ["e","ha","ho"]
["abc","bca","bac"] -> ["ab","ca","ba"]
["abc","abd","dbc"] -> ["abc","bd","db"]
["lorem","ipsum","dolor","sit","amet"] -> ["re","p","d","si","a"]
["abc","acb","bac","bca","cab","cba"] -> ["abc","acb","bac","bca","cab","cba"]
Related: Shortest Identifying Substring - similar idea, but more involved rules and cumbersome format.
10 Answers 10
Pyth, 12 bytes
mhf!ts}LTQ.:
How it works
Basicaly filters the substrings of each that only occur in one of the strings in the list (that is, it is unique to that string) and gets the first one.
mhf!ts}LTQ.: Full program, Q=eval(stdin_input())
m .: Map over Q and obtain all the substrings of each.
f And filter-keep those that satisfy (var: T)...
}LTQ ... For each string in Q, yield 1 if it contains T, else 0.
!ts ... Sum the list, decrement and negate.
h Head. Yields the first valid substring, which is always the shortest.
Prolog (SWI), (削除) 175 (削除ここまで) 163 bytes
S/L/R:-sub_string(S,_,L,_,R).
[H|T]+[I|R]:-string_length(H,L),between(1,L,X),H/X/I,T+R.
R+R.
L-R:-L+R,forall(member(E,L),findall(_,(member(F,R),\+ \+ E/_/F),[_])).
Most of the things here should be fairly obvious, but:
Explanation
Signatures: (+
= input, ?
= optional, -
= output, :
= expression)
sub_string(+String, ?Before, ?Length, ?After, ?SubString)
string_length(+String, -Length)
member(?Elem, ?List)
between(+Low, +High, ?Value)
findall(+Template, :Goal, -Bag)
forall(:Cond, :Action)
\+ \+
is just not not
(i.e. converts a match to boolean (in this case, prevents it from matching both p
s in ppcg
separately))
-
\$\begingroup\$ The right tool for the job :P except for the fact that it's mind-blowingly verbose \$\endgroup\$ASCII-only– ASCII-only2018年06月03日 22:18:23 +00:00Commented Jun 3, 2018 at 22:18
J, (削除) 30 29 (削除ここまで) 25 bytes
1(|:(0{-.&,)"_1]\.)<\\.&>
<\\.&> a 3-dimensional array of substrings
1 |: transpose each matrix to sort the substrings by length
1 ]\. all choices where one word is missing
(0{-.&,)"_1 for every matrix, flatten, remove substrings
that are present in the corresponding complement,
pick first
Python 2, 116 bytes
def f(a):g=lambda s,S:s not in`set(a)-{S}`[3:]and min(s,g(s[1:],S),g(s[:-1],S),key=len)or S;return[g(s,s)for s in a]
JavaScript (ES6), 93 bytes
a=>a.map(s=>(L=s.length,g=n=>a.every(S=>S==s|!~S.search(u=s.substr(n%L,n/L+1)))?u:g(n+1))(0))
How?
For each string s of length L in the input array a[ ] and starting with n = 0, we use the recursive function g() to generate all substrings u of s with:
u = s.substr(n % L, n / L + 1)
For instance, with s = "abc" and L = 3:
n | n%L | floor(n/L+1) | u
---+-----+--------------+-------
0 | 0 | 1 | "a"
1 | 1 | 1 | "b"
2 | 2 | 1 | "c"
3 | 0 | 2 | "ab"
4 | 1 | 2 | "bc"
5 | 2 | 2 | "c"
6 | 0 | 3 | "abc"
7 | 1 | 3 | "bc"
8 | 2 | 3 | "c"
Some substrings are generated several times, but it doesn't matter. What's important is that all substrings of length N have been generated before any substring of length N+1.
We stop the process as soon as u cannot be found in any other string S in a[ ], which is guaranteed to happen when u == s in the worst case, as per challenge rule #2:
no string in the list will be a substring of any of the other strings
Therefore, in the above example, steps 7 and 8 will actually never be processed.
PowerShell, 107 bytes
($a=$args)|%{$(for($i=0;$i++-lt($g=($s=$_)|% Le*)){0..($g-$i)|%{$s|% s*g $_ $i}|?{!($a-match$_-ne$s)}})[0]}
Explanation
For each string supplied (and assign the whole array to $a
):
- Do a
for
loop over each substring length (1 based) of the string (assigning the string itself to$s
and the length to$g
) - For each length (
$i
):- Make an index loop, from 0 to to length -
$i
, then for each index:- Get the substring of the current string (
$s
) at position$_
(index) and of length$i
- Pass that substring to
Where-Object
(?
), and return it if:- The subset of array (
$a
) that does not contain the current string$s
, does not have a match for the current substring$_
- The subset of array (
- Get the substring of the current string (
- Make an index loop, from 0 to to length -
Back at the string level, we have all of the substrings of this string that were not found in the others, so take the first one [0]
since we only need one of them, then continue with the next string.
C# (Visual C# Interactive Compiler), 149 bytes
a=>a.Select(s=>{var t=s;for(int j=0,k,l=s.Length;j++<l;)for(k=-1;j+k++<l;)if(!a.Where(u=>s!=u&u.Contains(t=s.Substring(k,j))).Any())j=k=l;return t;})
Less golfed...
// a is an input array of strings
a=>
// iterate over input array
a.Select(s=>{
// t is the result string
var t=s;
// j is the substring length
for(int j=0,k,l=s.Length;j++<l;)
// k is the start index
for(k=-1;j+k++<l;)
// LINQ query to check if substring is valid
// the tested string is collected in t
if(!a.Where(u=>s!=u&u.Contains(t=s.Substring(k,j))).Any())
// break loops
j=k=l;
// return result
return t;
})
""
(empty string) uniquely identifying for the single"ppcg"
case? \$\endgroup\$