Shortest Uniquely Identifying Substrings

Question 1

Given a list of strings, replace each string by one of its non-empty substrings which is not a substring of any of the other strings in the list and as short as possible.

Example

Given the list ["hello","hallo","hola"], "hello" should be replaced by just "e" as this substring is not contained in "hallo" and "hola" and it is as short as possible. "hallo" could be replaced by either "ha" or "al" and "hola" by any of "ho", "ol" or "la".

Rules

You can assume that the strings will be non-empty and only contain alphabetical characters of the same case.
You can assume that such a substring exists for each string in the list, i.e. no string in the list will be a substring of any of the other strings.
Input and output can be in any reasonable format.
This is code-golf, so try to use as few bytes as possible in the language of your choice.

Test Cases

Only one possible output is given for most cases.

["ppcg"] -> ["p"] (or ["c"] or ["g"])
["hello","hallo","hola"] -> ["e","ha","ho"]
["abc","bca","bac"] -> ["ab","ca","ba"]
["abc","abd","dbc"] -> ["abc","bd","db"]
["lorem","ipsum","dolor","sit","amet"] -> ["re","p","d","si","a"]
["abc","acb","bac","bca","cab","cba"] -> ["abc","acb","bac","bca","cab","cba"]

^{Related: Shortest Identifying Substring - similar idea, but more involved rules and cumbersome format.}

Question 2

Why isnt "" (empty string) uniquely identifying for the single "ppcg" case?

Question 3

@MooseBoys Given a list of strings, replace each string by one of it's non-empty substrings

Question 4

Harder related challenge

Question 5

Easier related challenge

Question 6

Jelly, (削除) 12 (削除ここまで) 9 bytes

Ẇ€ŒpẇþÞ8Ḣ

Try it online! Or see the test-suite (takes ~35s)

Question 7

Pyth, 12 bytes

mhf!ts}LTQ.:

Try it here!

How it works

Basicaly filters the substrings of each that only occur in one of the strings in the list (that is, it is unique to that string) and gets the first one.

mhf!ts}LTQ.: Full program, Q=eval(stdin_input())
m .: Map over Q and obtain all the substrings of each.
 f And filter-keep those that satisfy (var: T)...
 }LTQ ... For each string in Q, yield 1 if it contains T, else 0.
 !ts ... Sum the list, decrement and negate. 
 h Head. Yields the first valid substring, which is always the shortest.

Question 8

Prolog (SWI), (削除) 175 (削除ここまで) 163 bytes

S/L/R:-sub_string(S,_,L,_,R).
[H|T]+[I|R]:-string_length(H,L),between(1,L,X),H/X/I,T+R.
R+R.
L-R:-L+R,forall(member(E,L),findall(_,(member(F,R),\+ \+ E/_/F),[_])).

Try it online!

Most of the things here should be fairly obvious, but:

Explanation

Signatures: (+ = input, ? = optional, - = output, : = expression)

sub_string(+String, ?Before, ?Length, ?After, ?SubString)
string_length(+String, -Length)
member(?Elem, ?List)
between(+Low, +High, ?Value)
findall(+Template, :Goal, -Bag)
forall(:Cond, :Action)

\+ \+ is just not not (i.e. converts a match to boolean (in this case, prevents it from matching both ps in ppcg separately))

Question 9

The right tool for the job :P except for the fact that it's mind-blowingly verbose

Question 10

APL (Dyalog), 25 bytes

Thanks ngn for saving one byte

⊃ ̈1↓≢~⍨/(,⍨(,∘↑⍳∘≢,/ ̈⊂) ̈)

Try it online!

Question 11

J, (削除) 30 29 (削除ここまで) 25 bytes

1(|:(0{-.&,)"_1]\.)<\\.&>

Try it online!

 <\\.&> a 3-dimensional array of substrings
1 |: transpose each matrix to sort the substrings by length
1 ]\. all choices where one word is missing
 (0{-.&,)"_1 for every matrix, flatten, remove substrings
 that are present in the corresponding complement,
 pick first

Question 12

Jelly, 10 bytes

Ẇ€ṙƬ1ḟ/Ḣ$€

Try it online!

Question 13

Python 2, 116 bytes

def f(a):g=lambda s,S:s not in`set(a)-{S}`[3:]and min(s,g(s[1:],S),g(s[:-1],S),key=len)or S;return[g(s,s)for s in a]

Try it online!

Question 14

JavaScript (ES6), 93 bytes

a=>a.map(s=>(L=s.length,g=n=>a.every(S=>S==s|!~S.search(u=s.substr(n%L,n/L+1)))?u:g(n+1))(0))

Try it online!

How?

For each string s of length L in the input array a[ ] and starting with n = 0, we use the recursive function g() to generate all substrings u of s with:

u = s.substr(n % L, n / L + 1)

For instance, with s = "abc" and L = 3:

 n | n%L | floor(n/L+1) | u
---+-----+--------------+-------
 0 | 0 | 1 | "a"
 1 | 1 | 1 | "b"
 2 | 2 | 1 | "c"
 3 | 0 | 2 | "ab"
 4 | 1 | 2 | "bc"
 5 | 2 | 2 | "c"
 6 | 0 | 3 | "abc"
 7 | 1 | 3 | "bc"
 8 | 2 | 3 | "c"

Some substrings are generated several times, but it doesn't matter. What's important is that all substrings of length N have been generated before any substring of length N+1.

We stop the process as soon as u cannot be found in any other string S in a[ ], which is guaranteed to happen when u == s in the worst case, as per challenge rule #2:

no string in the list will be a substring of any of the other strings

Therefore, in the above example, steps 7 and 8 will actually never be processed.

Question 15

PowerShell, 107 bytes

($a=$args)|%{$(for($i=0;$i++-lt($g=($s=$_)|% Le*)){0..($g-$i)|%{$s|% s*g $_ $i}|?{!($a-match$_-ne$s)}})[0]}

Try it online!

Explanation

For each string supplied (and assign the whole array to $a):

Do a for loop over each substring length (1 based) of the string (assigning the string itself to $s and the length to $g)
For each length ($i):
- Make an index loop, from 0 to to length - $i, then for each index:
  - Get the substring of the current string ($s) at position $_ (index) and of length $i
  - Pass that substring to Where-Object (?), and return it if:
    - The subset of array ($a) that does not contain the current string $s, does not have a match for the current substring $_

Back at the string level, we have all of the substrings of this string that were not found in the others, so take the first one [0] since we only need one of them, then continue with the next string.

Question 16

C# (Visual C# Interactive Compiler), 149 bytes

a=>a.Select(s=>{var t=s;for(int j=0,k,l=s.Length;j++<l;)for(k=-1;j+k++<l;)if(!a.Where(u=>s!=u&u.Contains(t=s.Substring(k,j))).Any())j=k=l;return t;})

Try it online!

Less golfed...

// a is an input array of strings
a=>
 // iterate over input array 
 a.Select(s=>{
 // t is the result string
 var t=s;
 // j is the substring length
 for(int j=0,k,l=s.Length;j++<l;)
 // k is the start index
 for(k=-1;j+k++<l;)
 // LINQ query to check if substring is valid
 // the tested string is collected in t
 if(!a.Where(u=>s!=u&u.Contains(t=s.Substring(k,j))).Any())
 // break loops
 j=k=l;
 // return result
 return t;
 })

score 5 · Accepted Answer · 2018-06-02 23:13:53Z

5

\$\begingroup\$

Jelly, (削除) 12 (削除ここまで) 9 bytes

Ẇ€ŒpẇþÞ8Ḣ

Try it online! Or see the test-suite (takes ~35s)

Share

Improve this answer

edited Jun 2, 2018 at 23:41

answered Jun 2, 2018 at 23:13

Jonathan Allan's user avatar

Jonathan Allan Jonathan Allan

115k8 gold badges68 silver badges292 bronze badges

\$\endgroup\$

Add a comment |

Stack Exchange Network

Shortest Uniquely Identifying Substrings

Example

Rules

Test Cases

10 Answers 10

Jelly, (削除) 12 (削除ここまで) 9 bytes

Pyth, 12 bytes

How it works

Prolog (SWI), (削除) 175 (削除ここまで) 163 bytes

Explanation

APL (Dyalog), 25 bytes

J, (削除) 30 29 (削除ここまで) 25 bytes

Jelly, 10 bytes

Python 2, 116 bytes

JavaScript (ES6), 93 bytes

How?

PowerShell, 107 bytes

Explanation

C# (Visual C# Interactive Compiler), 149 bytes

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions