I've written a short python script that attempts to solve the problem of finding the longest common substring using the dynamic programming technique. It is meant to be generalised so I could plug in any number of strings and it would find the longest common substring.
def longest_common_substring(*strings):
table = defaultdict(int)
for pos in product(*(range(len(s)) for s in strings)):
same = len(set(s[i] for s, i in zip(strings, pos))) is 1
table[pos] = table[tuple(map(lambda n: n - 1, pos))] + 1 if same else 0
return max(table.items(), key=operator.itemgetter(1))
This works fine for a small number of short strings, but the space and time complexity absolutely blows up with longer strings.
I got this from wikipedia, and since this approach is clearly terrible for multiple longer strings (or maybe my implementation is just bad!?), I am wondering what I could do to improve it? Wikipedia also metions a generalised suffix trees... I am not familiar with them at all so would that be a better approach?
Also, if its my implementation, I'd love to know what's wrong and what I could do better in terms of space complexity.
1 Answer 1
Sorry I've not examined too closely your code to make a comment, but just considering the problem as stated and the fact you are using Py3, I would probably solve it with itertools.accumulate
, e.g.:
>>> import itertools as it
>>> import operator as op
>>> ss = ["thisishello", "dfdsishdllo", "ashsisdsdsf"]
>>> i, l = max(enumerate(it.accumulate(it.chain([0], zip(*ss)),
... lambda x, y: (x+1)*(len(set(y)) == 1))), key=op.itemgetter(1))
>>> i, l, ss[0][i-l:i]
(6, 3, 'sis')
Should work well for an arbitrary number of strings as it uses generators and doesn't create any intermediate data structures.
It does use the fact that a False
equates to 0 with len(set(y)) == 1
but if that is uncomfortable you could simply replace with 1 if len(set(y)) == 1 else 0
.
Note: I still wish that itertools.accumulate
had an initial value argument much like functools.reduce
has, would avoid the need of chain
ing the initial value to the iterable.
-
\$\begingroup\$ This is a very interesting approach, I've never really used
accumulate
much. Very interesting use case :) I'll be sure to test it out! \$\endgroup\$Pavlin– Pavlin2016年10月29日 18:06:47 +00:00Commented Oct 29, 2016 at 18:06
Explore related questions
See similar questions with these tags.