Longest common substring using dynamic programming

Question 1

I've written a short python script that attempts to solve the problem of finding the longest common substring using the dynamic programming technique. It is meant to be generalised so I could plug in any number of strings and it would find the longest common substring.

def longest_common_substring(*strings):
 table = defaultdict(int)
 for pos in product(*(range(len(s)) for s in strings)):
 same = len(set(s[i] for s, i in zip(strings, pos))) is 1
 table[pos] = table[tuple(map(lambda n: n - 1, pos))] + 1 if same else 0
 return max(table.items(), key=operator.itemgetter(1))

This works fine for a small number of short strings, but the space and time complexity absolutely blows up with longer strings.

I got this from wikipedia, and since this approach is clearly terrible for multiple longer strings (or maybe my implementation is just bad!?), I am wondering what I could do to improve it? Wikipedia also metions a generalised suffix trees... I am not familiar with them at all so would that be a better approach?

Also, if its my implementation, I'd love to know what's wrong and what I could do better in terms of space complexity.

Question 2

Sorry I've not examined too closely your code to make a comment, but just considering the problem as stated and the fact you are using Py3, I would probably solve it with itertools.accumulate, e.g.:

>>> import itertools as it
>>> import operator as op
>>> ss = ["thisishello", "dfdsishdllo", "ashsisdsdsf"]
>>> i, l = max(enumerate(it.accumulate(it.chain([0], zip(*ss)),
... lambda x, y: (x+1)*(len(set(y)) == 1))), key=op.itemgetter(1))
>>> i, l, ss[0][i-l:i]
(6, 3, 'sis')

Should work well for an arbitrary number of strings as it uses generators and doesn't create any intermediate data structures.
It does use the fact that a False equates to 0 with len(set(y)) == 1 but if that is uncomfortable you could simply replace with 1 if len(set(y)) == 1 else 0.

Note: I still wish that itertools.accumulate had an initial value argument much like functools.reduce has, would avoid the need of chaining the initial value to the iterable.

Question 3

This is a very interesting approach, I've never really used accumulate much. Very interesting use case :) I'll be sure to test it out!

AChampion AChampion 4962 silver badges9 bronze badges · Answer 1 · 2016-10-29 15:08:39Z

Sorry I've not examined too closely your code to make a comment, but just considering the problem as stated and the fact you are using Py3, I would probably solve it with itertools.accumulate, e.g.:

>>> import itertools as it
>>> import operator as op
>>> ss = ["thisishello", "dfdsishdllo", "ashsisdsdsf"]
>>> i, l = max(enumerate(it.accumulate(it.chain([0], zip(*ss)),
... lambda x, y: (x+1)*(len(set(y)) == 1))), key=op.itemgetter(1))
>>> i, l, ss[0][i-l:i]
(6, 3, 'sis')

Should work well for an arbitrary number of strings as it uses generators and doesn't create any intermediate data structures.
It does use the fact that a False equates to 0 with len(set(y)) == 1 but if that is uncomfortable you could simply replace with 1 if len(set(y)) == 1 else 0.

Note: I still wish that itertools.accumulate had an initial value argument much like functools.reduce has, would avoid the need of chaining the initial value to the iterable.

This is a very interesting approach, I've never really used accumulate much. Very interesting use case :) I'll be sure to test it out!

Stack Exchange Network

Longest common substring using dynamic programming

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Longest common substring using dynamic programming

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions