find longest common substring in space O(n)

Question 1

I am working on the class question find the longest common substring from 2 strings but I would like to return the longest common substring instead of just size of longest common substring.like

s1 = 'abcdex', s2 = 'xbcdey' => 'bcde'
s1 = 'abcxyx', s2 = 'xbcydey' => 'bc'

The way I do it is using dynamic programming to create a memo table and update each cell that required space \$O(n * m)\$. How can I implement the code so I can only using space as \$O(n)\$?

 def longestCommonSubstring(self, s1, s2):
 """
 :type s: str
 :rtype: str
 """
 m, n = len(s1), len(s2)
 dp = [[''] * (n + 1) for _ in range(m + 1)]
 max_len = ''
 for i, ic in enumerate(s1):
 for j, jc in enumerate(s2):
 dp[i][j] = dp[i - 1][j - 1] + ic if ic == jc else ''
 max_len = dp[i][j] if len(max_len) < len(dp[i][j]) else max_len
 return max_len

Question 2

I have rolled-back the latest edit to your question. Please see What should I do when someone answers my question for what you can and/or should do when someone answers your question; in particular, the What should I not do section.

Question 3

PEP 8

You are following most of the PEP 8 style guidelines. One that you are breaking is method names should be snake_case; your function should be named longest_common_substring.

Tricky Code

Your dp matrix is properly allocated to the size m+1 by n+1.

When you index your matrix, you access [i-1][j-1] with \0ドル \le i \lt m\$ and \0ドル \le j \lt n\$. This means you never access the last allocated row m or the last allocated column n, but instead rely on accessing the -1 row and the -1 column wrapping around to reach these "unused" spaces. This is "surprising" code at best, and "crashing" code if translated to a different programming language.

It would be better to add one to the indices used to index the dp matrix. The simplest way would be to start the i and j enumerations at one:

 for i, ic in enumerate(s1, 1):
 for j, jc in enumerate(s2, 1):

Useless `else`

Expand out this ... if ... else ... statement:

 max_len = dp[i][j] if len(max_len) < len(dp[i][j]) else max_len

Initially, this produces:

 if len(max_len) < len(dp[i][j]):
 max_len = dp[i][j]
 else:
 max_len = max_len

But we can immediately see the else: clause is a no-op, and can be removed:

 if len(max_len) < len(dp[i][j]):
 max_len = dp[i][j]

Which reads much more clearly than the original.

From \$O(n m)\$ to \$O(n)\$ space

During the first iteration of outer loop, you only access rows -1 and 0. During the second iteration of outer loop, you only access rows 0 and 1. During the third iteration of outer loop, you only access rows 1 and 2. Etc. You only need two rows of the dp matrix!

More over, you create the 0 row from the -1 row, you create the 1 from the 0 row, you create the 2 row from the 1 row, and so on.

Do you really need to keep the dp matrix? Or could you use a previous_row and a current_row? Only storing two length n rows reduces your space to \$O(2n)\$, which is simply \$O(n)\$.

Question 4

thank you for your hint, just update the question with space O(n) solution

Question 5

And I've just rolled-back your update. Editing your question after it has been answered invalidates answers. Adding new code creates confusion (are the answers referring to the old code, the new code, both?). If you want to post your updated code for additional review, it must be posted as a new question.

Question 6

You don't specify what n is, so I'm going to assume that n is the total combined length of the two strings s1 and s2 that are input.

Imagine you have a hashing function h(s) that computes a fixed-size hash value for a given string s. That is, given a string s = "abc" the function h(s) will return some number X that is computed from s, so that changing the contents of s will change the returned value X.

If you keep an array of hash values for the substrings of a given length k for each starting position i, you will have a total size of sizeof(h(s1)) * (n - 2k + 2) which is easily \$O(n)\$.

So, start from the longest possible substring and work your way down, computing the hashes of all substrings then checking to see if the two strings share a common substring. If two hashes are equal you can compare the resulting substrings and return if you find a match. If you get to k==1 and find no match, then there's no match to be had.

With that outline, I'll point out that python provides a built-in called hash() that does what you want.

Question 7

This is not a review of the OP's code, but an alternate solution.

AJNeufeld AJNeufeld 35.2k5 gold badges41 silver badges103 bronze badges · Answer 1 · 2019-11-12 04:33:36Z

PEP 8

You are following most of the PEP 8 style guidelines. One that you are breaking is method names should be snake_case; your function should be named longest_common_substring.

Tricky Code

Your dp matrix is properly allocated to the size m+1 by n+1.

When you index your matrix, you access [i-1][j-1] with \0ドル \le i \lt m\$ and \0ドル \le j \lt n\$. This means you never access the last allocated row m or the last allocated column n, but instead rely on accessing the -1 row and the -1 column wrapping around to reach these "unused" spaces. This is "surprising" code at best, and "crashing" code if translated to a different programming language.

It would be better to add one to the indices used to index the dp matrix. The simplest way would be to start the i and j enumerations at one:

 for i, ic in enumerate(s1, 1):
 for j, jc in enumerate(s2, 1):

Useless `else`

Expand out this ... if ... else ... statement:

 max_len = dp[i][j] if len(max_len) < len(dp[i][j]) else max_len

Initially, this produces:

 if len(max_len) < len(dp[i][j]):
 max_len = dp[i][j]
 else:
 max_len = max_len

But we can immediately see the else: clause is a no-op, and can be removed:

 if len(max_len) < len(dp[i][j]):
 max_len = dp[i][j]

Which reads much more clearly than the original.

From \$O(n m)\$ to \$O(n)\$ space

During the first iteration of outer loop, you only access rows -1 and 0. During the second iteration of outer loop, you only access rows 0 and 1. During the third iteration of outer loop, you only access rows 1 and 2. Etc. You only need two rows of the dp matrix!

More over, you create the 0 row from the -1 row, you create the 1 from the 0 row, you create the 2 row from the 1 row, and so on.

Do you really need to keep the dp matrix? Or could you use a previous_row and a current_row? Only storing two length n rows reduces your space to \$O(2n)\$, which is simply \$O(n)\$.

thank you for your hint, just update the question with space O(n) solution
And I've just rolled-back your update. Editing your question after it has been answered invalidates answers. Adding new code creates confusion (are the answers referring to the old code, the new code, both?). If you want to post your updated code for additional review, it must be posted as a new question.

aghast aghast 12.6k25 silver badges46 bronze badges · Answer 2 · 2019-11-12 03:42:29Z

You don't specify what n is, so I'm going to assume that n is the total combined length of the two strings s1 and s2 that are input.

Imagine you have a hashing function h(s) that computes a fixed-size hash value for a given string s. That is, given a string s = "abc" the function h(s) will return some number X that is computed from s, so that changing the contents of s will change the returned value X.

If you keep an array of hash values for the substrings of a given length k for each starting position i, you will have a total size of sizeof(h(s1)) * (n - 2k + 2) which is easily \$O(n)\$.

So, start from the longest possible substring and work your way down, computing the hashes of all substrings then checking to see if the two strings share a common substring. If two hashes are equal you can compare the resulting substrings and return if you find a match. If you get to k==1 and find no match, then there's no match to be had.

With that outline, I'll point out that python provides a built-in called hash() that does what you want.

This is not a review of the OP's code, but an alternate solution.

Stack Exchange Network

find longest common substring in space O(n)

2 Answers 2

PEP 8

Tricky Code

Useless `else`

From \$O(n m)\$ to \$O(n)\$ space

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

find longest common substring in space O(n)

2 Answers 2

PEP 8

Tricky Code

Useless else

From \$O(n m)\$ to \$O(n)\$ space

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

Useless `else`