I am working on the class question find the longest common substring from 2 strings but I would like to return the longest common substring instead of just size of longest common substring.like
s1 = 'abcdex', s2 = 'xbcdey' => 'bcde'
s1 = 'abcxyx', s2 = 'xbcydey' => 'bc'
The way I do it is using dynamic programming to create a memo table and update each cell that required space \$O(n * m)\$. How can I implement the code so I can only using space as \$O(n)\$?
def longestCommonSubstring(self, s1, s2):
"""
:type s: str
:rtype: str
"""
m, n = len(s1), len(s2)
dp = [[''] * (n + 1) for _ in range(m + 1)]
max_len = ''
for i, ic in enumerate(s1):
for j, jc in enumerate(s2):
dp[i][j] = dp[i - 1][j - 1] + ic if ic == jc else ''
max_len = dp[i][j] if len(max_len) < len(dp[i][j]) else max_len
return max_len
-
3\$\begingroup\$ I have rolled-back the latest edit to your question. Please see What should I do when someone answers my question for what you can and/or should do when someone answers your question; in particular, the What should I not do section. \$\endgroup\$AJNeufeld– AJNeufeld2019年11月12日 05:25:25 +00:00Commented Nov 12, 2019 at 5:25
2 Answers 2
PEP 8
You are following most of the PEP 8 style guidelines. One that you are breaking is method names should be snake_case
; your function should be named longest_common_substring
.
Tricky Code
Your dp
matrix is properly allocated to the size m+1
by n+1
.
When you index your matrix, you access [i-1][j-1]
with \0ドル \le i \lt m\$ and \0ドル \le j \lt n\$. This means you never access the last allocated row m
or the last allocated column n
, but instead rely on accessing the -1
row and the -1
column wrapping around to reach these "unused" spaces. This is "surprising" code at best, and "crashing" code if translated to a different programming language.
It would be better to add one to the indices used to index the dp
matrix. The simplest way would be to start the i
and j
enumerations at one:
for i, ic in enumerate(s1, 1):
for j, jc in enumerate(s2, 1):
Useless else
Expand out this ... if ... else ...
statement:
max_len = dp[i][j] if len(max_len) < len(dp[i][j]) else max_len
Initially, this produces:
if len(max_len) < len(dp[i][j]):
max_len = dp[i][j]
else:
max_len = max_len
But we can immediately see the else:
clause is a no-op, and can be removed:
if len(max_len) < len(dp[i][j]):
max_len = dp[i][j]
Which reads much more clearly than the original.
From \$O(n m)\$ to \$O(n)\$ space
During the first iteration of outer loop, you only access rows -1
and 0
. During the second iteration of outer loop, you only access rows 0
and 1
. During the third iteration of outer loop, you only access rows 1
and 2
. Etc. You only need two rows of the dp
matrix!
More over, you create the 0
row from the -1
row, you create the 1
from the 0
row, you create the 2
row from the 1
row, and so on.
Do you really need to keep the dp
matrix? Or could you use a previous_row
and a current_row
? Only storing two length n
rows reduces your space to \$O(2n)\$, which is simply \$O(n)\$.
-
\$\begingroup\$ thank you for your hint, just update the question with space O(n) solution \$\endgroup\$jacobcan118– jacobcan1182019年11月12日 05:19:46 +00:00Commented Nov 12, 2019 at 5:19
-
4\$\begingroup\$ And I've just rolled-back your update. Editing your question after it has been answered invalidates answers. Adding new code creates confusion (are the answers referring to the old code, the new code, both?). If you want to post your updated code for additional review, it must be posted as a new question. \$\endgroup\$AJNeufeld– AJNeufeld2019年11月12日 05:27:46 +00:00Commented Nov 12, 2019 at 5:27
You don't specify what n
is, so I'm going to assume that n
is the total combined length of the two strings s1
and s2
that are input.
Imagine you have a hashing function h(s)
that computes a fixed-size hash value for a given string s
. That is, given a string s = "abc"
the function h(s)
will return some number X
that is computed from s
, so that changing the contents of s
will change the returned value X
.
If you keep an array of hash values for the substrings of a given length k
for each starting position i
, you will have a total size of sizeof(h(s1)) * (n - 2k + 2)
which is easily \$O(n)\$.
So, start from the longest possible substring and work your way down, computing the hashes of all substrings then checking to see if the two strings share a common substring. If two hashes are equal you can compare the resulting substrings and return if you find a match. If you get to k==1
and find no match, then there's no match to be had.
With that outline, I'll point out that python provides a built-in called hash()
that does what you want.
-
2\$\begingroup\$ This is not a review of the OP's code, but an alternate solution. \$\endgroup\$AJNeufeld– AJNeufeld2019年11月12日 04:40:52 +00:00Commented Nov 12, 2019 at 4:40
Explore related questions
See similar questions with these tags.