The following code is my solution to a LeetCode question - find the longest palindromic substring. My code is 100% correct, it passed once but it took too long, and in most of the reruns I hit a "time limit exceeded" error.
I would appreciate any suggestions on why it's taking too long and how to improve the performance.
class Solution:
def longestPalindrome(self, s: str) -> str:
n = len(s)
# edge cases
if n == 1:
return s
dp = [[0 for _ in range(n)] for _ in range(n)]
for i in range(n):
dp[i][i] = 1
substring = s[0]
# my dp is bottom-up and only works with the matrix cells to the right of the
# diagonal. I start from the very last row.
for row in range(n-2,-1,-1):
for col in range(row+1, n):
if s[row] == s[col]:
if col - row == 1 or dp[row+1][col-1]:
dp[row][col] = 1
if len(s[row:col+1]) > len(substring):
substring = s[row:col+1]
return substring
This is the last version of my code. Previous versions had slightly different implementations, for example:
I assigned
dp[i][i] = 1
while initializing the dp. It looked like this:dp = [[1 if (c==r) else 0 for c in range(n)] for r in range(n)]
.But thinking about this at the assembly level, it was adding an extra few lines for the conditional for every iteration, which adds unnecessary overhead.
For comparing the length of the new palindrome, I used a
maxLen
variable, and then replaced it with what I have now: I thought that finding the length of a string (which is readily available) might take less time than incrementing the variablemaxLen
. Although I'm not sure if that's correct.
2 Answers 2
Beware of the cost of string slicing
This piece of code creates an unnecessary string slice:
if len(s[row:col+1]) > len(substring): substring = s[row:col+1]
To create a string slice, Python has to allocate memory for the slice and copy memory content. This is equivalent without an unnecessary string slice:
if col + 1 - row > len(substring):
substring = s[row:col+1]
This change alone seems to improve the performance by more than 10%, greatly increasing the success rate.
Related to this, creating substring
is also unnecessary.
Although in this case slices are only created when a longer substring is found,
if this happens many times across all the test cases,
the performance difference can be noticeable.
Instead of create the longest substring using a slice, you could track the start and end points:
longest_start = 0
longest_end = 1
for row in range(n-2, -1, -1):
for col in range(row+1, n):
if s[row] == s[col]:
if col - row == 1 or dp[row+1][col-1]:
dp[row][col] = 1
if col + 1 - row > longest_end - longest_start:
longest_start = row
longest_end = col + 1
return s[longest_start:longest_end]
This seems to boosts the performance by another 10%+.
Minor things
A simpler way to initialize dp
:
dp = [[0] * n for _ in range(n)]
And the special treatment for the edge case of n == 1
is unnecessary,
the implementation handles that case correctly.
Variable naming
I'd like to recommend the book Clean code by Robert C. Martin (Amazon link), which gives very concrete hints about variable naming. Most of your variable names don't describe what that variable means. For example:
dp
describes the algorithm, not what the values in the matrix mean; maybelongest_palindrome
is a bit long, but describes it better,row
andcol
are clearly indices, but don't describe what the rows and columns mean;starting_pos
andpalindrome_length
describes better what is does (assuming this is the correct meaning ofrow
andcol
; I'm not sure of that,n
is used for the string length; so name it accordingly:len_s
Now dp[row][col]
doesn't tell me anything. While longest_palindrome[starting_pos][palindrome_length]
is a more meaningful name. Again: not sure about the row
/col
meaning, but I'm sure you get the point.
Speeding it up
Your core question is about performance. Right now you have an algorithm that takes a squared amount of time (and memory) when scaling up: O(n^2). This means that if you double the string length, you need 2*2 = 4 times the time and memory. In general O(n^2) is quite bad: takes a long time, takes a lot of memory, so you should look at improving that.
In most of these challenges, you can (or must) exploit the properties of the information in the question to create a solution. In this case, you know that a palindrome consists of a mirrored part of the same characters. A palindrome of even length has the same central characters, a palindrome of odd length has one central character. So one approach is to loop over s, and see what is the length of the palindrome at the current character. This algorithm has O(n * log(n)). The log(n) comes from checking the length of the palindrome from a given character.
In summary: another algorithm is needed to speed-up your solution.
A possible implementation is given below as a reference. This implementation works with indices instead of string slicing (see the other answer) to further speedup the algorithm.
class Solution:
def count_palindromic_chars(self, s: str, idx: int, even_length: bool) -> int:
""" Returns the number of palindromic characters in s at position idx """
len_s = len(s)
shift = 0
# Keep going until the string's end and characters are still palindrome
while (idx - shift >= 0 and idx + shift + even_length < len_s and
s[idx - shift] == s[idx + shift + even_length]):
shift += 1
return shift
def longest_palindrome(self, s: str) -> str:
""" Returns the first longest palindrome substring of s """
longest = ''
for n in range(0, len(s)):
# split after nth char / palindromes of even length
length = self.count_palindromic_chars(s, n, True)
if 2 * length > len(longest):
longest = s[n-length+1:n+length+1]
# split at nth char / palindromes of odd length
length = self.count_palindromic_chars(s, n, False)
if 2 * length - 1 > len(longest):
longest = s[n-length+1:n+length]
return longest
testsuite = {
# palindromes with odd length
'abcdcth': 'cdc',
'abacdfg': 'aba',
'abcdefe': 'efe',
# palindromes with even length
'abbacdefg': 'abba',
'abcdeffedcklm': 'cdeffedc',
# even length; entire string is palindrome
'abcddcba': 'abcddcba',
# all the same characters
'a': 'a',
'aa': 'aa',
'aaa': 'aaa',
'aaaa': 'aaaa',
# start with different character than all the same characters
'ba': 'b', # first palindrome is 'b', not 'a'
'baa': 'aa',
'baaa': 'aaa',
'baaaa': 'aaaa',
# all the same characters and end with different character
'aab': 'aa',
'aaab': 'aaa',
# palindrome of length 1
'abcdef': 'a',
# two palindromes in one string (last one longer)
'abcbdefghhgfekl': 'efghhgfe',
'abcbdefedh': 'defed'
}
s = Solution()
for case, result in testsuite.items():
observed = s.longest_palindrome(case)
print(f"{case} -> {observed}; "
f"this is{' NOT' if observed != result else ''} correct")
Explore related questions
See similar questions with these tags.