Making my DP algorithm faster - longest palindromic substring

Question 1

The following code is my solution to a LeetCode question - find the longest palindromic substring. My code is 100% correct, it passed once but it took too long, and in most of the reruns I hit a "time limit exceeded" error.

I would appreciate any suggestions on why it's taking too long and how to improve the performance.

class Solution:
 def longestPalindrome(self, s: str) -> str:
 n = len(s)
 # edge cases
 if n == 1:
 return s
 dp = [[0 for _ in range(n)] for _ in range(n)]
 for i in range(n):
 dp[i][i] = 1
 substring = s[0]
 
 # my dp is bottom-up and only works with the matrix cells to the right of the 
 # diagonal. I start from the very last row.
 
 for row in range(n-2,-1,-1):
 for col in range(row+1, n):
 if s[row] == s[col]:
 if col - row == 1 or dp[row+1][col-1]:
 dp[row][col] = 1
 if len(s[row:col+1]) > len(substring):
 substring = s[row:col+1]
 
 
 return substring

This is the last version of my code. Previous versions had slightly different implementations, for example:

I assigned dp[i][i] = 1 while initializing the dp. It looked like this:

dp = [[1 if (c==r) else 0 for c in range(n)] for r in range(n)].

But thinking about this at the assembly level, it was adding an extra few lines for the conditional for every iteration, which adds unnecessary overhead.
For comparing the length of the new palindrome, I used a maxLen variable, and then replaced it with what I have now: I thought that finding the length of a string (which is readily available) might take less time than incrementing the variable maxLen. Although I'm not sure if that's correct.

Question 2

Beware of the cost of string slicing

This piece of code creates an unnecessary string slice:

if len(s[row:col+1]) > len(substring):
 substring = s[row:col+1]

To create a string slice, Python has to allocate memory for the slice and copy memory content. This is equivalent without an unnecessary string slice:

if col + 1 - row > len(substring):
 substring = s[row:col+1]

This change alone seems to improve the performance by more than 10%, greatly increasing the success rate.

Related to this, creating substring is also unnecessary. Although in this case slices are only created when a longer substring is found, if this happens many times across all the test cases, the performance difference can be noticeable.

Instead of create the longest substring using a slice, you could track the start and end points:

longest_start = 0
longest_end = 1
for row in range(n-2, -1, -1):
 for col in range(row+1, n):
 if s[row] == s[col]:
 if col - row == 1 or dp[row+1][col-1]:
 dp[row][col] = 1
 if col + 1 - row > longest_end - longest_start:
 longest_start = row
 longest_end = col + 1
return s[longest_start:longest_end]

This seems to boosts the performance by another 10%+.

Minor things

A simpler way to initialize dp:

dp = [[0] * n for _ in range(n)]

And the special treatment for the edge case of n == 1 is unnecessary, the implementation handles that case correctly.

Question 3

Variable naming

I'd like to recommend the book Clean code by Robert C. Martin (Amazon link), which gives very concrete hints about variable naming. Most of your variable names don't describe what that variable means. For example:

dp describes the algorithm, not what the values in the matrix mean; maybe longest_palindrome is a bit long, but describes it better,
row and col are clearly indices, but don't describe what the rows and columns mean; starting_pos and palindrome_length describes better what is does (assuming this is the correct meaning of row and col; I'm not sure of that,
n is used for the string length; so name it accordingly: len_s

Now dp[row][col] doesn't tell me anything. While longest_palindrome[starting_pos][palindrome_length] is a more meaningful name. Again: not sure about the row/col meaning, but I'm sure you get the point.

Speeding it up

Your core question is about performance. Right now you have an algorithm that takes a squared amount of time (and memory) when scaling up: O(n^2). This means that if you double the string length, you need 2*2 = 4 times the time and memory. In general O(n^2) is quite bad: takes a long time, takes a lot of memory, so you should look at improving that.

In most of these challenges, you can (or must) exploit the properties of the information in the question to create a solution. In this case, you know that a palindrome consists of a mirrored part of the same characters. A palindrome of even length has the same central characters, a palindrome of odd length has one central character. So one approach is to loop over s, and see what is the length of the palindrome at the current character. This algorithm has O(n * log(n)). The log(n) comes from checking the length of the palindrome from a given character.

In summary: another algorithm is needed to speed-up your solution.

A possible implementation is given below as a reference. This implementation works with indices instead of string slicing (see the other answer) to further speedup the algorithm.

class Solution:
 def count_palindromic_chars(self, s: str, idx: int, even_length: bool) -> int:
 """ Returns the number of palindromic characters in s at position idx """
 len_s = len(s)
 shift = 0
 # Keep going until the string's end and characters are still palindrome
 while (idx - shift >= 0 and idx + shift + even_length < len_s and
 s[idx - shift] == s[idx + shift + even_length]):
 shift += 1
 return shift
 def longest_palindrome(self, s: str) -> str:
 """ Returns the first longest palindrome substring of s """
 longest = ''
 for n in range(0, len(s)):
 # split after nth char / palindromes of even length
 length = self.count_palindromic_chars(s, n, True)
 if 2 * length > len(longest):
 longest = s[n-length+1:n+length+1]
 # split at nth char / palindromes of odd length
 length = self.count_palindromic_chars(s, n, False)
 if 2 * length - 1 > len(longest):
 longest = s[n-length+1:n+length]
 return longest
testsuite = {
 # palindromes with odd length
 'abcdcth': 'cdc',
 'abacdfg': 'aba',
 'abcdefe': 'efe',
 # palindromes with even length
 'abbacdefg': 'abba',
 'abcdeffedcklm': 'cdeffedc',
 # even length; entire string is palindrome
 'abcddcba': 'abcddcba',
 # all the same characters
 'a': 'a',
 'aa': 'aa',
 'aaa': 'aaa',
 'aaaa': 'aaaa',
 # start with different character than all the same characters
 'ba': 'b', # first palindrome is 'b', not 'a'
 'baa': 'aa',
 'baaa': 'aaa',
 'baaaa': 'aaaa',
 # all the same characters and end with different character
 'aab': 'aa',
 'aaab': 'aaa',
 # palindrome of length 1
 'abcdef': 'a',
 # two palindromes in one string (last one longer)
 'abcbdefghhgfekl': 'efghhgfe',
 'abcbdefedh': 'defed'
}
s = Solution()
for case, result in testsuite.items():
 observed = s.longest_palindrome(case)
 print(f"{case} -> {observed}; "
 f"this is{' NOT' if observed != result else ''} correct")

Try it online!

janos janos 113k15 gold badges154 silver badges396 bronze badges · Accepted Answer · 2022-10-10 19:12:00Z

Beware of the cost of string slicing

This piece of code creates an unnecessary string slice:

if len(s[row:col+1]) > len(substring):
 substring = s[row:col+1]

To create a string slice, Python has to allocate memory for the slice and copy memory content. This is equivalent without an unnecessary string slice:

if col + 1 - row > len(substring):
 substring = s[row:col+1]

This change alone seems to improve the performance by more than 10%, greatly increasing the success rate.

Related to this, creating substring is also unnecessary. Although in this case slices are only created when a longer substring is found, if this happens many times across all the test cases, the performance difference can be noticeable.

Instead of create the longest substring using a slice, you could track the start and end points:

longest_start = 0
longest_end = 1
for row in range(n-2, -1, -1):
 for col in range(row+1, n):
 if s[row] == s[col]:
 if col - row == 1 or dp[row+1][col-1]:
 dp[row][col] = 1
 if col + 1 - row > longest_end - longest_start:
 longest_start = row
 longest_end = col + 1
return s[longest_start:longest_end]

This seems to boosts the performance by another 10%+.

Minor things

A simpler way to initialize dp:

dp = [[0] * n for _ in range(n)]

And the special treatment for the edge case of n == 1 is unnecessary, the implementation handles that case correctly.

Stack Exchange Network

Making my DP algorithm faster - longest palindromic substring

2 Answers 2

Beware of the cost of string slicing

Minor things

Variable naming

Speeding it up

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions