Improving efficiency for finding longest contiguous non-decreasing substring

Question 1

I was working on a problem set and I came across this problem:

Assume s is a string of lower case characters.

Write a program that prints the longest substring of s in which the letters occur in alphabetical order. For example, if s = 'azcbobobegghakl', then your program should print:
Longest substring in alphabetical order is: beggh
In the case of ties, print the first substring. For example, if s = 'abcbcd', then your program should print
Longest substring in alphabetical order is: abc

I actually was able to get a solution, but it was the most inefficient ever! And here's what I came up with:

def test():
 index = 1
 prev_index = 0
 count = 0
 global largest
 largest = ''
 test = s[prev_index]
 while count < len(s):
 if ord(s[index]) > ord(s[prev_index]):
 test += s[index]
 index += 1
 prev_index += 1
 elif ord(s[index]) == ord(s[prev_index]):
 test += s[index]
 index += 1
 prev_index += 1
 else:
 if len(largest) < len(test):
 largest = test[:]
 test = s[index]
 prev_index += 1
 index += 1
 count += 1
 return largest
try:
 test()
except IndexError:
 pass
finally:
 print largest

Can anyone give me an example of better code that produces the same output?

Question 2

Your algorithm is basically fine, but it is expressed in the code clumsily.

The biggest problem is the interface of the function: it should accept a string parameter and return a string. Instead, your function takes s from its environment and uses the global variable largest to store the answer. Although there is a return largest statement, it is very misleading. What actually happens is that s[index] raises an IndexError, so that the return largest is never reached, and the answer is passed back using the global variable instead.

The root cause for the IndexError is that there are too many variables, causing confusion. With each iteration through the loop, you increment index, prev_index, and count. That means that those three variables could be reduced to one, which could just be called i. (If you think about it, you'll find that count is always one less than index. Since the loop condition is count < len(s), s[index] will go one position beyond the end of the string.)

The next observation to make is that your if and elif blocks are almost identical. You can combine them by using >= for the test.

Python strings are immutable, so largest = test[:] could just be largest = test.

Putting everything together...

def longest_nondecreasing_substring(s):
 if s is None or len(s) <= 1:
 return s
 longest = test = s[0]
 for i in range(1, len(s)):
 if ord(s[i]) >= ord(s[i - 1]):
 test += s[i]
 else:
 if len(longest) < len(test):
 longest = test
 test = s[i]
 if len(longest) < len(test):
 longest = test
 return longest

Question 3

What, no docstring?

Question 4

This answer doesn't have docstrings (Python isn't my main language), but the basic idea is to just start at every character and see how far you can go:

def longest_nondecreasing(s):
 start = -1
 maxlen = -1
 for i in range(len(s)): 
 for j in range(i+1, len(s)):
 if s[j] < s[i]: 
 break
 else: # we made it to the end!
 j = len(s)
 #so now our nondecreasing string goes from i to j
 if j - i > maxlen:
 maxlen = j - i
 start = i
 return s[start:start + maxlen]

Once we have a working solution, we can work on making some optimizations. For instance, if we have one contiguous non-decreasing sequence from say... 4 to 9, we definitely know that the longest non-decreasing sequence starting at 5 will also end at 9. Hence after the j loop we can just set i to j.

Question 5

The Python tutorial explains how to write documentation strings. It's not hard!

Question 6

It's the famous Longest increasing subsequence problem. Convert the chars to ints using the ord function

def longest_increasing_subsequence(X):
 """
 Find and return longest increasing subsequence of S.
 If multiple increasing subsequences exist, the one that ends
 with the smallest value is preferred, and if multiple
 occurrences of that value can end the sequence, then the
 earliest occurrence is preferred.
 """
 n = len(X)
 X = [None] + X # Pad sequence so that it starts at X[1]
 M = [None]*(n+1) # Allocate arrays for M and P
 P = [None]*(n+1)
 L = 0
 for i in range(1,n+1):
 if L == 0 or X[M[1]] >= X[i]:
 # there is no j s.t. X[M[j]] < X[i]]
 j = 0
 else:
 # binary search for the largest j s.t. X[M[j]] < X[i]]
 lo = 1 # largest value known to be <= j
 hi = L+1 # smallest value known to be > j
 while lo < hi - 1:
 mid = (lo + hi)//2
 if X[M[mid]] < X[i]:
 lo = mid
 else:
 hi = mid
 j = lo
 P[i] = M[j]
 if j == L or X[i] < X[M[j+1]]:
 M[j+1] = i
 L = max(L,j+1)
 # Backtrack to find the optimal sequence in reverse order
 output = []
 pos = M[L]
 while L > 0:
 output.append(X[pos])
 pos = P[pos]
 L -= 1
 output.reverse()
 return output
def main():
 TEST_STRING = 'azcbobobegghakl'
 print(''.join(map(chr, longest_increasing_subsequence(map(ord, TEST_STRING)))))
if __name__ == '__main__':
 main()

Question 7

In the OP's problem it's clear that the sequence must be contiguous (consider the first example, where the answer is beggh, not abegghkl). So your answer is solving a different problem.

Question 8

The example was invisible and there aren't any word about contiguous sequence only.

Question 9

The OP may have been poorly formatted but the example was not invisible if you read it carefully.

200_success 200_success 146k22 gold badges190 silver badges479 bronze badges · Accepted Answer · 2013-10-26 22:16:51Z

Your algorithm is basically fine, but it is expressed in the code clumsily.

The biggest problem is the interface of the function: it should accept a string parameter and return a string. Instead, your function takes s from its environment and uses the global variable largest to store the answer. Although there is a return largest statement, it is very misleading. What actually happens is that s[index] raises an IndexError, so that the return largest is never reached, and the answer is passed back using the global variable instead.

The root cause for the IndexError is that there are too many variables, causing confusion. With each iteration through the loop, you increment index, prev_index, and count. That means that those three variables could be reduced to one, which could just be called i. (If you think about it, you'll find that count is always one less than index. Since the loop condition is count < len(s), s[index] will go one position beyond the end of the string.)

The next observation to make is that your if and elif blocks are almost identical. You can combine them by using >= for the test.

Python strings are immutable, so largest = test[:] could just be largest = test.

Putting everything together...

def longest_nondecreasing_substring(s):
 if s is None or len(s) <= 1:
 return s
 longest = test = s[0]
 for i in range(1, len(s)):
 if ord(s[i]) >= ord(s[i - 1]):
 test += s[i]
 else:
 if len(longest) < len(test):
 longest = test
 test = s[i]
 if len(longest) < len(test):
 longest = test
 return longest

\$\begingroup\$ What, no docstring? \$\endgroup\$

Gareth Rees
– Gareth Rees

2013年10月28日 10:26:22 +00:00
Commented Oct 28, 2013 at 10:26

Stack Exchange Network

Improving efficiency for finding longest contiguous non-decreasing substring

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Improving efficiency for finding longest contiguous non-decreasing substring

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions