I was working on a problem set and I came across this problem:
Assume
s
is a string of lower case characters.Write a program that prints the longest substring of
s
in which the letters occur in alphabetical order. For example, ifs = 'azcbobobegghakl'
, then your program should print:Longest substring in alphabetical order is: beggh
In the case of ties, print the first substring. For example, if
s = 'abcbcd'
, then your program should printLongest substring in alphabetical order is: abc
I actually was able to get a solution, but it was the most inefficient ever! And here's what I came up with:
def test():
index = 1
prev_index = 0
count = 0
global largest
largest = ''
test = s[prev_index]
while count < len(s):
if ord(s[index]) > ord(s[prev_index]):
test += s[index]
index += 1
prev_index += 1
elif ord(s[index]) == ord(s[prev_index]):
test += s[index]
index += 1
prev_index += 1
else:
if len(largest) < len(test):
largest = test[:]
test = s[index]
prev_index += 1
index += 1
count += 1
return largest
try:
test()
except IndexError:
pass
finally:
print largest
Can anyone give me an example of better code that produces the same output?
3 Answers 3
Your algorithm is basically fine, but it is expressed in the code clumsily.
The biggest problem is the interface of the function: it should accept a string parameter and return a string. Instead, your function takes s
from its environment and uses the global variable largest
to store the answer. Although there is a return largest
statement, it is very misleading. What actually happens is that s[index]
raises an IndexError
, so that the return largest
is never reached, and the answer is passed back using the global variable instead.
The root cause for the IndexError
is that there are too many variables, causing confusion. With each iteration through the loop, you increment index
, prev_index
, and count
. That means that those three variables could be reduced to one, which could just be called i
. (If you think about it, you'll find that count
is always one less than index
. Since the loop condition is count < len(s)
, s[index]
will go one position beyond the end of the string.)
The next observation to make is that your if
and elif
blocks are almost identical. You can combine them by using >=
for the test.
Python strings are immutable, so largest = test[:]
could just be largest = test
.
Putting everything together...
def longest_nondecreasing_substring(s):
if s is None or len(s) <= 1:
return s
longest = test = s[0]
for i in range(1, len(s)):
if ord(s[i]) >= ord(s[i - 1]):
test += s[i]
else:
if len(longest) < len(test):
longest = test
test = s[i]
if len(longest) < len(test):
longest = test
return longest
-
\$\begingroup\$ What, no docstring? \$\endgroup\$Gareth Rees– Gareth Rees2013年10月28日 10:26:22 +00:00Commented Oct 28, 2013 at 10:26
This answer doesn't have docstrings (Python isn't my main language), but the basic idea is to just start at every character and see how far you can go:
def longest_nondecreasing(s):
start = -1
maxlen = -1
for i in range(len(s)):
for j in range(i+1, len(s)):
if s[j] < s[i]:
break
else: # we made it to the end!
j = len(s)
#so now our nondecreasing string goes from i to j
if j - i > maxlen:
maxlen = j - i
start = i
return s[start:start + maxlen]
Once we have a working solution, we can work on making some optimizations. For instance, if we have one contiguous non-decreasing sequence from say... 4 to 9, we definitely know that the longest non-decreasing sequence starting at 5 will also end at 9. Hence after the j
loop we can just set i
to j
.
-
\$\begingroup\$ The Python tutorial explains how to write documentation strings. It's not hard! \$\endgroup\$Gareth Rees– Gareth Rees2013年10月28日 19:28:57 +00:00Commented Oct 28, 2013 at 19:28
It's the famous Longest increasing subsequence problem. Convert the chars to ints using the ord function
def longest_increasing_subsequence(X):
"""
Find and return longest increasing subsequence of S.
If multiple increasing subsequences exist, the one that ends
with the smallest value is preferred, and if multiple
occurrences of that value can end the sequence, then the
earliest occurrence is preferred.
"""
n = len(X)
X = [None] + X # Pad sequence so that it starts at X[1]
M = [None]*(n+1) # Allocate arrays for M and P
P = [None]*(n+1)
L = 0
for i in range(1,n+1):
if L == 0 or X[M[1]] >= X[i]:
# there is no j s.t. X[M[j]] < X[i]]
j = 0
else:
# binary search for the largest j s.t. X[M[j]] < X[i]]
lo = 1 # largest value known to be <= j
hi = L+1 # smallest value known to be > j
while lo < hi - 1:
mid = (lo + hi)//2
if X[M[mid]] < X[i]:
lo = mid
else:
hi = mid
j = lo
P[i] = M[j]
if j == L or X[i] < X[M[j+1]]:
M[j+1] = i
L = max(L,j+1)
# Backtrack to find the optimal sequence in reverse order
output = []
pos = M[L]
while L > 0:
output.append(X[pos])
pos = P[pos]
L -= 1
output.reverse()
return output
def main():
TEST_STRING = 'azcbobobegghakl'
print(''.join(map(chr, longest_increasing_subsequence(map(ord, TEST_STRING)))))
if __name__ == '__main__':
main()
-
\$\begingroup\$ In the OP's problem it's clear that the sequence must be contiguous (consider the first example, where the answer is
beggh
, notabegghkl
). So your answer is solving a different problem. \$\endgroup\$Gareth Rees– Gareth Rees2013年10月26日 18:54:54 +00:00Commented Oct 26, 2013 at 18:54 -
\$\begingroup\$ The example was invisible and there aren't any word about contiguous sequence only. \$\endgroup\$cat_baxter– cat_baxter2013年10月26日 19:16:44 +00:00Commented Oct 26, 2013 at 19:16
-
\$\begingroup\$ The OP may have been poorly formatted but the example was not invisible if you read it carefully. \$\endgroup\$Gareth Rees– Gareth Rees2013年10月26日 19:20:53 +00:00Commented Oct 26, 2013 at 19:20
Explore related questions
See similar questions with these tags.