Finding the longest substring in a string

Question 1

I am writing Python code to find the longest substring in a string which is an alphabetical order. I tried to minimize the number of loops to a single for loop.

Any improvements would be appreciated. Any details about the complexity of this algorithm would also be welcome, because I don't know Big-O notation all too well.

substr = '';
final_substr = '';
for index, char in enumerate(s):
 if index < len(s) - 1:
 #print('Comparing %s and %s'% (char, s[index+1]));
 if ord(char) <= ord(s[index + 1]):
 substr += char;
 print(substr);
 else:
 substr += char;
 if len(final_substr) < len(substr):
 final_substr = substr;
 substr = '';
print('Longest substring in alphabetical order is: '+final_substr);

Question 2

Double bug

One bug is that if the longest non-decreasing substring is at the end, it will be ignored.

A second bug is that the fix more complicated than adding this at the end:

if len(final_substr) < len(substr):
 final_substr = substr

This is not enough, because the last character will not have been appended yet.

The fix is not very pretty:

for index, char in enumerate(s):
 if index < len(s) - 1:
 if ord(char) <= ord(s[index + 1]):
 substr += char
 else:
 substr += char
 if len(final_substr) < len(substr):
 final_substr = substr
 substr = ''
 else:
 if index > 0 and s[index - 1] < char:
 substr += char
 if len(final_substr) < len(substr):
 final_substr = substr

The forced `enumerate`

enumerate is great. In many situations it gives you index and element pairs of iterables, which is truly awesome. But in this case it's just not a good fit:

for index, char in enumerate(s):
 if index < len(s) - 1:

For each character there is a check on length twice: once in the mechanism of enumerate itself, and one more time inside the loop.

I suggest to rewrite with either for pos in range(len(s) - 1) or for pos in range(1, len(s)).

Actually, even better, as @kyrill suggested, you could do for i, c in enumerate(s[:-1]):.

Don't repeat yourself

substr += char appears in both branches of this conditional, it can be easily lifted out:

if ord(char) <= ord(s[index + 1]):
 substr += char
else:
 substr += char
 if len(final_substr) < len(substr):
 final_substr = substr
 substr = ''

Compare characters directly

There's no need for ord. You can compare characters directly, for example:

if char > s[index + 1]:

Question 3

Ad enumerate – you probably meant range(len(s) - 1). A better alternative would be for i, c in enumerate(s[:-1]):.

Question 4

@kyrill thanks, yeah, that's what I meant, and great tip! (added to my answer)

Question 5

The algorithm has linear (\$O(n)\$) time complexity, which is good; you can't do better. It also has a linear space complexity, which is not so good; only a constant space is really required. Notice that you don't need to build substr (substr consumes memory!). It suffices to keep track of the start and end indices.
The test for index < len(s) - 1 is un-pythonic. Don't ask permission, ask forgiveness.
The intention to

minimize the number of loops

is dubious. Does it improve run time? Most likely, no (if you doubt, measure). Does it improve readability? Most likely, no. Consider the pseudocode
```
 while start_index < string_length:
 length = scan_ordered_substring(string, start_index)
 do_business_logic
 start_index += length
```
I don't think that
```
 print(substr);
```
was intended.

Question 6

"has linear time complexity" - why is that? Doesn't string concatenation have O(n+m), making the whole loop O(n^2)

Question 7

You're right, print(substr) was not intended. I left it in there by accident. Thanks for the input!

Question 8

You mentioned the algorithm complexity and the peace of code you wrote shows you love good coding ... I just wish you mentioned that if problem

Question 9

Don't bother with `ord`

Python already provide syntactic sugar for comparing characters in codepoint order, so ord(char) <= ord(s[index + 1]) can be shortened to char <= s[index+1]

Don't bother with the index

You only use index as a mean to look at the next character in s, you can dispense with it if you do

substr = s[0]
for char in s[1:]:
 if substr[-1] <= char:
 substr += char
 else:
 # Do your things

Don't use a string as an accumulator

Making substr a string is (probably) not the most efficient way to do it, as strings are immutable and you are constantly modifying it, better to make it a list of chars and join it only when needed

final_substr = []
substr = [s[0]]
for char in s[1:]:
 if substr[-1] <= char:
 substr.append(char)
 else:
 if len(substr) > len(final_substr):
 final_substr = substr
 substr = [char]
if len(substr) > len(final_substr):
 final_substr = substr
final_substr = ''.join(final_substr)

Extra fanciness

in the code above, the string slicing s[1:] copies s, which might be an issue if you have to apply this procedure to a lot of very long strings. You can avoid that copy by using an iterator over s, changing the lines above to

s_iter = iter(s)
final_substr = []
substr = [next(s_iter)]
for char in s_iter:
 # Nothing changes after this line

Or you could be more pedestrian and iterate over range(len(s)).

In the same vein, if you expect to have to deal with long substrings, you could transform everything to keep track of only the bounds of substr

final_bounds = [0, 1]
substr_bounds = [0, 1]
for i in range(1, len(s)):
 if s[i-1] <= s[i]:
 substr_bounds[1] += 1
 else:
 if final_bounds[1] - final_bounds[0] < substr_bounds[1] - substr_bounds[0]:
 final_bounds = substr
 substr_bounds = (i, i)
if final_bounds[1] - final_bounds[0] < substr_bounds[1] - substr_bounds[0]:
 final_bounds = substr
final_substr = s[final_bounds[0]:final_bounds[1]]

This version should be the most efficient of all memory-wise. I do find it disgraceful, though.

Question 10

Python does not require semicolons as a terminator. You should refrain from using them.

Question 11

The only reason I always make sure to use semicolons is because I've heard its a good habit to maintain across programming languages, and is something I picked up from JavaScript. Is using a semicolon not considered pythonic?

Question 12

@BillalBEGUERADJ this is a perfectly legit answer. Please read codereview.meta.stackexchange.com/questions/1463/…

Question 13

@BillalBEGUERADJ yes it says that. And then? I don't see the connection between that and this answer. Are you sure you didn't quote out of context? I think that post overall means that this answer is fine, do you disagree?

Question 14

@ZaidHumayun unnecessary semicolons are decidedly unpythonic. And the advice to use semicolons across programming languages is a very bad advice that should not be followed.

Question 15

Yes, I disagree. I can not write remove semicolon or remove space as an answer. Such things could be written either as a comment or mentioned in an elaborated answer. I think all SE websites look for good quality posts. @janos

janos janos 113k15 gold badges154 silver badges396 bronze badges · Accepted Answer · 2017-06-02 05:18:32Z

Double bug

One bug is that if the longest non-decreasing substring is at the end, it will be ignored.

A second bug is that the fix more complicated than adding this at the end:

if len(final_substr) < len(substr):
 final_substr = substr

This is not enough, because the last character will not have been appended yet.

The fix is not very pretty:

for index, char in enumerate(s):
 if index < len(s) - 1:
 if ord(char) <= ord(s[index + 1]):
 substr += char
 else:
 substr += char
 if len(final_substr) < len(substr):
 final_substr = substr
 substr = ''
 else:
 if index > 0 and s[index - 1] < char:
 substr += char
 if len(final_substr) < len(substr):
 final_substr = substr

The forced `enumerate`

enumerate is great. In many situations it gives you index and element pairs of iterables, which is truly awesome. But in this case it's just not a good fit:

for index, char in enumerate(s):
 if index < len(s) - 1:

For each character there is a check on length twice: once in the mechanism of enumerate itself, and one more time inside the loop.

I suggest to rewrite with either for pos in range(len(s) - 1) or for pos in range(1, len(s)).

Actually, even better, as @kyrill suggested, you could do for i, c in enumerate(s[:-1]):.

Don't repeat yourself

substr += char appears in both branches of this conditional, it can be easily lifted out:

if ord(char) <= ord(s[index + 1]):
 substr += char
else:
 substr += char
 if len(final_substr) < len(substr):
 final_substr = substr
 substr = ''

Compare characters directly

There's no need for ord. You can compare characters directly, for example:

if char > s[index + 1]:

Ad enumerate – you probably meant range(len(s) - 1). A better alternative would be for i, c in enumerate(s[:-1]):.
@kyrill thanks, yeah, that's what I meant, and great tip! (added to my answer)

Stack Exchange Network

Finding the longest substring in a string

4 Answers 4

Double bug

The forced `enumerate`

Don't repeat yourself

Compare characters directly

Don't bother with `ord`

Don't bother with the index

Don't use a string as an accumulator

Extra fanciness

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Finding the longest substring in a string

4 Answers 4

Double bug

The forced enumerate

Don't repeat yourself

Compare characters directly

Don't bother with ord

Don't bother with the index

Don't use a string as an accumulator

Extra fanciness

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

The forced `enumerate`

Don't bother with `ord`