I am writing Python code to find the longest substring in a string which is an alphabetical order. I tried to minimize the number of loops to a single for loop.
Any improvements would be appreciated. Any details about the complexity of this algorithm would also be welcome, because I don't know Big-O notation all too well.
substr = '';
final_substr = '';
for index, char in enumerate(s):
if index < len(s) - 1:
#print('Comparing %s and %s'% (char, s[index+1]));
if ord(char) <= ord(s[index + 1]):
substr += char;
print(substr);
else:
substr += char;
if len(final_substr) < len(substr):
final_substr = substr;
substr = '';
print('Longest substring in alphabetical order is: '+final_substr);
4 Answers 4
Double bug
One bug is that if the longest non-decreasing substring is at the end, it will be ignored.
A second bug is that the fix more complicated than adding this at the end:
if len(final_substr) < len(substr):
final_substr = substr
This is not enough, because the last character will not have been appended yet.
The fix is not very pretty:
for index, char in enumerate(s):
if index < len(s) - 1:
if ord(char) <= ord(s[index + 1]):
substr += char
else:
substr += char
if len(final_substr) < len(substr):
final_substr = substr
substr = ''
else:
if index > 0 and s[index - 1] < char:
substr += char
if len(final_substr) < len(substr):
final_substr = substr
The forced enumerate
enumerate
is great.
In many situations it gives you index and element pairs of iterables,
which is truly awesome.
But in this case it's just not a good fit:
for index, char in enumerate(s): if index < len(s) - 1:
For each character there is a check on length twice:
once in the mechanism of enumerate
itself,
and one more time inside the loop.
I suggest to rewrite with either for pos in range(len(s) - 1)
or for pos in range(1, len(s))
.
Actually, even better, as @kyrill suggested,
you could do for i, c in enumerate(s[:-1]):
.
Don't repeat yourself
substr += char
appears in both branches of this conditional,
it can be easily lifted out:
if ord(char) <= ord(s[index + 1]): substr += char else: substr += char if len(final_substr) < len(substr): final_substr = substr substr = ''
Compare characters directly
There's no need for ord
. You can compare characters directly, for example:
if char > s[index + 1]:
-
\$\begingroup\$ Ad
enumerate
– you probably meantrange(len(s) - 1)
. A better alternative would befor i, c in enumerate(s[:-1]):
. \$\endgroup\$kyrill– kyrill2017年06月02日 11:03:16 +00:00Commented Jun 2, 2017 at 11:03 -
\$\begingroup\$ @kyrill thanks, yeah, that's what I meant, and great tip! (added to my answer) \$\endgroup\$janos– janos2017年06月02日 11:37:38 +00:00Commented Jun 2, 2017 at 11:37
The algorithm has linear (\$O(n)\$) time complexity, which is good; you can't do better. It also has a linear space complexity, which is not so good; only a constant space is really required. Notice that you don't need to build
substr
(substr
consumes memory!). It suffices to keep track of the start and end indices.The test for
index < len(s) - 1
is un-pythonic. Don't ask permission, ask forgiveness.The intention to
minimize the number of loops
is dubious. Does it improve run time? Most likely, no (if you doubt, measure). Does it improve readability? Most likely, no. Consider the pseudocode
while start_index < string_length: length = scan_ordered_substring(string, start_index) do_business_logic start_index += length
I don't think that
print(substr);
was intended.
-
1\$\begingroup\$ "has linear time complexity" - why is that? Doesn't string concatenation have O(n+m), making the whole loop O(n^2) \$\endgroup\$en_Knight– en_Knight2017年06月02日 15:26:10 +00:00Commented Jun 2, 2017 at 15:26
-
\$\begingroup\$ You're right, print(substr) was not intended. I left it in there by accident. Thanks for the input! \$\endgroup\$redixhumayun– redixhumayun2017年06月02日 16:43:51 +00:00Commented Jun 2, 2017 at 16:43
-
\$\begingroup\$ You mentioned the algorithm complexity and the peace of code you wrote shows you love good coding ... I just wish you mentioned that
if
problem \$\endgroup\$Billal BEGUERADJ– Billal BEGUERADJ2017年06月02日 20:10:33 +00:00Commented Jun 2, 2017 at 20:10
Don't bother with ord
Python already provide syntactic sugar for comparing characters in codepoint order, so ord(char) <= ord(s[index + 1])
can be shortened to char <= s[index+1]
Don't bother with the index
You only use index
as a mean to look at the next character in s
, you can dispense with it if you do
substr = s[0]
for char in s[1:]:
if substr[-1] <= char:
substr += char
else:
# Do your things
Don't use a string as an accumulator
Making substr
a string is (probably) not the most efficient way to do it, as strings are immutable and you are constantly modifying it, better to make it a list of chars and join it only when needed
final_substr = []
substr = [s[0]]
for char in s[1:]:
if substr[-1] <= char:
substr.append(char)
else:
if len(substr) > len(final_substr):
final_substr = substr
substr = [char]
if len(substr) > len(final_substr):
final_substr = substr
final_substr = ''.join(final_substr)
Extra fanciness
in the code above, the string slicing s[1:]
copies s
, which might be an issue if you have to apply this procedure to a lot of very long strings. You can avoid that copy by using an iterator over s
, changing the lines above to
s_iter = iter(s)
final_substr = []
substr = [next(s_iter)]
for char in s_iter:
# Nothing changes after this line
Or you could be more pedestrian and iterate over range(len(s))
.
In the same vein, if you expect to have to deal with long substrings, you could transform everything to keep track of only the bounds of substr
final_bounds = [0, 1]
substr_bounds = [0, 1]
for i in range(1, len(s)):
if s[i-1] <= s[i]:
substr_bounds[1] += 1
else:
if final_bounds[1] - final_bounds[0] < substr_bounds[1] - substr_bounds[0]:
final_bounds = substr
substr_bounds = (i, i)
if final_bounds[1] - final_bounds[0] < substr_bounds[1] - substr_bounds[0]:
final_bounds = substr
final_substr = s[final_bounds[0]:final_bounds[1]]
This version should be the most efficient of all memory-wise. I do find it disgraceful, though.
Python does not require semicolons as a terminator. You should refrain from using them.
-
\$\begingroup\$ The only reason I always make sure to use semicolons is because I've heard its a good habit to maintain across programming languages, and is something I picked up from JavaScript. Is using a semicolon not considered pythonic? \$\endgroup\$redixhumayun– redixhumayun2017年06月02日 16:35:10 +00:00Commented Jun 2, 2017 at 16:35
-
2\$\begingroup\$ @BillalBEGUERADJ this is a perfectly legit answer. Please read codereview.meta.stackexchange.com/questions/1463/… \$\endgroup\$janos– janos2017年06月02日 18:01:23 +00:00Commented Jun 2, 2017 at 18:01
-
\$\begingroup\$ @BillalBEGUERADJ yes it says that. And then? I don't see the connection between that and this answer. Are you sure you didn't quote out of context? I think that post overall means that this answer is fine, do you disagree? \$\endgroup\$janos– janos2017年06月02日 18:32:49 +00:00Commented Jun 2, 2017 at 18:32
-
6\$\begingroup\$ @ZaidHumayun unnecessary semicolons are decidedly unpythonic. And the advice to use semicolons across programming languages is a very bad advice that should not be followed. \$\endgroup\$janos– janos2017年06月02日 18:35:49 +00:00Commented Jun 2, 2017 at 18:35
-
1\$\begingroup\$ Yes, I disagree. I can not write remove semicolon or remove space as an answer. Such things could be written either as a comment or mentioned in an elaborated answer. I think all SE websites look for good quality posts. @janos \$\endgroup\$Billal BEGUERADJ– Billal BEGUERADJ2017年06月02日 20:03:24 +00:00Commented Jun 2, 2017 at 20:03