This is the "Maximum sub-array" problem from CodeChef:
Find out the maximum sub-array of non negative numbers from an array.
The sub-array should be continuous. That is, a sub-array created by choosing the second and fourth element and skipping the third element is invalid.
Maximum sub-array is defined in terms of the sum of the elements in the sub-array. Sub-array A is greater than sub-array B if sum(A)> sum(B).
NOTE 1: If there is a tie then compare the segment length's and return segment which has maximum length.
NOTE 2: If there is still a tie, then return the segment with minimum starting index.
This is my solution:
def maxset(array):
"""Returns maximum sub array of non-negative numbers"""
max_sum = 0
current_sum = 0
start = 0
stop = 0
max_array = array[0]
for i in range(len(array)):
if array[i] >= 0:
current_sum += array[i]
stop += 1
if max_sum < current_sum:
max_sum = current_sum
if max_array < array[start:stop]:
max_array = array[start:stop]
elif max_sum == current_sum:
if len(max_array)< len(array[start:stop]):
max_array = array[start:stop]
else :
start = i+1
stop = start
current_sum = 0
return max_array
Test cases :
def main():
print maxset( [ 1, 2, 5, -7, 2, 5 ] )
print maxset( [ 1,-1, 1, -1, 1, -1])
print maxset( [ 1, 2, 5, 7, 2, 5 ] )
print maxset( [ 222])
print maxset( [ 1, 1])
print maxset( [ 3, -1, 1, 1, 1, -1, 3])
print maxset( [ 5, -1, 1, 1, 1, 1, -1, 5 ])
print maxset([ 3, -1, 1, 1, 1, -1, 2 ])
print maxset([ 3,-3, -1, 1, 1, 1, -1, 2 ])
if __name__ == '__main__':
main()
How can I improve this code ?
2 Answers 2
1. Bugs
maxset
returns the wrong answer for some inputs. For example:>>> maxset([2, 1, -1, 1, 3]) [2, 1]
The correct answer is
[1, 3]
. The cause of this bug is the line:if max_array < array[start:stop]:
which causes the sub-array with the higher sum (here,
[1, 3]
) to be ignored because it lexicographically precedesmax_array
(here,[2, 1]
).maxset
goes wrong when all the elements of the input are negative:>>> maxset([-1, -2]) -1
The number
-1
is not a valid result: it's not the maximum sub-array, it's just a number.It also goes wrong when the input is empty:
>>> maxset([]) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "cr193902.py", line 7, in maxset max_array = array[0] IndexError: list index out of range
To fix these two problems, instead of this initialization:
max_array = array[0]
start with an empty list:
max_array = []
2. Review
The code in the post uses the
print
statement and so only runs on Python 2.7. But support for Python 2.7 is scheduled to end in 2020. It is long past the time when it was a good idea to start writing new projects in Python 2.7. If you really can't use Python 3 then use:from __future__ import print_function
so that your code is portable to Python 3.
The name
maxset
is misleading since it does not return a set.When you take a slice of a list, for example
array[start:stop]
, Python has to copy out all the elements of the slice in order to construct a new list. This is a waste unless you are actually going to need the new list. So instead of:len(array[start:stop])
(which creates a list and then throws it away again) you could write:
stop - start
This line copies out a slice of the list each time you find a new maximum sub-array:
max_array = array[start:stop]
but in many cases you are never going to need this list, because later on you will find an even bigger sub-array. So instead of copying out a slice, you could remember the location of the maximum sub-array found so far:
max_start = start max_stop = stop
and at the end of the function
return array[max_start:max_stop]
The code has a loop over the indexes into
array
:for i in range(len(array)):
and then within the loop it looks up the element at that index using
array[i]
. Whenever you see this pattern, you can simplify it using the built-in functionenumerate
. In this case the loop becomes:for i, element in enumerate(array):
and wherever you had
array[i]
you now haveelement
.The two branches of the
if: ... elif: ...
both have the line:max_array = array[start:stop]
This duplication could be avoided by combining the two conditions into one:
if (max_sum < current_sum or (max_sum == current_sum and max_stop - max_start < stop - start)):
and this can be further simplified by comparing tuples:
if (max_sum, max_stop - max_start) < (current_sum, stop - start):
There's no need to maintain a running variable
stop
since it always has the valuei + 1
.The test cases don't check the results. When I run it, I get the output:
>>> main() [1, 2, 5] [1] [1, 2, 5, 7, 2, 5] [222] [1, 1] [1, 1, 1] [5] [1, 1, 1] [1, 1, 1]
Is this right? The usual way to write a test case is to raise an exception if the result is incorrect. There are various ways to do this. You could use the
assert
statement and write:assert maxset([ 1, 2, 5, -7, 2, 5 ]) == [1, 2, 5]
or you could use the
unittest
module from the Python standard library.
3. Revised code
def max_sub_array(array):
"""Return maximum sub-array of non-negative numbers."""
start = 0 # Start of current sub-array
current_sum = 0 # Sum of current sub-array
max_start = 0 # Start of maximum sub-array
max_stop = 0 # Stop of maximum sub-array
max_sum = 0 # Sum of maxium sub-array
for stop, element in enumerate(array, start=1):
if element >= 0:
current_sum += element
if (max_sum, max_stop - max_start) < (current_sum, stop - start):
max_sum = current_sum
max_start = start
max_stop = stop
else:
start = stop
current_sum = 0
return array[max_start:max_stop]
import unittest
class TestMaxSubArray(unittest.TestCase):
CASES = [
# array, expected result
([], []),
([-1, -2], []),
([2, 1, -1, 1, 3], [1, 3]),
([1, 2, 5, -7, 2, 5], [1, 2, 5]),
([1,-1, 1, -1, 1, -1], [1]),
([1, 2, 5, 7, 2, 5] , [1, 2, 5, 7, 2, 5]),
([222], [222]),
([1, 1], [1, 1]),
([3, -1, 1, 1, 1, -1, 3], [1, 1, 1]),
([5, -1, 1, 1, 1, 1, -1, 5], [5]),
([3, -1, 1, 1, 1, -1, 2], [1, 1, 1]),
([3, -3, -1, 1, 1, 1, -1, 2], [1, 1, 1]),
]
def test_max_sub_array(self):
for array, expected in self.CASES:
self.assertEqual(expected, max_sub_array(array))
4. A different approach
The single responsibility principle say that each piece of code should do just one thing. By splitting code into pieces with single responsibilities, the individual pieces become easier to understand, check and test.
So for this problem we could split the code into two parts:
- find all the sub-arrays of non-negative numbers in
array
; - find the maximum sub-array in the results of step 1.
For part 1 the Python standard library has a useful function itertools.groupby
for splitting an iterable into groups based on a key function. In this case the key function needs to be "is the element non-negative". Then for part 2 we can use the built-in max
function, again using a key function.
from itertools import groupby
def max_sub_array(array):
"""Return maximum sub-array of non-negative numbers."""
sub_arrays = [list(group)
for non_negative, group in groupby(array, lambda e: e >= 0)
if non_negative]
if sub_arrays:
return max(sub_arrays, key=lambda a: (sum(a), len(a)))
else:
return []
In Python 3.4 or later, max
has a default
keyword argument to handle the case where there are no items, and so we can combine this into a single expression:
from itertools import groupby
def max_sub_array(array):
"""Return maximum sub-array of non-negative numbers."""
return max((list(group)
for non_negative, group in groupby(array, lambda e: e >= 0)
if non_negative),
key=lambda a: (sum(a), len(a)),
default=[])
-
\$\begingroup\$ You say that the left operator in
if max_array < array[start:stop]:
is a number, but unless I am mistaken, it is a list (the best subarray found so far). The (lexicographical?) comparison still makes no sense there. \$\endgroup\$Martin R– Martin R2018年05月08日 11:04:54 +00:00Commented May 8, 2018 at 11:04 -
\$\begingroup\$ Solid answer. Every time I read one of your answers I find something I myself miss. One day I'll remember
groupby
... \$\endgroup\$2018年05月08日 11:06:04 +00:00Commented May 8, 2018 at 11:06 -
\$\begingroup\$ @MartinR It starts off as a number,
max_array = array[0]
. But is changed to a list just after the check,max_array = array[start:stop]
. \$\endgroup\$2018年05月08日 11:08:23 +00:00Commented May 8, 2018 at 11:08 -
\$\begingroup\$ @MartinR: The lexicographic comparison is in fact a bug — see §1.1 in my revised answer. Thanks for the correction. \$\endgroup\$Gareth Rees– Gareth Rees2018年05月08日 11:11:48 +00:00Commented May 8, 2018 at 11:11
Your code is splitting numbers by negatives whilst performing the max sum code. You can simplify this, if you move the splitting out of the function.
def split_negatives(array):
ret = tuple()
for item in array:
if item >= 0:
ret += (item,)
elif ret:
yield ret
ret = tuple()
if ret:
yield ret
After this you can just use max
. Since you're using Python 2, there's no default keyword if sections
is an empty generator, and so we would need to use a try
except
to deal with this.
def maxset(array):
sections = split_negatives(array)
try:
return max(sections, key=lambda i: (sum(i), len(i)))
except ValueError:
return 0
Explore related questions
See similar questions with these tags.