Maximum sub-array of non-negative numbers

Question 1

This is the "Maximum sub-array" problem from CodeChef:

Find out the maximum sub-array of non negative numbers from an array.

The sub-array should be continuous. That is, a sub-array created by choosing the second and fourth element and skipping the third element is invalid.

Maximum sub-array is defined in terms of the sum of the elements in the sub-array. Sub-array A is greater than sub-array B if sum(A)> sum(B).

NOTE 1: If there is a tie then compare the segment length's and return segment which has maximum length.

NOTE 2: If there is still a tie, then return the segment with minimum starting index.

This is my solution:

def maxset(array):
 """Returns maximum sub array of non-negative numbers"""
 max_sum = 0
 current_sum = 0
 start = 0
 stop = 0
 max_array = array[0]
 for i in range(len(array)):
 if array[i] >= 0:
 current_sum += array[i]
 stop += 1
 if max_sum < current_sum:
 max_sum = current_sum
 if max_array < array[start:stop]:
 max_array = array[start:stop]
 elif max_sum == current_sum:
 if len(max_array)< len(array[start:stop]):
 max_array = array[start:stop]
 else :
 start = i+1
 stop = start
 current_sum = 0
 return max_array

Test cases :

def main(): 
 print maxset( [ 1, 2, 5, -7, 2, 5 ] )
 print maxset( [ 1,-1, 1, -1, 1, -1])
 print maxset( [ 1, 2, 5, 7, 2, 5 ] )
 print maxset( [ 222])
 print maxset( [ 1, 1])
 print maxset( [ 3, -1, 1, 1, 1, -1, 3])
 print maxset( [ 5, -1, 1, 1, 1, 1, -1, 5 ])
 print maxset([ 3, -1, 1, 1, 1, -1, 2 ])
 print maxset([ 3,-3, -1, 1, 1, 1, -1, 2 ])
if __name__ == '__main__':
 main()

How can I improve this code ?

Question 2

1. Bugs

maxset returns the wrong answer for some inputs. For example:
```
>>> maxset([2, 1, -1, 1, 3])
[2, 1]
```
The correct answer is [1, 3]. The cause of this bug is the line:
```
if max_array < array[start:stop]:
```
which causes the sub-array with the higher sum (here, [1, 3]) to be ignored because it lexicographically precedes max_array (here, [2, 1]).
maxset goes wrong when all the elements of the input are negative:
```
>>> maxset([-1, -2])
-1
```
The number -1 is not a valid result: it's not the maximum sub-array, it's just a number.

It also goes wrong when the input is empty:
```
>>> maxset([])
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "cr193902.py", line 7, in maxset
 max_array = array[0]
IndexError: list index out of range
```
To fix these two problems, instead of this initialization:
```
max_array = array[0]
```
start with an empty list:
```
max_array = []
```

2. Review

The code in the post uses the print statement and so only runs on Python 2.7. But support for Python 2.7 is scheduled to end in 2020. It is long past the time when it was a good idea to start writing new projects in Python 2.7. If you really can't use Python 3 then use:
```
from __future__ import print_function
```
so that your code is portable to Python 3.
The name maxset is misleading since it does not return a set.
When you take a slice of a list, for example array[start:stop], Python has to copy out all the elements of the slice in order to construct a new list. This is a waste unless you are actually going to need the new list. So instead of:
```
len(array[start:stop])
```
(which creates a list and then throws it away again) you could write:
```
stop - start
```
This line copies out a slice of the list each time you find a new maximum sub-array:
```
max_array = array[start:stop]
```
but in many cases you are never going to need this list, because later on you will find an even bigger sub-array. So instead of copying out a slice, you could remember the location of the maximum sub-array found so far:
```
max_start = start
max_stop = stop
```
and at the end of the function
```
return array[max_start:max_stop]
```
The code has a loop over the indexes into array:
```
for i in range(len(array)):
```
and then within the loop it looks up the element at that index using array[i]. Whenever you see this pattern, you can simplify it using the built-in function enumerate. In this case the loop becomes:
```
for i, element in enumerate(array):
```
and wherever you had array[i] you now have element.

The two branches of the if: ... elif: ... both have the line:

max_array = array[start:stop]

This duplication could be avoided by combining the two conditions into one:

if (max_sum < current_sum
 or (max_sum == current_sum
 and max_stop - max_start < stop - start)):

and this can be further simplified by comparing tuples:

if (max_sum, max_stop - max_start) < (current_sum, stop - start):

There's no need to maintain a running variable stop since it always has the value i + 1.
The test cases don't check the results. When I run it, I get the output:
```
>>> main()
[1, 2, 5]
[1]
[1, 2, 5, 7, 2, 5]
[222]
[1, 1]
[1, 1, 1]
[5]
[1, 1, 1]
[1, 1, 1]
```
Is this right? The usual way to write a test case is to raise an exception if the result is incorrect. There are various ways to do this. You could use the assert statement and write:
```
assert maxset([ 1, 2, 5, -7, 2, 5 ]) == [1, 2, 5]
```
or you could use the unittest module from the Python standard library.

3. Revised code

def max_sub_array(array):
 """Return maximum sub-array of non-negative numbers."""
 start = 0 # Start of current sub-array
 current_sum = 0 # Sum of current sub-array
 max_start = 0 # Start of maximum sub-array
 max_stop = 0 # Stop of maximum sub-array
 max_sum = 0 # Sum of maxium sub-array
 for stop, element in enumerate(array, start=1):
 if element >= 0:
 current_sum += element
 if (max_sum, max_stop - max_start) < (current_sum, stop - start):
 max_sum = current_sum
 max_start = start
 max_stop = stop
 else:
 start = stop
 current_sum = 0
 return array[max_start:max_stop]
import unittest
class TestMaxSubArray(unittest.TestCase):
 CASES = [
 # array, expected result
 ([], []),
 ([-1, -2], []),
 ([2, 1, -1, 1, 3], [1, 3]),
 ([1, 2, 5, -7, 2, 5], [1, 2, 5]),
 ([1,-1, 1, -1, 1, -1], [1]),
 ([1, 2, 5, 7, 2, 5] , [1, 2, 5, 7, 2, 5]),
 ([222], [222]),
 ([1, 1], [1, 1]),
 ([3, -1, 1, 1, 1, -1, 3], [1, 1, 1]),
 ([5, -1, 1, 1, 1, 1, -1, 5], [5]),
 ([3, -1, 1, 1, 1, -1, 2], [1, 1, 1]),
 ([3, -3, -1, 1, 1, 1, -1, 2], [1, 1, 1]),
 ]
 def test_max_sub_array(self):
 for array, expected in self.CASES:
 self.assertEqual(expected, max_sub_array(array))

4. A different approach

The single responsibility principle say that each piece of code should do just one thing. By splitting code into pieces with single responsibilities, the individual pieces become easier to understand, check and test.

So for this problem we could split the code into two parts:

find all the sub-arrays of non-negative numbers in array;
find the maximum sub-array in the results of step 1.

For part 1 the Python standard library has a useful function itertools.groupby for splitting an iterable into groups based on a key function. In this case the key function needs to be "is the element non-negative". Then for part 2 we can use the built-in max function, again using a key function.

from itertools import groupby
def max_sub_array(array):
 """Return maximum sub-array of non-negative numbers."""
 sub_arrays = [list(group)
 for non_negative, group in groupby(array, lambda e: e >= 0)
 if non_negative]
 if sub_arrays:
 return max(sub_arrays, key=lambda a: (sum(a), len(a)))
 else:
 return []

In Python 3.4 or later, max has a default keyword argument to handle the case where there are no items, and so we can combine this into a single expression:

from itertools import groupby
def max_sub_array(array):
 """Return maximum sub-array of non-negative numbers."""
 return max((list(group)
 for non_negative, group in groupby(array, lambda e: e >= 0)
 if non_negative),
 key=lambda a: (sum(a), len(a)),
 default=[])

Question 3

You say that the left operator in if max_array < array[start:stop]: is a number, but unless I am mistaken, it is a list (the best subarray found so far). The (lexicographical?) comparison still makes no sense there.

Question 4

Solid answer. Every time I read one of your answers I find something I myself miss. One day I'll remember groupby...

Question 5

@MartinR It starts off as a number, max_array = array[0]. But is changed to a list just after the check, max_array = array[start:stop].

Question 6

@MartinR: The lexicographic comparison is in fact a bug — see §1.1 in my revised answer. Thanks for the correction.

Question 7

Your code is splitting numbers by negatives whilst performing the max sum code. You can simplify this, if you move the splitting out of the function.

def split_negatives(array):
 ret = tuple()
 for item in array:
 if item >= 0:
 ret += (item,)
 elif ret:
 yield ret
 ret = tuple()
 if ret:
 yield ret

After this you can just use max. Since you're using Python 2, there's no default keyword if sections is an empty generator, and so we would need to use a try except to deal with this.

def maxset(array):
 sections = split_negatives(array)
 try:
 return max(sections, key=lambda i: (sum(i), len(i)))
 except ValueError:
 return 0

Gareth Rees Gareth Rees 50.1k3 gold badges130 silver badges210 bronze badges · Accepted Answer · 2018-05-08 10:48:43Z

1. Bugs

maxset returns the wrong answer for some inputs. For example:
```
>>> maxset([2, 1, -1, 1, 3])
[2, 1]
```
The correct answer is [1, 3]. The cause of this bug is the line:
```
if max_array < array[start:stop]:
```
which causes the sub-array with the higher sum (here, [1, 3]) to be ignored because it lexicographically precedes max_array (here, [2, 1]).
maxset goes wrong when all the elements of the input are negative:
```
>>> maxset([-1, -2])
-1
```
The number -1 is not a valid result: it's not the maximum sub-array, it's just a number.

It also goes wrong when the input is empty:
```
>>> maxset([])
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "cr193902.py", line 7, in maxset
 max_array = array[0]
IndexError: list index out of range
```
To fix these two problems, instead of this initialization:
```
max_array = array[0]
```
start with an empty list:
```
max_array = []
```

2. Review

The code in the post uses the print statement and so only runs on Python 2.7. But support for Python 2.7 is scheduled to end in 2020. It is long past the time when it was a good idea to start writing new projects in Python 2.7. If you really can't use Python 3 then use:
```
from __future__ import print_function
```
so that your code is portable to Python 3.
The name maxset is misleading since it does not return a set.
When you take a slice of a list, for example array[start:stop], Python has to copy out all the elements of the slice in order to construct a new list. This is a waste unless you are actually going to need the new list. So instead of:
```
len(array[start:stop])
```
(which creates a list and then throws it away again) you could write:
```
stop - start
```
This line copies out a slice of the list each time you find a new maximum sub-array:
```
max_array = array[start:stop]
```
but in many cases you are never going to need this list, because later on you will find an even bigger sub-array. So instead of copying out a slice, you could remember the location of the maximum sub-array found so far:
```
max_start = start
max_stop = stop
```
and at the end of the function
```
return array[max_start:max_stop]
```
The code has a loop over the indexes into array:
```
for i in range(len(array)):
```
and then within the loop it looks up the element at that index using array[i]. Whenever you see this pattern, you can simplify it using the built-in function enumerate. In this case the loop becomes:
```
for i, element in enumerate(array):
```
and wherever you had array[i] you now have element.

The two branches of the if: ... elif: ... both have the line:

max_array = array[start:stop]

This duplication could be avoided by combining the two conditions into one:

if (max_sum < current_sum
 or (max_sum == current_sum
 and max_stop - max_start < stop - start)):

and this can be further simplified by comparing tuples:

if (max_sum, max_stop - max_start) < (current_sum, stop - start):

There's no need to maintain a running variable stop since it always has the value i + 1.
The test cases don't check the results. When I run it, I get the output:
```
>>> main()
[1, 2, 5]
[1]
[1, 2, 5, 7, 2, 5]
[222]
[1, 1]
[1, 1, 1]
[5]
[1, 1, 1]
[1, 1, 1]
```
Is this right? The usual way to write a test case is to raise an exception if the result is incorrect. There are various ways to do this. You could use the assert statement and write:
```
assert maxset([ 1, 2, 5, -7, 2, 5 ]) == [1, 2, 5]
```
or you could use the unittest module from the Python standard library.

3. Revised code

def max_sub_array(array):
 """Return maximum sub-array of non-negative numbers."""
 start = 0 # Start of current sub-array
 current_sum = 0 # Sum of current sub-array
 max_start = 0 # Start of maximum sub-array
 max_stop = 0 # Stop of maximum sub-array
 max_sum = 0 # Sum of maxium sub-array
 for stop, element in enumerate(array, start=1):
 if element >= 0:
 current_sum += element
 if (max_sum, max_stop - max_start) < (current_sum, stop - start):
 max_sum = current_sum
 max_start = start
 max_stop = stop
 else:
 start = stop
 current_sum = 0
 return array[max_start:max_stop]
import unittest
class TestMaxSubArray(unittest.TestCase):
 CASES = [
 # array, expected result
 ([], []),
 ([-1, -2], []),
 ([2, 1, -1, 1, 3], [1, 3]),
 ([1, 2, 5, -7, 2, 5], [1, 2, 5]),
 ([1,-1, 1, -1, 1, -1], [1]),
 ([1, 2, 5, 7, 2, 5] , [1, 2, 5, 7, 2, 5]),
 ([222], [222]),
 ([1, 1], [1, 1]),
 ([3, -1, 1, 1, 1, -1, 3], [1, 1, 1]),
 ([5, -1, 1, 1, 1, 1, -1, 5], [5]),
 ([3, -1, 1, 1, 1, -1, 2], [1, 1, 1]),
 ([3, -3, -1, 1, 1, 1, -1, 2], [1, 1, 1]),
 ]
 def test_max_sub_array(self):
 for array, expected in self.CASES:
 self.assertEqual(expected, max_sub_array(array))

4. A different approach

The single responsibility principle say that each piece of code should do just one thing. By splitting code into pieces with single responsibilities, the individual pieces become easier to understand, check and test.

So for this problem we could split the code into two parts:

find all the sub-arrays of non-negative numbers in array;
find the maximum sub-array in the results of step 1.

For part 1 the Python standard library has a useful function itertools.groupby for splitting an iterable into groups based on a key function. In this case the key function needs to be "is the element non-negative". Then for part 2 we can use the built-in max function, again using a key function.

from itertools import groupby
def max_sub_array(array):
 """Return maximum sub-array of non-negative numbers."""
 sub_arrays = [list(group)
 for non_negative, group in groupby(array, lambda e: e >= 0)
 if non_negative]
 if sub_arrays:
 return max(sub_arrays, key=lambda a: (sum(a), len(a)))
 else:
 return []

In Python 3.4 or later, max has a default keyword argument to handle the case where there are no items, and so we can combine this into a single expression:

from itertools import groupby
def max_sub_array(array):
 """Return maximum sub-array of non-negative numbers."""
 return max((list(group)
 for non_negative, group in groupby(array, lambda e: e >= 0)
 if non_negative),
 key=lambda a: (sum(a), len(a)),
 default=[])

You say that the left operator in if max_array < array[start:stop]: is a number, but unless I am mistaken, it is a list (the best subarray found so far). The (lexicographical?) comparison still makes no sense there.
Solid answer. Every time I read one of your answers I find something I myself miss. One day I'll remember groupby...
@MartinR It starts off as a number, max_array = array[0]. But is changed to a list just after the check, max_array = array[start:stop].
@MartinR: The lexicographic comparison is in fact a bug — see §1.1 in my revised answer. Thanks for the correction.

Stack Exchange Network

Maximum sub-array of non-negative numbers

2 Answers 2

1. Bugs

2. Review

3. Revised code

4. A different approach

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Maximum sub-array of non-negative numbers

2 Answers 2

1. Bugs

2. Review

3. Revised code

4. A different approach

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions