Greedy algorithm for postive interval covering

Question 1

Consider the following problem from Jeff Erickson: Algorithms that also appears in this post, which wants us to prove a lower bound for the problem.

Suppose you are given an array $A[1 .. n]$ of integers, each of which may be positive, negative, or zero. A contiguous subarray $A[i .. j]$ is called a positive interval if the sum of its entries is greater than zero. Describe and analyze an algorithm to compute the minimum number of positive intervals that cover every positive entry in $A$. For example, given the following array as input, your algorithm should output 3ドル$. If every entry in the input array is negative, your algorithm should output 0ドル$.
$${+}3\ {-}5\ {+}7\ {-}4\ {+}1\ {-}8\ {+}3\ {-}7\ {+}5\ {-}9\ {+}5\ {-}2\ {+}4$$

My question is, is there a greedy approach for this problem?

I have an idea, find the largest positive number, then according to whether there is some positive numbers in each side, go left or right to cover more positive number. But I think this idea doesn't work.

Question 2

It looks like your idea is a bit too complicated and unlikely to work.

Question 3

We ask that references satisfy scholarly requirements: please include at least the title of the book and the full name of the author, and the chapter and exercise number, so that others with the same question are more likely to be able to find this page by search. Thank you!

Question 4

Please clarify in the question: are intervals allowed to overlap?

Question 5

There is a simple greedy algorithm.

A greedy approach

The greedy idea is obvious: starting from the very left, extend the coverage to the right as far as possible each time.

However, it may not be obvious to get it right. We should search each next positive interval by the following two criteria:

It should cover the next uncovered positive entry.
It should cover as many entries as possible after that entry among all positive intervals that covers that entry.

To boost performance, after the next $u$ncovered positive entry $A[u]$ is found, we will find the optimal start of the positive interval to cover $A[u]$ and the most entries after $A[u]$. That optimal start is an entry such that the sum of entries between it and $A[u]$ is the largest. In fact, that optimal start can be ignored; knowing that largest sum is enough.

Let $\text{sum}(A[i..j])=A[i]+A[i+1]+\cdots+A[j]$, the sum of entries in $A[i..j]$.

Here is the greedy algorithm.

$u\leftarrow0.$ $\ rightmost\leftarrow0.$ $\ answer\leftarrow 0.$ Repeat the following 3ドル$ steps.

$u\leftarrow$ the least index greater than $rightmost$ at which the entry of $A$ is positive.
If there is no such index, return $answer$.
Otherwise, $answer\leftarrow answer+1$. We will cover $A[u]>0$ and as many entries as possible after it with one positive interval.
$surplus\leftarrow$ the largest sum among $\text{sum}(A[u..u])$, $\text{sum}(A[u-1..u])$, $\cdots$, $\text{sum}(A[1..u])$.
For the sake of correctness proof, let $start$ be the corresponding starting index of $surplus$, which is the optimal start of the desired positive interval.
$rightmost\leftarrow$ the largest index $k$ such that $ surplus+\text{sum}(A[u+1..k])>0$, where $u+1\le k\le n$. If there is no such index, $rightmost\leftarrow u$.
Conceptually, we select the positive interval $A[start, rightmost]$.

Complexity Analysis

Each one of step 1, 2 and 3 takes $O(n)$ time.

$u$ starts at 0ドル$ and ends at most $n$. Since $u\le n$ always increases at step 1 unless the algorithm returns, step 1 and hence the whole loop body of step 1, 2, 3 can repeat at most $n$ times. Hence the whole algorithm takes $O(n^2)$ time.

The post here optimizes the approach above to run in $O(n)$ time.

Correctness proof

Let $A[s_1..r_1], A[s_2..r_2], \cdots$ be the positive intervals selected successively by the greedy algorithm. So $r_1<r_2<\cdots$.
Let $A[s_1'..r_1'], A[s_2'..r_2'], \cdots$ be some positive intervals that covers all positive entries in $A$. WLOG, assume $r_1'<r_2'<\cdots$.

Let $s_0=r_0=s_0'=r_0'=0$.

It is enough to prove the greedy algorithm "stays ahead", i.e., $r_i\ge r_i'$ for all $i$.

The case $i=0$ is trivial. Suppose $r_i\ge r_i'$ for some $i$.

Let $u$ and $start=s_{i+1}$ be the values found in the step 1 and 2 of the algorithm that leads to the selected interval $A[s_{i+1}..r_{i+1}]$, i.e., $A[u]$ is the positive entry after $A[r_i]$ and $\text{sum}(A[start..u])$ is the largest among $\text{sum}(A[*..u])$.

Since $r_i'\le r_i<u$, $A[u]$ is not covered by any interval $A[s_j'.. r_j']$, 1ドル\le j\le i$. Hence $A[u]$ is in some $A[s_k'..r_k']$, where $k\ge i+1$. $$ \text{sum}(A[u+1..r_k'])=\text{sum}(A[s_k'..r_k'])-\text{sum}(A[s_k'..u])>0-\text{sum}(A[start..u])$$ By the step 3ドル$ of the algorithm, the inequality above implies $r_{i+1}\ge r_k'$. Hence $r_{i+1}\ge r_k'\ge r_{i+1}'$.

A Python implementation

While the indices above start with 1ドル$, they start with 0ドル$ in the implementation.

from itertools import accumulate
def minimum_cover(a):
 n = len(a)
 answer = 0
 rightmost = -1
 while True:
 uncovered = rightmost + 1
 while uncovered < n and a[uncovered] <= 0:
 uncovered += 1
 if uncovered == n:
 break
 surplus = max(accumulate(a[uncovered::-1]))
 new_rightmost = uncovered + 1 + max(
 (idx for idx, sm in enumerate(accumulate(a[uncovered + 1:n]))
 if surplus + sm > 0), default=-1)
 if new_rightmost > rightmost:
 rightmost = new_rightmost
 answer += 1
 return answer
print(minimum_cover([1, 7, -9, 2, -10, -5, 5, -10, 4, -2, 3, -1, 2])) # 2
print(minimum_cover([1, -3, 3, -2, 1, -2, 1])) # 2

Question 6

Overlapping positive intervals are needed to cover [1, -2, 1, -2, 3, -2, 1, -2, 1] with minimum number of positive intervals. Ditto for [1, -7, 3, -5, 9, -6, 4, -8, 2].

Question 7

The accepted greedy solution doesn't work.

I don't think there is a greedy algorithm. There is a counterexample to the other solution, where they use the heuristic of adding the largest positive interval starting at the current positive integer.

Consider the counterexample:

[1, -10, 10, -8, 3, -6, 4, -10, 8]

The proposed greedy algorithm will cover the array with 3 intervals:

(1, -10, 10) (3, -6, 4) (8)

But the array can be optimally covered with just 2 intervals:

(1) (10, -8, 3, -6, 4, -10, 8)

Question 8

Thanks for catching my error. Please check my updated answer.

Question 9

When we have an array like this: [1, -6, 5, -100, 101, -8, 3, -6, 4, -90, 8],

and apply the above greedy algorithm to it, the outcome is 2

However, I think that the correct answer should be 3.

Question 10

It is not required that the positive intervals are disjoint. sum([1, -6, 5, -100, 101]$=1>0$.$\quad$ sum([101, -8, 3, -6, 4, -90, 8]$=12>0$. The correct outcome should be 2ドル$.

Question 11

Please clarify in your answer: Does above greedy algorithm refer to ErroR's or John L's?

Question 12

It refers to John L's. but I misunderstood the problem. I didn't know overlapping was possible. Thank you for pointing out the part I misunderstood.

喜欢算法和数学 39.2k4 gold badges35 silver badges96 bronze badges · Accepted Answer · 2022-04-30 00:41:44Z

There is a simple greedy algorithm.

A greedy approach

The greedy idea is obvious: starting from the very left, extend the coverage to the right as far as possible each time.

However, it may not be obvious to get it right. We should search each next positive interval by the following two criteria:

It should cover the next uncovered positive entry.
It should cover as many entries as possible after that entry among all positive intervals that covers that entry.

To boost performance, after the next $u$ncovered positive entry $A[u]$ is found, we will find the optimal start of the positive interval to cover $A[u]$ and the most entries after $A[u]$. That optimal start is an entry such that the sum of entries between it and $A[u]$ is the largest. In fact, that optimal start can be ignored; knowing that largest sum is enough.

Let $\text{sum}(A[i..j])=A[i]+A[i+1]+\cdots+A[j]$, the sum of entries in $A[i..j]$.

Here is the greedy algorithm.

$u\leftarrow0.$ $\ rightmost\leftarrow0.$ $\ answer\leftarrow 0.$ Repeat the following 3ドル$ steps.

$u\leftarrow$ the least index greater than $rightmost$ at which the entry of $A$ is positive.
If there is no such index, return $answer$.
Otherwise, $answer\leftarrow answer+1$. We will cover $A[u]>0$ and as many entries as possible after it with one positive interval.
$surplus\leftarrow$ the largest sum among $\text{sum}(A[u..u])$, $\text{sum}(A[u-1..u])$, $\cdots$, $\text{sum}(A[1..u])$.
For the sake of correctness proof, let $start$ be the corresponding starting index of $surplus$, which is the optimal start of the desired positive interval.
$rightmost\leftarrow$ the largest index $k$ such that $ surplus+\text{sum}(A[u+1..k])>0$, where $u+1\le k\le n$. If there is no such index, $rightmost\leftarrow u$.
Conceptually, we select the positive interval $A[start, rightmost]$.

Complexity Analysis

Each one of step 1, 2 and 3 takes $O(n)$ time.

$u$ starts at 0ドル$ and ends at most $n$. Since $u\le n$ always increases at step 1 unless the algorithm returns, step 1 and hence the whole loop body of step 1, 2, 3 can repeat at most $n$ times. Hence the whole algorithm takes $O(n^2)$ time.

The post here optimizes the approach above to run in $O(n)$ time.

Correctness proof

Let $A[s_1..r_1], A[s_2..r_2], \cdots$ be the positive intervals selected successively by the greedy algorithm. So $r_1<r_2<\cdots$.
Let $A[s_1'..r_1'], A[s_2'..r_2'], \cdots$ be some positive intervals that covers all positive entries in $A$. WLOG, assume $r_1'<r_2'<\cdots$.

Let $s_0=r_0=s_0'=r_0'=0$.

It is enough to prove the greedy algorithm "stays ahead", i.e., $r_i\ge r_i'$ for all $i$.

The case $i=0$ is trivial. Suppose $r_i\ge r_i'$ for some $i$.

Let $u$ and $start=s_{i+1}$ be the values found in the step 1 and 2 of the algorithm that leads to the selected interval $A[s_{i+1}..r_{i+1}]$, i.e., $A[u]$ is the positive entry after $A[r_i]$ and $\text{sum}(A[start..u])$ is the largest among $\text{sum}(A[*..u])$.

Since $r_i'\le r_i<u$, $A[u]$ is not covered by any interval $A[s_j'.. r_j']$, 1ドル\le j\le i$. Hence $A[u]$ is in some $A[s_k'..r_k']$, where $k\ge i+1$. $$ \text{sum}(A[u+1..r_k'])=\text{sum}(A[s_k'..r_k'])-\text{sum}(A[s_k'..u])>0-\text{sum}(A[start..u])$$ By the step 3ドル$ of the algorithm, the inequality above implies $r_{i+1}\ge r_k'$. Hence $r_{i+1}\ge r_k'\ge r_{i+1}'$.

A Python implementation

While the indices above start with 1ドル$, they start with 0ドル$ in the implementation.

from itertools import accumulate
def minimum_cover(a):
 n = len(a)
 answer = 0
 rightmost = -1
 while True:
 uncovered = rightmost + 1
 while uncovered < n and a[uncovered] <= 0:
 uncovered += 1
 if uncovered == n:
 break
 surplus = max(accumulate(a[uncovered::-1]))
 new_rightmost = uncovered + 1 + max(
 (idx for idx, sm in enumerate(accumulate(a[uncovered + 1:n]))
 if surplus + sm > 0), default=-1)
 if new_rightmost > rightmost:
 rightmost = new_rightmost
 answer += 1
 return answer
print(minimum_cover([1, 7, -9, 2, -10, -5, 5, -10, 4, -2, 3, -1, 2])) # 2
print(minimum_cover([1, -3, 3, -2, 1, -2, 1])) # 2

Overlapping positive intervals are needed to cover [1, -2, 1, -2, 3, -2, 1, -2, 1] with minimum number of positive intervals. Ditto for [1, -7, 3, -5, 9, -6, 4, -8, 2].

Stack Exchange Network

Greedy algorithm for postive interval covering

3 Answers 3

A greedy approach

Complexity Analysis

Correctness proof

A Python implementation

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Greedy algorithm for postive interval covering

3 Answers 3

A greedy approach

Complexity Analysis

Correctness proof

A Python implementation

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions