Algorithm To Compute The Gaps Between A Set of Intervals

Question 1

Problem

Given a set of intervals with possibly non-distinct start and end points, find all maximal gaps. A gap is defined as an interval that does not overlap with any given interval. All endpoints are integers and inclusive.

For example, given the following set of intervals:

$\{[2,6], [1,9], [12,19]\}$

The set of all maximal gaps is:

$\{[10,11]\}$

For the following set of intervals:

$\{[2,6], [1,9], [3,12], [18,20]\}$

The set of all maximal gaps is:

$\{[13,17]\}$

because that produces the maximal gap.

Proposed Algorithm

My proposed algorithm (modified approach taken by John L.) to compute these gaps is:

Order the intervals by ascending start date.
Initialise an empty list gaps that will store gaps
Initialise a variable, last_covered_point, to the end point of the first interval.
Iterate through all intervals in the sorted order. For each interval [start, end], do the following.
1. If start > last_covered_point + 1, add the gap, [last_covered_point + 1, start - 1] to gaps.
2. Assign max(last_covered_point, end) to last_covered_point.
Return gaps

I have tested my algorithm on a few cases and it produces the correct results. But I cannot say with 100% guarantee that it works for every interval permutation and combination. Is there a way to prove this handles every permutation and combination?

Question 2

The key to prove your algorithm is correct is to find enough invariants of the loop, step 4 so that we apply use mathematical induction.

Let $I_1, I_2, \cdots, I_n$ denote the sorted intervals. When the algorithm has just finished processing $I_i$, we record the values of gaps and last_covered_point as $\text{gaps}_i$ and $\text{last_covered_point}_i$ respectively.

Let us prove the following proposition, $P(i)$, for $i=1, 2, \cdots, n$.

$\text{gaps}_i$ is the set of all maximal gaps for $I_1, I_2, \cdots, I_i$ and $\text{last_covered_point}_i$ is the maximum of all right endpoints of $I_1, I_2, \cdots, I_i$.

When $i=1$, $\text{gaps}_1$ is the empty set and $\text{last_covered_point}_1$ is the right endpoint of $I_1$. So $P(1)$ is correct.

For the sake of mathematical induction, assume $P(i)$ is correct, where 1ドル\le i\lt n$. Let $I_{i+1}=[s, e]$. There are two cases.

If $s\gt\text{last_covered_point}_i+1$, then $$\text{gaps}_i\cup[\text{last_covered_point}_i +1, s-1]=\text{gaps}_{i+1}.$$
Let $m$ be any point between the start point of $I_1$ and the maximum of all right endpoints of $I_1, I_2, \cdots, I_{i+1}$. Suppose $m$ not covered by any of $I_1, I_2, \cdots, I_{i+1}$.
- If $m\le\text{last_covered_point}_i$, the induction hypothesis says that $m$ is covered by some interval in $\text{gap}_i$.
- Otherwise, $m\gt\text{last_covered_point}_i$. Since $m$ is not covered by $I_{i+1}$, we know $m<s$. So $m$ is covered by $[\text{last_covered_point}_i +1, s-1]$.
In both cases, $m$ is covered by some interval in $\text{gaps}_{i+1}$. Since $\text{last_covered_point}_i$ is the largest point covered by one of $I_1, I_2, \cdots, I_i$ and $s$ is the smallest point covered by $I_{i+1}$, $[\text{last_covered_point}_i +1, s-1]$ is a maximal gap.
Otherwise, we have $s\le\text{last_covered_point}_i$+1. We can also verify that $\text{gaps}_{i+1}=\text{gaps}_{i}$ is the set of all maximal gaps for $I_1, I_2, \cdots, I_{i+1}$.

Finally, since step 4.2 says $\text{last_covered_point}_{i+1}=\max(\text{last_covered_point}_i, e)$ and $\text{last_covered_point}_i$ is the maximum of all right endpoints of $I_1, I_2, \cdots, I_i$, $\text{last_covered_point}_{i+1}$ is the maximum of all right endpoints of $I_1, I_2, \cdots, I_{i+1}$.

So, $P(i+1)$ is correct. $\quad\checkmark$.

Question 3

I probably didn't make my post precise enough and so I think you may have misunderstood what I meant by a gap. I have cleaned up my original post to define the problem better but I also took your example and customised it to what I think the algorithm should be. The main issue I have is that I am not sure if I have covered all possible interval permutations.

Question 4

@Mojo, thanks for noticing that last_covered_point should be updated in step 4.2 as well.

Question 5

@Mojo, I just wrote a proof. The proof is not completely formal, since "maximal gap" is given by a definition (although it is easy to define) and the case 2 is not proved in detail. However, the idea should be clear enough.

Question 6

In my last comment, 'since "maximal gap" is given by a definition' should have been 'since "maximal gap" is not given by a definition'.

Question 7

I'd like to propose a quite simple algorithm as well. The idea is this: we're going to place open and close parentheses on the number line at the boundaries of each interval. For example, for the intervals $(1,5), (2,7), (9, 10)$, the number line would look like this:

1 2 3 4 5 6 7 8 9 10
( ( ) ) ( )

Then we'll just scan left to right, counting parentheses. When all the parentheses get closed, we start a gap. Note in particular that in the above diagram, we have lost the information about which close parenthesis is paired with which open parenthesis -- because it doesn't actually matter. So:

Convert each interval $(a,b)$ to pairs $(a,\mathtt o),(b,\mathtt c)$. ($\mathtt o$ and $\mathtt c$ are for open and close, respectively.)
Sort all the pairs you get from this process. (When sorting, use lexicographic ordering and $\mathtt o < \mathtt c$.)
Iterate through them, keeping a counter that starts from 0ドル$.
- When you see a pair with $\mathtt o$ in the second part, increment the counter.
- When you see a pair with $\mathtt c$ in the second part, decrement the counter.
- When you decrement the counter, if that causes it to drop to 0ドル$, then look at the next element of the list to decide what to do; empty list means you're done, otherwise if the next pair's first part is just one bigger than the current one's you do nothing, and in the last case you record a maximal gap between the current end point and the next open point.

score 3 · Accepted Answer · 2020-12-15 03:45:48Z

The key to prove your algorithm is correct is to find enough invariants of the loop, step 4 so that we apply use mathematical induction.

Let $I_1, I_2, \cdots, I_n$ denote the sorted intervals. When the algorithm has just finished processing $I_i$, we record the values of gaps and last_covered_point as $\text{gaps}_i$ and $\text{last_covered_point}_i$ respectively.

Let us prove the following proposition, $P(i)$, for $i=1, 2, \cdots, n$.

$\text{gaps}_i$ is the set of all maximal gaps for $I_1, I_2, \cdots, I_i$ and $\text{last_covered_point}_i$ is the maximum of all right endpoints of $I_1, I_2, \cdots, I_i$.

When $i=1$, $\text{gaps}_1$ is the empty set and $\text{last_covered_point}_1$ is the right endpoint of $I_1$. So $P(1)$ is correct.

For the sake of mathematical induction, assume $P(i)$ is correct, where 1ドル\le i\lt n$. Let $I_{i+1}=[s, e]$. There are two cases.

If $s\gt\text{last_covered_point}_i+1$, then $$\text{gaps}_i\cup[\text{last_covered_point}_i +1, s-1]=\text{gaps}_{i+1}.$$
Let $m$ be any point between the start point of $I_1$ and the maximum of all right endpoints of $I_1, I_2, \cdots, I_{i+1}$. Suppose $m$ not covered by any of $I_1, I_2, \cdots, I_{i+1}$.
- If $m\le\text{last_covered_point}_i$, the induction hypothesis says that $m$ is covered by some interval in $\text{gap}_i$.
- Otherwise, $m\gt\text{last_covered_point}_i$. Since $m$ is not covered by $I_{i+1}$, we know $m<s$. So $m$ is covered by $[\text{last_covered_point}_i +1, s-1]$.
In both cases, $m$ is covered by some interval in $\text{gaps}_{i+1}$. Since $\text{last_covered_point}_i$ is the largest point covered by one of $I_1, I_2, \cdots, I_i$ and $s$ is the smallest point covered by $I_{i+1}$, $[\text{last_covered_point}_i +1, s-1]$ is a maximal gap.
Otherwise, we have $s\le\text{last_covered_point}_i$+1. We can also verify that $\text{gaps}_{i+1}=\text{gaps}_{i}$ is the set of all maximal gaps for $I_1, I_2, \cdots, I_{i+1}$.

Finally, since step 4.2 says $\text{last_covered_point}_{i+1}=\max(\text{last_covered_point}_i, e)$ and $\text{last_covered_point}_i$ is the maximum of all right endpoints of $I_1, I_2, \cdots, I_i$, $\text{last_covered_point}_{i+1}$ is the maximum of all right endpoints of $I_1, I_2, \cdots, I_{i+1}$.

So, $P(i+1)$ is correct. $\quad\checkmark$.

I probably didn't make my post precise enough and so I think you may have misunderstood what I meant by a gap. I have cleaned up my original post to define the problem better but I also took your example and customised it to what I think the algorithm should be. The main issue I have is that I am not sure if I have covered all possible interval permutations.
@Mojo, thanks for noticing that last_covered_point should be updated in step 4.2 as well.
@Mojo, I just wrote a proof. The proof is not completely formal, since "maximal gap" is given by a definition (although it is easy to define) and the case 2 is not proved in detail. However, the idea should be clear enough.
In my last comment, 'since "maximal gap" is given by a definition' should have been 'since "maximal gap" is not given by a definition'.

Stack Exchange Network

Algorithm To Compute The Gaps Between A Set of Intervals

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Algorithm To Compute The Gaps Between A Set of Intervals

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions