Problem
Given a set of intervals with possibly non-distinct start and end points, find all maximal gaps. A gap is defined as an interval that does not overlap with any given interval. All endpoints are integers and inclusive.
For example, given the following set of intervals:
$\{[2,6], [1,9], [12,19]\}$
The set of all maximal gaps is:
$\{[10,11]\}$
For the following set of intervals:
$\{[2,6], [1,9], [3,12], [18,20]\}$
The set of all maximal gaps is:
$\{[13,17]\}$
because that produces the maximal gap.
Proposed Algorithm
My proposed algorithm (modified approach taken by John L.) to compute these gaps is:
- Order the intervals by ascending start date.
- Initialise an empty list
gaps
that will store gaps - Initialise a variable,
last_covered_point
, to the end point of the first interval. - Iterate through all intervals in the sorted order. For each interval
[start, end]
, do the following.- If
start > last_covered_point + 1
, add the gap,[last_covered_point + 1, start - 1]
togaps
. - Assign
max(last_covered_point, end)
tolast_covered_point
.
- If
- Return
gaps
I have tested my algorithm on a few cases and it produces the correct results. But I cannot say with 100% guarantee that it works for every interval permutation and combination. Is there a way to prove this handles every permutation and combination?
2 Answers 2
The key to prove your algorithm is correct is to find enough invariants of the loop, step 4 so that we apply use mathematical induction.
Let $I_1, I_2, \cdots, I_n$ denote the sorted intervals. When the algorithm has just finished processing $I_i$, we record the values of gaps
and last_covered_point
as $\text{gaps}_i$ and $\text{last_covered_point}_i$ respectively.
Let us prove the following proposition, $P(i)$, for $i=1, 2, \cdots, n$.
$\text{gaps}_i$ is the set of all maximal gaps for $I_1, I_2, \cdots, I_i$ and $\text{last_covered_point}_i$ is the maximum of all right endpoints of $I_1, I_2, \cdots, I_i$.
When $i=1$, $\text{gaps}_1$ is the empty set and $\text{last_covered_point}_1$ is the right endpoint of $I_1$. So $P(1)$ is correct.
For the sake of mathematical induction, assume $P(i)$ is correct, where 1ドル\le i\lt n$. Let $I_{i+1}=[s, e]$. There are two cases.
If $s\gt\text{last_covered_point}_i+1$, then $$\text{gaps}_i\cup[\text{last_covered_point}_i +1, s-1]=\text{gaps}_{i+1}.$$
Let $m$ be any point between the start point of $I_1$ and the maximum of all right endpoints of $I_1, I_2, \cdots, I_{i+1}$. Suppose $m$ not covered by any of $I_1, I_2, \cdots, I_{i+1}$.- If $m\le\text{last_covered_point}_i$, the induction hypothesis says that $m$ is covered by some interval in $\text{gap}_i$.
- Otherwise, $m\gt\text{last_covered_point}_i$. Since $m$ is not covered by $I_{i+1}$, we know $m<s$. So $m$ is covered by $[\text{last_covered_point}_i +1, s-1]$.
In both cases, $m$ is covered by some interval in $\text{gaps}_{i+1}$. Since $\text{last_covered_point}_i$ is the largest point covered by one of $I_1, I_2, \cdots, I_i$ and $s$ is the smallest point covered by $I_{i+1}$, $[\text{last_covered_point}_i +1, s-1]$ is a maximal gap.
Otherwise, we have $s\le\text{last_covered_point}_i$+1. We can also verify that $\text{gaps}_{i+1}=\text{gaps}_{i}$ is the set of all maximal gaps for $I_1, I_2, \cdots, I_{i+1}$.
Finally, since step 4.2 says $\text{last_covered_point}_{i+1}=\max(\text{last_covered_point}_i, e)$ and $\text{last_covered_point}_i$ is the maximum of all right endpoints of $I_1, I_2, \cdots, I_i$, $\text{last_covered_point}_{i+1}$ is the maximum of all right endpoints of $I_1, I_2, \cdots, I_{i+1}$.
So, $P(i+1)$ is correct. $\quad\checkmark$.
-
$\begingroup$ I probably didn't make my post precise enough and so I think you may have misunderstood what I meant by a gap. I have cleaned up my original post to define the problem better but I also took your example and customised it to what I think the algorithm should be. The main issue I have is that I am not sure if I have covered all possible interval permutations. $\endgroup$Mojo– Mojo2020年12月17日 15:19:06 +00:00Commented Dec 17, 2020 at 15:19
-
$\begingroup$ @Mojo, thanks for noticing that last_covered_point should be updated in step 4.2 as well. $\endgroup$喜欢算法和数学– 喜欢算法和数学2020年12月18日 16:30:34 +00:00Commented Dec 18, 2020 at 16:30
-
$\begingroup$ @Mojo, I just wrote a proof. The proof is not completely formal, since "maximal gap" is given by a definition (although it is easy to define) and the case 2 is not proved in detail. However, the idea should be clear enough. $\endgroup$喜欢算法和数学– 喜欢算法和数学2020年12月20日 05:24:51 +00:00Commented Dec 20, 2020 at 5:24
-
$\begingroup$ In my last comment, 'since "maximal gap" is given by a definition' should have been 'since "maximal gap" is not given by a definition'. $\endgroup$喜欢算法和数学– 喜欢算法和数学2021年01月04日 01:35:00 +00:00Commented Jan 4, 2021 at 1:35
I'd like to propose a quite simple algorithm as well. The idea is this: we're going to place open and close parentheses on the number line at the boundaries of each interval. For example, for the intervals $(1,5), (2,7), (9, 10)$, the number line would look like this:
1 2 3 4 5 6 7 8 9 10
( ( ) ) ( )
Then we'll just scan left to right, counting parentheses. When all the parentheses get closed, we start a gap. Note in particular that in the above diagram, we have lost the information about which close parenthesis is paired with which open parenthesis -- because it doesn't actually matter. So:
- Convert each interval $(a,b)$ to pairs $(a,\mathtt o),(b,\mathtt c)$. ($\mathtt o$ and $\mathtt c$ are for open and close, respectively.)
- Sort all the pairs you get from this process. (When sorting, use lexicographic ordering and $\mathtt o < \mathtt c$.)
- Iterate through them, keeping a counter that starts from 0ドル$.
- When you see a pair with $\mathtt o$ in the second part, increment the counter.
- When you see a pair with $\mathtt c$ in the second part, decrement the counter.
- When you decrement the counter, if that causes it to drop to 0ドル$, then look at the next element of the list to decide what to do; empty list means you're done, otherwise if the next pair's first part is just one bigger than the current one's you do nothing, and in the last case you record a maximal gap between the current end point and the next open point.