algorithmic problem - Combining overlapping ranges

Question 1

I'm looking for an idea to efficiently solve following problem:

I have a set of pairs of ranges (range = a pair of numbers), each range is unique (but has same size) e.g.

[
 [(0,6),(34,40)],
 [(1,7),(35,41)],
 [(3,9),(12,18)],
 [(2,8),(36,42)],
 [(13,19),(22,28)],
 [(23,29),(14,20)] 
]

Now I'd like to combine pairs of ranges, if ranges are overlapping e.g.

[(0,6),(34,40)] overlaps with [(1,7),(35,41)] -result-> [(0,7),(34,41)]

So as a result for above set I'd like to get (now each pair may have ranges of different size)

[
 [(0,8),(34,42)],
 [(3,9),(12,18)],
 [(13,20),(22,29)]
]

The set might be pretty big, I'd like to avoid quadratic complexity if possible.

EDIT: My best idea so far (in Python) is below. I'd like to know if you know a better(faster) way. Also I'm not sure if my idea of removing already combined pairs is valid:

def ranges_overlap(range1, range2):
 return range1[0] < range2[1] and range2[0] < range1[1]
def combine_pairs(pair1, pair2):
 return [(min(r1[0], r2[0]), max(r1[1], r2[1])) for r1, r2 in zip(pair1, pair2)]
def combine_overlapping_pairs(pairs):
 combined = []
 while pairs:
 pair1 = pairs.pop()
 already_combined = []
 for pair2 in pairs:
 for pair2_perm in itertools.permutations(pair2):
 does_overlap = True
 for range1, range2 in zip(pair1, pair2_perm):
 if not ranges_overlap(range1, range2):
 does_overlap = False
 break
 if does_overlap:
 pair1 = combine_pairs(pair1, pair2_perm)
 already_combined.append(pair2)
 break
 combined.append(pair1)
 # Not sure if I can do that
 for pair in already_combined:
 pairs.remove(pair)
 return combined

Question 2

Why didn't (0,8) and (3,9) combine? Do both pairs have to overlap?

Question 3

@KarlBielefeldt yes, both pairs have to overlap

Question 4

Sharing your research helps everyone. Tell us what you've tried and why it didn’t meet your needs. This demonstrates that you’ve taken the time to try to help yourself, it saves us from reiterating obvious answers, and most of all it helps you get a more specific and relevant answer. Also see How to Ask

Question 5

I've appended my best idea to question, is there any faster solution?

Question 6

Another tactic you might try is to think about the two pairs as perpendicular segments, thus defining a rectangle. Then you might be able to leverage "union of rectangles" types of questions (since that is what overlap of both ranges could also mean), of which there are many in computational geometry...

Question 7

Third time's the charm, I hope. I combined my original idea to sort the list first with Snowman's suggestion of using set theory. The basic algorithm is:

Sort the pairs in lexicographic order by the first range.
Group adjacent pairs by if just the first ranges overlap. I prove below that lexicographically sorted ranges will never overlap without being adjacent, which allows this step to be done in O(n). This creates sets where the first ranges overlap.
Within each of those sets, sort again by the second range, and group by if the second ranges overlap. This splits sets where the first ranges overlap into sets where the second ranges also overlap.
Within each set of sets, repeatedly combine pairs until you have one merged pair.

Here's an O(n log n) implementation in Haskell:

import Data.List
import Data.Tuple
type Range = (Int, Int)
type RangePair = (Range, Range)
rangesOverlap :: Range -> Range -> Bool
rangesOverlap (a,b) (c,d) =
 c <= a && a <= d ||
 c <= b && b <= d ||
 a <= c && c <= b ||
 a <= d && d <= b
pairsOverlap :: RangePair -> RangePair -> Bool
pairsOverlap (a, b) (c, d) = rangesOverlap a c
combineRanges :: Range -> Range -> Range
combineRanges (a,b) (c,d) = (min a c, max b d)
combinePairs :: RangePair -> RangePair -> RangePair
combinePairs (a, b) (c, d) = (combineRanges a c, combineRanges b d)
combineOverlapping :: [RangePair] -> [RangePair]
combineOverlapping = map swap . concat . 
 map (combineSets. makeSet . map swap) . makeSet 
 where makeSet = groupBy pairsOverlap . sort
 combineSets = map (foldl1 combinePairs)

We wish to prove that if the ranges are lexicographically sorted, all the overlapping ranges will occur in adjacent rows. Assume a counterexample exists where overlapping ranges are not in adjacent rows. It will take the form of:

(a, b)
(c, d)
(e, f)

where (a,b) and (e,f) overlap with each other, but (c,d) does not overlap with either.

In order for (c,d) not to overlap with (a,b), d must be less than a or b must be less than c. c <= d because it is a range, and a <= c because of lexicographical sorting, therefore d >= a. e <= b because of overlapping, and c <= e because of lexicographical sorting, therefore c <= b. Therefore, if (a,b) and (e,f) overlap, then (a,b) and (c,d) always overlap, and no counterexample exists.

Question 8

Not quite. [(1,3),(8,10)],[(2,4),(6,7)],[(3,4),(9,11)] - the first and third overlap but the first and second doesn't even though they're in sorted order.

Question 9

That's an excellent catch. I think there's still a solution in there somewhere if you run it twice, once sorted on the first tuple and once on the second. The merge is tricky, though. I'll ponder on it.

Question 10

Hrmm... This sorting thing feels like a trap. A hunch tells me that the two pairs may need to be treated as independent dimensions, so a topological sort or something may be needed to fully make it go...

Question 11

See my new answer, I believe looking at this using set theory will produce more accurate results, but the performance might not be optimal.

Question 12

He already knows how to solve it quadratically. You have to sort it somehow to do better.

Question 13

I will try my hand at an answer, and fully expect to hear what's wrong with it soon. :)

So first, I think the problem may be more easily tackled by thinking of the pair of ranges instead as defining a rectangle. Suppose you have [(0,2), (3,4)]. Another way to view this is the rectangle in Cartesian coordinates of (0,3) to (2,4). Overlap of both ranges can then be thought of as rectangle-to-rectangle intersection.

With that in mind, I think a spatial structure is possibly most appropriate. The path my tired brain can conjure up is to use an R-tree or one of its variants. Finding the union of rectangles then becomes similar to building the tree. At each step, use the rectangle being added first as a search. If it finds any rectangles, replace all found rectangles with the "union" rectangle - the min/max of x/y coordinates, else add the rectangle as is. Once the tree is built, simply iterate through the rectangles and voila, convert them back to your ranges.

The unfortunate reality is that the runtime is not going to be easily analyzable. Let's note some of the potential issues:

R-trees themselves do not guarantee good worst-case performance (as noted by Wikipedia)
When adding a new rectangle, the number of overlapping rectangles plays a role in the complexity of the step. This seems to imply density is a sneaky factor.
R-trees are likely complex enough that any constant factor is probably pretty high, which may mean that other methods win out depending on your data set size.

Given a large enough set, I believe this approach would win out. I believe it to be quadratic only in "bad" cases. But R-trees are not simple data structures, so it needs to be thought through carefully. Good luck!

Question 14

With me being not familiar with R-tress and you writing these are not simple data structures, makes me think that this is too complicated for me. Maybe I'll try it when I understand these structures. Thanks for an answer!

Question 15

Ah, well don't be scared by R-trees. I meant more that they're not simple on the inside but not that the API to use them isn't simple. I'm not a Python guy but this didn't look too hard if you're ever inclined to give it a shot.

score 2 · Accepted Answer · 2014-09-08 20:24:27Z

Third time's the charm, I hope. I combined my original idea to sort the list first with Snowman's suggestion of using set theory. The basic algorithm is:

Sort the pairs in lexicographic order by the first range.
Group adjacent pairs by if just the first ranges overlap. I prove below that lexicographically sorted ranges will never overlap without being adjacent, which allows this step to be done in O(n). This creates sets where the first ranges overlap.
Within each of those sets, sort again by the second range, and group by if the second ranges overlap. This splits sets where the first ranges overlap into sets where the second ranges also overlap.
Within each set of sets, repeatedly combine pairs until you have one merged pair.

Here's an O(n log n) implementation in Haskell:

import Data.List
import Data.Tuple
type Range = (Int, Int)
type RangePair = (Range, Range)
rangesOverlap :: Range -> Range -> Bool
rangesOverlap (a,b) (c,d) =
 c <= a && a <= d ||
 c <= b && b <= d ||
 a <= c && c <= b ||
 a <= d && d <= b
pairsOverlap :: RangePair -> RangePair -> Bool
pairsOverlap (a, b) (c, d) = rangesOverlap a c
combineRanges :: Range -> Range -> Range
combineRanges (a,b) (c,d) = (min a c, max b d)
combinePairs :: RangePair -> RangePair -> RangePair
combinePairs (a, b) (c, d) = (combineRanges a c, combineRanges b d)
combineOverlapping :: [RangePair] -> [RangePair]
combineOverlapping = map swap . concat . 
 map (combineSets. makeSet . map swap) . makeSet 
 where makeSet = groupBy pairsOverlap . sort
 combineSets = map (foldl1 combinePairs)

We wish to prove that if the ranges are lexicographically sorted, all the overlapping ranges will occur in adjacent rows. Assume a counterexample exists where overlapping ranges are not in adjacent rows. It will take the form of:

(a, b)
(c, d)
(e, f)

where (a,b) and (e,f) overlap with each other, but (c,d) does not overlap with either.

In order for (c,d) not to overlap with (a,b), d must be less than a or b must be less than c. c <= d because it is a range, and a <= c because of lexicographical sorting, therefore d >= a. e <= b because of overlapping, and c <= e because of lexicographical sorting, therefore c <= b. Therefore, if (a,b) and (e,f) overlap, then (a,b) and (c,d) always overlap, and no counterexample exists.

Not quite. [(1,3),(8,10)],[(2,4),(6,7)],[(3,4),(9,11)] - the first and third overlap but the first and second doesn't even though they're in sorted order.
That's an excellent catch. I think there's still a solution in there somewhere if you run it twice, once sorted on the first tuple and once on the second. The merge is tricky, though. I'll ponder on it.
Hrmm... This sorting thing feels like a trap. A hunch tells me that the two pairs may need to be treated as independent dimensions, so a topological sort or something may be needed to fully make it go...
See my new answer, I believe looking at this using set theory will produce more accurate results, but the performance might not be optimal.
He already knows how to solve it quadratically. You have to sort it somehow to do better.

Stack Exchange Network

algorithmic problem - Combining overlapping ranges

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

algorithmic problem - Combining overlapping ranges

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions