Algorithm for flattening overlapping ranges

Question 1

I am looking for a nice way of flattening (splitting) a list of potentially-overlapping numeric ranges. The problem is very similar to that of this question: Fastest way to split overlapping date ranges, and many others.

However, the ranges are not only integers, and I am looking for a decent algorithm that can be easily implemented in Javascript or Python, etc.

Example Data: Example data

Example Solution: enter image description here

Apologies if this is a duplicate, but I am yet to find a solution.

Question 2

How do you determine that green is on top of blue, but under yellow and orange? Are the color ranges applied in order? If that's the case, the algorithm seems obvious; just ...erm, apply the color ranges in order.

Question 3

Yes, they are applied in order. But that's the problem—how would you 'apply' the ranges?

Question 4

Do you often add/remove colors, or do you need to optimize for query speed? How many "ranges" will you usually have? 3? 3000?

Question 5

Won't be adding/removing colours very frequently, and there'll be anywhere between 10-20 ranges, with 4+ digit precision. That's why the set method isn't quite suitable, because the sets will have to be 1000+ items long. The method I've gone with is the one I posted in Python.

Question 6

Walk from left to right, using a stack to keep track of what color you're on. Instead of a discrete map, use the 10 numbers in your dataset as the break-points.

Starting with an empty stack, and setting start to 0, loop until we reach the end:

If stack is empty:
- Look for the first color starting at or after start, and push it and all lower-ranked colors onto the stack. In your flattened list, mark the beginning of that color.
else (If not empty):
- Find next beginning point for any higher-ranked color at or after start, and find the end of the current color
  - If the next color starts first, push it and anything else on the way to it onto the stack. Update the end of the current color as the beginning of this one, and add the start of this color to the flattened list.
  - If there are none and the current color ends first, set start to the end of this color, pop it off of the stack, and check the next-highest ranked color
    - If start is within the next color's range, add this color to the flattened list, starting at start.
    - If the stack empties, just continue the loop (go back to the first bullet point).

This is a mental run-through given your example data:

# Initial data.
flattened = []
stack = []
start = 0
# Stack is empty. Look for the next starting point at 0 or later: "b", 0 - Push it and all lower levels onto stack
flattened = [ (b, 0, ?) ]
stack = [ r, b ]
start = 0
# End of "b" is 5.4, next higher-colored start is "g" at 2 - Delimit and continue
flattened = [ (b, 0, 2), (g, 2, ?) ]
stack = [ r, b, g ]
start = 2
# End of "g" is 12, next higher-colored start is "y" at 3.5 - Delimit and continue
flattened = [ (b, 0, 2), (g, 2, 3.5), (y, 3.5, ?) ]
stack = [ r, b, g, y ]
start = 3.5
# End of "y" is 6.7, next higher-colored start is "o" at 6.7 - Delimit and continue
flattened = [ (b, 0, 2), (g, 2, 3.5), (y, 3.5, 6.7), (o, 6.7, ?) ]
stack = [ r, b, g, y, o ]
start = 6.7
# End of "o" is 10, and there is nothing starting at 12 or later in a higher color. Next off stack, "y", has already ended. Next off stack, "g", has not ended. Delimit and continue.
flattened = [ (b, 0, 2), (g, 2, 3.5), (y, 3.5, 6.7), (o, 6.7, 10), (g, 10, ?) ]
stack = [ r, b, g ]
start = 10
# End of "g" is 12, there is nothing starting at 12 or later in a higher color. Next off stack, "b", is out of range (already ended). Next off stack, "r", is out of range (not started). Mark end of current color:
flattened = [ (b, 0, 2), (g, 2, 3.5), (y, 3.5, 6.7), (o, 6.7, 10), (g, 10, 12) ]
stack = []
start = 12
# Stack is empty. Look for the next starting point at 12 or later: "r", 12.5 - Push onto stack
flattened = [ (b, 0, 2), (g, 2, 3.5), (y, 3.5, 6.7), (o, 6.7, 10), (g, 10, 12), (r, 12.5, ?) ]
stack = [ r ]
start = 12
# End of "r" is 13.8, and there is nothing starting at 12 or higher in a higher color. Mark end and pop off stack.
flattened = [ (b, 0, 2), (g, 2, 3.5), (y, 3.5, 6.7), (o, 6.7, 10), (g, 10, 12), (r, 12.5, 13.8) ]
stack = []
start = 13.8
# Stack is empty and nothing is past 13.8 - We're done.

Question 7

what do you mean by "anything else on the way to it onto the stack" ?

Question 8

@Guillaume07 Anything of ranks between the current and the chosen next start. The sample data doesn't show it, but imagine the yellow was shifted to start before green - you have to push both green and yellow onto the stack so that when yellow ends, the end of green is still at the right place in the stack so it still shows up in the final result

Question 9

Another think I don't understand, please, is why you tell firstly "If stack is empty: Look for the first color starting at or before start," then in the code sample you comment "# Stack is empty. Look for the next starting point at 0 or later". So once it is before and once it is later

Question 10

@Guillaume07 Yep, a typo, the correct version is in the code block twice (the second being the comment near the bottom that starts "Stack is empty."). I've edited that bullet point.

Question 11

This solution seems the simplest. (Or at least, the easiest to grasp)

All that is needed is a function to subtract two ranges. In other words, something that will give this:

A ------ A ------ A ----
B ------- and B ------ and B ---------
= ---- = ---- = --- --

Which is simple enough. Then you can simply iterate through each of the ranges, starting from the lowest, and for each one, subtract from it all the ranges above it, in turn. And there you have it.

Here is an implementation of the range subtractor in Python:

def subtractRanges((As, Ae), (Bs, Be)):
 '''SUBTRACTS A FROM B'''
 # e.g, A = ------
 # B = -----------
 # result = -- ---
 # Returns list of new range(s)
 if As > Be or Bs > Ae: # All of B visible
 return [[Bs, Be]]
 result = []
 if As > Bs: # Beginning of B visible
 result.append([Bs, As])
 if Ae < Be: # End of B visible
 result.append([Ae, Be])
 return result

Using this function, the rest can be done like this: (A 'span' means a range, as 'range' is a Python keyword)

spans = [["red", [12.5, 13.8]],
["blue", [0.0, 5.4]],
["green", [2.0, 12.0]],
["yellow", [3.5, 6.7]],
["orange", [6.7, 10.0]]]
i = 0 # Start at lowest span
while i < len(spans):
 for superior in spans[i+1:]: # Iterate through all spans above
 result = subtractRanges(superior[1], spans[i][1])
 if not result: # If span is completely covered
 del spans[i] # Remove it from list
 i -= 1 # Compensate for list shifting
 break # Skip to next span
 else: # If there is at least one resulting span
 spans[i][1] = result[0]
 if len(result) > 1: # If there are two resulting spans
 # Insert another span with the same name
 spans.insert(i+1, [spans[i][0], result[1]])
 i += 1
print spans

This gives [['red', [12.5, 13.8]], ['blue', [0.0, 2.0]], ['green', [2.0, 3.5]], ['green', [10.0, 12.0]], ['yellow', [3.5, 6.7]], ['orange', [6.7, 10.0]]], which is correct.

Question 12

Your output at the end doesn't match expected output in the question...

Question 13

@Izkata Gosh, I was careless. That must have been the output from another test. Fixed now, thanks

Question 14

If the data really is similar in scope to your sample data, you could create a map like this:

map = [0 .. 150]
for each color:
 for loc range start * 10 to range finish * 10:
 map[loc] = color

Then just walk through this map to generate the ranges

curcolor = none
for loc in map:
 if map[loc] != curcolor:
 if curcolor:
 rangeend = loc / 10
 make new range
 rangecolor = map[loc]
 rangestart = loc / 10

To work, the values have to be in a relatively small range as in your sample data.

Edit: to work with true floats, use the map to generate a high level mapping and then refer to the original data to create the boundaries.

map = [0 .. 15]
for each color:
 for loc round(range start) to round(range finish):
 map[loc] = color
curcolor = none
for loc in map
 if map[loc] != curcolor:
 make new range
 if loc = round(range[map[loc]].start) 
 rangestart = range[map[loc]].start
 else
 rangestart = previous rangeend
 rangecolor = map[loc]
 if curcolor:
 if map[loc] == none:
 last rangeend = range[map[loc]].end
 else
 last rangeend = rangestart
 curcolor = rangecolor

Question 15

This is a very nice solution, I have come across it before. However, I am looking for an more generic solution that can manage any arbitrary float ranges... (this wouldn't be the best for something like 563.807 - 770.100)

Question 16

I think you could generalize it by rounding the values, and generating the map, but marking a location on the edges as having two colors. Then when you see a location with two colors, go back to the original data to determine the boundary.

Question 17

Here's a relatively straightforward solution in Scala. It shouldn't be too difficult to port to another language.

case class Range(name: String, left: Double, right: Double) {
 def overlapsLeft(other: Range) =
 other.left < left && left < other.right
 def overlapsRight(other: Range) =
 other.left < right && right < other.right
 def overlapsCompletely(other: Range) =
 left <= other.left && right >= other.right
 def splitLeft(other: Range) = 
 Range(other.name, other.left, left)
 def splitRight(other: Range) = 
 Range(other.name, right, other.right)
}
def apply(ranges: Set[Range], newRange: Range) = {
 val left = ranges.filter(newRange.overlapsLeft)
 val right = ranges.filter(newRange.overlapsRight)
 val overlaps = ranges.filter(newRange.overlapsCompletely)
 val leftSplit = left.map(newRange.splitLeft)
 val rightSplit = right.map(newRange.splitRight)
 ranges -- left -- right -- overlaps ++ leftSplit ++ rightSplit + newRange
}
val ranges = Vector(
 Range("red", 12.5, 13.8),
 Range("blue", 0.0, 5.4),
 Range("green", 2.0, 12.0),
 Range("yellow", 3.5, 6.7),
 Range("orange", 6.7, 10.0))
val flattened = ranges.foldLeft(Set.empty[Range])(apply)
val sorted = flattened.toSeq.sortBy(_.left)
sorted foreach println

apply takes in a Set of all the ranges that have already been applied, finds the overlaps, then returns a new set minus the overlaps and plus the new range and the newly split ranges. foldLeft repeatedly calls apply with each input range.

Question 18

Just keep a set of ranges sorted by start. Add range that covers everything (-oo..+oo). To add a range r:

let pre = last range that starts before r starts
let post = earliest range that starts before r ends
now iterate from pre to post: split ranges that overlap, remove ranges that are covered, then add r

Izkata Izkata 6,1187 gold badges30 silver badges44 bronze badges · Accepted Answer · 2014-05-29 04:34:19Z

Walk from left to right, using a stack to keep track of what color you're on. Instead of a discrete map, use the 10 numbers in your dataset as the break-points.

Starting with an empty stack, and setting start to 0, loop until we reach the end:

If stack is empty:
- Look for the first color starting at or after start, and push it and all lower-ranked colors onto the stack. In your flattened list, mark the beginning of that color.
else (If not empty):
- Find next beginning point for any higher-ranked color at or after start, and find the end of the current color
  - If the next color starts first, push it and anything else on the way to it onto the stack. Update the end of the current color as the beginning of this one, and add the start of this color to the flattened list.
  - If there are none and the current color ends first, set start to the end of this color, pop it off of the stack, and check the next-highest ranked color
    - If start is within the next color's range, add this color to the flattened list, starting at start.
    - If the stack empties, just continue the loop (go back to the first bullet point).

This is a mental run-through given your example data:

# Initial data.
flattened = []
stack = []
start = 0
# Stack is empty. Look for the next starting point at 0 or later: "b", 0 - Push it and all lower levels onto stack
flattened = [ (b, 0, ?) ]
stack = [ r, b ]
start = 0
# End of "b" is 5.4, next higher-colored start is "g" at 2 - Delimit and continue
flattened = [ (b, 0, 2), (g, 2, ?) ]
stack = [ r, b, g ]
start = 2
# End of "g" is 12, next higher-colored start is "y" at 3.5 - Delimit and continue
flattened = [ (b, 0, 2), (g, 2, 3.5), (y, 3.5, ?) ]
stack = [ r, b, g, y ]
start = 3.5
# End of "y" is 6.7, next higher-colored start is "o" at 6.7 - Delimit and continue
flattened = [ (b, 0, 2), (g, 2, 3.5), (y, 3.5, 6.7), (o, 6.7, ?) ]
stack = [ r, b, g, y, o ]
start = 6.7
# End of "o" is 10, and there is nothing starting at 12 or later in a higher color. Next off stack, "y", has already ended. Next off stack, "g", has not ended. Delimit and continue.
flattened = [ (b, 0, 2), (g, 2, 3.5), (y, 3.5, 6.7), (o, 6.7, 10), (g, 10, ?) ]
stack = [ r, b, g ]
start = 10
# End of "g" is 12, there is nothing starting at 12 or later in a higher color. Next off stack, "b", is out of range (already ended). Next off stack, "r", is out of range (not started). Mark end of current color:
flattened = [ (b, 0, 2), (g, 2, 3.5), (y, 3.5, 6.7), (o, 6.7, 10), (g, 10, 12) ]
stack = []
start = 12
# Stack is empty. Look for the next starting point at 12 or later: "r", 12.5 - Push onto stack
flattened = [ (b, 0, 2), (g, 2, 3.5), (y, 3.5, 6.7), (o, 6.7, 10), (g, 10, 12), (r, 12.5, ?) ]
stack = [ r ]
start = 12
# End of "r" is 13.8, and there is nothing starting at 12 or higher in a higher color. Mark end and pop off stack.
flattened = [ (b, 0, 2), (g, 2, 3.5), (y, 3.5, 6.7), (o, 6.7, 10), (g, 10, 12), (r, 12.5, 13.8) ]
stack = []
start = 13.8
# Stack is empty and nothing is past 13.8 - We're done.

what do you mean by "anything else on the way to it onto the stack" ?
@Guillaume07 Anything of ranks between the current and the chosen next start. The sample data doesn't show it, but imagine the yellow was shifted to start before green - you have to push both green and yellow onto the stack so that when yellow ends, the end of green is still at the right place in the stack so it still shows up in the final result
Another think I don't understand, please, is why you tell firstly "If stack is empty: Look for the first color starting at or before start," then in the code sample you comment "# Stack is empty. Look for the next starting point at 0 or later". So once it is before and once it is later
@Guillaume07 Yep, a typo, the correct version is in the code block twice (the second being the comment near the bottom that starts "Stack is empty."). I've edited that bullet point.

Stack Exchange Network

Algorithm for flattening overlapping ranges

5 Answers 5

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Algorithm for flattening overlapping ranges

5 Answers 5

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions