Averaging a list of intersecting rectangles

Question 1

I have a list of rectangles as tuples (x, y, w, h). I need to find which ones intersect and if they do, average all the intersecting rectangles into a new rectangle. I also track the resultant averaged rectangle "weight" (the number of intersecting rectangles) as an addition member in the new tuple. (x, y, w, h, weight)

The final return is resultant list of averaged sorted by weight.

I am looking for feedback on how to improve performance and make it even more pythonic. Or, as I have discovered many times in python, there may a completely different and better way to approach the problem (such as with NumPy).

I pass a list of rect tuples to ordered_average_intersecting_rects:

import numpy as np
def rects_intersect(r1, r2):
 hoverlaps = (r1[0] <= r2[0] + r2[2]) and (r1[0] + r1[2] >= r2[0])
 voverlaps = (r1[1] <= r2[1] + r2[3]) and (r1[1] + r1[3] >= r2[1])
 return hoverlaps and voverlaps
def average_rects(rects):
 x = np.average([coord[0] for coord in rects])
 y = np.average([coord[1] for coord in rects])
 w = np.average([coord[2] for coord in rects])
 h = np.average([coord[3] for coord in rects])
 return x, y, w, h, len(rects)
def ordered_average_intersecting_rects(rects):
 intersecting_rect_clusters = []
 while rects: # modified from `while len(rect) > 0:` (janos feedback)
 r1 = rects[0]
 rect_cluster = [r1]
 for r2 in rects[1:]:
 if rects_intersect(r1, r2):
 rect_cluster.append(r2)
 rects.remove(r2)
 rects.remove(r1)
 intersecting_rect_clusters.append(rect_cluster)
 averaged_rects = [average_rects(cluster) for cluster in intersecting_rect_clusters]
 averaged_rects.sort(key=lambda w: w[4], reverse=True)
 return averaged_rects

Here is my unit test to help understand how the code is being used:

def test_ordered_average_intersecting_rects(self):
 # Arrange
 rects = [(150, 55, 40, 20, 0),
 (160, 40, 20, 20, 0),
 (100, 120, 20, 20, 0),
 (45, 10, 15, 30, 0),
 (50, 25, 25, 20, 0),
 (35, 35, 20, 30, 0)]
 # Act
 with test_helpers.Timer('average_intersecting_rects') as t:
 result = ordered_average_intersecting_rects(rects)
 # Assert
 assert len(result) == 3
 cluster1 = result[0]
 assert cluster1[4] == 3
 assert int(cluster1[0]) == 43
 assert int(cluster1[1]) == 23
 assert int(cluster1[2]) == 20
 assert int(cluster1[3]) == 26
 cluster2 = result[1]
 assert cluster2[4] == 2
 assert cluster2[0] == 155
 assert cluster2[1] == 47.5
 assert cluster2[2] == 30
 assert cluster2[3] == 20
 cluster3 = result[2]
 assert cluster3[4] == 1
 assert cluster3[0] == 100
 assert cluster3[1] == 120
 assert cluster3[2] == 20
 assert cluster3[3] == 20

Question 2

What if the first rect intersects the second and the third, but the second and third rect do not intersect? Then all three rects are put into one "cluster", but the intersection of all three is empty.

Question 3

Or 1 and 3 intersect, and 2 and 4 intersect, but 1 doesn't intersect 2 (nor does 3 intersect 4)? So, what do you want to do in the face of multiple sets (or resulting empty intersections)?

Question 4

Good questions, I thought of that but in the case of my dataset the clusters are usually pretty tight and the size of the rects are almost always large enough to ensure overlap. So you are right, there might be a very few mis-formed clusters but for my application the accuracy is good enough so I decided to not add the incremental complexity to the algorithm to handle the few times I would run into the cases you mention. Unless you have an idea on how to add those cases more easily then my original attempts.

Question 5

To improve performance, wait for another review with some clever numpy trick.

To make it more Pythonic, it would be better to represent a rectangle with a Rectangle class instead of a simple tuple, for example:

class Rectangle(object):
 def __init__(self, x, y, width, height):
 self.x = x
 self.y = y
 self.width = width
 self.height = height
 def intersect(self, other):
 hoverlaps = (self.x <= other.x + other.width) and (self.x + self.width >= other.x)
 voverlaps = (self.y <= other.y + other.height) and (self.y + self.height >= other.y)
 return hoverlaps and voverlaps

The intersect method fits naturally into it, and now the code is easier to read.

The averaging method becomes:

def average_rects(rects):
 x = np.average([rect.x for rect in rects])
 y = np.average([rect.y for rect in rects])
 w = np.average([rect.width for rect in rects])
 h = np.average([rect.height for rect in rects])
 return x, y, w, h, len(rects)

The main part of ordered_average_intersecting_rects becomes:

while rects:
 r1 = rects[0]
 rect_cluster = [r1]
 for r2 in rects[1:]:
 if r1.intersect(r2):
 rect_cluster.append(r2)
 rects.remove(r2)

And in your testing method you can create rects with:

rects = [Rectangle(150, 55, 40, 20),
 Rectangle(160, 40, 20, 20),
 Rectangle(100, 120, 20, 20),
 Rectangle(45, 10, 15, 30),
 Rectangle(50, 25, 25, 20),
 Rectangle(35, 35, 20, 30)]

And of course you can (and should) go further:

average_rects could return a tuple: Rectangle(x, y, w, h), len(rects)
The assertions could use cluster1.x, .y, .width, instead of cryptic [0], [1], ...

You can replace while len(rects) > 0: with simply while rects:, because in Python empty sequences are false. This is recommended on PEP8 too.

Question 6

I stated with a Rect class but moved to tuples because performance is critical - the data sets can get large. It was my understanding that an object creation and object member lookup was slower than tuple creation and accessing indexed tuple elements. If this is not the case (or is not significant) then I would prefer to go back to the Class for readability.

Question 7

If performance is critical, tuples should be faster. My suggestions concern mostly readability. I updated now to make this point clear.

Question 8

Thanks, and you are right, "To improve performance,"... I am wait for another review with some clever numpy trick :) added numpy tag

Question 9

Also update the while len(rects) > 0: every bit of code simplification counts.

Question 10

You're not supposed to edit your original code, as that invalidates reviews

janos janos 113k15 gold badges154 silver badges396 bronze badges · Answer 1 · 2014-09-07 09:29:54Z

To improve performance, wait for another review with some clever numpy trick.

To make it more Pythonic, it would be better to represent a rectangle with a Rectangle class instead of a simple tuple, for example:

class Rectangle(object):
 def __init__(self, x, y, width, height):
 self.x = x
 self.y = y
 self.width = width
 self.height = height
 def intersect(self, other):
 hoverlaps = (self.x <= other.x + other.width) and (self.x + self.width >= other.x)
 voverlaps = (self.y <= other.y + other.height) and (self.y + self.height >= other.y)
 return hoverlaps and voverlaps

The intersect method fits naturally into it, and now the code is easier to read.

The averaging method becomes:

def average_rects(rects):
 x = np.average([rect.x for rect in rects])
 y = np.average([rect.y for rect in rects])
 w = np.average([rect.width for rect in rects])
 h = np.average([rect.height for rect in rects])
 return x, y, w, h, len(rects)

The main part of ordered_average_intersecting_rects becomes:

while rects:
 r1 = rects[0]
 rect_cluster = [r1]
 for r2 in rects[1:]:
 if r1.intersect(r2):
 rect_cluster.append(r2)
 rects.remove(r2)

And in your testing method you can create rects with:

rects = [Rectangle(150, 55, 40, 20),
 Rectangle(160, 40, 20, 20),
 Rectangle(100, 120, 20, 20),
 Rectangle(45, 10, 15, 30),
 Rectangle(50, 25, 25, 20),
 Rectangle(35, 35, 20, 30)]

And of course you can (and should) go further:

average_rects could return a tuple: Rectangle(x, y, w, h), len(rects)
The assertions could use cluster1.x, .y, .width, instead of cryptic [0], [1], ...

You can replace while len(rects) > 0: with simply while rects:, because in Python empty sequences are false. This is recommended on PEP8 too.

I stated with a Rect class but moved to tuples because performance is critical - the data sets can get large. It was my understanding that an object creation and object member lookup was slower than tuple creation and accessing indexed tuple elements. If this is not the case (or is not significant) then I would prefer to go back to the Class for readability.
If performance is critical, tuples should be faster. My suggestions concern mostly readability. I updated now to make this point clear.
Thanks, and you are right, "To improve performance,"... I am wait for another review with some clever numpy trick :) added numpy tag
Also update the while len(rects) > 0: every bit of code simplification counts.
You're not supposed to edit your original code, as that invalidates reviews

Stack Exchange Network

Averaging a list of intersecting rectangles

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Averaging a list of intersecting rectangles

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions