I have a list of rectangles as tuples (x, y, w, h). I need to find which ones intersect and if they do, average all the intersecting rectangles into a new rectangle. I also track the resultant averaged rectangle "weight" (the number of intersecting rectangles) as an addition member in the new tuple. (x, y, w, h, weight)
The final return is resultant list of averaged sorted by weight.
I am looking for feedback on how to improve performance and make it even more pythonic. Or, as I have discovered many times in python, there may a completely different and better way to approach the problem (such as with NumPy).
I pass a list of rect
tuples to ordered_average_intersecting_rects
:
import numpy as np
def rects_intersect(r1, r2):
hoverlaps = (r1[0] <= r2[0] + r2[2]) and (r1[0] + r1[2] >= r2[0])
voverlaps = (r1[1] <= r2[1] + r2[3]) and (r1[1] + r1[3] >= r2[1])
return hoverlaps and voverlaps
def average_rects(rects):
x = np.average([coord[0] for coord in rects])
y = np.average([coord[1] for coord in rects])
w = np.average([coord[2] for coord in rects])
h = np.average([coord[3] for coord in rects])
return x, y, w, h, len(rects)
def ordered_average_intersecting_rects(rects):
intersecting_rect_clusters = []
while rects: # modified from `while len(rect) > 0:` (janos feedback)
r1 = rects[0]
rect_cluster = [r1]
for r2 in rects[1:]:
if rects_intersect(r1, r2):
rect_cluster.append(r2)
rects.remove(r2)
rects.remove(r1)
intersecting_rect_clusters.append(rect_cluster)
averaged_rects = [average_rects(cluster) for cluster in intersecting_rect_clusters]
averaged_rects.sort(key=lambda w: w[4], reverse=True)
return averaged_rects
Here is my unit test to help understand how the code is being used:
def test_ordered_average_intersecting_rects(self):
# Arrange
rects = [(150, 55, 40, 20, 0),
(160, 40, 20, 20, 0),
(100, 120, 20, 20, 0),
(45, 10, 15, 30, 0),
(50, 25, 25, 20, 0),
(35, 35, 20, 30, 0)]
# Act
with test_helpers.Timer('average_intersecting_rects') as t:
result = ordered_average_intersecting_rects(rects)
# Assert
assert len(result) == 3
cluster1 = result[0]
assert cluster1[4] == 3
assert int(cluster1[0]) == 43
assert int(cluster1[1]) == 23
assert int(cluster1[2]) == 20
assert int(cluster1[3]) == 26
cluster2 = result[1]
assert cluster2[4] == 2
assert cluster2[0] == 155
assert cluster2[1] == 47.5
assert cluster2[2] == 30
assert cluster2[3] == 20
cluster3 = result[2]
assert cluster3[4] == 1
assert cluster3[0] == 100
assert cluster3[1] == 120
assert cluster3[2] == 20
assert cluster3[3] == 20
-
\$\begingroup\$ What if the first rect intersects the second and the third, but the second and third rect do not intersect? Then all three rects are put into one "cluster", but the intersection of all three is empty. \$\endgroup\$Martin R– Martin R2014年09月07日 08:20:57 +00:00Commented Sep 7, 2014 at 8:20
-
\$\begingroup\$ Or 1 and 3 intersect, and 2 and 4 intersect, but 1 doesn't intersect 2 (nor does 3 intersect 4)? So, what do you want to do in the face of multiple sets (or resulting empty intersections)? \$\endgroup\$Clockwork-Muse– Clockwork-Muse2014年09月07日 10:38:46 +00:00Commented Sep 7, 2014 at 10:38
-
\$\begingroup\$ Good questions, I thought of that but in the case of my dataset the clusters are usually pretty tight and the size of the rects are almost always large enough to ensure overlap. So you are right, there might be a very few mis-formed clusters but for my application the accuracy is good enough so I decided to not add the incremental complexity to the algorithm to handle the few times I would run into the cases you mention. Unless you have an idea on how to add those cases more easily then my original attempts. \$\endgroup\$Lars Laakes– Lars Laakes2014年09月07日 13:36:52 +00:00Commented Sep 7, 2014 at 13:36
1 Answer 1
To improve performance, wait for another review with some clever numpy
trick.
To make it more Pythonic, it would be better to represent a rectangle with a Rectangle
class instead of a simple tuple, for example:
class Rectangle(object):
def __init__(self, x, y, width, height):
self.x = x
self.y = y
self.width = width
self.height = height
def intersect(self, other):
hoverlaps = (self.x <= other.x + other.width) and (self.x + self.width >= other.x)
voverlaps = (self.y <= other.y + other.height) and (self.y + self.height >= other.y)
return hoverlaps and voverlaps
The intersect
method fits naturally into it, and now the code is easier to read.
The averaging method becomes:
def average_rects(rects):
x = np.average([rect.x for rect in rects])
y = np.average([rect.y for rect in rects])
w = np.average([rect.width for rect in rects])
h = np.average([rect.height for rect in rects])
return x, y, w, h, len(rects)
The main part of ordered_average_intersecting_rects
becomes:
while rects:
r1 = rects[0]
rect_cluster = [r1]
for r2 in rects[1:]:
if r1.intersect(r2):
rect_cluster.append(r2)
rects.remove(r2)
And in your testing method you can create rects
with:
rects = [Rectangle(150, 55, 40, 20),
Rectangle(160, 40, 20, 20),
Rectangle(100, 120, 20, 20),
Rectangle(45, 10, 15, 30),
Rectangle(50, 25, 25, 20),
Rectangle(35, 35, 20, 30)]
And of course you can (and should) go further:
average_rects
could return a tuple:Rectangle(x, y, w, h), len(rects)
- The assertions could use
cluster1.x
,.y
,.width
, instead of cryptic[0]
,[1]
, ...
You can replace while len(rects) > 0:
with simply while rects:
, because in Python empty sequences are false. This is recommended on PEP8 too.
-
\$\begingroup\$ I stated with a Rect class but moved to tuples because performance is critical - the data sets can get large. It was my understanding that an object creation and object member lookup was slower than tuple creation and accessing indexed tuple elements. If this is not the case (or is not significant) then I would prefer to go back to the Class for readability. \$\endgroup\$Lars Laakes– Lars Laakes2014年09月07日 13:40:13 +00:00Commented Sep 7, 2014 at 13:40
-
\$\begingroup\$ If performance is critical, tuples should be faster. My suggestions concern mostly readability. I updated now to make this point clear. \$\endgroup\$janos– janos2014年09月07日 13:52:13 +00:00Commented Sep 7, 2014 at 13:52
-
\$\begingroup\$ Thanks, and you are right, "To improve performance,"... I am wait for another review with some clever numpy trick :) added numpy tag \$\endgroup\$Lars Laakes– Lars Laakes2014年09月07日 13:55:21 +00:00Commented Sep 7, 2014 at 13:55
-
\$\begingroup\$ Also update the
while len(rects) > 0:
every bit of code simplification counts. \$\endgroup\$Lars Laakes– Lars Laakes2014年09月07日 13:59:58 +00:00Commented Sep 7, 2014 at 13:59 -
1\$\begingroup\$ You're not supposed to edit your original code, as that invalidates reviews \$\endgroup\$janos– janos2014年09月07日 14:05:46 +00:00Commented Sep 7, 2014 at 14:05