This is a continued discussion from here, and I use new a approach (bounding box/or envelop algorithm) to improve the algorithm efficiency. Any code bugs, algorithm time complexity improvement or general code advice is highly appreciated.
Major structure of my code:
assign_box
is used to build the bounding box or envelopfind_min
is finding min points among all candidate points for a given target point. It first uses a bounding box/envelop solution. If it cannot find an answer, it will use a brute force solutionbruce_force
is a brute force solution which is used to check the correctness offind_min
distance
is a utility method used to check the distance between two points
import sys
import random
from collections import defaultdict
def assign_box(box_map, points, k):
x_min = min([p[0] for p in points])
x_max = max([p[0] for p in points])
y_min = min([p[1] for p in points])
y_max = max([p[1] for p in points])
x_distance = (x_max - x_min) / k
y_distance = (y_max - y_min) / k
for p in points:
i = (p[0]-x_min)/x_distance
j = (p[1]-y_min)/y_distance
box_map[(i,j)].append(p)
return (x_min, x_distance, y_min, y_distance)
def distance(p1, p2):
return (p1[0]-p2[0]) ** 2 + (p1[1]-p2[1]) ** 2
def find_min(points, box_map, target_point, x_min, x_distance, y_min, y_distance):
# try 9 neighbour first
row = (target_point[0] - x_min) / x_distance
col = (target_point[1] - y_min) / y_distance
result = sys.maxint
result_point = None
for r in range(row-1, row+2):
for c in range(col-1, col+2):
if (r,c) in box_map:
for p in box_map[(r,c)]:
d = distance(p, target_point)
if d < result:
result = d
result_point = p
if result_point:
return (result, result_point)
else:
# try brute force solution
result = sys.maxint
result_point = None
for p in points:
d = distance(p, target_point)
if d < result:
result = d
result_point = p
return (result, result_point)
def bruce_force(points, target_point):
result = sys.maxint
result_point = None
for p in points:
d = distance(p, target_point)
if d < result:
result = d
result_point = p
return (result, result_point)
if __name__ == "__main__":
# key is a tuple, (box row index, box col index)
# value is a list of points fall into it
box_map = defaultdict(list)
points = []
for i in range(100):
points.append((random.randint(0,10), random.randint(0,10)))
x_min, x_distance, y_min, y_distance = assign_box(box_map, points, 5)
target_point = (random.randint(0,10), random.randint(0,10))
print 'target point, ', target_point
print 'find min result, ', find_min(points, box_map, target_point, x_min, x_distance, y_min, y_distance)
print 'brute force result, ', bruce_force(points, target_point)
1 Answer 1
Mostly this looks good.
- assign_box is used to build the bounding box or envelope
I benchmarked the speed of assign_box
. Please see https://github.com/jhanley634/testing-tools/blob/master/problem/bench/min_of_copied_list.py#L104..L109 . It turns out one can compute a pair of min's / max's slightly faster:
0.017s scan_manually_with_if
0.019s scan_with_two_copies
0.030s assign_box
0.049s scan_with_key
0.078s scan_manually_with_min_max
The first two scan approaches are faster than your code.
- find_min
A comment of # try 8 neighbors first
would be more natural than 9
, since distance won't be less when row & column offsets are zero.
- bruce_force
Maybe this is a Bruce Lee joke. I recommend renaming it def brute_force
.
- distance is a utility method ...
I recommend renaming it def distance_squared
, as it clearly does not compute Euclidean distance. It's only used in comparisons, and N^2 is monotonic w.r.t. N, so there's no harm done. But you should say what you mean, and mean what you say.
Explore related questions
See similar questions with these tags.