It makes no sense to save the distances to disk (disk i/o is slower than the computation), but it might make sense to "memoize" "memoize" your function:
distance2_cache = {}
def distance2(p1,p2):
"Compute the distance squared, using cache."
try:
return distance2_cache[(p1,p2)]
except KeyError:
distance2_cache[(p1,p2)] = d2 = (p1[0]-p2[0])**2 + (p1[1]-p2[1])**2
return d2
Note that this might actually be slower than your original function because dictionary lookup may be more expensive than 2 multiplications (which dwarf two subtractions and an addition).
Edit: envelopes
If your pool of points is huge, and you have a stream of point that have to be matched against the pool (i.e., find the pool element which is the closest to input point), you can use lattices/envelopes: suppose your pool points have coordinates from 0 to 1. You can split them into 100 boxes by splitting each coordinate into 10. E.g., the box number 35 will have 0.2<=x<0.3
and 0.4<=y<0.5
. Then for each new point you only need to check "very few" boxes (instead of all 100).
Specifically, you need to find the closest point in the box where the point landed, and then compare the distance to that closest point to the distance to the boundary of the box.
If the closest point is closer than the boundary, you are done.
Otherwise you need to check those neighboring boxes that are closer than the closest point (up to 11! if the target point and its closest point are almost in the opposite diagonal nodes). However, this should not happen if the number of points in the box is "large".
A good rule of thumb is that each box should contain as many points as there are boxes, e.g., if you have 10,000 points, there should be 100 boxes.
PS. The further development of this approach results in K-d tree.
It makes no sense to save the distances to disk (disk i/o is slower than the computation), but it might make sense to "memoize" your function:
distance2_cache = {}
def distance2(p1,p2):
"Compute the distance squared, using cache."
try:
return distance2_cache[(p1,p2)]
except KeyError:
distance2_cache[(p1,p2)] = d2 = (p1[0]-p2[0])**2 + (p1[1]-p2[1])**2
return d2
Note that this might actually be slower than your original function because dictionary lookup may be more expensive than 2 multiplications (which dwarf two subtractions and an addition).
Edit: envelopes
If your pool of points is huge, and you have a stream of point that have to be matched against the pool (i.e., find the pool element which is the closest to input point), you can use lattices/envelopes: suppose your pool points have coordinates from 0 to 1. You can split them into 100 boxes by splitting each coordinate into 10. E.g., the box number 35 will have 0.2<=x<0.3
and 0.4<=y<0.5
. Then for each new point you only need to check "very few" boxes (instead of all 100).
Specifically, you need to find the closest point in the box where the point landed, and then compare the distance to that closest point to the distance to the boundary of the box.
If the closest point is closer than the boundary, you are done.
Otherwise you need to check those neighboring boxes that are closer than the closest point (up to 11! if the target point and its closest point are almost in the opposite diagonal nodes). However, this should not happen if the number of points in the box is "large".
A good rule of thumb is that each box should contain as many points as there are boxes, e.g., if you have 10,000 points, there should be 100 boxes.
PS. The further development of this approach results in K-d tree.
It makes no sense to save the distances to disk (disk i/o is slower than the computation), but it might make sense to "memoize" your function:
distance2_cache = {}
def distance2(p1,p2):
"Compute the distance squared, using cache."
try:
return distance2_cache[(p1,p2)]
except KeyError:
distance2_cache[(p1,p2)] = d2 = (p1[0]-p2[0])**2 + (p1[1]-p2[1])**2
return d2
Note that this might actually be slower than your original function because dictionary lookup may be more expensive than 2 multiplications (which dwarf two subtractions and an addition).
Edit: envelopes
If your pool of points is huge, and you have a stream of point that have to be matched against the pool (i.e., find the pool element which is the closest to input point), you can use lattices/envelopes: suppose your pool points have coordinates from 0 to 1. You can split them into 100 boxes by splitting each coordinate into 10. E.g., the box number 35 will have 0.2<=x<0.3
and 0.4<=y<0.5
. Then for each new point you only need to check "very few" boxes (instead of all 100).
Specifically, you need to find the closest point in the box where the point landed, and then compare the distance to that closest point to the distance to the boundary of the box.
If the closest point is closer than the boundary, you are done.
Otherwise you need to check those neighboring boxes that are closer than the closest point (up to 11! if the target point and its closest point are almost in the opposite diagonal nodes). However, this should not happen if the number of points in the box is "large".
A good rule of thumb is that each box should contain as many points as there are boxes, e.g., if you have 10,000 points, there should be 100 boxes.
PS. The further development of this approach results in K-d tree.
It makes no sense to save the distances to disk (disk i/o is slower than the computation), but it might make sense to "memoize" your function:
distance2_cache = {}
def distance2(p1,p2):
"Compute the distance squared, using cache."
try:
return distance2_cache[(p1,p2)]
except KeyError:
distance2_cache[(p1,p2)] = d2 = (p1[0]-p2[0])**2 + (p1[1]-p2[1])**2
return d2
Note that this might actually be slower than your original function because dictionary lookup may be more expensive than 2 multiplications (which dwarf two subtractions and an addition).
Edit: envelopes
If your pool of points is huge, and you have a stream of point that have to be matched against the pool (i.e., find the pool element which is the closest to input point), you can use lattices/envelopes: suppose your pool points have coordinates from 0 to 1. You can split them into 100 boxes by splitting each coordinate into 10. E.g., the box number 35 will have 0.2<=x<0.3
and 0.4<=y<0.5
. Then for each new point you only need to check 4"very few" boxes (instead of all 100).
YouSpecifically, you need to be careful thoughfind the closest point in the box where the point landed, and then compare the distance to that closest point to the distance to the boundary of the box.
If yourthe closest point is closer than the boundary, you are done.
Otherwise you need to check those neighboring boxes that are too smallcloser than the closest point (and thus contain too few pointsup to 11! if the target point and its closest point are almost in the opposite diagonal nodes). However, it mightthis should not be enough to look athappen if the 4 boxesnumber of points in the box is "large". A
A good rule of thumb is that each box should contain as many points as there are boxes, e.g., if you have 10,000 points, there should be 100 boxes.
PS. The further development of this approach results in K-d tree.
It makes no sense to save the distances to disk (disk i/o is slower than the computation), but it might make sense to "memoize" your function:
distance2_cache = {}
def distance2(p1,p2):
"Compute the distance squared, using cache."
try:
return distance2_cache[(p1,p2)]
except KeyError:
distance2_cache[(p1,p2)] = d2 = (p1[0]-p2[0])**2 + (p1[1]-p2[1])**2
return d2
Note that this might actually be slower than your original function because dictionary lookup may be more expensive than 2 multiplications (which dwarf two subtractions and an addition).
Edit: envelopes
If your pool of points is huge, and you have a stream of point that have to be matched against the pool (i.e., find the pool element which is the closest to input point), you can use lattices/envelopes: suppose your pool points have coordinates from 0 to 1. You can split them into 100 boxes by splitting each coordinate into 10. E.g., the box number 35 will have 0.2<=x<0.3
and 0.4<=y<0.5
. Then for each new point you only need to check 4 boxes (instead of all 100).
You need to be careful though. If your boxes are too small (and thus contain too few points), it might not be enough to look at the 4 boxes. A good rule of thumb is that each box should contain as many points as there are boxes, e.g., if you have 10,000 points, there should be 100 boxes.
PS. The further development of this approach results in K-d tree.
It makes no sense to save the distances to disk (disk i/o is slower than the computation), but it might make sense to "memoize" your function:
distance2_cache = {}
def distance2(p1,p2):
"Compute the distance squared, using cache."
try:
return distance2_cache[(p1,p2)]
except KeyError:
distance2_cache[(p1,p2)] = d2 = (p1[0]-p2[0])**2 + (p1[1]-p2[1])**2
return d2
Note that this might actually be slower than your original function because dictionary lookup may be more expensive than 2 multiplications (which dwarf two subtractions and an addition).
Edit: envelopes
If your pool of points is huge, and you have a stream of point that have to be matched against the pool (i.e., find the pool element which is the closest to input point), you can use lattices/envelopes: suppose your pool points have coordinates from 0 to 1. You can split them into 100 boxes by splitting each coordinate into 10. E.g., the box number 35 will have 0.2<=x<0.3
and 0.4<=y<0.5
. Then for each new point you only need to check "very few" boxes (instead of all 100).
Specifically, you need to find the closest point in the box where the point landed, and then compare the distance to that closest point to the distance to the boundary of the box.
If the closest point is closer than the boundary, you are done.
Otherwise you need to check those neighboring boxes that are closer than the closest point (up to 11! if the target point and its closest point are almost in the opposite diagonal nodes). However, this should not happen if the number of points in the box is "large".
A good rule of thumb is that each box should contain as many points as there are boxes, e.g., if you have 10,000 points, there should be 100 boxes.
PS. The further development of this approach results in K-d tree.
It makes no sense to save the distances to disk (disk i/o is slower than the computation), but it might make sense to "memoize" your function:
distance2_cache = {}
def distance2(p1,p2):
"Compute the distance squared, using cache."
try:
return distance2_cache[(p1,p2)]
except KeyError:
distance2_cache[(p1,p2)] = d2 = (p1[0]-p2[0])**2 + (p1[1]-p2[1])**2
return d2
Note that this might actually be slower than your original function because dictionary lookup may be more expensive than 2 multiplications (which dwarf two subtractions and an addition).
Edit: envelopes
If your pool of points is huge, and you have a stream of point that have to be matched against the pool (i.e., find the pool element which is the closest to input point), you can use lattices/envelopes: suppose your pool points have coordinates from 0 to 1. You can split them into 100 boxes by splitting each coordinate into 10. E.g., the box number 35 will have 0.2<=x<0.3
and 0.4<=y<0.5
. Then for each new point you only need to check 4 boxes (instead of all 100).
You need to be careful though. If your boxes are too small (and thus contain too few points), it might not be enough to look at the 4 boxes. A good rule of thumb is that each box should contain as many points as there are boxes, e.g., if you have 10,000 points, there should be 100 boxes.
PS. The further development of this approach results in K-d tree .
It makes no sense to save the distances to disk (disk i/o is slower than the computation), but it might make sense to "memoize" your function:
distance2_cache = {}
def distance2(p1,p2):
"Compute the distance squared, using cache."
try:
return distance2_cache[(p1,p2)]
except KeyError:
distance2_cache[(p1,p2)] = d2 = (p1[0]-p2[0])**2 + (p1[1]-p2[1])**2
return d2
Note that this might actually be slower than your original function because dictionary lookup may be more expensive than 2 multiplications (which dwarf two subtractions and an addition).
Edit: envelopes
If your pool of points is huge, and you have a stream of point that have to be matched against the pool (i.e., find the pool element which is the closest to input point), you can use lattices/envelopes: suppose your pool points have coordinates from 0 to 1. You can split them into 100 boxes by splitting each coordinate into 10. E.g., the box number 35 will have 0.2<=x<0.3
and 0.4<=y<0.5
. Then for each new point you only need to check 4 boxes (instead of all 100).
You need to be careful though. If your boxes are too small (and thus contain too few points), it might not be enough to look at the 4 boxes. A good rule of thumb is that each box should contain as many points as there are boxes, e.g., if you have 10,000 points, there should be 100 boxes.
It makes no sense to save the distances to disk (disk i/o is slower than the computation), but it might make sense to "memoize" your function:
distance2_cache = {}
def distance2(p1,p2):
"Compute the distance squared, using cache."
try:
return distance2_cache[(p1,p2)]
except KeyError:
distance2_cache[(p1,p2)] = d2 = (p1[0]-p2[0])**2 + (p1[1]-p2[1])**2
return d2
Note that this might actually be slower than your original function because dictionary lookup may be more expensive than 2 multiplications (which dwarf two subtractions and an addition).
Edit: envelopes
If your pool of points is huge, and you have a stream of point that have to be matched against the pool (i.e., find the pool element which is the closest to input point), you can use lattices/envelopes: suppose your pool points have coordinates from 0 to 1. You can split them into 100 boxes by splitting each coordinate into 10. E.g., the box number 35 will have 0.2<=x<0.3
and 0.4<=y<0.5
. Then for each new point you only need to check 4 boxes (instead of all 100).
You need to be careful though. If your boxes are too small (and thus contain too few points), it might not be enough to look at the 4 boxes. A good rule of thumb is that each box should contain as many points as there are boxes, e.g., if you have 10,000 points, there should be 100 boxes.
PS. The further development of this approach results in K-d tree .