Optimize finding index of nearest point in 2d arrays

Question 1

2d NumPy array x_array contains positional information in x-direction, y_array positions in y-direction. I then have a list of x,y points. For each point in the list I find the array index of the location closest to that point, based on this code:

import time
import numpy
def find_index_of_nearest_xy(y_array, x_array, y_point, x_point):
 distance = (y_array-y_point)**2 + (x_array-x_point)**2
 idy,idx = numpy.where(distance==distance.min())
 return idy[0],idx[0]
def do_all(y_array, x_array, points):
 store = []
 for i in xrange(points.shape[1]):
 store.append(find_index_of_nearest_xy(y_array,x_array,points[0,i],points[1,i]))
 return store
# Create some dummy data
y_array = numpy.random.random(10000).reshape(100,100)
x_array = numpy.random.random(10000).reshape(100,100)
points = numpy.random.random(10000).reshape(2,5000)
# Time how long it takes to run
start = time.time()
results = do_all(y_array, x_array, points)
end = time.time()
print 'Completed in: ',end-start

I want to speed it up.

Question 2

Just a guess but maybe a k-d tree would help. I don't know if Python has an implementation.

Question 3

No need to create a list and to transpose 'points'. Use an array instead and ravel the indexes.

Question 4

From the docs re KDTree re cKDTree:

cKDTree is functionally identical to KDTree. Prior to SciPy v1.6.0, cKDTree had better performance and slightly different functionality but now the two names exist only for backward-compatibility reasons. If compatibility with SciPy < 1.6 is not a concern, prefer KDTree.

Question 5

@PeteW Do not include solution to question please (post a separate answer instead).

Question 6

Here is a scipy.spatial.KDTree example

In [1]: from scipy import spatial
In [2]: import numpy as np
In [3]: A = np.random.random((10,2))*100
In [4]: A
Out[4]:
array([[ 68.83402637, 38.07632221],
 [ 76.84704074, 24.9395109 ],
 [ 16.26715795, 98.52763827],
 [ 70.99411985, 67.31740151],
 [ 71.72452181, 24.13516764],
 [ 17.22707611, 20.65425362],
 [ 43.85122458, 21.50624882],
 [ 76.71987125, 44.95031274],
 [ 63.77341073, 78.87417774],
 [ 8.45828909, 30.18426696]])
In [5]: pt = [6, 30] # <-- the point to find
In [6]: A[spatial.KDTree(A).query(pt)[1]] # <-- the nearest point 
Out[6]: array([ 8.45828909, 30.18426696])
#how it works!
In [7]: distance,index = spatial.KDTree(A).query(pt)
In [8]: distance # <-- The distances to the nearest neighbors
Out[8]: 2.4651855048258393
In [9]: index # <-- The locations of the neighbors
Out[9]: 9
#then 
In [10]: A[index]
Out[10]: array([ 8.45828909, 30.18426696])

Question 7

scipy.spatial also has a k-d tree implementation: scipy.spatial.KDTree.

The approach is generally to first use the point data to build up a k-d tree. The computational complexity of that is on the order of N log N, where N is the number of data points. Range queries and nearest neighbour searches can then be done with log N complexity. This is much more efficient than simply cycling through all points (complexity N).

Thus, if you have repeated range or nearest neighbor queries, a k-d tree is highly recommended.

Question 8

OK, using scipy.spatial.cKDTree seems to be the way to go. Testing with my test data showed that the standard scipy.spatial.KDTree doesn't give much/any improvement over my naive solution.

Question 9

If you can massage your data into the right format, a fast way to go is to use the methods in scipy.spatial.distance:

http://docs.scipy.org/doc/scipy/reference/spatial.distance.html

In particular pdist and cdist provide fast ways to calculate pairwise distances.

Question 10

I call that massaging too, it pretty much describes what we do with data. :D

Question 11

Scipy.spatil.distance is great tool but be aware that if you have lots of distances to calculate cKdtree is so much faster than cdist.

Question 12

If i am not misunderstood, using cdist() or other Numpy method is shown in this answer codereview.stackexchange.com/a/134918/156228

Question 13

Search methods have two phases:

build a search structure, e.g. a KDTree, from the npt data points (your x y)
lookup nq query points.

Different methods have different build times, and different query times. Your choice will depend a lot on npt and nq:
scipy cdist has build time 0, but query time ~ npt * nq.
KDTree build times are complicated, lookups are very fast, ~ ln npt * nq.

On a regular (Manhatten) grid, you can do much better: see (ahem) find-nearest-value-in-numpy-array.

A little testbench: : building a KDTree of 5000 ×ばつ 5000 2d points takes about 30 seconds, then queries take microseconds; scipy cdist 25 million ×ばつ 20 points (all pairs, 4G) takes about 5 seconds, on my old iMac.

Question 14

I have been trying to follow along with this, but new to Jupyter Notebooks, Python and the various tools being discussed here, but I have managed to get some way down the road I'm travelling.

BURoute = pd.read_csv('C:/Users/andre/BUKP_1m.csv', header=None)
NGEPRoute = pd.read_csv('c:/Users/andre/N1-06.csv', header=None)

I create a combined XY array from my BURoute dataframe

combined_x_y_arrays = BURoute.iloc[:,[0,1]]

And I create the points with the following command

points = NGEPRoute.iloc[:,[0,1]]

I then do the KDTree magic

def do_kdtree(combined_x_y_arrays, points): 
 mytree = scipy.spatial.cKDTree(combined_x_y_arrays)
 dist, indexes = mytree.query(points)
 return indexes
results2 = do_kdtree(combined_x_y_arrays, points)

This gives me an array of the indexes. I'm now trying to figure out how to calculate the distance between the points and the indexed points in the results array.

Question 15

As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center.

Question 16

def find_nearest_vector(self,arrList, value):
 
 y,x = value
 offset =10
 
 x_Array=[]
 y_Array=[]
 for p in arrList:
 x_Array.append(p[1])
 y_Array.append(p[0])
 
 x_Array=np.array(x_Array)
 y_Array=np.array(y_Array)
 difference_array_x = np.absolute(x_Array-x)
 difference_array_y = np.absolute(y_Array-y)
 index_x = np.where(difference_array_x<offset)[0]
 index_y = np.where(difference_array_y<offset)[0]
 index = np.intersect1d(index_x, index_y, assume_unique=True)
 nearestCootdinate = (arrList[index][0][0],arrList[index][0][1])
 
 return nearestCootdinate

Question 17

An answer with code only, no explanation, is generally considered as not useful. What is the algorithm? Its efficiency?

Question 18

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.

efirvida efirvida 4,8855 gold badges50 silver badges76 bronze badges · Accepted Answer · 2015-09-25 11:59:48Z

Here is a scipy.spatial.KDTree example

In [1]: from scipy import spatial
In [2]: import numpy as np
In [3]: A = np.random.random((10,2))*100
In [4]: A
Out[4]:
array([[ 68.83402637, 38.07632221],
 [ 76.84704074, 24.9395109 ],
 [ 16.26715795, 98.52763827],
 [ 70.99411985, 67.31740151],
 [ 71.72452181, 24.13516764],
 [ 17.22707611, 20.65425362],
 [ 43.85122458, 21.50624882],
 [ 76.71987125, 44.95031274],
 [ 63.77341073, 78.87417774],
 [ 8.45828909, 30.18426696]])
In [5]: pt = [6, 30] # <-- the point to find
In [6]: A[spatial.KDTree(A).query(pt)[1]] # <-- the nearest point 
Out[6]: array([ 8.45828909, 30.18426696])
#how it works!
In [7]: distance,index = spatial.KDTree(A).query(pt)
In [8]: distance # <-- The distances to the nearest neighbors
Out[8]: 2.4651855048258393
In [9]: index # <-- The locations of the neighbors
Out[9]: 9
#then 
In [10]: A[index]
Out[10]: array([ 8.45828909, 30.18426696])

CollectivesTM on Stack Overflow

Optimize finding index of nearest point in 2d arrays

6 Answers 6

Comments

1 Comment

3 Comments

Comments

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

6 Answers 6

Comments

1 Comment

3 Comments

Comments

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related