I need to write a program that does the following:
- Take an input list of objects whose properties include latitude and longitude to, say, 5 decimal places
- Store them in a data structure once
- Provide a "nearby" lookup function that can efficiently return the N closest objects for a given lat/long
Currently, I'm doing the following, which is suboptimal:
- Store all objects in a hash, with array keys like
[integer_latitude, integer_longitude] - At search time, find all objects in an arbitrary-sized circle around the target. Eg, if the search is at
[0,0], I can get all objects within 1 degree by pulling[-1,0], then[0,0], then[1,0], then[0,-1], etc. - Order the found objects by actual distance to the target and take the top N
This is obviously inefficient, because often there are many more matches than N.
One improvement could be to examine locations in concentric squares outward from the center: all points 0 degrees from the center, all points 1 degree from the center, 2 degrees, etc, and stop after the first square when I have at least the number of objects needed. Then I could sort those by actual, fine-grained distance and take the top N.
Is there some well-established way of doing this search efficiently?
-
$\begingroup$ Another idea would be to use a tree of geohashes, where the geohash gets longer as you go toward the leaves. You could move to a parent to find "near" objects. But I think this fails when two objects are very close together but just opposite a boundary like the equator. $\endgroup$Nathan Long– Nathan Long2015年03月02日 22:33:14 +00:00Commented Mar 2, 2015 at 22:33
-
3$\begingroup$ Have you read this: en.wikipedia.org/wiki/Nearest_neighbor_search ? $\endgroup$HEKTO– HEKTO2015年03月02日 22:47:25 +00:00Commented Mar 2, 2015 at 22:47
-
$\begingroup$ How about dictionary? Or 4 directional linked lists? $\endgroup$S.Dan– S.Dan2015年03月03日 04:37:40 +00:00Commented Mar 3, 2015 at 4:37
-
1$\begingroup$ Closely related question; the comment applies. $\endgroup$Raphael– Raphael2015年03月03日 07:37:35 +00:00Commented Mar 3, 2015 at 7:37
-
2$\begingroup$ I think quadtrees might give you a first approximation. They are usually good to explore some part of the plane, as a 2-dimensional counterpart to binary-trees. But I am no expert on that. $\endgroup$babou– babou2015年03月03日 08:28:20 +00:00Commented Mar 3, 2015 at 8:28
3 Answers 3
Yes, you could you could do better than a naive approach - you could use a k-d tree data structure.
Clever implementations, such as sklearn k-d tree do the nearest neighbour lookup in O(log(n)) which, unless you have a high dimensional and sparse dataset, will work significantly faster than a naive O(n) approach.
As well as k-d trees, it's worth knowing about R-trees, as these are arguably the most popular data structure for indexing in the GIS world.
R-trees are to k-d trees as B-trees are to binary search trees. Like B-trees, R-trees tend to be very cache efficient, which makes them extremely useful for on-disk data structures as you tend to find in databases.
There are obviously many different data structures out there - you could also use a cover tree
Explore related questions
See similar questions with these tags.