1
$\begingroup$

I need to write a program that does the following:

  • Take an input list of objects whose properties include latitude and longitude to, say, 5 decimal places
  • Store them in a data structure once
  • Provide a "nearby" lookup function that can efficiently return the N closest objects for a given lat/long

Currently, I'm doing the following, which is suboptimal:

  • Store all objects in a hash, with array keys like [integer_latitude, integer_longitude]
  • At search time, find all objects in an arbitrary-sized circle around the target. Eg, if the search is at [0,0], I can get all objects within 1 degree by pulling [-1,0], then [0,0], then [1,0], then [0,-1], etc.
  • Order the found objects by actual distance to the target and take the top N

This is obviously inefficient, because often there are many more matches than N.

One improvement could be to examine locations in concentric squares outward from the center: all points 0 degrees from the center, all points 1 degree from the center, 2 degrees, etc, and stop after the first square when I have at least the number of objects needed. Then I could sort those by actual, fine-grained distance and take the top N.

Is there some well-established way of doing this search efficiently?

asked Mar 2, 2015 at 22:32
$\endgroup$
9
  • $\begingroup$ Another idea would be to use a tree of geohashes, where the geohash gets longer as you go toward the leaves. You could move to a parent to find "near" objects. But I think this fails when two objects are very close together but just opposite a boundary like the equator. $\endgroup$ Commented Mar 2, 2015 at 22:33
  • 3
    $\begingroup$ Have you read this: en.wikipedia.org/wiki/Nearest_neighbor_search ? $\endgroup$ Commented Mar 2, 2015 at 22:47
  • $\begingroup$ How about dictionary? Or 4 directional linked lists? $\endgroup$ Commented Mar 3, 2015 at 4:37
  • 1
    $\begingroup$ Closely related question; the comment applies. $\endgroup$ Commented Mar 3, 2015 at 7:37
  • 2
    $\begingroup$ I think quadtrees might give you a first approximation. They are usually good to explore some part of the plane, as a 2-dimensional counterpart to binary-trees. But I am no expert on that. $\endgroup$ Commented Mar 3, 2015 at 8:28

3 Answers 3

1
$\begingroup$

Yes, you could you could do better than a naive approach - you could use a k-d tree data structure.

Clever implementations, such as sklearn k-d tree do the nearest neighbour lookup in O(log(n)) which, unless you have a high dimensional and sparse dataset, will work significantly faster than a naive O(n) approach.

answered Jul 17, 2018 at 23:37
$\endgroup$
0
$\begingroup$

As well as k-d trees, it's worth knowing about R-trees, as these are arguably the most popular data structure for indexing in the GIS world.

R-trees are to k-d trees as B-trees are to binary search trees. Like B-trees, R-trees tend to be very cache efficient, which makes them extremely useful for on-disk data structures as you tend to find in databases.

answered Jul 18, 2018 at 3:08
$\endgroup$
0
$\begingroup$

There are obviously many different data structures out there - you could also use a cover tree

answered Jul 18, 2018 at 3:08
$\endgroup$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.