Cover Tree for Nearest Neighbor calculations
Alina Beygelzimer,
Sham Kakade, and
John Langford,
Cover Trees for Nearest Neighbor, ICML 2006.
Video
A
longer version and
experimental results
addendum
Thomas Kollar found a
small
bug in the insert algorithm description. This doesn't appear in
the code because the code uses a batch insert implementation.
A Cover Tree is a datastructure helpful in calculating the nearest
neighbor of points given only a metric. A cover tree is particularly
motivating for a confluence of reasons:
- The running time of a nearest neighbor query is only O(log(n)) given a fixed intrinsic dimensionality. (like KR2002 and KL04)
- The space usage and query time are O(n) under no assumptions. (like the naive approach, sb(s), and ball trees)
- It's remarkably fast in practice.
code (v4) (Under LGPL/GPL license),
templated code (v2),
datasets,
and
sparse datasets (This is version 2, the templated version with both vector points and sparse vector points defined.) A
cover tree code faq. v4 fixes a subtle bug that Ryan Curtin found and has a more robust zero-distance check.
Gordon Rios
notes a few details on porting to a Mac.
Application
Dinoj Surendran has
added the cover tree to
isomap yielding speedups, as
expected. The
code.
Performance Tests
Speedup of a cover tree over an optimized brute force approach for querying 1,2,3,5, and 10 (euclidean) nearest neighbors on a collection of datasets.
Speedup of a cover tree over the sb(S) datastructure for querying 1,2 (euclidean) nearest neighbors
Speedup of a cover tree over the sb(S) datastructure for querying 1,2 (string) nearest neighbors