Cover tree
The cover tree is a type of data structure in computer science that is specifically designed to facilitate the speed-up of a nearest neighbor search. It is a refinement of the Navigating Net data structure, and related to a variety of other data structures developed for indexing intrinsically low-dimensional data.[1]
The tree can be thought of as a hierarchy of levels with the top level containing the root point and the bottom level containing every point in the metric space. Each level C is associated with an integer value i that decrements by one as the tree is descended. Each level C in the cover tree has three important properties:
- Nesting: {\displaystyle C_{i}\subseteq C_{i-1}}
- Covering: For every point {\displaystyle p\in C_{i-1}}, there exists a point {\displaystyle q\in C_{i}} such that the distance from {\displaystyle p} to {\displaystyle q} is less than or equal to {\displaystyle 2^{i}} and exactly one such {\displaystyle q} is a parent of {\displaystyle p}.
- Separation: For all points {\displaystyle p,q\in C_{i}}, the distance from {\displaystyle p} to {\displaystyle q} is greater than {\displaystyle 2^{i}}.
Complexity
[edit ]Find
[edit ]Like other metric trees the cover tree allows for nearest neighbor searches in {\displaystyle O(\eta *\log {n})} where {\displaystyle \eta } is a constant associated with the dimensionality of the dataset and n is the cardinality. To compare, a basic linear search requires {\displaystyle O(n)}, which is a much worse dependence on {\displaystyle n}. However, in high-dimensional metric spaces the {\displaystyle \eta } constant is non-trivial, which means it cannot be ignored in complexity analysis. Unlike other metric trees, the cover tree has a theoretical bound on its constant that is based on the dataset's expansion constant or doubling constant (in the case of approximate NN retrieval). The bound on search time is {\displaystyle O(c^{12}\log {n})} where {\displaystyle c} is the expansion constant of the dataset.
Insert
[edit ]Although cover trees provide faster searches than the naive approach, this advantage must be weighed with the additional cost of maintaining the data structure. In a naive approach adding a new point to the dataset is trivial because order does not need to be preserved, but in a cover tree it can take {\displaystyle O(c^{6}\log {n})} time. However, this is an upper-bound, and some techniques have been implemented that seem to improve the performance in practice.[2]
Space
[edit ]The cover tree uses implicit representation to keep track of repeated points. Thus, it only requires O(n) space.
See also
[edit ]References
[edit ]- Notes
- ^ Kenneth Clarkson. Nearest-neighbor searching and metric space dimensions. In G. Shakhnarovich, T. Darrell, and P. Indyk, editors, Nearest-Neighbor Methods for Learning and Vision: Theory and Practice, pages 15--59. MIT Press, 2006.
- ^ "Cover Tree".
- Bibliography
- Alina Beygelzimer, Sham Kakade, and John Langford. Cover Trees for Nearest Neighbor. In Proc. International Conference on Machine Learning (ICML), 2006.
- JL's Cover Tree page. John Langford's page links to papers and code.
- A C++ Cover Tree implementation on GitHub.
- A cover tree implementation in Java.