Perhaps the most common need for nearest-neighbor search is for the
purpose of nearest-neighbor
classification. It turns out,
however, that solving this problem need not be as hard as actually
finding the
k nearest neighbors and then finding out what the
majority label is. To determine the correct class label for a query
point, one need only determine whether more of its
k nearest
neighbors are of the positive class or of the negative class.
We demonstrate for the first time that this can be achieved exactly using
bounds on distances, in less time than it takes to actually identify
the
k points in question using the best state-of-the-art
algorithms. We show, for example, that this can make efficient search
possible for a dataset with over 1 million dimensions and a few
hundred thousand points, something which cannot be achieved with full
nearest-neighbor search. (Liu, Moore, and Gray,
New Algorithms for
Efficient High-dimensional Nonparametric Classification [pdf],
[ps] NIPS 2003.) A journal
version has been accepted to JMLR but hasn't appeared yet. Ting Liu has also
developed the multi-class generalization of this algorithm, in another paper.