Skip to main content
Log in

Scalable Distributed Algorithm for Approximate Nearest Neighbor Search Problem in High Dimensional General Metric Spaces

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7404))

Included in the following conference series:

Abstract

We propose a novel approach for solving the approximate nearest neighbor search problem in arbitrary metric spaces. The distinctive feature of our approach is that we can incrementally build a non-hierarchical distributed structure for given metric space data with a logarithmic complexity scaling on the size of the structure and adjustable accuracy probabilistic nearest neighbor queries. The structure is based on a small world graph with vertices corresponding to the stored elements, edges for links between them and the greedy algorithm as base algorithm for searching. Both search and addition algorithms require only local information from the structure. The performed simulation for data in the Euclidian space shows that the structure built using the proposed algorithm has navigable small world properties with logarithmic search complexity at fixed accuracy and has weak (power law) scalability with the dimensionality of the stored data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+
from 17,985円 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 5262
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 6578
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

Discover the latest articles, books and news in related subjects, suggested using machine learning.

References

  1. Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. IEEE Transactions on Information Theory 13(1), 21–27 (1967)

  2. Flickner, M., et al.: Query by image and video content: the QBIC system. Computer 28(9), 23–32 (1995)

  3. Cost, S., Salzberg, S.: A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features. Machine Learning 10(1), 57–78 (1993)

  4. Sarwar, B., Karypis, G., Konstan, J., Riedl, J.: Item-based collaborative filtering recommendation algorithms. In: Proceedings of the 10th International Conference on World Wide Web, New York, USA, pp. 285–295 (2001)

  5. Rhoads, R., Rychlik, W.: A computer program for choosing optimal oligonudeotides for filter hybridization, sequencing and in vitro amplification of DNA. Nucletic Acids Research 17(21), 8543–8551 (1989)

  6. Deerwester, S., et al.: Indexing by Latent Semantic Analysis. J. Amer. Soc. Inform. Sci. 41, 391–407 (1990)

  7. Kleinberg, J.: The Small-World Phenomenon: An Algorithmic Perspective. In: Annual ACM Symposium on Theory of Computing, vol. 32, pp. 163–170 (2000)

  8. Aurenhammer, F.: Voronoi diagrams — a survey of a fundamental geometric data structure. ACM Computing Surveys (CSUR) 23(3), 345–405 (1991)

  9. Navarro, G.: Searching in metric spaces by spatial approximation. Paper Presented at the String Processing and Information Retrieval Symposium, Cancun, Mexico

  10. Bentley, J.L.: Multidimensional binary search trees used for associative searching. Communications of the ACM 18(9), 509–517 (1975)

  11. Finkel, R.A., Bentley, J.L.: Quad Trees: A Data Structure for Retrieval on Composite Keys. Acta Informatica 4(1), 1–9 (1974)

  12. Lee, D.T., Wong, C.K.: Worst-case analysis for region and partial region searches in multidimensional binary search trees and balanced quad trees. Acta Informatica 9(1), 23–29 (1977)

  13. Samet, H.: The design and analysis of spatial data structures. Addison-Wesley Pub. (1989)

  14. Arya, S.: Accounting for boundary effects in nearest-neighbor searching. Discrete & Computational Geometry 16(2), 155–176 (1996)

  15. Chávez, E., et al.: Searching in metric space. Journal ACM Computing Surveys (CSUR) 33(3), 273–321 (2001)

  16. Arya, S., Mount, D.: Approximate nearest neighbor queries in fixed dimensions. In: SODA 1993 Proceedings of the Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, Philadelphia, PA, USA, pp. 271–280 (1993)

  17. Kleinberg, J.: Two algorithms for nearest-neighbor search in high dimensions. In: Proceedings of the Twenty-ninth Annual ACM Symposium on Theory of Computing, STOC 1997, New York, USA, pp. 599–608 (1997)

  18. Indyk, P., Motwani, R.: Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality. In: Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, STOC 1998, New York, USA, pp. 604–613 (1998)

  19. Kushilevitz, E., Ostrovsky, R., Rabani, Y.: Efficient search for approximate nearest neighbor in high dimensional spaces. In: Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, STOC 1998, New York, USA, pp. 614–623 (1998)

  20. Gionis, A., Indyk, P., Motwani, R.: Similarity Search in High Dimensions via Hashing. In: Proceedings of the 25th International Conference on Very Large Data Bases, VLDB 1999, San Francisco, USA, pp. 518–529 (1999)

  21. Andoni, A., Indyk, P.: Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions. In: Proceedings of 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2006), Berkeley, USA, pp. 459–468 (2006)

  22. Houle, M.E., Sakuma, J.: Fast Approximate Similarity Search in Extremely High-Dimensional Data Sets. In: ICDE 2005 (2005)

  23. Chávez, E., Figueroa, K., Navarro, G.: Effective Proximity Retrieval by Ordering Permutations. IEEE Transactions on Pattern Analysis and Machine Intelligence 30(9), 1647–1658 (2008)

  24. Cai, M., Frank, M., Chen, J., Szekely, P.: MAAN: A Multi-Attribute Addressable Network for Grid Information Services. Journal of Grid Computing 2(1), 3–14 (2004)

  25. Ganesan, P., Yang, B., Garcia-Molina, H.: One torus to rule them all: multi-dimensional queries in P2P systems. In: Proceedings of the 7th International Workshop on the Web and Databases, New York, USA, pp. 19–24 (2004)

  26. Bharambe, A.R., Agrawal, M., Seshan, S.: Mercury: supporting scalable multi-attribute range queries. In: Proceedings of Applications, Technologies, Architectures, and Protocols for Computer Communication, New York, USA, pp. 353–366 (2004)

  27. Beaumont, O., Kermarrec, A.-M., Marchal, L., Riviere, E.: VoroNet: A scalable object network based on Voronoi tessellations. In: Proceedings of International Parallel and Distributed Processing Symposium, Long Beach, US, p. 20 (2007)

  28. Novak, D., Zezula, P.: M-Chord: A Scalable Distributed Similarity Search Structure. In: Proceedings of the 2001 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, San Diego, pp. 149–160 (2001)

  29. Batko, M., Gennaro, C., Zezula, P.: Similarity Grid for Searching in Metric Spaces. In: Türker, C., Agosti, M., Schek, H.-J. (eds.) Peer-to-Peer, Grid, and Service-Orientation in Digital Library Architectures. LNCS, vol. 3664, pp. 25–44. Springer, Heidelberg (2005)

  30. Haghani, P., Michel, S., Aberer, K.: Distributed similarity search in high dimensions using locality sensitive hashing. Paper presented at the 12th International Conference on Extending Database Technology: Advances in Database Technology, New York, USA

  31. Beaumont, O., Kermarrec, A.-M., Rivière, É.: Peer to peer multidimensional overlays: approximating complex structures. In: Proceedings of the 11th International Conference on Principles of Distributed Systems, Berlin, Heidelberg (2007)

  32. Krylov, V., Ponomarenko, A., Logvinov, A., Ponomarev, D.: Single-attribute Distributed Metrized Small World Data Structure. Paper Presented at the IEEE International Conference on Intelligent Computing and Intelligent Systems (CAS)

  33. Wang, Y., Xiao, J., Suzek, T.O., Zhang, J., Wang, J., Bryant, S.H.: PubChem: a public information system for analyzing bioactivities of small molecules. Nucl. Acids Res. 37, W623–W633 (2009)

  34. James, C.A., Weininger, D., Delaney, J.: Fingerprints-Screening and Similarity (1997), http://www.daylight.com/dayhtml/doc/theory/theory.toc.html

Download references

Author information

Authors and Affiliations

  1. MERA Labs LLC, Nizhny Novgorod, Russia

    Yury Malkov, Alexander Ponomarenko, Andrey Logvinov & Vladimir Krylov

Authors
  1. Yury Malkov
  2. Alexander Ponomarenko
  3. Andrey Logvinov
  4. Vladimir Krylov

Editor information

Editors and Affiliations

  1. Department of Computer Science, University of Chile, Chile

    Gonzalo Navarro

  2. Department of Mathematics and Statistics, University of Ottawa, 585 King Edward Avenue, K1N 6N5, Ottawa, ON, Canada

    Vladimir Pestov

About this paper

Cite this paper

Malkov, Y., Ponomarenko, A., Logvinov, A., Krylov, V. (2012). Scalable Distributed Algorithm for Approximate Nearest Neighbor Search Problem in High Dimensional General Metric Spaces. In: Navarro, G., Pestov, V. (eds) Similarity Search and Applications. SISAP 2012. Lecture Notes in Computer Science, vol 7404. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32153-5_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-32153-5_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-32152-8

  • Online ISBN: 978-3-642-32153-5

  • eBook Packages: Computer Science Computer Science (R0)

Keywords

Publish with us

AltStyle によって変換されたページ (->オリジナル) /