Open Distro for Elasticsearch k-NN plugin enables users to run nearest neighbor search on billions of documents across thousands of dimensions with the same ease as running any regular Elasticsearch query. Use aggregations and filter clauses to further refine your similarity search operations. Power uses cases such as product recommendations, fraud detection, image and video search, related document search, and more.
The k-NN plugin leverages the lightweight open source library Non-Metric Space Library (NMSLIB) that implements the approximate k-NN search based on Hierarchical Navigable Small world (HNSW) graphs. NMSLIB is a highly efficient implementation of k-NN, which has consistently out-performed most of the other solutions as per the ANN-Benchmarks published here. NMSLIB can be easily extended to add new search methods and distance functions.
The solution has extended Lucene codec to introduce a separate file format for storing and retrieving k-NN indices to deliver high efficiency k-NN search operations on Elasticsearch. Datasets in k-NN are vectors that are represented in Elasticsearch fields by the new datatype called knn_vector. This vector can support a single list of up to 10,000 floats.
k-NN functionality integrates seamlessly with other Elasticsearch features. This provides users the flexibility to use Elasticsearch’s extensive search features such as aggregations and filtering with k-NN to further slice and dice the data to increase the precision of the searches.
By leveraging Elasticsearch’s distributed architecture, k-NN plugin can ingest large datasets, support incremental updates and efficiently process the machine learning model, thereby delivering a highly performant and scalable similarity search engine.