Learning To Rank has been the first integration of machine learning techniques with Apache Solr allowing you to improve the ranking of your search results using training data.
One limitation is that documents have to contain the keywords that the user typed in the search box in order to be retrieved(and then reranked).
For example, the query “jaguar” won’t retrieve documents containing only the terms “panthera onca”.
This is called the vocabulary mismatch problem.
Neural search is an Artificial Intelligence technique that allows a search engine to reach those documents that are semantically similar to the user’s information need without necessarily containing those query terms; it learns the similarity of terms and sentences in your collection through deep neural networks and numerical vector representation(so no manual synonyms are needed!).
This talk explores the first Apache Solr official contribution about this topic, available from Apache Solr 9.0.
We start with an overview of neural search (Don’t worry – we keep it simple!): we describe vector representations for queries and documents, and how Approximate K-Nearest Neighbor (KNN) vector search works.
We show how neural search can be used along with deep learning techniques (e.g, BERT) or directly on vector data, and how you can run these types of search with Apache Solr!
You will learn:
– how Approximate Nearest Neighbor (ANN) approaches work, with a focus on Hierarchical Navigable Small World Graph (HNSW)
– how the Apache Lucene implementation works
– how the Apache Solr implementation works, with the new field type and query parser introduced
– how to run KNN queries and how to use it to rerank a first stage pass
– how to generate vectors from text and integrate large language models with Apache Solr
Join us to explore this exciting new Apache Solr feature and learn how you can leverage it to improve your search experience!