Search Solutions 2022
Search Solutions is the BCS Information Retrieval specialist group’s annual event focused on practitioner issues in the area of search and information retrieval.
Tutorials are for both full day (5-6 hours including breaks and lunch) and half day (2-3 hours including breaks). The tutorials will take place on Tuesday 22nd November 2022 at the BCS offices in London and/or online depending on the situation near the time.
Location: BCS, The Chartered Institute for IT, Ground Floor, 25 Copthall Avenue, London, EC2R 7BP
Date: 22nd November 2022
Approaching Neural Search with Apache Solr and Open-source technologies
Learning To Rank has been the first integration of machine learning techniques with Apache Solr allowing you to improve the ranking of your search results using training data.
One limitation is that documents have to contain the keywords that the user typed in the search box in order to be retrieved(and then reranked).
For example, the query “jaguar” won’t retrieve documents containing only the terms “panthera onca”.
This is called the vocabulary mismatch problem.
Neural search is an Artificial Intelligence technique that allows a search engine to reach those documents that are semantically similar to the user’s information need without necessarily containing those query terms; it learns the similarity of terms and sentences in your collection through deep neural networks and numerical vector representation(so no manual synonyms are needed!).
This talk explores the first Apache Solr official contribution about this topic, available from Apache Solr 9.0.
We start with an overview of neural search (Don’t worry – we keep it simple!): we describe vector representations for queries and documents, and how Approximate K-Nearest Neighbor (KNN) vector search works.
We show how neural search can be used along with deep learning techniques (e.g, BERT) or directly on vector data, and how you can run these types of search with Apache Solr!
You will learn:
– how Approximate Nearest Neighbor (ANN) approaches work, with a focus on Hierarchical Navigable Small World Graph (HNSW)
– how the Apache Lucene implementation works
– how the Apache Solr implementation works, with the new field type and query parser introduced
– how to run KNN queries and how to use it to rerank a first stage pass
– how to generate vectors from text and integrate large language models with Apache Solr
Join us to explore this exciting new Apache Solr feature and learn how you can leverage it to improve your search experience!
9:00 – 9:20 – Introduction to Semantic Search Problems (vocabulary mismatch problem, semantic similarity)
9:20 – 9:40 – From Text to Vectors (Sparse vs Dense vector representation)
9:40 – 10:10 – how Approximate Nearest Neighbor (ANN) approaches work, with a focus on Hierarchical Navigable Small World Graph (HNSW)
10:10 – 10:40 – how the Apache Lucene implementation works
10:40 – 11:10 – how the Apache Solr implementation works, with the new field type and query parser introduced
11:10 – 11:30 – Break
11:30 – 12:00 – how to run KNN queries and how to use it to rerank a first-stage pass
12:00 – 12:35 – how to generate vectors from text and integrate large language models with Apache Solr”
12:35 – 13:05 – Limitations and how to mitigate them
13:05 – 13:20 – Future Works