Sease at ECIR 2022!
ECIR 2022
ECIR features full-paper and poster presentations, system demonstrations, tutorials, workshops, an industry-oriented event, and traditionally has a strong focus on the active participation of early-career researchers.
Location: Stavanger, Norway
Date: 10th-14th April 2022

Dense Retrieval with
Apache Solr Neural Search
Neural Search is an industry derivation from the academic field of Neural information Retrieval. More and more frequently, we hear about how Artificial Intelligence (AI) permeates every aspect of our lives and this includes also software engineering and Information Retrieval. In particular, the advent of Deep Learning introduced the use of deep neural networks to solve complex problems that could not be solved simply by an algorithm. For example, Deep Learning can be used to produce a vector representation of both the query and the documents in a corpus of information. Search, in general, comprises of performing four primary steps:
- generate a representation of the query that describes the information need
- generate a representation of the document that captures the information contained in it
- match the query and the document representations from the corpus of information
- assign a score to each matched document in order to establish a meaningful document ranking by relevance in the results
With the Neural Search module, Apache Solr is introducing support for neural network based techniques that can improve these four aspects of search. This talk explores the first official contribution of Neural Search capabilities coming to Apache Solr 9.1, in the first quarter of 2022: Approximate K-Nearest Neighbor Vector Search for matching and ranking.
You will learn:
- how Approximate Nearest Neighbor (ANN) approaches work, with a focus on Hierarchical Navigable Small World Graph (HNSW)
- how the Apache Lucene implementation works
- how the Apache Solr implementation works, with the new field type and query parser introduced
- how to run KNN queries and how to use it to rerank a first stage pass
- how the performance benchmarks compare with classic BM25 lexical retrieval and ranking

Alessandro Benedetti
Founder @SeaseAPACHE LUCENE/SOLR COMMITTER
APACHE SOLR PMC MEMBER
Elia Porciani
R&D SOFTWARE ENGINEER @SeaseSEARCH CONSULTANT
Evaluating Ranking Models in Production:
a View on Offline and Online Experiences
Evaluation plays a key role in the field of information retrieval. Researchers and practitioners design and develop ranking models to represent the relationship between an information need expressed by a user (query) and information (search result) from the available resources (corpus). To validate any research paper on ranking innovation, It is fundamental to test the
produced models by comparing their outcomes and calculating relevance metrics on a pre-
defined ground truth(judgments).
What happens in the industry, where real users interact with the system, business interests affect the concept of relevance and pre-defined relevance judgments are not available? This talk illustrates how companies in different domains approach the problem and
implement offline and online testing/monitoring solutions. For each real-world application, this presentation describes:
- how it is approached and implemented (A/B testing, Interleaving, Statistical Significance calculations…)
- how the implicit/explicit feedback is collected and used to estimate the relevance (internal team of experts, users interactions with the system, revenue/profit signals, sponsored results… )
- how the experiments are designed and planned (how many models to compare at the time, what models to compare in the same test, how to test mobile/desktop/tablet platforms…)
- what Open Source technologies are used to facilitate the tasks
- most common pitfalls and solutions to mitigate them

Alessandro Benedetti
Founder @ SeaseAPACHE LUCENE/SOLR COMMITTER
APACHE SOLR PMC MEMBER
Anna Ruggero
R&D SOFTWARE ENGINEER @SeaseSEARCH CONSULTANT
Related
Author
Lisa Biella
Lisa Biella is a creative digital marketer, geek at heart who is enthusiastic about technology and how it affects people’s lives.