Community Over Code EU 2024
Community Over Code is where you will learn about the latest innovations from Apache Software Foundation (ASF) projects and their communities in a collaborative, vendor-neutral environment.
Location: Bratislava, Slovakia
Date: June 3-5, 2024
// our talk
Hybrid Search with Apache Solr
Mon 11:50 am - 12:20 pm
Vector-based search gained incredible popularity in the last few years: Large Language Models fine-tuned for sentence similarity proved to be quite effective in encoding text to vectors and representing some of the semantics of sentences in a numerical form.
These vectors can be used to run a K-nearest neighbour search and look for documents/paragraphs close to the query in a n-dimensional vector space, effectively mimicking a similarity search in the semantic space (Apache Solr KNN Query Parser).
Although exciting, vector-based search nowadays still presents some limitations:
– it’s very difficult to explain – e.g. why is document A returned and why at position K?
– it doesn’t care about exact keyword matching (and users still rely on keyword searches a lot)
To mitigate these problems, combining lexical (traditional keyword-based) search with neural (vector-based) search is possible.
So, what does it mean to combine these two worlds?
Join us as we explore various ways of running hybrid search in Apache Solr, including tricks, suggestions, pros/cons and future works on this exciting new search approach!
// our speaker
Alessandro Benedetti
FOUNDER @ SEASE
APACHE LUCENE/SOLR COMMITTER
APACHE SOLR PMC MEMBER
Senior Search Software Engineer, his focus is on R&D in Information Retrieval, Information Extraction, Natural Language Processing, and Machine Learning.
He firmly believes in Open Source as a way to build a bridge between Academia and Industry and facilitate the progress of applied research.





