Apache Lucene Elasticsearch Main Blog

Elasticsearch _source, doc_values and store Performance

In this blogpost I want to explore what possibilities elasticsearch gives us for storing fields and retrieve them at query time from the performance point of view. In fact, lucene, the underlying library upon which elasticsearch and solr are built, provides two ways for storing and retrieving fields: stored fields and docvalues. In addition, elasticsearch…

Apache Solr Elasticsearch Main Blog search quality evaluation

Offline Search Quality Evaluation: Rated Ranking Evaluator (RRE)

Introduction With Rated Ranking Evaluator Enterprise approaching soon, we take the occasion of explaining in details why Offline Search Quality Evaluation is so important nowadays and what you can do already with the Rated Ranking Evaluator open-source libraries. More news will come soon as we are approaching the V1 release date. Stay tuned! Search Quality…

Explaining Learning to Rank Models with Tree Shap

Introduction A common problem with machine learning models is their interpretability and explainability.We create a dataset and we train a model to achieve a task, then we would like to understand how the model obtains those results. This is often quite difficult to understand, especially with very complex models. In this blog post, I would…

London Information Retrieval Meetup June

After the very warm reception of the first year, the fifth London Information Retrieval Meetup is approaching (23/06/2020) and we are excited to add more details about our speakers and talks!The event is going to be fully remote (given the COVID-19 situation) and free! You are invited to register : Our second speaker is Martin…

Apache Lucene Apache Solr Elasticsearch Main Blog Synonyms

Introducing Weighted Synonyms in Apache Lucene/Solr

This blog post is about our latest contribution to the Apache Lucene/Solr project:introducing the ability of assigning different weights to synonyms.This contribution aims to help users that deal with complex synonyms dictionaries where it’s important to associate a numerical weight to each of them, for example to boost the ones that are more important in…

London Information Retrieval Meetup February

After the very warm reception of the first year, the fourth London Information Retrieval Meetup is approaching (11/02/2020) and we are excited to add more details about our speakers and talks!The event is free and you are invited to register : https://www.eventbrite.com/e/london-information-retrieval-meetup-february-2020-tickets-89056738101 Our first speaker is Anna Ruggero, one of our R&D software engineers: Anna…

Digging in the Solr code: 5 minutes howto

Let’s say you need to write a component, a request handler, or in general some piece of custom code that needs to be plugged into Solr. Or, you need to have a deeper understanding about some Lucene/Solr internals, following what actually happens within the code.   I know: unit tests, integration tests, everything to make…

Apache Lucene Apache Solr Elasticsearch Main Blog RRE

Road to Rated Ranking Evaluator Enterprise

It was the spring of 2018, Andrea was strenuously working on a customer project, continuously tuning search configurations and checking the ground truth for certain queries manually. That was pretty much the standard at the time, the brilliant Quepid[1] from our friends at Open Source Connection helped in some use cases, but there was nothing…

London Information Retrieval Meetup October

After the very warm reception of the first and second edition, the third London Information Retrieval Meetup is approaching (21/10/2019) and we are excited to add more details about our speakers and talks!The event is free and you are invited to register : https://www.eventbrite.com/e/london-information-retrieval-meetup-october-tickets-74403100677 Our second speaker is Andrea Gazzarini, our founder and software engineer:…

London Information Retrieval Meetup June

After the very warm reception of the first edition, the second London Information Retrieval Meetup is approaching (25/06/2019) and we are excited to add more details about our speakers and talks!The event is free and you are invited to register : https://www.eventbrite.com/e/london-information-retrieval-meetup-june-tickets-62261343354 Our first speaker is René Kriegler, freelance search consultant and search engineer : René Kriegler René…

Apache Lucene Apache Solr Elasticsearch Enterprise Search Learning To Rank Main Blog Search

Haystack 2019 Experience

This blog is a quick summary of my (subjective) experience at Haystack 2019 : the Search Relevance Conference, hosted in Charlottesville (Virginia, USA) from 24/04/2019 to 25/04/2019.References to the slides will be updated as soon as they become available. First of all my feedback on the Haystack Conference is extremely positive.From my perspective the conference…

London Information Retrieval Meetup

The London Information Retrieval Meetup is approaching (19/02/2019) and we are excited to add more details about the speakers and talks!The event is free and you are invited to register :https://www.eventbrite.com/e/information-retrieval-meetup-tickets-54542417840 After Sambhav Kothari, software engineer at Bloomberg and Elia Porciani, R&D software engineer at Sease, our last speaker is Andrea Gazzarini, founder and software engineer at…

Apache Solr Distributed Facets

Apache Solr distributed faceting feature has been introduced back in 2008 with the first versions of Solr (1.3 according to this jira[1]) . Until now, I always assumed it just worked, without diving too much into the details. Nowadays distributed search and faceting are extremely popular, you can find them pretty much everywhere (in the…

Synonyms and Stopwords: Vademecum

In this post we’ll cover two additional synonyms scenarios and we’ll try to summarise all previous tips in a coincise form. Following the approach of the previous posts [1] [2] [3], everything can be applied both to Apache Solr and Elasticsearch. Preconditions Synonyms and stopwords at query time: this is not just a “theoretical” constraint;…

Still Synonyms + Stopwords?? Mamma mia!

The Context Brief recap of where we arrived in the preceding article: we had the following synonyms and stopwords settings: synonyms = {“out of warranty”,”oow”} stopwords = {“of”} Both of those filters were configured exclusively at query-time; the synonym filter first and then the stopwords filter. Using the built-in StopFilter we had a synonym detection…

Synonyms + Stopwords?? OMG!

The Context The scenario description is quite simple: we want to use synonyms and stopwords. Following the path of our previous article, we will introduce an additional component in the analysis chain: a StopFilter, which, as the name suggests, removes a set of words from an incoming token stream. We will use the following data…

Apache Solr/Elasticsearch: How to Manage Multi-term Concepts out of the Box?

This flash blog post will address a very specific and common problem : how to manage entities/concepts composed by multiple terms in a vanilla Apache Solr/Elasticsearch instance ( no plugins or extensions to install). The (deployment) context An Elasticsearch or Apache Solr infrastructure where you cannot install third-party components (e.g. plugins, filters, query parsers). This can…

Apache Solr Apache Zookeeper Distributed Search Tips And Tricks
SolrCloud exceptions with Apache Zookeeper

SolrCloud exceptions with Apache Zookeeper

At the time we speak ( Solr 7.3.1 ) SolrCloud is a reliable and stable distributed architecture for Apache Solr. But it is not perfect and failures happen. Apache Zookeeper[1] is the system responsible of managing the communications across the SolrCloud cluster. It contains the shared collections configurations and it has the view of the…

SolrCloud Leader Election Failing

At the time we speak ( Solr 7.3.0 ) SolrCloud is a reliable and stable distributed architecture for Apache Solr. But it is not perfect and failures happen. This lightening blog post will present some practical tips to follow when a specific shard of a collection is down with no leader and the situation is…

Analysis Apache Lucene Apache Solr Feature Engineering Indexing Information Retrieval Lucene index Main Blog Query parsers Search Solr schema
Apache Solr quantity detection plugin

Give the height the right weight: quantities detection in Apache Solr

Quantity detection? What is a quantity? And why do we need to detect it? A quantity, as described by Martin Fowler in his “Analysis Patterns” [1] is defined as a pair which combines an amount and unit (such as 30 litres, 0.25 cl, or 140 cm). In search-based applications, there are many cases where you may…

ECIR 2018 Experience

This blog is a quick summary of my (subjective) experience at ECIR 2018 : the 40th European Conference on Information Retrieval, hosted in Grenoble (France) from 26/03/2018 to 29/03/2018. Deep Learning and Explicability Eight long papers accepted were about Deep Learning. The topics “Neural Network” and “Word Embedding” were the most occurring in the accepted…

Distributed Search Tips for Apache Solr

Distributed search is the foundation for Apache Solr Scalability : It’s possible to distributed search across different Apache Solr nodes of the same collection ( both in a  legacy[1] or SolrCloud[2] architecture), but it is also possible to distribute search across different collections in a SolrCloud cluster. Aggregating results from different collections may be useful…

Lucene Document Classification

Introduction This blog post describes the approach used in the Lucene Classification module to adapt text classification to document ( multi field ) classification. Machine Learning and Search have been always strictly associated. Machine Learning can help to improve the Search Experience in a lot of ways, extracting more information from the corpus of documents,…

// our blogs
Looking for a Category?