Synonyms and Stopwords: Vademecum

In this post we'll cover two additional synonyms scenarios and we'll try to summarise all previous tips in a coincise form. Following the approach of the previous posts [1] [2] [3], everything can be applied both to Apache Solr and Elasticsearch. Preconditions Synonyms and stopwords at query time: this is not just a "theoretical" constraint; … Continue reading Synonyms and Stopwords: Vademecum

Still Synonyms + Stopwords?? Mamma mia!

The Context Brief recap of where we arrived in the preceding article: we had the following synonyms and stopwords settings: synonyms = {"out of warranty","oow"} stopwords = {"of"} Both of those filters were configured exclusively at query-time; the synonym filter first and then the stopwords filter. Using the built-in StopFilter we had a synonym detection … Continue reading Still Synonyms + Stopwords?? Mamma mia!

Synonyms + Stopwords?? OMG!

The Context The scenario description is quite simple: we want to use synonyms and stopwords. Following the path of our previous article, we will introduce an additional component in the analysis chain: a StopFilter, which, as the name suggests, removes a set of words from an incoming token stream. We will use the following data … Continue reading Synonyms + Stopwords?? OMG!

Apache Solr/Elasticsearch: How to Manage Multi-term Concepts out of the Box?

This flash blog post will address a very specific and common problem : how to manage entities/concepts composed by multiple terms in a vanilla Apache Solr/Elasticsearch instance ( no plugins or extensions to install). The (deployment) context An Elasticsearch or Apache Solr infrastructure where you cannot install third-party components (e.g. plugins, filters, query parsers). This can … Continue reading Apache Solr/Elasticsearch: How to Manage Multi-term Concepts out of the Box?

Apache Solr: orchestrating Known item and Full-text search

Scenario You’re working as a search engineer for XYZ Ltd, a company which sells electric components. XYZ provided you the application logs of the last six months, and some business requirements. Two kinds of customers, two kinds of requirements, two kinds of search The log analysis shows that XYZ has mainly two kinds of customers: … Continue reading Apache Solr: orchestrating Known item and Full-text search

Give the height the right weight: quantities detection in Apache Solr

Quantity detection? What is a quantity? And why do we need to detect it? A quantity, as described by Martin Fowler in his "Analysis Patterns" [1] is defined as a pair which combines an amount and unit (such as 30 litres, 0.25 cl, or 140 cm). In search-based applications, there are many cases where you may … Continue reading Give the height the right weight: quantities detection in Apache Solr

Apache Solr: Chaining SearchHandler instances: the CompositeRequestHandler

What are "Invisible Queries"? This is an extract of an article [1] on Lucidworks.com, by Grant Ingersoll, talking about invisible queries: "It is often necessary in many applications to execute more than one query for any given user query.  For instance, in applications that require very high precision (only good results, forgoing marginal results), the … Continue reading Apache Solr: Chaining SearchHandler instances: the CompositeRequestHandler

Exploring Solr Internals : The Lucene Inverted Index

    Introduction This blog post is about the Lucene Inverted Index and how Apache Solr internally works. When playing with Solr systems, understanding and properly configuring the underline Lucene Index is fundamental to deeply control your search. With a better knowledge of how the index looks like and how each component is used, you … Continue reading Exploring Solr Internals : The Lucene Inverted Index