Analysis Apache Lucene Apache Solr Elasticsearch Information Retrieval Search Solr schema Tips And Tricks

Synonyms + Stopwords?? OMG!

The Context The scenario description is quite simple: we want to use synonyms and stopwords. Following the path of our previous article, we will introduce an additional component in the analysis chain: a StopFilter, which, as the name suggests, removes a set of words from an incoming token stream. We will use the following data…

Apache Solr Elasticsearch Information Retrieval Lucene index Search Solr schema Tips And Tricks

Apache Solr/Elasticsearch: How to Manage Multi-term Concepts out of the Box?

This flash blog post will address a very specific and common problem : how to manage entities/concepts composed by multiple terms in a vanilla Apache Solr/Elasticsearch instance ( no plugins or extensions to install). The (deployment) context An Elasticsearch or Apache Solr infrastructure where you cannot install third-party components (e.g. plugins, filters, query parsers). This can…

Apache Lucene Apache Solr Elasticsearch Enterprise Search Information Retrieval Main Blog Search

Rated Ranking Evaluator: Help the poor (Search Engineer)

A Software Engineer is always required to give his customers a concrete evidence about deliverables quality. A Search Engineer deals with a specialisation of such generic Software Quality, which is called Search Quality. What is Search Quality? And why is it so important in a search infrastructure? After all, the “Software Quality” should be omni-comprensive,…

Apache Lucene Apache Solr Autocomplete Autosuggestion Main Blog

Apache Lucene BlendedInfixSuggester : How It Works, Bugs And Improvements

The Apache Lucene/Solr suggesters are important to Sease : we explored the topic in the past[1] and we strongly believe the autocomplete feature to be vital for a lot of search applications. This blog post explores in details the current status of the Lucene BlendedInfixSuggester, some bugs of the most recent version ( with the…

Apache Lucene Apache Solr Indexing Information Retrieval Invisible Queries Main Blog Ngrams Search Solr schema Token filters Tokenizer Topic Modeling

Apache Solr: orchestrating Known item and Full-text search

Scenario You’re working as a search engineer for XYZ Ltd, a company which sells electric components. XYZ provided you the application logs of the last six months, and some business requirements. Two kinds of customers, two kinds of requirements, two kinds of search The log analysis shows that XYZ has mainly two kinds of customers:…

Analysis Apache Lucene Apache Solr Feature Engineering Indexing Information Retrieval Lucene index Main Blog Query parsers Search Solr schema
Apache Solr quantity detection plugin

Give the height the right weight: quantities detection in Apache Solr

Quantity detection? What is a quantity? And why do we need to detect it? A quantity, as described by Martin Fowler in his “Analysis Patterns” [1] is defined as a pair which combines an amount and unit (such as 30 litres, 0.25 cl, or 140 cm). In search-based applications, there are many cases where you may…

Apache Lucene Apache Solr Deep Learning ECIR European Conference Evaluation & User Behaviour Information Retrieval LambdaMART Learning To Rank Machine Learning Main Blog RankLib Recommender Systems Representation Search Topic Modeling

ECIR 2018 Experience

This blog is a quick summary of my (subjective) experience at ECIR 2018 : the 40th European Conference on Information Retrieval, hosted in Grenoble (France) from 26/03/2018 to 29/03/2018. Deep Learning and Explicability Eight long papers accepted were about Deep Learning. The topics “Neural Network” and “Word Embedding” were the most occurring in the accepted…

Apache Lucene Apache Solr Invisible Queries Search Search Library SearchHandler Solr schema

Apache Solr: Chaining SearchHandler instances: the CompositeRequestHandler

What are “Invisible Queries”? This is an extract of an article [1] on Lucidworks.com, by Grant Ingersoll, talking about invisible queries: “It is often necessary in many applications to execute more than one query for any given user query.  For instance, in applications that require very high precision (only good results, forgoing marginal results), the…