Apache Lucene Apache Solr Indexing Information Retrieval Invisible Queries Main Blog Ngrams Search Solr schema Token filters Tokenizer Topic Modeling

Apache Solr: orchestrating Known item and Full-text search

Scenario You’re working as a search engineer for XYZ Ltd, a company which sells electric components. XYZ provided you the application logs of the last six months, and some business requirements. Two kinds of customers, two kinds of requirements, two kinds of search The log analysis shows that XYZ has mainly two kinds of customers:…

Analysis Apache Lucene Apache Solr Feature Engineering Indexing Information Retrieval Lucene index Main Blog Query parsers Search Solr schema
Apache Solr quantity detection plugin

Give the height the right weight: quantities detection in Apache Solr

Quantity detection? What is a quantity? And why do we need to detect it? A quantity, as described by Martin Fowler in his “Analysis Patterns” [1] is defined as a pair which combines an amount and unit (such as 30 litres, 0.25 cl, or 140 cm). In search-based applications, there are many cases where you may…

Apache Lucene Apache Solr Deep Learning ECIR European Conference Evaluation & User Behaviour Information Retrieval LambdaMART Learning To Rank Machine Learning Main Blog RankLib Recommender Systems Representation Search Topic Modeling

ECIR 2018 Experience

This blog is a quick summary of my (subjective) experience at ECIR 2018 : the 40th European Conference on Information Retrieval, hosted in Grenoble (France) from 26/03/2018 to 29/03/2018. Deep Learning and Explicability Eight long papers accepted were about Deep Learning. The topics “Neural Network” and “Word Embedding” were the most occurring in the accepted…

Apache Solr Data Preparation Feature Engineering Learning To Rank Machine Learning Main Blog RankLib Search Signal Processing

Solr Is Learning To Rank Better – Part 1 – Data Collection

Learning To Rank In Apache Solr Introduction This blog post is about the journey necessary to bring Learning To Rank In Apache Solr search engines. Learning to Rank[1] is the application of Machine Learning in the construction of ranking models for Information Retrieval systems. Introducing supervised learning from user behaviour and signals can improve the relevancy…

Apache Lucene Apache Solr Document Frequency Indexing Indexing options Lucene index Main Blog Norms Solr schema Term Frequency Term offsets Term positions

Exploring Solr Internals : The Lucene Inverted Index

    Introduction This blog post is about the Lucene Inverted Index and how Apache Solr internally works. When playing with Solr systems, understanding and properly configuring the underline Lucene Index is fundamental to deeply control your search. With a better knowledge of how the index looks like and how each component is used, you…

Analysis Apache Lucene Apache Solr Autocomplete Autosuggestion FST Lucene index Main Blog Ngrams Suggester Token filters Tokenizer

Solr : " You complete me! " : The Apache Solr Suggester

This blog post is about the Apache Solr Autocomplete feature. It is clear that the current documentation available on the wiki is not enough to fully understand the Solr Suggester : this blog post will describe all the available implementations with examples and tricks and tips. Introduction If there’s one thing that months of Solr-user…

Apache Lucene Apache Solr Classification Indexing Machine Learning Main Blog Search Update Request Processor

Solr Document Classification – Part 1 – Indexing Time

Introduction This blog post is about the Solr classification module and the way Lucene classification has been integrated at indexing time. In the previous blog [1] we have explored the world of Lucene Classification and the extension to use it for Document Classification . It comes natural to integrate Solr with the Classification module and…