This is the last post of the Entity Search with graph embeddings serie. In Part 2 and Part 3 we illustrated the core of the dissertation describing in detail the implementation of our solution pipeline. In this final part we will see some evaluation measures and results. We will draw some conclusions explaining which were…
This series of blog posts wants to describe my master degree dissertation done with the supervision of Prof. Gianmaria Silvello at the University of Padova. The main focus of this project is in the use of graph embeddings in order to create virtual documents for the Information Retrieval Entity Search task. This thesis description is…
This blog is a quick summary of my (subjective) experience at Haystack 2019 : the Search Relevance Conference, hosted in Charlottesville (Virginia, USA) from 24/04/2019 to 25/04/2019.References to the slides will be updated as soon as they become available. First of all my feedback on the Haystack Conference is extremely positive.From my perspective the conference…
How faceting is calculated in Apache Solr distributed architectures. It presents inner details explanation and practical examples.
In this post we’ll cover two additional synonyms scenarios and we’ll try to summarise all previous tips in a coincise form. Following the approach of the previous posts [1] [2] [3], everything can be applied both to Apache Solr and Elasticsearch. Preconditions Synonyms and stopwords at query time: this is not just a “theoretical” constraint; imagine if you…
This flash blog post will address a very specific and common problem : how to manage entities/concepts composed by multiple terms in a vanilla Apache Solr/Elasticsearch instance ( no plugins or extensions to install). The (deployment) context An Elasticsearch or Apache Solr infrastructure where you cannot install third-party components (e.g. plugins, filters, query parsers). This can…
Scenario You’re working as a search engineer for XYZ Ltd, a company which sells electric components. XYZ provided you the application logs of the last six months, and some business requirements. Two kinds of customers, two kinds of requirements, two kinds of search The log analysis shows that XYZ has mainly two kinds of customers:…
At the time we speak ( Solr 7.3.0 ) SolrCloud is a reliable and stable distributed architecture for Apache Solr.But it is not perfect and failures happen.This lightning blog post will present some practical tips to follow when a specific shard of a collection is down with no leader and the situation is stuck.The following…
Distributed search is the foundation for Apache Solr Scalability : It’s possible to distributed search across different Apache Solr nodes of the same collection ( both in a legacy [1] or SolrCloud [2] architecture), but it is also possible to distribute search across different collections in a SolrCloud cluster.Aggregating results from different collections may be useful when…
Last Stage Of The Journey This blog post is about the Apache Solr Learning To Rank ( LTR ) integration. We modelled our dataset, we collected the data and refined it in Part 1 .Trained the model in Part 2 .Analysed and evaluate the model and training set in Part 3 .We are ready to…