Main Blog, search quality evaluation

Road to Rated Ranking Evaluator Enterprise

It was the spring of 2018, and Andrea was strenuously working on a customer project, continuously tuning search configurations and checking the ground truth for certain queries manually. That was pretty much the standard at the time, the brilliant Quepid from our friends at Open Source Connection helped in some use cases, but there was nothing in the open source landscape giving the possibility of testing your relevance in a Continuous Integration Approach, on big numbers of queries and documents and giving the possibility of easily exploring the results on various metrics.
Fast forward a couple of months and the RRE first commit happened on the 17th of May 2018, Rated Ranking Evaluator was born, ready to help search software engineers from all over the world.

Community Boom

We presented RRE for the first time at the Lucene/Solr London Meetup the 26/06/2018, we just wanted to show our progress in automating search quality evaluation but it was so well received that we got convinced in spreading the word: RRE talks got accepted in several international conferences: Haystack Europe (2/10/2018), Fosdem (3/02/2019), ECIR–European Conference on Information Retrieval (18/04/2019), Haystack US (24/04/2019) and at the NYU-New York University (1/05/2019).

Haystack US, Charlottesville, Virginia

It has been an extraordinary success and special thanks are to Charlie Hull, Doug Turnbull and Tim Allison who through their evangelisation and thoughtful feedback helped us build a strong community.
Currently, RRE counts 257 commits, 9 contributors (special thanks here to Matt Pearce and Max Irwin) and a good number of users all around the world!

Phase 2 - Enterprise Needs

We strongly believe in Open Source, we consider it the way to build a bridge between Academia and the industry and in general to contribute back to human progress.
This is the reason Sease is profoundly involved in various Open Source projects and RRE was released as Open Source in the first place.
We plan to support it long term as much as we can through our resources and the help of the community, that’s a solid point.
RRE is a great library and we firmly believe it improved a lot since its publication through the help of the community and it can improve a lot more.

As good as it is for hands on search software engineers, in parallel we started to get a lot of requests for building a standalone solution, enterprise level to wrap all RRE complexities and give companies a much more simplified approach: one button to evaluate and tons of advanced analytics to browse the evaluation results.
It was not the only requirement rising, even more clients were interested in genuine ways to generate the ratings, reducing as much as possible the effort from their side.
After countless brainstorming sessions, the first one in this very specific location:

World War 2 Bunker, Tarquinia, Viterbo, Italy

The first RREE (Rated Ranking Evaluator Enterprise) design draft was ready.

Rated Ranking Evaluator Enterprise

“Enterprise software, also known as enterprise application software (EAS), is computer software used to satisfy the needs of an organization rather than individual users” Wikipedia

RRE-Enterprise is a standalone solution providing a full spectrum of search quality evaluation capabilities to our clients.
It is a REST server that gives you the possibility of feeding ratings to RRE, evaluating your search engine and exploring the results, there is a user-friendly UI that simplifies the usage: you don’t need to be a software engineer to use it now, but if you are one, you will get a load of possibilities to explore the quality of your system and doing that in a systematic and fruitful approach.

REST Standalone Server
User Friendly UI

Input

RRE-Enterprise takes seriously the challenge of simplifying the way you provide ratings to RRE. The legacy JSON approach is still supported, but on top of that we have the:

Judgement Collector – This component takes care of explicit ratings, given a query it gives the user the possibility of assigning a rating to random documents and uploading the corresponding JSON

User Interactions Logger – This component takes care of implicit ratings, exposing a service to collect user interactions and estimate ratings through statistical analysis of such data. This is extremely useful to collect a huge amount of ratings and to have an idea of the current online system performances (RRE-Enterprise will keep track of online Click Through Rate, Add To Cart Rate, etc….)
The approach is supervised, so you may refine and validate the machine-generated ratings with the explicit feedback of your team of domain experts.

If you have a system that is producing advanced ratings already, you can export those ratings directly in RRE-Enterprise, just following the various JSON formats supported.

Let's Start the Evaluation!

In search systems using Apache Solr or Elasticsearch is a common practice to have a Search-API between the front end and the search engine, to build queries as elaborate as you like.
From now on we’ll call it Black Box Search-API (we use Black Box here because neither RRE-Enterprise nor the tester needs to know the details of its internals).
The most controversial aspect of RRE is that you need to replicate your search-API logic through complex query templates in JSON files.

Embedded Target – After you have your input ratings you can provide a GIT repository URL to fetch the configurations of your system and RRE-Enterprise will spin up an embedded version of it and populate it, this is pretty much an evolution of the current embedded approach in RRE (that is still supported).

Query Discovery – RRE-Enterprise still supports the legacy approach but it is also capable of running queries against your Black Box search API to understand automatically the Solr/Elasticsearch final query. There is no need to replicate the logic of your API in complex and time-consuming template files: RRE-Enterprise will do this automatically for you.

External Target – You need to provide a QA instance for both the Black Box Search-API and Apache Solr/Elasticsearch. It will be your responsibility to deploy the latest configuration you want to test and populate it with meaningful data, compatible with the ratings in input. This can be a brilliant solution to be used in a continuous integration pipeline, where the Black Box Search-API code and the Solr/Elasticsearch configuration are deployed only after integration tests are successful.
Then Relevance test can (and should) happen before a production deployment happens.

Output

Plenty of work has been done to simplify and enrich the user experience in regard of analysing the evaluation results.

It is important to give a clear and quick dashboard to business-level users to assess the quality of the search engine, but it is equally important to offer advanced capabilities to the search software engineer to deep dive into the evaluations and debug relevance problems.

Overview – This tab is meant for busy business people, it serves as a glance over the relevant trends of the various collections.
It gives you a quick approximation of improvements/regressions, highlights warnings and allows for historical comparison. It is the entry point for the evaluation results exploration.

Explore/Compare – These interfaces are pretty similar, the Explore allows you to go deep into the results of a single iteration (compared with the previous one) while the Compare allows you to select two evaluation iterations and see the differences.
After selecting the evaluation iteration, the collection and the metric(s) of interest, the hierarchical RRE model is offered, in all its splendour:
Expand/Collapse will guide you through effective navigation of the results down through the hierarchy ’til the leaf nodes (the single query).
At that level you will be able to compare at the finest grain possible, seeing the query, the associated search engine query and the search results returned by the search engine at the evaluation time (with their assigned relevance)

C.I. Tools Integrations – Using RRE within a Continuous Integration pipeline will now be possible through Apache Jenkins, Atlassian Bamboo and Jetbrains Teamcity.
The integration will allow you to execute the Search Quality Evaluation as part of the pipeline and then have a look at the equivalent of the overview panel, just in your favourite C.I. tool. Of course, it will be also possible to explore the evaluation by accessing the RRE-Enterprise dashboard.

Future Work

This is just the beginning, the first version of RRE-Enterprise will cover all the Enterprise level requirements we have collected so far, but much more will come as soon as more users get engaged.
We already identified few areas for the version 2, this is just an appetiser for your refined taste:

Judgement Collector Browser Plugin – Wouln’t be nice to transform your Search UI in a judgement collector? A nice automatic overlay that gives your expert a way of tagging relevant documents? We are working on it 🙂

Intelligent Explore – On top of the manual exploration of the evaluation results, this new widget will perform automatically a set of analysis and present the results back to the explorer: most alarming queries and topics, patterns across low performing queries and much more

Advanced Diff – when comparing at the finest grain level, it would be useful to have a quick and clear understanding of the changes that happened in between. For example how the Search Engine Query has been changed from the request parameter perspective.

Much more will follow!

Need Help With This Topic?

If you’re struggling with search quality evaluation, don’t worry – we’re here to help! Our team offers expert services and training to help you optimize your search engine and get the most out of your system. Contact us today to learn more!

Need Help with this topic?

If you're struggling with search quality evaluation, don't worry - we're here to help! Our team offers expert services and training to help you optimize your search engine and get the most out of your system. Contact us today to learn more!

Click Here

apache lucene, apache solr, elasticsearch, rre

Sign up for our Newsletter

Did you like this post? Don’t forget to subscribe to our Newsletter to stay always updated in the Information Retrieval world!

About the company

about our work

Rated Ranking Evaluator
(RRE)

Rated Ranking Evaluator Enterprise (RREE)

Apache Solr LLM Highlighter plugin

News

Main Blog

TIPS AND TRICKS

LATEST BLOG POST

contact us

Don't miss all the news - subscribe to our newsletter!

Road to Rated Ranking Evaluator Enterprise

Community Boom

Phase 2 - Enterprise Needs