It was the spring of 2018, and Andrea was strenuously working on a customer project, continuously tuning search configurations and checking the ground truth for certain queries manually. That was pretty much the standard at the time, the brilliant Quepid from our friends at Open Source Connection helped in some use cases, but there was nothing in the open source landscape giving the possibility of testing your relevance in a Continuous Integration Approach, on big numbers of queries and documents and giving the possibility of easily exploring the results on various metrics.
Fast forward a couple of months and the RRE first commit happened on the 17th of May 2018, Rated Ranking Evaluator was born, ready to help the search software engineers from all over the world.
We presented RRE for the first time at the Lucene/Solr London Meetup the 26/06/2018, we just wanted to show our progress in automating search quality evaluation but it was so well received that we got convinced in spreading the word: RRE talks got accepted in a number of international conferences : Haystack Europe (2/10/2018), Fosdem (3/02/2019), ECIR–European Conference on Information Retrieval (18/04/2019), Haystack US (24/04/2019) and at the NYU-New York University (1/05/2019).
Haystack US, Charlottesville, Virginia
It has been an extraordinary success and special thanks are for Charlie Hull, Doug Turnbull and Tim Allison that through their evangelisation and thoughtful feedback helped us in building a strong community.
Currently RRE counts 257 commits, 9 contributors (special thanks here to Matt Pearce and Max Irwin) and a good number of users all around the world!
Phase 2 - Enterprise Needs
We strongly believe in Open Source, we consider it the way to build a bridge between Academia and the industry and in general to contribute back to human progress.
This is the reason Sease is profoundly involved in various Open Source projects and RRE was released as Open Source in the first place.
We plan to support it long term as much as we can through our resources and the help of the community, that’s a solid point.
RRE is a great library and we firmly believe it improved a lot since its publication through the help of the community and it can improve a lot more.
As good as it is for hands on search software engineers, in parallel we started to get a lot of requests for building a standalone solution, enterprise level to wrap all RRE complexities and give companies a much more simplified approach: one button to evaluate and tons of advanced analytics to browse the evaluation results.
It was not the only requirement rising, even more clients were interested in genuine ways to generate the ratings, reducing as much as possible the effort from their side.
After countless brainstorming sessions, the first one in this very specific location:
World War 2 Bunker, Tarquinia, Viterbo, Italy
The first RREE (Rated Ranking Evaluator Enterprise) design draft was ready.
Rated Ranking Evaluator Enterprise
“Enterprise software, also known as enterprise application software (EAS), is computer software used to satisfy the needs of an organization rather than individual users” Wikipedia
RRE-Enterprise is a standalone solution providing a full spectrum of search quality evaluation capabilities to our clients.
It is a REST server that gives you the possibility of feeding ratings to RRE, evaluating your search engine and exploring the results, there is a user friendly UI that simplifies the usage: you don’t need to be a software engineer to use it now, but if you are one, you will get a load of possibilities to explore the quality of your system and doing that in a systematic and fruitful approach.
- REST Standalone Server
- User Friendly UI
RRE-Enterprise takes seriously the challenge of simplifying the way you provide ratings to RRE. The legacy Json approach is still supported, but on top of that we have the:
Judgement Collector – This component takes care of explicit ratings, given a query it gives the user the possibility of assigning a rating to random documents and uploading the corresponding Json
User Interactions Logger – This component takes care of implicit ratings, exposing a service to collect user interactions and estimate ratings through statistical analysis of such data. This is extremely useful to collect a huge amount of ratings and to have an idea of the current online system performances (RRE-Enterprise will keep track of online Click Through Rate, Add To Cart Rate, ect …)
The approach is supervised, so you may refine and validate the machine generated ratings with the explicit feedback of your team of domain experts.
If you have a system that is producing advanced ratings already, you can export those ratings directly in RRE-Enterprise, just following the various json formats supported.
Let's Start the Evaluation!
In search systems using Apache Solr or Elasticsearch is a common practice to have a Search-API between the front end and the search engine, to build queries as elaborate as you like.
From now on we’ll call it Black Box Search-API (we use Black Box here because neither RRE-Enterprise nor the tester needs to know the details of its internals)
The most controversial aspect of RRE is that you need to replicate your search-API logic through complex query templates in json files.
Embedded Target – After you have your input ratings you can provide a GIT repository URL to fetch the configurations of your system and RRE-Enterprise will spin up an embedded version of it and populate it, this is pretty much an evolution of the current embedded approach in RRE (that is still supported).
Query Discovery – RRE-Enterprise still supports the legacy approach but it is also capable of running queries against your Black Box search-API to understand automatically the Solr/Elasticsearch final query. There is no need to replicate the logic of your API in complex and time consuming template files: RRE-Enterprise will do this automatically for you.
External Target – You need to provide a QA instance for both the Black Box Search-API and Apache Solr/Elasticsearch. It will be your responsibility to deploy the latest configuration you want to test and populate it with meaningful data, compatible with the ratings in input. This can be a brilliant solution to be used in a continuous integration pipeline, where the Black Box Search-API code and the Solr/Elasticsearch configuration are deployed only after integration tests are successful.
Then Relevance test can(and should) happen before a production deployment happens.
Plenty of work has been done to simplify and enrich the user experience in regards to analyse the evaluation results.
It is important to give a clear and quick dashboard to business level users to assess the quality of the search engine, but it is equally important to offer advance capabilities to the search software engineer to deep dive in the evaluations and debug relevance problems.
Overview – This tab is meant for busy busy business people, it serves as a quick glance over the relevance trends of the various collections.
It gives you a quick approximation of improvements/regressions, highlights warnings and allows for historical comparison. It is the entry point for the evaluation results exploration.
Explore/Compare – These interfaces are pretty similar, the Explore allows you to go deep in the results of a single iteration (compared with the previous one) while the Compare allows to select two evaluation iterations and see the differences.
After selecting the evaluation iteration, the collection and the metric(s) of interest, the hierarchical RRE model is offered, in all its splendour:
Expand/Collapse will guide you through an effective navigation of the results down through the hierarchy ’til the leaf nodes (the single query).
At that level you will be able to compare at the finest grain possible, seeing the query, the associated search engine query and the search results returned by the search engine at the evaluation time (with their assigned relevance)
C.I. Tools Integrations – Using RRE within a Continuous Integration pipeline will now be possible through Apache Jenkins, Atlassian Bamboo and Jetbrains Teamcity.
The integration will allow to execute the Search Quality Evaluation as part of the pipeline and then have a look to the equivalent of the overview panel, just in your favourite C.I. tool. Of course it will be also possible to explore the evaluation accessing RRE-Enterprise dashboard.
This is just the beginning, the first version of RRE-Enterprise will cover all the Enterprise level requirements we have collected so far, but much more will come as soon as more users get engaged.
We already identified few areas for the version 2, this is just an appetiser for your refined taste:
Judgement Collector Browser Plugin – Wouln’t be nice to transform your Search UI in a judgement collector? A nice automatic overlay that gives your expert a way of tagging relevant documents? We are working on it 🙂
Intelligent Explore – On top of the manual exploration of the evaluation results, this new widget will perform automatically a set of analysis and present the results back to the explorer: most alarming queries and topics, patterns across low performing queries and much more
Advanced Diff – when comparing at the finest grain level, it would be useful to have a quick and clear understanding of the changes that happened in between. For example how the Search Engine Query has been changed from the request parameter perspective.
Much more will follow!
Subscribe to our newsletter
Did you like this post about Rated Ranking Evaluator Enterprise? Don’t forget to subscribe to our Newsletter to stay always updated from the Information Retrieval world!