RRE-Enterprise: How to Run an Evaluation
After the configuration of all the necessary building blocks (Rating Set, Target Search Engine and Data Collection), it’s now time to discuss running your first evaluation.
Let’s see the runtime parameters to pass:

iteration label
: the human-readable label associated with this evaluation
, this can contain any useful information that will help you recognize this evaluation
Rating tag
:The Rating Set to use in the evaluation. You can choose from all the rating sets you have saved.
Collections
: he data collection(s) to use in the evaluation N.B. you can only choose collections available in the rating set you have selected.
Metrics
:
RRE-Enterprise supports all the metrics available in RRE Open-Source:
- Precision: the fraction of retrieved documents that are relevant
- Recall: the fraction of relevant documents that are retrieved
- Reciprocal Rank: it is the multiplicative inverse of the rank of the first “correct” answer: 1 for first place, 1/2 for second place, 1/3 for third and so on.
- Expected Reciprocal Rank (ERR): An extension of Reciprocal Rank with graded relevance, measures the expected reciprocal length of time that the user will take to find a relevant document.
- Average Precision: the area delimited by the precision-recall curve.
- NDCG: an evaluation metric that takes into account the graded relevance of a search result and the impact of the position
- F-Measure: it measures the effectiveness of retrieval with respect to a user who attaches β times as much importance to recall as precision. RRE provides the three most popular F-Measure instances: F0.5, F1 and F2; additionally, you may specify your own β value if required (see below).
Some of the metrics supported offer additional parameters:
- F-Measure
- k – the metric is calculated on the top K results from the search response
beta
– the balance factor between precision and recall.
- NDCG@K
k
– the metric is calculated on the top K results from the search responsemaximumGrade
– the maximum relevance grade available when judging documents (optional, default: 4.0).missingGrade
– the grade that should be assigned to documents where no judgement has been given. This is optional – the default value is eithermaximumGrade / 2
(ifmaximumGrade
has been supplied), or 2.0.name
– the name used to record this metric in the output (optional, defaults toNDCG@k
, wherek
is set as above). This allows the metric to be run multiple times with different missing grade values, for example.
- ERR@K – Expected Reciprocal Rank
k
-the metric is calculated on the top K results from the search responsemaximumGrade
– the maximum relevance grade available when judging documents (optional, default: 3.0).missingGrade
– the grade that should be assigned to documents where no judgement has been given. This is optional – the default value is eithermaximumGrade / 2
(ifmaximumGrade
has been supplied), or 2.0.name
– the name used to record this metric in the output (optional, defaults toERR@k
, wherek
is set as above). This allows the metric to be run multiple times with different missing grade values, for example.
- RR@K – Reciprocal Rank
k
– the metric is calculated on the top K results from the search responsemaximumGrade
– the maximum relevance grade available when judging documents (optional, default: 3.0).missingGrade
– the grade that should be assigned to documents where no judgement has been given. This is optional – the default value is eithermaximumGrade / 2
(ifmaximumGrade
has been supplied), or 2.0.name
– the name used to record this metric in the output (optional, defaults toRR@k
, wherek
is set as above). This allows the metric to be run multiple times with different missing grade values, for example.
Both maximumGrade
and missingGrade
may be floating-point values.
Once selected the metrics of interest, it’s possible to specify the target search engine approach: Embedded/External :
Embedded Target Search Engine

Search Engine Type
:
Search Engine Version
:
Local configuration
:
Git configuration
:
, fetched from a Git repository, branch, and commit point. The path is used to locate the configuration in the Git repository.
External Target Search Engine

Black Box API
:
Search Engine
:ElasticsearchApache Solr
Once everything is filled you can start the evaluation, pressing the ‘EVALUATE’ button.
You can keep track of the evaluation execution by checking the job list track.


In the next blog post let’s see how to display and navigate the results of an evaluation!
Rated Ranking Evaluator Enterprise
Subscribe to our newsletter
Did you like this post about Drop constant features: a real-world Learning to Rank scenario? Don’t forget to subscribe to our Newsletter to stay always updated from the Information Retrieval world!
Related
Author
Alessandro Benedetti
Alessandro Benedetti is the founder of Sease Ltd. Senior Search Software Engineer, his focus is on R&D in information retrieval, information extraction, natural language processing, and machine learning.