After the configuration of all the necessary building blocks (Rating Set, Target Search Engine and Data Collection), it’s now time to discuss running your first evaluation.
Let’s see the runtime parameters to pass:
iteration label: the human-readable label associated with this evaluation
, this can contain any useful information that will help you recognize this evaluation
Rating tag:The Rating Set to use in the evaluation. You can choose from all the rating sets you have saved.
Collections: he data collection(s) to use in the evaluation N.B. you can only choose collections available in the rating set you have selected.
Metrics:
RRE-Enterprise supports all the metrics available in RRE Open-Source:
- Precision: the fraction of retrieved documents that are relevant
- Recall: the fraction of relevant documents that are retrieved
- Reciprocal Rank: it is the multiplicative inverse of the rank of the first “correct” answer: 1 for first place, 1/2 for second place, 1/3 for third and so on.
- Expected Reciprocal Rank (ERR): An extension of Reciprocal Rank with graded relevance, measures the expected reciprocal length of time that the user will take to find a relevant document.
- Average Precision: the area delimited by the precision-recall curve.
- NDCG: an evaluation metric that takes into account the graded relevance of a search result and the impact of the position
- F-Measure: it measures the effectiveness of retrieval with respect to a user who attaches β times as much importance to recall as precision. RRE provides the three most popular F-Measure instances: F0.5, F1 and F2; additionally, you may specify your own β value if required (see below).
Some of the metrics supported offer additional parameters:
- F-Measure
- k – the metric is calculated on the top K results from the search response
beta– the balance factor between precision and recall.
- NDCG@K
k– the metric is calculated on the top K results from the search responsemaximumGrade– the maximum relevance grade available when judging documents (optional, default: 4.0).missingGrade– the grade that should be assigned to documents where no judgement has been given. This is optional – the default value is eithermaximumGrade / 2(ifmaximumGradehas been supplied), or 2.0.name– the name used to record this metric in the output (optional, defaults toNDCG@k, wherekis set as above). This allows the metric to be run multiple times with different missing grade values, for example.
- ERR@K – Expected Reciprocal Rank
k-the metric is calculated on the top K results from the search responsemaximumGrade– the maximum relevance grade available when judging documents (optional, default: 3.0).missingGrade– the grade that should be assigned to documents where no judgement has been given. This is optional – the default value is eithermaximumGrade / 2(ifmaximumGradehas been supplied), or 2.0.name– the name used to record this metric in the output (optional, defaults toERR@k, wherekis set as above). This allows the metric to be run multiple times with different missing grade values, for example.
- RR@K – Reciprocal Rank
k– the metric is calculated on the top K results from the search responsemaximumGrade– the maximum relevance grade available when judging documents (optional, default: 3.0).missingGrade– the grade that should be assigned to documents where no judgement has been given. This is optional – the default value is eithermaximumGrade / 2(ifmaximumGradehas been supplied), or 2.0.name– the name used to record this metric in the output (optional, defaults toRR@k, wherekis set as above). This allows the metric to be run multiple times with different missing grade values, for example.
Both maximumGrade and missingGrade may be floating-point values.
Once selected the metrics of interest, it’s possible to specify the target search engine approach: Embedded/External :
Embedded Target Search Engine

Search Engine Type:
Search Engine Version:
Local configuration:
Git configuration:
, fetched from a Git repository, branch, and commit point. The path is used to locate the configuration in the Git repository.
External Target Search Engine

Black Box API:
Search Engine:ElasticsearchApache Solr
Once everything is filled you can start the evaluation, pressing the ‘EVALUATE’ button.
You can keep track of the evaluation execution by checking the job list track.
In the next blog post let’s see how to display and navigate the results of an evaluation!
Need Help With This Topic?
If you’re struggling to run an evaluation, don’t worry – we’re here to help! Our team offers expert services and training to help you optimize your search engine and get the most out of your system. Contact us today to learn more!





