Main Blog RRE

RRE-Enterprise: How to Run an Evaluation

After the configuration of all the necessary building blocks (Rating Set, Target Search Engine and Data Collection), it’s now time to discuss running your first evaluation.

Let’s see the runtime parameters to pass:

You open the Evaluation interface clicking on the ‘play’ button on the right end side of the top bar

  • iteration label: the human-readable label associated with this evaluation

, this can contain any useful information that will help you recognize this evaluation

  • Rating tag:The Rating Set to use in the evaluation. You can choose from all the rating sets you have saved.

  • Collections: he data collection(s) to use in the evaluation N.B. you can only choose collections available in the rating set you have selected.

  • Metrics:

RRE-Enterprise supports all the metrics available in RRE Open-Source:

  • Precision: the fraction of retrieved documents that are relevant
  • Recall: the fraction of relevant documents that are retrieved
  • Reciprocal Rank: it is the multiplicative inverse of the rank of the first “correct” answer: 1 for first place, 1/2 for second place, 1/3 for third and so on.
  • Expected Reciprocal Rank (ERR): An extension of Reciprocal Rank with graded relevance, measures the expected reciprocal length of time that the user will take to find a relevant document.
  • Average Precision: the area delimited by the precision-recall curve.
  • NDCG: an evaluation metric that takes into account the graded relevance of a search result and the impact of the position
  • F-Measure: it measures the effectiveness of retrieval with respect to a user who attaches β times as much importance to recall as precision. RRE provides the three most popular F-Measure instances: F0.5, F1 and F2; additionally, you may specify your own β value if required (see below).

Some of the metrics supported offer additional parameters:

  • F-Measure
    • k – the metric is calculated on the top K results from the search response
    • beta – the balance factor between precision and recall.
  • NDCG@K
    • k – the metric is calculated on the top K results from the search response
    • maximumGrade – the maximum relevance grade available when judging documents (optional, default: 4.0).
    • missingGrade – the grade that should be assigned to documents where no judgement has been given. This is optional – the default value is either maximumGrade / 2 (if maximumGrade has been supplied), or 2.0.
    • name – the name used to record this metric in the output (optional, defaults to NDCG@k, where k is set as above). This allows the metric to be run multiple times with different missing grade values, for example.
  • ERR@K – Expected Reciprocal Rank
    • k -the metric is calculated on the top K results from the search response
    • maximumGrade – the maximum relevance grade available when judging documents (optional, default: 3.0).
    • missingGrade – the grade that should be assigned to documents where no judgement has been given. This is optional – the default value is either maximumGrade / 2 (if maximumGrade has been supplied), or 2.0.
    • name – the name used to record this metric in the output (optional, defaults to ERR@k, where k is set as above). This allows the metric to be run multiple times with different missing grade values, for example.
  • RR@K – Reciprocal Rank
    • k – the metric is calculated on the top K results from the search response
    • maximumGrade – the maximum relevance grade available when judging documents (optional, default: 3.0).
    • missingGrade – the grade that should be assigned to documents where no judgement has been given. This is optional – the default value is either maximumGrade / 2 (if maximumGrade has been supplied), or 2.0.
    • name – the name used to record this metric in the output (optional, defaults to RR@k, where k is set as above). This allows the metric to be run multiple times with different missing grade values, for example.

Both maximumGrade and missingGrade may be floating-point values.

Once selected the metrics of interest, it’s possible to specify the target search engine approach: Embedded/External :

Embedded Target Search Engine

  • Search Engine Type:

  • Search Engine Version:

  • Local configuration:

  • Git configuration:

, fetched from a Git repository, branch, and commit point. The path is used to locate the configuration in the Git repository.

External Target Search Engine

  • Black Box API:

  • Search Engine:ElasticsearchApache Solr

Once everything is filled you can start the evaluation, pressing the ‘EVALUATE’ button.

You can keep track of the evaluation execution by checking the job list track.

Clicking on each evaluation you can see the details:

These details also include failures and details on why an execution failed (potentially wrong setting, unreachable hosts, configuration errors…)

In the next blog post let’s see how to display and navigate the results of an evaluation!

// BEGIN YOUR JOURNEY INTO THE SEARCH QUALITY EVALUATION

Rated Ranking Evaluator Enterprise

// STAY ALWAYS UP TO DATE

Subscribe to our newsletter

Did you like this post about Drop constant features: a real-world Learning to Rank scenario? Don’t forget to subscribe to our Newsletter to stay always updated from the Information Retrieval world!

Author

Alessandro Benedetti

Alessandro Benedetti is the founder of Sease Ltd. Senior Search Software Engineer, his focus is on R&D in information retrieval, information extraction, natural language processing, and machine learning.

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.