Main Blog RRE

We’ve seen in our previous blog post tutorial how to define the golden truth, a fundamental milestone for Search Quality Evaluation.
This blog focuses on the set up of the target of the evaluation: your search engine based on top of Elasticsearch or Solr.
RRE-Enterprise supports two modalities:

  • Embedded: RRE-Enterprise spins up with Docker an Elasticsearch/Solr instance, using the configurations and test data provided. The instance will be available for the time necessary to run the evaluation and then it is switched off.
    N.B. you need docker installed on your machine
  • External: RRE-Enterprise points to an existing Elasticsearch/Solr instance.
    N.B. the instance must be network accessible from RRE-Enterprise server

EMBEDDED Configuration Upload

When configuring an Embedded approach you need to provide the search engine configurations.
These configurations depend on your search engine implementation (Elasticsearch or Solr).
e.g.
– solrconfig.xml, schema.xml, synonyms.txt ect for Solr
– mapping parameters and settings for Elasticsearch

RRE-Enterprise supports a direct upload for such config: so you just need to provide the zipped folder with all the configurations:

 



EMBEDDED Configuration from Git

It’s a good practice to source control your search engine configuration with Git.
In this way you can approach the work of tuning your configurations incrementally, keeping track of various versions.
RRE-Enterprise got you covered in this scenario: you can fetch the configuration directly from your Git repository, a specific branch, and a specific commit.

Once given a Unique Identifier to the repository, the URL, and set up a public key that RRE-Enterprise uses to programmatically access the repo, you are good to go and this will be usable at evaluation time to fetch a specific version of your configurations.

 

EMBEDDED Query Templates

When using the Embedded approach, you won’t have a black box search API in the middle (you are assuming your front end directly talks with the final search engine Elasticsearch or Solr).
RRE-Enterprise spins the search engine instance, so when evaluating it, it will run the queries from the rating set directly.
To simplify the JSON rating file, we can define a set of Query Templates, that contain the query structure (Elasticsearch DSL query or Solr query).
The templates are referenced in the ratings and will be used by RRE-Enterprise to build the queries to run against the target search engine at evaluation time.
N.B. Query Templates are associated with a specific search engine configuration version that you can specify when uploading them

 

On the left: JSON rating file, on the right: examples of query templates.

{
  "query": {
    "multi_match": {
      "query": "$query",
      "fields": ["title","overview","cast.name","directors.name"]
    }
  }
}

multi_match_with_titles_and_starring.json

 

External

Most of the time your front end doesn’t talk directly with Elasticsearch or Solr: there’s a search API layer in the middle.
This search-API builds any sort of Elasticsearch or Solr query, it may contain very complex business logic.
We call it Black Box Search API: because RRE-Enterprise doesn’t need to know what are the implementation details of such middle layer nor the programming language involved, it just needs to know the REST endpoint to use to run the evaluation.
N.B. RRE-Enterprise supports REST Black Box Search API only

When configuring the target of your evaluation to be an external search engine you need to make sure you configure all the information RRE-Enterprise needs to talk with such search engine instance.
With this approach, you can point directly to your custom Black Box Search API!

RRE-Enterprise is capable of evaluating directly your search-API (that internally talks with Elasticsearch or Solr)

Let’s see how you can configure this:

First, you define the endpoint of the Black Box Search API:

 

N.B. your ratings need to specify the full HTTP Black Box Search API request associated with the query of the <query, document, rating> triplet

Then you need to configure the endpoints to access the search engine instance itself (Elasticsearch or Solr):

 

In this example, we are configuring an Elasticsearch instance.
You may notice an unusual configuration: ‘Log Extractor port’.
This is a simple script you need to install in the environment that is hosting the Elasticsearch instance.
It’s available for free, let us know if you need help with the installation!

 

And that’s it! Now you are able to set up your target search engine and proceed with the next episode:

RRE-Enterprise: How to Manage Your Data Collections [coming soon…]

// begin your journey into the search quality evaluation

Rated Ranking Evaluator Enterprise

// STAY ALWAYS UP TO DATE

Subscribe to our newsletter

Did you like this post about Drop constant features: a real-world Learning to Rank scenario? Don’t forget to subscribe to our Newsletter to stay always updated from the Information Retrieval world!

Author

Alessandro Benedetti

Alessandro Benedetti is the founder of Sease Ltd. Senior Search Software Engineer, his focus is on R&D in information retrieval, information extraction, natural language processing, and machine learning.

Leave a comment

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.