Main Blog RRE
Query Discovery

The role of the "Intruder" layers

One of the things that make RRE, the open source version, very fast and immediate to use, is a direct communication with the target search engine. 

That means a search engineer, within the IDE, can use RRE to bind and test a set of ratings directly towards a Solr or an Elasticsearch instance. 

While this is undoubtedly pragmatic for concretely improving the search quality of the system under test, it introduces a strong compromise: the queries defined in the system, that will be executed for creating search quality metrics are not “user queries“: they need to be declared using the native search engine language. Here’s an example of an Elasticsearch query:

{
  "query": {
    "match": {
      "name": {
        "query": "$query",
        "minimum_should_match": "3<-75% 9<-85%"
      }
    }
  }
}

Again, that establishes a powerful and direct connection with the target search engine but…in our experience that doesn’t reflect the extremely used/proven three-tiers architecture that distributes system responsibilities among:

  • a client application: typically a frontend layer(e.g., AngularJS or ReactJS)
  • an API layer: an intermediate component (actually a set of components in the case of a micro-services architecture) that is in charge to hide, abstracting, and implementing the system logic by coordinating and orchestrating the internal subsystems (e.g. an RDBMS, a search engine, a NoSQL storage).
  • a “datasource” layer, which consists of one or multiple storage subsystems. Each of them manages data, even the same data in some cases, for serving different purposes.  
Martin Fowler, in his famous book “Patterns of Enterprise Application Architecture” describes that layered architecture as composed of the Presentation, the Domain, and the Data Source layers.

Back to our search quality context, that means in a usual architecture, an “intruder” (the API layer) intermediates between the user query and the corresponding search engine query

The API layer implements the system logic. That means starting from a request (in this case let’s simplify and call it a Search API request) there’s business workflow which triggers several actions involving several components. 

For the search engine, that means the API layer builds and executes a search-engine specific query, for example taking in account ACL, permission filters, boosting logic and so on.    

If we want to actually measure the search quality of a system like that, is it correct to discard the role the API layer plays? Definitely not. An unfortunately, that is exactly what RRE, the open source version, does. 

Ideally, I would like to be able to consider the whole system, including the API layer, as something to measure.  

 

RRE Enterprise: the Query Discovery

RRE Enterprise fills the missing gap described above by implementing a query discovery mechanism. How it works? Without entering in technical details, the underlying idea is pretty simple:

  • Do not consider the presentation layer in the evaluation process
  • Split the Query entity in two related requests: the Search API request and the Search Engine request 
  • Trigger a Search API request towards the Search API Layer
  • Capture the corresponding Search Engine request (on the Search Engine side)
  • Store the correlation between them
At the end, a rating definition will therefore include all the relevant pieces that contributed to a given query execution, including the Search API request and the corresponding the Search Engine requests
 
RREE implements the query discovery described above both in Apache Solr and Elasticsearch.
 
The only assumption required for a successful discovery is to have, during that process, an exclusive access to the target search engine.
As you can imagine, if there is some other process that is using the search engine, it would be very hard in the correlation phase to distinguish between queries executed as consequence of RREE discovery and other applications.    

Recap

RREE Query Discovery is a crucial component of the evaluation infrastructure: it allows to consider the Search application as a whole, therefore targeting a system under evaluation strictly close to the real production environment.

It does so by including in the evaluation process the business and system logic carried out by intermediate API layers, which are a crucial part of the Search application.   

The purpose is to maximize the “trustability” of the evaluation process output. 

// BEGIN YOUR JOURNEY INTO THE SEARCH QUALITY EVALUATION

Rated Ranking Evaluator Enterprise

// STAY ALWAYS UP TO DATE

Subscribe to our newsletter

Did you like this post about Drop constant features: a real-world Learning to Rank scenario? Don’t forget to subscribe to our Newsletter to stay always updated from the Information Retrieval world!

Author

Andrea Gazzarini

Andrea Gazzarini is a curious software engineer, mainly focused on the Java language and Search technologies. With more than 15 years of experience in various software engineering areas, his adventure in the search world began in 2010, when he met Apache Solr and later Elasticsearch.

Leave a comment

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.