Apache Solr Main Blog

Hi everyone!
In this blog post, I would like to talk about the current usage of the Feature Vector Cache in Solr.

You can find a brief introduction to Learning To Rank in Solr in this blog post!

(This blog post has been written looking at the code that is likely to be in Apache Solr 9.0 and at the commit id=cfc953b6b90 in the main branch of the Solr git repository)

TL;DR;

At the moment the feature vector cache is only used when you enable the feature transformer in the fl parameter (both in insertions and lookup).
It would be interesting to use the feature vector cache also at reranking time, independently of the feature transformer.

We are planning for a contribution.

LTR query performance

FEATURE VECTOR CACHE

The Feature Vector Cache benefits are not too intuitive reading the Apache Solr documentation.
Let’s see how it works directly from the Solr code.

Insertion and lookups are done in a specific class called: org.apache.solr.ltr.FeatureLogger

Insertions are done in the org.apache.solr.ltr.FeatureLogger#log method:

public boolean log(int docid, LTRScoringQuery scoringQuery, SolrIndexSearcher 
searcher, LTRScoringQuery.FeatureInfo[] featuresInfo) {
  final String featureVector = makeFeatureVector(featuresInfo);
  if (featureVector == null) {
    return false;
  }
  return searcher.cacheInsert(fvCacheName, fvCacheKey(scoringQuery, docid), 
featureVector) != null;
}

Lookups are done in the org.apache.solr.ltr.FeatureLogger#getFeatureVector method:

public String getFeatureVector(int docid, LTRScoringQuery scoringQuery, 
SolrIndexSearcher searcher) {
    return (String) searcher.cacheLookup(fvCacheName, fvCacheKey(scoringQuery, 
docid));
}

N.B. Both Insertions and Lookups happen only if you are using the feature transformer (in the fl query parameter).

Insertions

Let’s go in-depth through the insert process.

In order to have an insertion:

The first condition we meet is in the org.apache.solr.ltr.LTRRescorer#scoreFeatures method:

org.apache.solr.ltr.LTRRescorer#scoreFeatures

....
if (scoreSingleHit(topN, docBase, hitUpto, hit, docID, scorer, reranked)) {
  logSingleHit(indexSearcher, modelWeight, hit.doc, scoringQuery);
}
...

Currently the condition on the org.apache.solr.ltr.LTRRescorer#scoreSingleHit method is always verified.
We suspect it ended up this way due to code maintenance problems, we are investigating this more and we’ll update this post with more details later on.

Let’s see why.

Here is the called method:

org.apache.solr.ltr.LTRRescorer#scoreSingleHit

...
if (hitUpto < topN) {
      reranked[hitUpto] = hit;
      // if the heap is not full, maybe I want to log the features for this
      // document
      logHit = true;
    } else if (hitUpto == topN) {
      // collected topN document, I create the heap
      heapify(reranked, topN);
    }
...

This piece of code is called during the reranking phase of the topN documents (a parameter that corresponds to the rerankDocs value chosen at query time inside the rq paramater).
Since we are reranking the topN documents, the if clause hitUpTo < topN is always verified for these topN docs (we are iterating on them) and therefore the logHit variable is always set to True for them.

Then a second condition arises.

In org.apache.solr.ltr.LTRRescorer#logSingleHit the existence of a FeatureLogger is needed as well as a SolrIndexSearcher is required.

Here is the condition:

org.apache.solr.ltr.LTRRescorer#logSingleHit

...
if (featureLogger != null && indexSearcher instanceof SolrIndexSearcher) {
  featureLogger.log(docid, scoringQuery, (SolrIndexSearcher)indexSearcher, 
modelWeight.getFeaturesInfo());
}
...

The condition to have a SolrIndexSearcher is always accomplished so let’s focus on the FeatureLogger one.

A FeatureLogger is set if both 1 AND 2 are satisfied:

    1. the feature transformer in the fl query parameter is used.
      In this case indeed we will have the extractFeatures variable in the org.apache.solr.ltr.search.LTRQParserPlugin.LTRQParser#parse method set to True from line 164.
    2. the feature store defined in the feature transformer is the same as the feature store defined in the model
      OR
      just the feature transformer [features] component has been set, without specifying the store we want to be used.

Therefore to have an insertion, these are the two conditions to accomplish:

    1. the feature transformer in the fl query parameter is used.
    2. the feature store defined in the fl parameter is the same as the feature store defined in the model
      OR
      just the feature transformer [features] component has been set, without specifying the store to be used.
Only for the curious ones

If you want to go in-depth into these conditions you can take a look at the org.apache.solr.ltr.search.LTRQParserPlugin.LTRQParser#parse method.
Here the extractFeatures variable is set in line 164:

final boolean extractFeatures = SolrQueryRequestContextUtils.isExtractingFeatures(req);

Then the computed extractFeatures is used in line 183, where the condition on the feature store is implemented:

final boolean featuresRequestedFromSameStore = 
(modelFeatureStoreName.equals(tranformerFeatureStoreName) || 
tranformerFeatureStoreName == null) ? extractFeatures : false;

And then the computed featuresRequestedFromSameStore is used to set the FeatureLogger at line 198.

Lookups

As mentioned before, lookups are part of the transformer process.
To enable the transformer just pass the <transfomer name> (defined in the solrconfig.xml) in the fl parameter.
e.g. fl=[features]

solrconfig.xml

<transformer name="features" 
class="org.apache.solr.ltr.response.transform.LTRFeatureLoggerTransformerFactory">
  <str name="fvCacheName">QUERY_DOC_FV</str>
</transformer>

Then one of these two conditions need to be True:

    1. The query doesn’t have to be an OriginalRankingLTRScoringQuery.
    2. A store is needed to be explicitly defined in the fl parameter.

Here are the two conditions that need to be verified:

org.apache.solr.ltr.response.transform.LTRFeatureLoggerTransformerFactory.FeatureTra
nsformer#implTransform

...
if (!(rerankingQuery instanceof OriginalRankingLTRScoringQuery) || 
hasExplicitFeatureStore) {
        Object featureVector = featureLogger.getFeatureVector(docid, rerankingQuery, 
searcher);
        ...
      }
...

The first condition on the OriginalRankingLTRScoringQuery is always true when interleaving is not running and therefore we will always access the cache for lookups.
Interleaving gives you the possibility of comparing a learned model with the original Solr score.
The original Solr score doesn’t require features to be extracted, so unless hasExplicitFeatureStore is defined, the lookup wouldn’t be necessary.

Here we can see that the cache is only used for the FeatureLoggerTrasformer and not during the reranking phase done through the rq parameter.
We are not making the reranking faster.

Therefore to have a lookup, these are the two conditions to accomplish:

    1. the feature transformer in the fl query parameter is used.
    2. the LTR model is a learned model (and not the original Solr score pre-reranking)
      OR
      a feature store has been exp
EXAMPLES

Let’s make some example queries to see how the Solr cache behaves.

First query

This is our first query. It has the [features] component defined in the fl query parameter and also the rq query parameter has been used.

For insertions we can see that:

Condition Is condition verified
The transformer is defined in fl (‘[features=…]’)  YES
The transformer store defined is the same as the model store (TRUE)
OR
the transformer [features] has been set, without specifying the store to be used (FALSE)
YES

All conditions for insertions are accomplished and therefore insertions happen.

For lookups we can see that:

Condition Is condition verified
The transformer is defined in fl (‘[features=…]’) YES
The model is a learned model (‘first_model’) (TRUE)
OR
An explicit store has been defined in the transformer (‘first_model_store’) (TRUE)
YES

All the conditions for lookups are accomplished and therefore lookups happen.

Second query

This is our second query. It has the [features] component defined in the fl query parameter and also the rq query parameter has been used.

For insertions we can see that:

Condition Is condition verified
The transformer is defined in fl (‘[features=…]’) YES
The transformer store defined is the same as the model store (FALSE)
OR
the transformer [features] has been set, without specifying the store to be used (FALSE)
NO (the model store is the ‘first_model_store’)

Not all conditions for insertions are accomplished and therefore insertions will not be made.

For lookups we can see that:

Condition Is condition verified
The transformer is defined in fl (‘[features=…]’) YES
The model is a learned model (‘first_model’) (TRUE)
OR
An explicit store has been defined in the transformer (‘second_model_store’) (TRUE)
YES
(N.B. the model and transformer stores are different)

All the conditions for lookups are accomplished and therefore lookups happen.

Third query

This is our third query. It has the [features] component defined in the fl query parameter but no rq query parameter has been used.

For insertions we can see that:

Condition Is condition verified
The transformer is defined in fl (‘[features=…]’) YES
The transformer store defined is the same as the model store (FALSE)
OR
the transformer [features] has been set, without specifying the store to be used (FALSE)
NO (there is no rq parameter)

Not all conditions for insertions are accomplished and therefore insertions will not be made.

For lookups we can see that:

Condition Is condition verified
The transformer is defined in fl (‘[features=…]’) YES
The model is a learned model (FALSE)
OR
An explicit store has been defined in the transformer (‘second_model_store’) (TRUE)
YES

The first and second conditions for lookups are accomplished and therefore lookups happen.

Fourth query

This is our fourth query. It has no [features] component defined in the fl query parameter but the rq query parameter has been used.

For insertions we can see that:

Condition Is condition verified
The transformer is defined in fl (‘[features=…]’) NO
The transformer store defined is the same as the model store (FALSE)
OR
the transformer [features] has been set, without specifying the store to be used (FALSE)
NO

Not all conditions for insertions are accomplished and therefore insertions will not be made.

For lookups we can see

Condition Is condition verified
The transformer is not defined in fl (‘[features=…]’) NO
The model is a learned model (TRUE)
OR
An explicit store has been defined in the transformer (FALSE)
YES

Not all conditions for lookups are accomplished and therefore lookups will not be made.

Future works

We will open soon a Jira issue to integrate the Feature Vector Cache in the reranking phase, independently of the feature transformer.

// our service

Shameless plug for our training and services!

Did I mention we do Learning To Rank and Apache Solr Beginner training?
We also provide consulting on these topics, get in touch if you want to bring your search engine to the next level!

// STAY ALWAYS UP TO DATE

Subscribe to our newsletter

Did you like this post about how the Feature Vector Cache Is Used in Apache Solr? Don’t forget to subscribe to our Newsletter to stay always updated on the Information Retrieval world!

Author

Anna Ruggero

Anna Ruggero is a software engineer passionate about Information Retrieval and Data Mining. She loves to find new solutions to problems, suggesting and testing new ideas, especially those that concern the integration of machine learning techniques into information retrieval systems.

Leave a comment

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.