Apache Solr Main Blog
QueryResultCache and FilterCache in Apache Solr

This blog is meant to explain how QueryResultCache and FilterCache are used during the basic query processing in Apache Solr 8.11.0. This blog does not explain how these caches are used during the execution of more advanced components like faceting.

Solr caches are associated with a specific instance of an Index Searcher. By default, elements in the caches don’t expire after a time interval, instead, they remain valid for the lifetime of the Index Searcher. Time-based expiration can be enabled by using the maxIdleTime option. This attribute is expressed in seconds, with the default value of 0 meaning no entries are automatically evicted due to exceeded idle time.

In Solr, the following cache implementations are available: CaffeineCache, LRUCache, FastLRUCache, and LFUCache. CaffeineCache is recommended because it usually offers a lower memory footprint, higher hit ratio, and better multi-threaded performance, all the other caches are in the deprecation path and they will be removed in Solr 9.0.

The Statistics page in the Solr Admin UI displays information about the performance of all the active caches.

If we want to have more details about the keys cached, a new open-source tool is available from December 2021.
The cacheViewHandler [1] is implemented by Shawn Heisey and it offers a rest endpoint to see what are the keys and the number of documents cached for each key.

We are now focusing on the details of how the Solr searcher uses the queryResultCache and filterCache.

QueryResultCache

Solr documentation [2]

The queryResultCache holds the results of previous searches and this is the first cache involved when a new query has been submitted.

The cache stores an ordered list of the Lucene document IDs returned as a result of a previously submitted query. Each entry is associated with the query parameters q (query), fq (filterQuery), sort, and minExactCount. So, every time you submit a query where at least one of that parameters is changed, you will get a cache miss, the query will be executed and a new entry will be cached.

Let’s see what happens in some examples:

Lucene Query Parser

SOLR_URL/solr/films/select?q=name:bend&fq=-genre:film&sort=name asc

The key will be

{
    query: "name:bend",
    filterQuery: "-genre:film",
    sort: "name asc",
    minExactCount: Integer.MAX_VALUE
}

Note that the filter query is maintained as it is: negative queries will keep the negative meaning. We will see a different behavior talking about the filterCache.

Dismax Query Parser

SOLR_URL/solr/films/select?defType=dismax&fq=genre:film&q=drama bend&qf=genre name

The key will be

{
    query: "+((genre:drama | name:drama) (genre:bend | name:bend))",
    filterQuery: "genre:film",
    sort: null,
    minExactCount: Integer.MAX_VALUE
}

FilterCache

Solr documentation [2]

The filterCache is used to store an unordered set containing the results of each fq search parameter and, in some cases, the q parameter too.

This cache always uses the positive logic so, if the query contains the parameter &fq=-field:value, the cached query will be &fq=field:value. For example, if I want to get all the films in the collections excluding the ones with the genre “drama”, I’ll submit the query q=*:*&fq=-genre:drama. The filterCache will be populated with the key genre:drama and the document set associated with it.

By default, the filterCache is used only to hold the results of each fq parameter (converted to positive query if needed). Anyway, the Solr searcher uses the filterCache also to store the results of the q parameter if both of the following conditions are met:

    • The parameter useFilterForSortedQuery is set as true in the solrconfig.xml file
    • The query sort clause does not include the score (if the sort clause is not defined, by default the results are sorted by score so the sort does implicitly include the score and this condition is not met)

Let’s execute some example queries and verify the cached content by using the cacheviewhandler [3] described above. Note Solr has been restarted before each example in order to have no polluted caches.

POSITIVE FQ CLAUSE: ONLY FQ CONDITION RESULT IS CACHED

We want all films with the name containing the word “bend” and we filter all the results to have the genre “drama”. As expected, the filterCache is used only to store the result of filter “genre:drama”

Query useFilterForSortedQuery
?q=name:bend&fq=genre:drama false

/solr/films/admin/info/cache?cache=filter

  "filterCacheEntries": {
    "genre:drama": 569
  }
NEGATIVE FQ CLAUSE: FQ CONDITION IS TRANSFORMED INTO POSITIVE

The same example we have seen before but, in this case, we will use a negative condition for the filter query. Since all filters are negative, the fq clause is automatically transformed in fq=*:*-genre:drama. The searcher executes separately two queries: *:* and genre:drama. The final result is the list of documents returned by the first query but not by the second one. In this example, the filterCache stores the results of both *:* and genre:drama queries.

Note that, as described before, negative entries are transformed to positive

Query useFilterForSortedQuery
?q=name:bend&fq=-genre:drama false

/solr/films/admin/info/cache?cache=filter

  "filterCacheEntries": {
    "*:*": 1100,
    "genre:drama": 569
  }
SORT CLAUSE SPECIFIED: Q RESULT IS CACHED

In this example, we set the parameter useFilterForSortedQuery to true and we sort all the results by name. During the query phase, having an unsorted list of documents is perfectly fine since Solr must reorder all the documents by name. This is the perfect example where the Solr searcher will use the filterCache also to store the q clause. Indeed, when we check the content of the cache, we immediately notice the presence of the key “name:bend”.

Query useFilterForSortedQuery
?q=name:bend&fq=genre:drama&sort=name asc true

/solr/films/admin/info/cache?cache=filter

  "filterCacheEntries": {
    "genre:drama": 569,
    "name:bend": 3
  }
SORT CLAUSE NOT SPECIFIED: Q RESULT IS NOT CACHED

This is the same example seen before but we removed the sort. Despite the parameter useFilterForSortedQuery being true, the filterCache will not be used for the q clause because the results are implicitly sorted by score. In this case, the Solr searcher will behave exactly in the same way we have seen in the first example

Query useFilterForSortedQuery
?q=name:bend&fq=genre:drama true

filterCache content

  "filterCacheEntries": {
    "genre:drama": 569
  }
COMPOSED CONDITIONS

In all the examples we have seen before, we used a single fq condition per query. If we want more filters we can add multiple fq clauses and the final query filter will be the interception of all the fq conditions. The query &fq=condition1&fq=condition2 provides the same result of fq=condition1 AND condition2. If we want to get the union of multiple conditions, we use the syntax: fq=condition1 OR condition2.

Let’s now see what happens in the filter cache when we submit these queries with the composed condition.

Query    
?q=*:*&fq=genre:drama&fq=directed_by:russell    

filterCache content

  "filterCacheEntries": {
    "directed_by:russell": 2,
    "genre:drama": 569
  }

Executing this query, solr computes each fq clause independently and stores each result in the cache; then it computes the interception. Using the other syntax listed above, the result stored in the cache will be the result of the final interception of all conditions.

Query    
?q=*:*&fq=genre:drama AND directed_by:russell    

filterCache content

  "filterCacheEntries": {
    "+genre:drama +directed_by:russell": 2
  }

Using multiple fq clauses, we will have data stored in the filter cache with a higher granularity, so we will have a higher possibility to hit the cache in future searches. If the user submits the same identical query, the filterCache will not be hit and the interception of all conditions must be computed for each query. On the other hand, using a single fq clause composed of an AND/OR or multiple conditions, we will hit the cache only if the future query contains exactly the same fq clause.

Query    
?q=*:*&fq=filter(genre:drama) OR filter(directed_by:russell)    

filterCache content

  "filterCacheEntries": {
    "directed_by:russell": 2,
    "filter(genre:drama) filter(directed_by:russell)": 569,
    "genre:drama": 569
  }

The keyword filter() is used to tell Solr to use the filterCache even for storing the result of single conditions inside a composed fq clause. This allows us to be able to hit the cache if the future query will contain the same fq composed condition or a single condition.

// our service

Shameless plug for our training and services!

Did I mention we do Apache Solr Beginner and Elasticsearch Beginner training?
We also provide consulting on these topics, get in touch if you want to bring your search engine to the next level!

// STAY ALWAYS UP TO DATE

Subscribe to our newsletter

Did you like this post about QueryResultCache and FilterCache in Apache Solr? Don’t forget to subscribe to our Newsletter to stay always updated on the Information Retrieval world!

Author

Daniele Antuzi

Software engineer passionate about high-performance data structures and algorithms. He likes studying and experimenting new technologies trying to improve the state of art.

Leave a comment

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.