Apache Solr, Main Blog

QueryResultCache and FilterCache in Apache Solr

This blog is meant to explain how QueryResultCache and FilterCache are used during the basic query processing in Apache Solr 8.11.0. This blog does not explain how these caches are used during the execution of more advanced components like faceting.

Solr caches are associated with a specific instance of an Index Searcher. By default, elements in the caches don’t expire after a time interval, instead, they remain valid for the lifetime of the Index Searcher. Time-based expiration can be enabled by using the maxIdleTime option. This attribute is expressed in seconds, with the default value of 0 meaning no entries are automatically evicted due to exceeded idle time.

In Solr, the following cache implementations are available: CaffeineCache, LRUCache, FastLRUCache, and LFUCache. CaffeineCache is recommended because it usually offers a lower memory footprint, higher hit ratio, and better multi-threaded performance, all the other caches are in the deprecation path and they will be removed in Solr 9.0.

The Statistics page in the Solr Admin UI displays information about the performance of all the active caches.

If we want to have more details about the keys cached, a new open-source tool has been available since December 2021.
The cacheViewHandler [1] is implemented by Shawn Heisey and it offers a rest endpoint to see what are the keys and the number of documents cached for each key.

We are now focusing on the details of how the Solr searcher uses the queryResultCache and filterCache.

QueryResultCache

Solr documentation [2]

The queryResultCache holds the results of previous searches and this is the first cache involved when a new query has been submitted.

The cache stores an ordered list of the Lucene document IDs returned as a result of a previously submitted query. Each entry is associated with the query parameters q (query), fq (filterQuery), sort, and minExactCount. So, every time you submit a query where at least one of those parameters is changed, you will get a cache miss, the query will be executed and a new entry will be cached.

Let’s see what happens in some examples:

LUCENE QUERY PARSER

				
					SOLR_URL/solr/films/select?q=name:bend&fq=-genre:film&sort=name asc

The key will be

				
					{
    query: "name:bend",
    filterQuery: "-genre:film",
    sort: "name asc",
    minExactCount: Integer.MAX_VALUE
}

Note that the filter query is maintained as it is: negative queries will keep the negative meaning. We will see a different behaviour talking about the filterCache.

Dismax Query Parser

SOLR_URL/solr/films/select?defType=dismax&fq=genre:film&q=drama bend&qf=genre name

The key will be

DISMAX QUERY PARSER

				
					SOLR_URL/solr/films/select?defType=dismax&fq=genre:film&q=drama bend&qf=genre name

The key will be

				
					{
    query: "+((genre:drama | name:drama) (genre:bend | name:bend))",
    filterQuery: "genre:film",
    sort: null,
    minExactCount: Integer.MAX_VALUE
}

FilterCache

Solr documentation [2]

The filterCache is used to store an unordered set containing the results of each fq search parameter and, in some cases, the q parameter too.

This cache always uses the positive logic so, if the query contains the parameter &fq=-field:value, the cached query will be &fq=field:value. For example, if I want to get all the films in the collections excluding the ones with the genre “drama”, I’ll submit the query q=*:*&fq=-genre:drama. The filterCache will be populated with the key genre:drama and the document set associated with it.

By default, the filterCache is used only to hold the results of each fq parameter (converted to a positive query if needed). Anyway, the Solr searcher uses the filterCache also to store the results of the q parameter if both of the following conditions are met:

- The parameter useFilterForSortedQuery is set as true in the solrconfig.xml file
- The query sort clause does not include the score (if the sort clause is not defined, by default the results are sorted by score so the sort does implicitly include the score and this condition is not met)

Let’s execute some example queries and verify the cached content by using the cacheviewhandler [3] described above. Note Solr has been restarted before each example to have no polluted caches.

POSITIVE FQ CLAUSE: ONLY FQ CONDITION RESULT IS CACHED

We want all films with the name containing the word “bend” and we filter all the results to have the genre “drama”. As expected, the filterCache is used only to store the result of the filter “genre:drama”

Query	useFilterForSortedQuery
?q=name:bend&fq=genre:drama	false

				
					/solr/films/admin/info/cache?cache=filter

				
					"filterCacheEntries": {
    "genre:drama": 569
  }

NEGATIVE FQ CLAUSE: FQ CONDITION IS TRANSFORMED INTO POSITIVE

The same example we have seen before but, in this case, we will use a negative condition for the filter query. Since all filters are negative, the fq clause is automatically transformed in fq=*:*-genre:drama. The searcher executes separately two queries: *:* and genre:drama. The final result is the list of documents returned by the first query but not by the second one. In this example, the filterCache stores the results of both *:* and genre:drama queries.

Note that, as described before, negative entries are transformed into positive.

Query	useFilterForSortedQuery
?q=name:bend&fq=-genre:drama	false

				
					/solr/films/admin/info/cache?cache=filter

				
					"filterCacheEntries": {
    "*:*": 1100,
    "genre:drama": 569
  }

SORT CLAUSE SPECIFIED: Q RESULT IS CACHED

In this example, we set the parameter useFilterForSortedQuery to true and we sort all the results by name. During the query phase, having an unsorted list of documents is perfectly fine since Solr must reorder all the documents by name. This is the perfect example where the Solr searcher will use the filterCache also to store the q clause. Indeed, when we check the content of the cache, we immediately notice the presence of the key “name:bend”.

Query	useFilterForSortedQuery
?q=name:bend&fq=genre:drama&sort=name asc	true

				
					/solr/films/admin/info/cache?cache=filter

				
					"filterCacheEntries": {
    "genre:drama": 569,
    "name:bend": 3
  }

SORT CLAUSE NOT SPECIFIED: Q RESULT IS NOT CACHED

This is the same example seen before but we removed the sort. Despite the parameter useFilterForSortedQuery being true, the filterCache will not be used for the q clause because the results are implicitly sorted by score. In this case, the Solr searcher will behave exactly in the same way we have seen in the first example.

Query	useFilterForSortedQuery
?q=name:bend&fq=genre:drama	true

filterCache content

				
					"filterCacheEntries": {
    "genre:drama": 569
  }

COMPOSED CONDITIONS

In all the examples we have seen before, we used a single fq condition per query. If we want more filters we can add multiple fq clauses and the final query filter will be the interception of all the fq conditions. The query &fq=condition1&fq=condition2 provides the same result of fq=condition1 AND condition2. If we want to get the union of multiple conditions, we use the syntax: fq=condition1 OR condition2.

Let’s now see what happens in the filter cache when we submit these queries with the composed condition.

Query
?q=:&fq=genre:drama&fq=directed_by:russell

filterCache content

				
					"filterCacheEntries": {
    "directed_by:russell": 2,
    "genre:drama": 569
  }

Executing this query, solr computes each fq clause independently and stores each result in the cache; then it computes the interception. Using the other syntax listed above, the result stored in the cache will be the result of the final interception of all conditions.

Query
?q=:&fq=genre:drama AND directed_by:russell

filterCache content

				
					"filterCacheEntries": {
    "+genre:drama +directed_by:russell": 2
  }

Using multiple fq clauses, we will have data stored in the filter cache with a higher granularity, so we will have a higher possibility of hitting the cache in future searches. If the user submits the same query, the filterCache will not be hit and the interception of all conditions must be computed for each query. On the other hand, using a single fq clause composed of an AND/OR or multiple conditions, we will hit the cache only if the future query contains exactly the same fq clause.

Query
?q=:&fq=filter(genre:drama) OR filter(directed_by:russell)

filterCache content

				
					"filterCacheEntries": {
    "directed_by:russell": 2,
    "filter(genre:drama) filter(directed_by:russell)": 569,
    "genre:drama": 569
  }

The keyword filter() is used to tell Solr to use the filterCache even for storing the result of single conditions inside a composed fq clause. This allows us to be able to hit the cache if the future query will contain the same fq composed condition or a single condition.

Need Help With This Topic?

If you’re struggling with QueryResultCache and FilterCache, don’t worry – we’re here to help! Our team offers expert services and training to help you optimize your Solr search engine and get the most out of your system. Contact us today to learn more!

Need Help with this topic?

If you're struggling with QueryResultCache and FilterCache, don't worry - we're here to help! Our team offers expert services and training to help you optimize your Solr search engine and get the most out of your system. Contact us today to learn more!

Click Here

apache solr, cache, caching, information retrieval, performance, search

Sign up for our Newsletter

Did you like this post? Don’t forget to subscribe to our Newsletter to stay always updated in the Information Retrieval world!

2 Responses

Bejean says:

August 1, 2023 at 1:43 pm

Hi,
For QueryResultCache, you write “Each entry is associated with the query parameters q (query), fq (filterQuery), sort, and minExactCount”.
start and rows parameters are not discriminants too ?
Dominique

Loading...

Reply
1. Daniele Antuzi says:
  
  September 1, 2023 at 9:15 am
  
  Hi Dominique,
  if you look at the source code of the cache key, the `start` and `rows` parameters are not discriminant.
  The values `start` and `rows` impact the number of elements stored in the cache in the following way.
  Given `queryResultWindowSize=20` we have:
  start=0, rows=10, the cache keeps the first 20 results
  start=10, rows=10, the cache keeps the first 20 results
  start=20, rows=10, the cache keeps the first 40 results
  That being said, there is another parameter `queryResultMaxDocsCached` so, if `start + rows > queryResultMaxDocsCached`, the cache won’t be used.
  
  Loading...
  
  Reply

About the company

about our work

Rated Ranking Evaluator (RRE)

Rated Ranking Evaluator Enterprise (RREE)

Apache Solr LLM Highlighter plugin

News

Main Blog

TIPS AND TRICKS

LATEST BLOG POST

contact us

Don't miss all the news - subscribe to our newsletter!

QueryResultCache and FilterCache in Apache Solr

QueryResultCache

LUCENE QUERY PARSER

Dismax Query Parser

DISMAX QUERY PARSER

FilterCache

POSITIVE FQ CLAUSE: ONLY FQ CONDITION RESULT IS CACHED

NEGATIVE FQ CLAUSE: FQ CONDITION IS TRANSFORMED INTO POSITIVE

SORT CLAUSE SPECIFIED: Q RESULT IS CACHED

SORT CLAUSE NOT SPECIFIED: Q RESULT IS NOT CACHED

COMPOSED CONDITIONS

Need Help With This Topic?​​

Need Help with this topic?​

Other posts you may find useful

Categorical Features in Apache Solr Learning to Rank

Synonyms + Stopwords?? OMG!

Word2Vec Model To Generate Synonyms on the Fly in Apache Lucene – Introduction

Daniele Antuzi

Daniele Antuzi

Follow Us

Top Categories

Recent Posts

Scalar Quantization of Dense Vectors in Apache Solr

Retrieval and Responsibility: The Ethics of Augmented Knowledge

Faster Vector Search: Early Termination Strategy Now in Apache Solr

Monthly video

Sign up for our Newsletter

2 Responses

Leave a Reply Cancel reply

Rated Ranking Evaluator
(RRE)

Need Help With This Topic?

Need Help with this topic?