Imagine you are setting up an Apache Solr index and need to handle a field representing an ID that will be used frequently in filter queries.
The key question is: how should you index this field for optimal performance? Should you use a string field type, or would an integer field type be more efficient?
In this “tips and tricks” blog post, we will explore the results of a local experiment, combined with insights from online research, to determine which option provides better performance.
We will summarize our findings to help you make an informed decision.
Online Research Findings
After conducting an online search across sources such as StackOverflow, the Solr mailing list, Slack channels, and blogs, only a limited number of discussions on the topic were found. The findings gathered from these sources can be summarized as follows:
- For simple filter queries, there is typically no noticeable difference in performance between using a string field or an integer field in Solr.
- For use cases involving range queries or sorting, numeric fields (like integers) may offer better performance.
- If an integer field isn’t used for numeric operations like calculations or range queries, storing it as a string provides greater flexibility for string manipulations, although this approach might use slightly more storage.
However, given the scarcity of information available online, we decided to conduct performance testing with different field types in Solr to either confirm or challenge these findings. Our objective was to provide more definitive guidance to determine whether there are truly performance benefits in using one field type over another.
Local Experiment on Field Type Selection for Solr Filter Queries
To evaluate the performance of filter queries on different field types in Apache Solr, a local test was conducted using Apache Solr version 9.1.
The goal was to determine whether using a string or integer field type would produce better performance for queries on a hypothetical field representing user IDs.
Four separate collections were created in Solr, each with a different field type for the user_id field:
- Integer field (
pint–>solr.IntPointField) - String field (
string–>solr.StrField) - Multivalued integer field (
pints) - Multivalued integer field (
strings)
For each collection, 50K documents were indexed, with the user_id field assigned random values between 0 and 100. The values were formatted according to the field type, e.g.:
- Integer:
5 - String:
"5" - Multivalued integer:
[1, 2, 3] - Multivalued string:
["1", "2", "3"]
To simulate real-world query scenarios, a CSV file was created containing 50 different combinations of user IDs, with each combination ranging in length from 1 to 20 values.
Examples of these combinations include:
- 56,20,59,11,70,66,54,32,21,4,12
- 0,58,1,64
- 54
- 10,31,13,16
The test involved running Solr queries with both OR and AND conditions using these combinations. Examples of the queries executed include:
FILTER QUERY – OR:
q=*:*
&fq=user_id:(0 OR 58 OR 1 OR 64)
FILTER QUERY – AND:
q=*:*
&fq=user_id:(10 AND 31 AND 13 AND 16)
For the test, the 50 combinations listed in the CSV were used as filter queries across all collections to ensure consistency in the performance comparison.
A Python script was then developed to automatically run these 50 queries against each collection and calculate the average execution time for each query set.
Here are the results:
SINGLE VALUED FIELD
MULTI-VALUED FIELD
- Average Execution Time represents the average execution time for each query, measured in milliseconds (ms). This metric was captured using a Python script with the “time” library, recording the time elapsed from just before the query was sent until the response was received.
- Average Solr QTime is the average query time reported by Solr, measured in milliseconds (ms). It reflects the amount of time Solr spent processing the request.
RANGE QUERIES
In addition to these tests, another experiment was conducted using range queries to further assess performance differences between the field types. An example of a range query used is:
q=*:*
&fq=user_id:[1 TO 10]
Approximately 30 different range combinations were tested, and the average timing was calculated for each. The results of these range queries are as follows:
Results and Conclusion
The local test results closely reflect the insights gathered in online discussions. As shown in the tables, under both single and multi-valued conditions, string fields generally provide slightly better performance in simple filter queries, especially in OR and AND conditions, although the difference is not substantial.
However, for operations involving range queries or numeric calculations in Solr, integer fields showed greater efficiency and are therefore the recommended choice for optimal performance. This result underscores the importance of selecting the best field type for Solr filter queries based on the specific nature of the queries and operations required.
Need Help With This Topic?
Choosing the right field type, whether integer or string, can significantly impact your Solr performance. Need guidance? don’t worry – we’re here to help! Our team offers expert Apache Solr consulting services and Apache Solr training to help you optimize your Solr search engine and get the most out of your system. Contact us today to learn more!





