Apache Solr, Tips And Tricks

Digging in the Solr code: 5 minutes how to

Let’s say you need to write a component, a request handler, or, in general, some piece of custom code that needs to be plugged into Solr. Or, you need to have a deeper understanding of some Lucene/Solr internals, following what happens within the code.

I know: unit tests, integration tests, everything to make sure things behave as you would expect; but here I’m talking about something different: while developing, it is (at least for me) very useful a productive and debug environment where it is possible, using short dev iterations, to follow step by step what’s happening within the code, taking a deep look at how things work behind the scenes.

In my experience, I found that useful in a couple of scenarios:

I have to write some Solr add-ons: in this case, I want to have a development environment which allows me to write and debug code as fast as possible
I have to study some Solr internals: let’s say for example I need to check what happens at retrieval time when a field is both docValues=”true” and stored=”true”; where does Solr get the field value from?

Let’s see how both of them can be accomplished in a few minutes!

Step #1: clone our template repository

Clone the following repository [1]

Once imported into your favourite IDE, the project layout will look like this:

As you can see, the template project provides:

A custom TokenFilter that simply prints in the standard out the output tokens during the text analysis. Note this is just an example (useful if you want to debug an analyzer): I could have created a SearchComponent, a Tokenizer or whatever I’d need.
a sample Solr configuration, with a minimal set of things configured
a Test Supertype layer (BaseIntegrationTest) and a sample Test (Tests) which loads some data, executes a query and then prints out the results.

Surprisingly, that’s all! There’s no second step!

Use Case #1: implement, debug and test an add-on

As previously said, in the example repository we already have a simple add-on which consists of a TokenFilter that prints in the standard output each token produced in the analysis chain. The filter has been declared in the Solr configuration as part of “text” field type analyzer:

				
					<fieldType name="text" class="solr.TextField">
    <analyzer>
        ...
        <filter class="io.sease.labs.solr.SystemOutTokenFilterFactory"/>
    </analyzer>
</fieldType>

The test class triggers that analyzer because it indexes some documents, so if you run it as a plain JUnit test, you will see the following output:

				
					startOffset=0,endOffset=6,positionIncrement=1,positionLength=1,type=word => Object
startOffset=7,endOffset=15,positionIncrement=1,positionLength=1,type=word => Oriented
startOffset=16,endOffset=24,positionIncrement=1,positionLength=1,type=word => Software
startOffset=25,endOffset=37,positionIncrement=1,positionLength=1,type=word => Construction
startOffset=0,endOffset=6,positionIncrement=1,positionLength=1,type=word => Design
startOffset=7,endOffset=16,positionIncrement=1,positionLength=1,type=word => Patterns:
startOffset=17,endOffset=25,positionIncrement=1,positionLength=1,type=word => Elements
startOffset=26,endOffset=28,positionIncrement=1,positionLength=1,type=word => of
startOffset=29,endOffset=37,positionIncrement=1,positionLength=1,type=word => Reusable
startOffset=38,endOffset=53,positionIncrement=1,positionLength=1,type=word => Object-Oriented
startOffset=54,endOffset=62,positionIncrement=1,positionLength=1,type=word => Software
 
DOC 1 
id = 1
title = Object Oriented Software Construction

DOC 2 
id = 2
title = Design Patterns: Elements of Reusable Object-Oriented Software

If you put a breakpoint in the token filter and re-run the Tests class in debug mode, the debugger will stop at that line as expected:

Use Case #2: debugging Solr internals

In this case there’s no custom code because remember, the goal is to investigate some Solr internals. Specifically, the question I have to answer in this example is: assuming we have a field

				
					<field name="myfield" type="string" docValues="true" stored="true"/>

and a request

				
					q=...&...&fl=myfield

Where does Solr get the field value from [1]?

The first thing I have to do is to change something in the project:

- schema.xml: add the field definition above
- Tests class: change the query parameters (adding fl=myfield) and add some value for the myfield field in the indexed documents.

Now, a premise: since the goal of this blog post is not to answer the question above, we will skip all the investigation phases needed to understand the overall query execution flow and to detect the right place where we will put the breakpoint.

After some investigation, we understand the RetrieveFieldOptimizer class plays a fundamental role in that process (this class – and the optimisation as well – has been introduced in Solr 7.), so let’s open it and put some breakpoints:

As you can see, the name and the intent of that class are quite clear, but I still want to see what happens at runtime: let’s start the Tests class in debug mode and, as expected

I can see the field “myfield” has been collected in the “storedFields” set, while the dvFields (DocValues fields) set is empty, even if the field has the docValues flag enabled. So that probably suggests something…

Moving forward, we arrive at the optimize method, where we meet the optimisation described in SOLR-8344 [3]:

Again, this is just an example and the goal here is not to describe the findings; however, briefly, it says that if all requested fields

- have the docValues and stored flags enabled
- are not multivalued

then Solr retrieves the values only from docValues.

Need Help With This Topic?

If you’re struggling with the debug environments, don’t worry – we’re here to help! Our team offers expert services and training to help you optimize your Solr search engine and get the most out of your system. Contact us today to learn more!

Need Help with this topic?

If you're struggling with the debug environments, don't worry - we're here to help! Our team offers expert services and training to help you optimize your Solr search engine and get the most out of your system. Contact us today to learn more!

Click Here

apachesolr, debugging, development, information retrieval, solr, solrlucene

Sign up for our Newsletter

Did you like this post? Don’t forget to subscribe to our Newsletter to stay always updated in the Information Retrieval world!

About the company

about our work

Rated Ranking Evaluator
(RRE)

Rated Ranking Evaluator Enterprise (RREE)

Apache Solr LLM Highlighter plugin

News

Main Blog

TIPS AND TRICKS

LATEST BLOG POST

contact us

Don't miss all the news - subscribe to our newsletter!

Digging in the Solr code: 5 minutes how to

Step #1: clone our template repository

Use Case #1: implement, debug and test an add-on

Use Case #2: debugging Solr internals

Need Help With This Topic?

Need Help with this topic?

Other posts you may find useful

Search Limitations and Workarounds in OpenSearch

Elasticsearch Disk Space Issue and Rollover Solution

Solr: You complete me! The Apache Solr Autocomplete

Andrea Gazzarini

Andrea Gazzarini

Follow Us

Top Categories

Recent Posts

GLiNER as an Alternative to LLMs for Query Parsing – Introduction

Enterprise AI Products for Search: Limits and Risks

OpenSearch Semantic Sentence Highlighting Explained

Monthly video

Sign up for our Newsletter

Leave a Reply Cancel reply

Quick Links

Services

Subscribe

About the company

about our work

Rated Ranking Evaluator (RRE)

Rated Ranking Evaluator Enterprise (RREE)

Apache Solr LLM Highlighter plugin

News

Main Blog

TIPS AND TRICKS

LATEST BLOG POST

contact us

Don't miss all the news - subscribe to our newsletter!

Digging in the Solr code: 5 minutes how to

Step #1: clone our template repository

Use Case #1: implement, debug and test an add-on

Use Case #2: debugging Solr internals

Need Help With This Topic?​​

Need Help with this topic?​

Other posts you may find useful

Search Limitations and Workarounds in OpenSearch

Elasticsearch Disk Space Issue and Rollover Solution

Solr: You complete me! The Apache Solr Autocomplete

Andrea Gazzarini

Andrea Gazzarini

Follow Us

Top Categories

Recent Posts

GLiNER as an Alternative to LLMs for Query Parsing – Introduction

Enterprise AI Products for Search: Limits and Risks

OpenSearch Semantic Sentence Highlighting Explained

Monthly video

Sign up for our Newsletter

Leave a Reply Cancel reply

Rated Ranking Evaluator
(RRE)

Need Help With This Topic?

Need Help with this topic?