Synonyms, Tips And Tricks

Still Synonyms + Stopwords?? Mamma mia!

The Context

Brief recap of where we arrived in the preceding article: we had the following synonyms and stopwords settings:

- synonyms = {“out of warranty”,”oow”}
- stopwords = {“of”}

Both of those filters were configured exclusively at query time; the synonym filter first and then the stopwords filter.

Using the built-in StopFilter we had a synonym detection issue because of the removal of the “of” term in the query string (e.g. “my device ran out of warranty“). For that reason, we introduced a custom StopFilter subclass which was aware of stopwords in synonyms.

The other scenario we are going to describe is a little bit different: let’s suppose we have the following data:

- synonyms = {test code, tdd, testing}
- stopwords = {my, your, how ,to, in}

Still, here, we want to manage synonyms and stopwords only at query time.
We have this document indexed:

				
					{
      "id": 1,
      "title": "Java programmer: do you want to test your code?"
   }

And a query like this:

"how to test code in Java?"

The Problem: missing synonym match

The query parser matches the “test code” synonym in the query and produces a query like this:

				
					(title:tdd title:testing PhraseQuery(title:"test code")) title:java

unfortunately there’s no match, because the document title contains an intruder: the “your” term between the “test” and “code”.

A Solution: invisible queries with and without synonym phrases

In the preceding article, we’ve underlined the role of the autoGeneratePhraseQueries flag. It is responsible for creating phrase clauses for all detected multi-term synonyms. In case this flag is set to false (or even missing) the generated query won’t have any phrase, even if a multi-term synonym is detected.

While usually this is not what you would expect, in this specific case it could be a valid alternative for dealing with such mismatching: a first request would require the “synonym phrasing” behaviour, but a second one wouldn’t. The first query would be:

				
					(title:tdd title:testing PhraseQuery(title:"test code")) title:java

After receiving an empty response, a second query will be sent, targeting another (similar) field related to a field type which has the autoGeneratePhraseQueries parameter set to false. That would generate the following query:

				
					(title:testing title:tdd (+title:test +title:code)) title:java

and here we would get a match!

A couple of notes:

- On the second try, we require the disjoint presence of those two terms (“test” and “code”) in whatever order, with whatever proximity, so the increased recall could produce some unexpected results. In case we are using the edismax query parser, a “pf” parameter would help move up those results which adhere better to the entered query, in terms of proximity and terms order.
- we could put the stop filter at index time, but that violates the precondition: we want pure query-time management.

How to implement such search workflow? In Solr, we need a couple of fields, the first one is exactly the field + field type we described in the preceding article, and the second is similar, the only difference is in the autoGeneratePhraseQueries parameter, which is set to false:

				
					<fieldtype 
       name="text_with_synonyms_phrases" 
       class="solr.TextField" autoGeneratePhraseQueries="true">
       
       <analyzer type="index">
           <tokenizer class="solr.StandardTokenizerFactory"/>
           <filter class="solr.LowerCaseFilterFactory"/>
       </analyzer>
       <analyzer type="query">
           <tokenizer class="solr.StandardTokenizerFactory"/>
           <filter class="solr.LowerCaseFilterFactory"/>
           <filter class="solr.SynonymGraphFilterFactory" 
                   synonyms="synonyms.txt" 
                   ignoreCase="false" 
                   expand="true"/>
           <filter class="sc.SynonymAwareStopFilterFactory" 
                   words="stopwords.txt" 
                   ignoreCase="true"/>
       </analyzer>
</fieldtype>

				
					<fieldtype 
       name="text_without_synonyms_phrases" 
       class="solr.TextField" autoGeneratePhraseQueries="false">
       
       <analyzer type="index">
           <tokenizer class="solr.StandardTokenizerFactory"/>
           <filter class="solr.LowerCaseFilterFactory"/>
       </analyzer>
       <analyzer type="query">
           <tokenizer class="solr.StandardTokenizerFactory"/>
           <filter class="solr.LowerCaseFilterFactory"/>
           <filter class="solr.SynonymGraphFilterFactory" 
                   synonyms="synonyms.txt" 
                   ignoreCase="false" 
                   expand="true"/>
           <filter class="sc.SynonymAwareStopFilterFactory" 
                   words="stopwords.txt" 
                   ignoreCase="true"/>
       </analyzer>
</fieldtype>

<field 
      name="title_with_synonyms_phrases" 
      type="text_with_synonyms_phrases .../>
<field 
      name="title_without_synonyms_phrases" 
      type="text_without_synonyms_phrases .../>

then, here is the minimal request handler:

				
					<requestHandler name="/search" class="solr.SearchHandler" default="true">
       <lst name="defaults">
           <bool name="sow">false</bool>
           <str name="df">title_with_synonyms_phrases</str>
           <str name="defType">lucene</str> 
       </lst>
   </requestHandler>

A client would send first a request like this:

				
					/search?q=how to test code in Java

And, after receiving an empty response, it will send a second query:

				
					/search?q=how to test code in Java&df=text_without_synonyms_phrases

Another option, which moves the search workflow on Solr side, is our CompositeRequestHandler, a Solr component which invokes in chain a set of RequestHandler instances: a first request handler, targeting the title_with_synonyms_phrases would be invoked and, in case of zero results, the same query will be sent to another request handler, which would target the title_without_synonyms_phrases.

Note for Elasticsearch users: you will find some differences in applying what is described above. Although the auto_generate_phrase_queries attribute is also present in Elasticsearch, it doesn’t have the same effect. What you’re looking for is an attribute which is not related to field types, it is a query attribute [2] [3] and it is called auto_generate_synonyms_phrase_query.

Need Help With This Topic?

If you’re struggling with synonyms and stopwords, don’t worry – we’re here to help! Our team offers expert services and training to help you optimize your search engine and get the most out of your system. Contact us today to learn more!

Need Help with this topic?

If you're struggling with synonyms and stopwords, don't worry - we're here to help! Our team offers expert services and training to help you optimize your search engine and get the most out of your system. Contact us today to learn more!

Click Here

analysis, apache solr, concept search, elasticsearch, lucene, multiterms-synonyms, solr, solr lucene, solr schema, stopwords, synonyms

Sign up for our Newsletter

Did you like this post? Don’t forget to subscribe to our Newsletter to stay always updated in the Information Retrieval world!

About the company

about our work

Rated Ranking Evaluator
(RRE)

Rated Ranking Evaluator Enterprise (RREE)

Apache Solr LLM Highlighter plugin

News

Main Blog

TIPS AND TRICKS

LATEST BLOG POST

contact us

Don't miss all the news - subscribe to our newsletter!

Still Synonyms + Stopwords?? Mamma mia!

The Context

The Problem: missing synonym match

A Solution: invisible queries with and without synonym phrases

Need Help With This Topic?

Need Help with this topic?

Other posts you may find useful

Apache Solr: Chaining SearchHandler instances: the CompositeRequestHandler

Build a Text Search API from a Postgres Database

Online Search Quality Evaluation With Kibana – Queries in Common

Andrea Gazzarini

Andrea Gazzarini

Follow Us

Top Categories

Recent Posts

London Information Retrieval & AI Meetup [February 2026]

Searching Children, Finding Parents: Nested KNN Vector Search in Solr

Binary Quantization of Dense Vectors in Apache Solr

Monthly video

Sign up for our Newsletter

Leave a Reply Cancel reply

Quick Links

Services

Subscribe

About the company

about our work

Rated Ranking Evaluator (RRE)

Rated Ranking Evaluator Enterprise (RREE)

Apache Solr LLM Highlighter plugin

News

Main Blog

TIPS AND TRICKS

LATEST BLOG POST

contact us

Don't miss all the news - subscribe to our newsletter!

Still Synonyms + Stopwords?? Mamma mia!

The Context

The Problem: missing synonym match

A Solution: invisible queries with and without synonym phrases

Need Help With This Topic?​​

Need Help with this topic?​

Other posts you may find useful

Apache Solr: Chaining SearchHandler instances: the CompositeRequestHandler

Build a Text Search API from a Postgres Database

Online Search Quality Evaluation With Kibana – Queries in Common

Andrea Gazzarini

Andrea Gazzarini

Follow Us

Top Categories

Recent Posts

London Information Retrieval & AI Meetup [February 2026]

Searching Children, Finding Parents: Nested KNN Vector Search in Solr

Binary Quantization of Dense Vectors in Apache Solr

Monthly video

Sign up for our Newsletter

Leave a Reply Cancel reply

Rated Ranking Evaluator
(RRE)

Need Help With This Topic?

Need Help with this topic?