Dense vector search was introduced in Apache Solr 9.0 in 2022 and since then it has received substantial adoption from the community.
Text vectorisation had to happen outside Solr, as there was no support to encode text to vector within the search engine transparently.
Apache Solr 9.8 changes this, introducing a module that allows interaction with well-known large language model providers such as OpenAI, Cohere, HuggingFace, and Mistral AI via the open-source library
LangChain4j.
Traditional (keyword) Search Problems
One of the biggest problems we’ve seen in many traditional search engines is the ‘vocabulary mismatch problem‘:
Incorrect or not exhaustive search results are returned because the terms used at query time (lexicon) don’t match the terms used in the documents of the corpus of information.
i.e.
queries and documents use different terms to describe the same concepts (or closely related).
Vector Search
A solution to the problem is to use Large Language Models (specifically fine-tuned for sentence similarity) to encode text to a numerical vector, in a way that sentences that are semantically similar are encoded to vectors that are close to each other in the vector space.
In this way, searching for content that is semantically close to a query sentence maps to running a k-nearest-neighbor query on vectors.
Up to Solr 9.7, as you can see from the diagram, the text vectorisation had to happen outside Solr, the search engine was only able to handle vectors, not supporting end-to-end semantic search transparently.
Semantic Search from Apache Solr 9.8
This changes with Apache Solr 9.8: with the introduction of the LLM module, you can configure Solr to talk with an external service to do the text vectorisation for you, offering a transparent semantic search experience end-to-end.
Once configured with a vectorisation model (and we’ll see shortly how to do it), Solr is able to encode text to vector (both at query and indexing time) and run vector search to find relevant to the user information need.
llm module (from Apache Solr 9.8, January 2025)
This module:
- stores the configuration to access text vectorisation APIs external to Solr (Langchain4j is internally used to interact with such APIs).
- implements a query parser (that encodes the query to a vector and then builds a Knn query)
- implements an Update Request Processor to vectorise the content of textual fields
To enable the module you can follow the standard Solr documentation:
- bin/solr start -e techproducts -Dsolr.modules=llm
- add a <str name=”modules”> tag to your solr.xml
- environment variable SOLR_MODULES (e.g. in solr.in.sh or solr.in.cmd)
- system property solr.modules
Once enabled you can configure and use its internal components (the query parser and the update request processor).
Models
A text-to-vector model has the responsibility of encoding text to vector.
At the time of writing only external models are supported: the text encoding doesn’t happen in the Solr JVM nor locally: ONLY EXTERNALLY.
A model (with the parameters to access it) is described via a JSON payload.
the Solr vectorisation model specifies the parameters to access the APIs, the model doesn’t run internally in Solr.
A model is described by these parameters:
class
| Required | Default: none |
The model implementation. Accepted values:
dev.langchain4j.model.huggingface.HuggingFaceEmbeddingModel.dev.langchain4j.model.mistralai.MistralAiEmbeddingModel.dev.langchain4j.model.openai.OpenAiEmbeddingModel.dev.langchain4j.model.cohere.CohereEmbeddingModel.
name
| Required | Default: none |
The identifier of your model is used by any component that intends to use the model (knn_text_to_vector query parser).
params
| Optional | Default: none |
Each model class has potentially different parameters. Many are shared but for the full set of parameters of the model you are interested in please refer to the official documentation of the LangChain4j version included in Solr: Vectorisation Models in LangChain4j.
Currently four models are supported: Hugging Face, MistralAI, OpenAI and Cohere.
To upload the model from a file: a /path/myModel.json file, please run:
curl -XPUT 'http://localhost:8983/solr/techproducts/schema/text-to-vector-model-store' --data-binary "@/path/myModel.json" -H 'Content-type:application/json'
To view all models:
http://localhost:8983/solr/techproducts/schema/text-to-vector-model-store
To view a model (‘model1’):
http://localhost:8983/solr/collection/schema/text-to-vector-model-store/
model1
To delete a model (‘model1’):
curl -XDELETE 'http://localhost:8983/solr/techproducts/schema/text-to-vector-model-store/model1'
Indexing Time
The ‘llm’ module introduces the ‘solr.llm.textvectorisation.update.processor.TextToVectorUpdateProcessor’, a component that processes a Solr document in input, enriching it with a vectorised encoding of a textual field.
_text_
vector
dummy-1
Adding this component to an update request processor chain means that all documents you index will be enriched with a vector, encoded from the ‘inputField’ using the model in the parameters.
The content of your document (‘inputField’) is sent to a remote hosted model. Be careful with your performance and privacy requirements!
Enrich Documents with Vectors on a Second Pass
Naive Approach
Vectorising is considered slow (especially when network latency is added to the picture).
You may want to first index your documents and then, in the background, add the vectorised fields.
Unfortunately, right now Solr doesn’t offer the capability of building the vector data structures in the background after the traditional indexing is complete (making it searchable and later vectorised).
There are still some workarounds that can mitigate the situation a bit.
A first approach is to define two update request processor chains, identical except one adds the vectorisation:
...
...
...
You first target the ‘no-vectorisation’ chain and index all your documents.
...
...
...
_text_
vector
dummy-1
Once it’s finished, you re-index all your documents targeting the second.
The effect you’ll see is that vectors will be added increasingly, while your documents become searchable lexically in a shorter amount of time.
Internally Solr will re-index everything, so there’s a lot of data traffic (data is sent again to Solr), lot of cpu waste (text is processed again and data structures rebuilt), a lot of deletions behind the scenes (each updated document is deleted and added again) and a lot of segment merges (as new segments are created).
Be careful with this approach as it affects the number of segments, deleted docs and merges that happen behind the scenes in Solr. For small scale systems this may be neglectable, but scaling up this negative aspect can be extremely important.
Partial Updates
A slightly better solution is to use partial updates:
you avoid sending the full document again to Solr when you want to add vectors.
This time, the chains look slightly different:
...
...
...
The ‘no-vectorisation’ looks exactly the same; you first index all your documents in a traditional way, executing all the update request processors you like.
Then, when you want to run the second pass to add the vectors, you target a new chain that only contains the text-to-vector processor (+ the mandatory ones).
…
_text_
vector
dummy-1
At this point, you can define a boolean field in your schema that keeps track of what documents have been vectorised already and just partially update that field, targeting the vectorisation chain:
re-index all your docs {"id":"mydoc","vectorised":{"set":true}}
Each partial update will just set the new field to ‘true’, with no data waste in networking. Solr will set the field and run the vectorisation chain that will take in input the field to be vectorised and encode it for each document.
The benefits in comparison to the naive solution are still minimal, as all the deleted and inserts will happen anyway, but at least we avoid the data networking transfer.
A big nice to have here would be an equivalent of the in place update for the vectors.
Interested in helping or sponsoring new Solr features like in place vectorisation? Reach out to us!
Query Time
Once you have your vectors indexed, running a natural language query is quite simple with the ‘org.apache.solr.llm.texttovector.search.TextToVectorQParserPlugin’.
First you define it in your solrconfig.xml:
Then, at query time, you just use it as you would for the knn query parser, but rather than passing the vector, you pass the natural language query and a reference to the model you want to use for vectorisation.
?q={!knn_text_to_vector model=modelName f=vectorField topK=10}a query
Behind the scenes, Solr will use the model to vectorise your query text and then run a vector search.
The text of your query is sent to a remote hosted model. Be careful with your performance and privacy requirements!
What's next?
There’s plenty of future work we want to contribute to enhance this new, exciting functionality:
- Local models (to run on your local machine or potentially in the Solr JVM)
- In place vectorisation update
- Vector search optimisations
- Retrieval augmented generation
- conversational search (including short\long term memory)
- Query/document expansion
- LLM highlighter/explainer
- From natural language to structured queries
- Better hybrid search





