Uncategorized

Bloomberg Sponsorship Spotlight: Our Latest Apache Solr Contributions

June 25, 2026
9 mins read

Over the past few months, we’ve been able to dedicate focused engineering time to upstream a lot of work in Apache Solr, especially in the areas of vector search, quantization, and LLM/embedding integration.

Acknowledgements

This work is sponsored by Bloomberg, and we want to clearly say thank you for the support and collaboration, especially to Andrey Ukhanov and Ken Laporte who made this initiative possible. Sponsorship like this also makes it possible to do the less visible (but essential) parts of upstreaming: solid fixes, tests, review iterations, documentation, and follow-ups so that changes land cleanly and are maintainable for the community.

And thanks to the contributors who drove these improvements:

Ilaria Petreti
Anna Ruggero
Kevin Liang
Elia Porciani
Alessandro Benedetti

Finally, thanks to the community members who reviewed, discussed, and helped land these changes through the usual Apache process.

Below is a summary of the Bloomberg-sponsored contributions we’ve completed recently (plus one that is very close to landing), with credit to the people who drove each change.

More reliable Text-to-Vector updates (including partial/atomic updates)

Text To Vector Update Processor atomic update bug

[SOLR-17843]

If you use Solr’s Text-to-Vector capabilities to enrich documents with embeddings, partial updates (atomic updates) are a common pattern: index first, then compute vectors later, or recompute vectors when only one field changes.

Kudos to Ilaria Petreti for driving this fix.

HuggingFace embeddings: restore compatibility with updated endpoints

Fix: LangChain4j + HuggingFace endpoint change

[SOLR-18000]

This issue updates the LangChain4j dependency to a version that includes the base URL change, restoring out-of-the-box compatibility for HuggingFace embeddings.

Kudos to Ilaria Petreti for driving this fix.

Vector search tuning and performance improvements

A large part of the recent work focuses on making Solr’s HNSW-based KNN search more controllable and more practical in real workloads (filters, hybrid retrieval, latency constraints).

ACORN-based filtered vector search: expose a tuning parameter

[SOLR-17815]

Filtered vector search is often challenging: pre-filtering and post-filtering approaches can become slow or unreliable depending on how selective the filter is.

ACORN is an algorithm designed to optimize filtered vector search by adapting both the HNSW graph and the search procedure to work efficiently with structured filters.

With this contribution, Solr supports passing Lucene’s filteredSearchThreshold directly as a KNN query parameter. The threshold controls when the ACORN optimization is applied: if the percentage of documents matching the filter falls below the configured threshold, Solr automatically uses ACORN for that query.

This contribution has been finalised, reviewed, and merged, kudos to Anna Ruggero!

Seeded KNN: let a query guide the HNSW entry points

[SOLR-17813]

With this addition, Solr supports seeded k-nearest-neighbors (kNN) searches: you can provide a specific “seed” query that is used to narrow down the candidate set before running the vector similarity computation. In practice, this helps steer the search toward a more relevant region of the index and can improve both precision and performance, especially in hybrid setups where you already have a strong lexical or structured signal.

This contribution has been finalised, reviewed, and merged, kudos to Ilaria Petreti!

Early termination for KNN: control tail latency

[SOLR-17814]

This contribution adds an early termination strategy that can improve kNN search performance by allowing searches to stop earlier under specific conditions. It introduces PatienceKnnVectorQuery, a new variant of the kNN vector query: unlike the standard approach, it can exit early when the HNSW queue remains saturated beyond a configurable saturation threshold. This helps reduce worst-case latency while maintaining useful result quality.

This contribution has been finalised, reviewed, and merged, kudos to Ilaria Petreti!

Independent control of candidate exploration vs returned results

[SOLR-17928]

A common tuning issue in vector search is that improving recall often requires exploring more candidates (efSearch), but you don’t necessarily want to return more results (topK).

Solr 10 introduces efSearchScaleFactor for the KNN query parser:

it lets you explore more candidates while still returning exactly topK
efSearch is computed internally as efSearchScaleFactor * topK (default 1.0 → efSearch == topK)

This gives a clean “accuracy vs cost” knob without forcing downstream changes in result set size.

Thanks to Elia Porciani and others who contributed and reviewed this work.

Quantization support: smaller, faster vector indexes (with clear trade-offs)

Quantization is one of the most impactful practical tools for vector search: it reduces query time and improves the memory footprint, at the cost of some loss in precision (depending on method and configuration).

Scalar quantization support

[SOLR-17780]

This contribution introduces a new schema field type, ScalarQuantizedDenseVectorField, which enables scalar quantization for vector data. Scalar quantization reduces the precision of floating-point values to a smaller number of bits—either 4 or 7 (default)—which can reduce memory consumption and speed up vector search by making vector indexes more compact.

This contribution has been finalised, reviewed, and merged, kudos to Kevin Liang!

Binary quantization support

[SOLR-17812]

This contribution introduces a new schema field type, BinaryQuantizedDenseVectorField, which enables binary quantization for vector data. Binary quantization reduces the precision of floating-point values down to a single bit, providing benefits similar to scalar quantization—lower memory usage and faster vector search—with a more extreme accuracy/quality trade-off.

Soon to be merged, kudos to Kevin Liang!

Terminology and parameter renames

[SOLR-17927]

Naming matters a lot once features become widely used. This work improves clarity and aligns terminology as Solr evolves:

HNSW parameters:
- hnswMaxConnections → hnswM
- hnswBeamWidth → hnswEfConstruction
Terminology: “neural search” → “vector search”
Module rename: llm → language-models
Backward compatibility: legacy parameter names are still accepted, but Solr logs warnings when they’re used.

Kudos to Ilaria Petreti and others that added additional work!

Documentation improvements: better examples for real usage

Tutorial updates: seeded KNN + early termination examples

[SOLR-3797]

Features are only useful if people can adopt them quickly and correctly. This documentation update improves the vector search tutorial by adding concrete examples for:

seeded vector search
early termination parameters

It also refines formatting to make copy/paste and incremental learning easier.

Sign up for our Newsletter

Did you like this post? Don’t forget to subscribe to our Newsletter to stay always updated in the Information Retrieval world!

About the company

about our work

Rated Ranking Evaluator (RRE)

Rated Ranking Evaluator Enterprise (RREE)

Apache Solr LLM Highlighter plugin

News

Main Blog

TIPS AND TRICKS

LATEST BLOG POST

contact us

Don't miss all the news - subscribe to our newsletter!

Bloomberg Sponsorship Spotlight: Our Latest Apache Solr Contributions

Acknowledgements

More reliable Text-to-Vector updates (including partial/atomic updates)

Text To Vector Update Processor atomic update bug

HuggingFace embeddings: restore compatibility with updated endpoints

Fix: LangChain4j + HuggingFace endpoint change

Vector search tuning and performance improvements

ACORN-based filtered vector search: expose a tuning parameter

Seeded KNN: let a query guide the HNSW entry points

Early termination for KNN: control tail latency

Independent control of candidate exploration vs returned results

Quantization support: smaller, faster vector indexes (with clear trade-offs)

Scalar quantization support

Binary quantization support

Terminology and parameter renames

Documentation improvements: better examples for real usage

Tutorial updates: seeded KNN + early termination examples

Other posts you may find useful

Solr Is Learning To Rank Better – Part 2 – Model Training

Apache Solr Learning to Rank Feature Stores and Models

SolrCloud Leader Election Failing

Lisa Biella

Lisa Biella

Follow Us

Top Categories

Recent Posts

Boosted K-Nearest Neighbor Search

Vector Search Doctor (Part 2): Bridging the Gap Between Theory and Practice in Vector Search

Vector Search Doctor (Part 1): Beyond the MTEB Leaderboard for Custom Datasets

Monthly video

Sign up for our Newsletter

Leave a Reply Cancel reply

Rated Ranking Evaluator
(RRE)