Search

Bloomberg Sponsorship Spotlight: Our Latest Apache Solr Contributions

Over the past few months, we’ve been able to dedicate focused engineering time to upstream a lot of work in Apache Solr, especially in the areas of vector searchquantization, and LLM/embedding integration.

Acknowledgements

This work is sponsored by Bloomberg, and we want to clearly say thank you for the support and collaboration, especially to Andrey Ukhanov and Ken Laporte who made this initiative possible. Sponsorship like this also makes it possible to do the less visible (but essential) parts of upstreaming: solid fixes, tests, review iterations, documentation, and follow-ups so that changes land cleanly and are maintainable for the community.

And thanks to the contributors who drove these improvements:

  • Ilaria Petreti

  • Anna Ruggero

  • Kevin Liang

  • Elia Porciani 

  • Alessandro Benedetti

Finally, thanks to the community members who reviewed, discussed, and helped land these changes through the usual Apache process.

Below is a summary of the Bloomberg-sponsored contributions we’ve completed recently (plus one that is very close to landing), with credit to the people who drove each change.

More reliable Text-to-Vector updates (including partial/atomic updates)

Text To Vector Update Processor atomic update bug

If you use Solr’s Text-to-Vector capabilities to enrich documents with embeddings, partial updates (atomic updates) are a common pattern: index first, then compute vectors later, or recompute vectors when only one field changes.

Kudos to Ilaria Petreti for driving this fix.

HuggingFace embeddings: restore compatibility with updated endpoints

Fix: LangChain4j + HuggingFace endpoint change

This issue updates the LangChain4j dependency to a version that includes the base URL change, restoring out-of-the-box compatibility for HuggingFace embeddings.

Kudos to Ilaria Petreti for driving this fix.

Vector search tuning and performance improvements

A large part of the recent work focuses on making Solr’s HNSW-based KNN search more controllable and more practical in real workloads (filters, hybrid retrieval, latency constraints).

ACORN-based filtered vector search: expose a tuning parameter

Filtered vector search is often challenging: pre-filtering and post-filtering approaches can become slow or unreliable depending on how selective the filter is.

ACORN is an algorithm designed to optimize filtered vector search by adapting both the HNSW graph and the search procedure to work efficiently with structured filters.

With this contribution, Solr supports passing Lucene’s filteredSearchThreshold directly as a KNN query parameter. The threshold controls when the ACORN optimization is applied: if the percentage of documents matching the filter falls below the configured threshold, Solr automatically uses ACORN for that query.

This contribution has been finalised, reviewed, and merged, kudos to Anna Ruggero!

Seeded KNN: let a query guide the HNSW entry points

With this addition, Solr supports seeded k-nearest-neighbors (kNN) searches: you can provide a specific “seed” query that is used to narrow down the candidate set before running the vector similarity computation. In practice, this helps steer the search toward a more relevant region of the index and can improve both precision and performance, especially in hybrid setups where you already have a strong lexical or structured signal.

This contribution has been finalised, reviewed, and merged, kudos to Ilaria Petreti!

Early termination for KNN: control tail latency

This contribution adds an early termination strategy that can improve kNN search performance by allowing searches to stop earlier under specific conditions. It introduces PatienceKnnVectorQuery, a new variant of the kNN vector query: unlike the standard approach, it can exit early when the HNSW queue remains saturated beyond a configurable saturation threshold. This helps reduce worst-case latency while maintaining useful result quality.

This contribution has been finalised, reviewed, and merged, kudos to Ilaria Petreti!

Independent control of candidate exploration vs returned results

A common tuning issue in vector search is that improving recall often requires exploring more candidates (efSearch), but you don’t necessarily want to return more results (topK).

Solr 10 introduces efSearchScaleFactor for the KNN query parser:

  • it lets you explore more candidates while still returning exactly topK

  • efSearch is computed internally as efSearchScaleFactor * topK (default 1.0 → efSearch == topK)

This gives a clean “accuracy vs cost” knob without forcing downstream changes in result set size.

Thanks to Elia Porciani and others who contributed and reviewed this work.

Quantization support: smaller, faster vector indexes (with clear trade-offs)

Quantization is one of the most impactful practical tools for vector search: it reduces query time and improves the memory footprint, at the cost of some loss in precision (depending on method and configuration).

Scalar quantization support

This contribution introduces a new schema field type, ScalarQuantizedDenseVectorField, which enables scalar quantization for vector data. Scalar quantization reduces the precision of floating-point values to a smaller number of bits—either 4 or 7 (default)—which can reduce memory consumption and speed up vector search by making vector indexes more compact.

This contribution has been finalised, reviewed, and merged, kudos to Kevin Liang!

Binary quantization support

This contribution introduces a new schema field type, BinaryQuantizedDenseVectorField, which enables binary quantization for vector data. Binary quantization reduces the precision of floating-point values down to a single bit, providing benefits similar to scalar quantization—lower memory usage and faster vector search—with a more extreme accuracy/quality trade-off.

Soon to be merged, kudos to Kevin Liang!

Terminology and parameter renames

Naming matters a lot once features become widely used. This work improves clarity and aligns terminology as Solr evolves:

  • HNSW parameters:

    • hnswMaxConnections → hnswM

    • hnswBeamWidth → hnswEfConstruction

  • Terminology: “neural search” → “vector search”

  • Module rename: llm → language-models

  • Backward compatibility: legacy parameter names are still accepted, but Solr logs warnings when they’re used.

Kudos to Ilaria Petreti and others that added additional work!

Documentation improvements: better examples for real usage

Tutorial updates: seeded KNN + early termination examples

Features are only useful if people can adopt them quickly and correctly. This documentation update improves the vector search tutorial by adding concrete examples for:

  • seeded vector search

  • early termination parameters

It also refines formatting to make copy/paste and incremental learning easier.

Other posts you may find useful

We are Sease, an Information Retrieval Company based in London, focused on providing R&D project guidance and implementation, Search consulting services, Training, and Search solutions using open source software like Apache Lucene/Solr, Elasticsearch, OpenSearch and Vespa.

Follow Us

Top Categories

Recent Posts

Monthly video

Sign up for our Newsletter

Did you like this post? Don’t forget to subscribe to our Newsletter to stay always updated in the Information Retrieval world!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.