Open source Contributions

Open Source Contributions

Sease strongly believes in Open source as a way to build a sustainable model for human progress.
If you are curious about Sease Open source projects you find them under our company GitHub account.
Our team is actively supporting the public mailing lists and continuously contributing code back to the community.

Here you can find a list of some of our biggest works.

In progress

We are currently working on a series of new projects. As this is a volunteer effort, we welcome and appreciate donations. In gratitude, we will acknowledge each donor by including their name in our contributions.

Apache Solr Retrieval Augmented Generation

Once configured with a Large Language Model (inference can happen locally on a dedicated Language Model Handler component or remotely accessing external APIs) this component will be able to take in input the query, the top-k results as context(coming from lexical, neural ar hybrid search) and use the LLM to craft the perfect answer with citations.

Progress

20%

Apache Solr LLM Query Rewriter

This component has the responsibility of parsing a natural language query and building a structured Solr Query, leveraging the interaction with a configured Large Language Model and the internal Solr index.
The result will be an easy-to-debug new Solr query, that leverages the combined power of the Solr inverted index terms and query expansion and understanding capabilities of LLMs.

Progress

25%

Apache SOLR Multi-Valued Vectors

Apache Solr Jira Issue – Github – Talk @ Community Over Code

The scope is to give the ability to Lucene of indexing multiple vectors per field (quite useful when working with long documents).

Progress

60%

Apache Solr
Hybrid Search

Apache Solr Jira Issue – Github – Talk @Berlin Buzzwords

Implement various approaches to combine and re-score search results coming from both lexical and neural models.
We’re talking about Reciprocal Rank Fusion algorithms and better support in Learning To Rank for vector similarity as a feature.

Progress

85%

Apache Solr
LLM Highlighter

A highlighter that takes in input a language model and uses it at runtime to build a snippet for each document, with the paragraph of text most relevant to the query.
Not based on lexical keyword matching but on semantic matching of the information requested.

Progress

65%

Project contributed

We have successfully completed a series of contributions to the open source community. Check the major ones.

Apache Solr Vector Search

We brought Vector Search to Apache Solr 9.0!
Through the implementation of the k-nearest neighbour search for vectors in Apache Solr, we have enabled the possibility of indexing and searching numerical vectors. You can generate the vectors using deep neural network models such as BERT (or through any other technique that encodes an information need/corpus in numerical format).
It leverages the Navigable Small Graph World Lucene internal implementation.

DONATIONS

OUR TALK @ BERLIN BUZZWORDS

➜ READ MORE

Apache Solr LLM module for text vectorisation

With the introduction of the LLM module in Apache Solr 9.8, you can configure Solr to talk with an external service to do the text vectorisation for you, offering a transparent semantic search experience end-to-end.

OUR TALK @ BERLIN BUZZWORDS

➜ READ MORE

APACHE SOLR LEARNING TO RANK PLUGIN

With the Learning To Rank (or LTR for short) module, you can configure and run machine-learned ranking models in Apache Solr.
We joined the original Bloomberg-led development and have since kept contributing code, talks, and posts.
Update 2025: In this contribution we introduce a new Learning to Rank feature vectors cache, used for both feature logging and reranking phases. The previous cache implementation is replaced by a more efficient one improving performance and speeding up the search.

Jira Issue: SOLR-16667GitHub PR: #3433

➜ READ MORE

APACHE SOLR VECTOR SEARCH ENHANCEMENTS

Early Termination Strategy (PatienceKnnVectorQuery)

We introduce PatienceKnnVectorQuery, a version of knn vector query that exits the HNSW graph early when the queue remains saturated beyond a threshold for more than a number of iterations (patience).

Jira Issue: SOLR-17814GitHub PR: #3644

SeededKnnVectorQuery
We introduce SeededKnnVectorQuery, a version of knn vector query that provides a query seed to initiate the vector search.

Jira Issue: SOLR-17813GitHub PR: #3705

ACORN-Based Filtering
This contribution gives users the possibility of disabling/regulating the ACORN behavior, an algorithm that presents an optimized solution for filtered vector search. You can now run a vector search with a new parameter, filteredSearchThreshold, which controls when ACORN optimizations are applied. This allows the system to switch from the baseline algorithm to a more efficient strategy that scores and explores only vectors matching the filtering criteria.

Jira Issue: SOLR-17815GitHub PR: #3680

➜ READ MORE

Search Quality Evaluation in the Era of LLMs: Dataset Generator

This toolkit is designed to make offline search quality evaluation accessible to everyone. It provides a dataset generator that works directly from an indexed document collection, and includes tools to analyze the metrics extracted from the generated dataset using both exact and approximate vector search.

RATED RANKING EVALUATOR/RATED RANKING EVALUATOR ENTERPRISE

Rated Ranking Evaluator (RRE) is a search quality evaluation library which evaluates the quality of results coming from a search system.
Rated Ranking Evaluator Enterprise (RREE) is the evolution of RRE. It simplifies the process of search relevance testing by offering both explicit and implicit rating methods to assess the performance of a search engine, without requiring deep technical expertise.

OUR TALK @ ECIR

➜ READ MORE ON RRE

➜ READ MORE ON RREE

APACHE SOLR LEARNING TO RANK INTERLEAVING

The Learning To Rank interleaving capability in Apache Solr can be used to mix up the results of different rankers to leverage the users’ implicit feedback and estimate the best ranking function.
We designed and developed the functionality, available from Apache Solr 8.8.

OUR TALK @ HAYSTACK

➜ READ MORE

APACHE LUCENE Word2Vec Model To Generate Synonyms

This project contribution to Apache Lucene integrates a Word2Vec model with the text analysis pipeline to generate synonyms based on the values stored in the indexed document fields.

OUR TALK @ BERLIN BUZZWORDS

➜ READ MORE

APACHE LUCENE
Weighted Synonyms

The weighted synonyms contribution makes it possible to assign a different weight to each synonym for a word and leverage the configuration to improve the search relevance of your search engine.
We designed and developed the functionality in Apache Lucene and Solr, available from 8.5.

➜ READ MORE

Apache Lucene/Solr
Document Classification

Document classification in Apache Lucene and Solr leverages the internal implementation of text classification to assign tags and classes to entire documents, unsupervised.
We implemented it on top of Lucene text classification and integrated it in Apache Solr, available from 6.1.

➜ READ MORE

APACHE LUCENE/SOLR
MORE LIKE THIS

The More Like This allows returning similar documents to an input document.
We have worked extensively on the feature for many years, from the Lucene and Solr sides, contributing many improvements and bug fixes.

OUR TALK @ OPEN SOURCE SUMMIT

➜ READ MORE

VARIOUS SMALL CONTRIBUTIONS

➜ RANKLIB LEARNING TO RANK BUGFIXING

➜ APACHE MANIFOLDCF TRANSFORMER PROCESSOR

About the company

about our work

Rated Ranking Evaluator
(RRE)

Rated Ranking Evaluator Enterprise (RREE)

Apache Solr LLM Highlighter plugin

News

Main Blog

TIPS AND TRICKS

LATEST BLOG POST

contact us

Don't miss all the news - subscribe to our newsletter!