Open Source Contributions
Sease strongly believes in Open source as a way to build a sustainable model for human progress.
If you are curious about Sease Open source projects you find them under our company GitHub account.
Our team is actively supporting the public mailing lists and continuously contributing code back to the community.
Here you can find a list of some of our biggest works.
In progress
We are currently working on a series of new projects. As this is a volunteer effort, we welcome and appreciate donations. In gratitude, we will acknowledge each donor by including their name in our contributions.
Apache Solr Retrieval Augmented Generation
Once configured with a Large Language Model (inference can happen locally on a dedicated Language Model Handler component or remotely accessing external APIs) this component will be able to take in input the query, the top-k results as context(coming from lexical, neural ar hybrid search) and use the LLM to craft the perfect answer with citations.
Apache Solr LLM Query Rewriter
This component has the responsibility of parsing a natural language query and building a structured Solr Query, leveraging the interaction with a configured Large Language Model and the internal Solr index.
The result will be an easy-to-debug new Solr query, that leverages the combined power of the Solr inverted index terms and query expansion and understanding capabilities of LLMs.
Apache SOLR Multi-Valued Vectors
Apache Solr Jira Issue – Github – Talk @ Community Over Code
The scope is to give the ability to Lucene of indexing multiple vectors per field (quite useful when working with long documents).
Apache Solr
Hybrid Search
Apache Solr Jira Issue – Github – Talk @Berlin Buzzwords
Implement various approaches to combine and re-score search results coming from both lexical and neural models.
We’re talking about Reciprocal Rank Fusion algorithms and better support in Learning To Rank for vector similarity as a feature.
Apache Solr
LLM Highlighter
A highlighter that takes in input a language model and uses it at runtime to build a snippet for each document, with the paragraph of text most relevant to the query.
Not based on lexical keyword matching but on semantic matching of the information requested.
Project contributed
We have successfully completed a series of contributions to the open source community. Check the major ones.
Apache Solr Vector Search
We brought Vector Search to Apache Solr 9.0!
Through the implementation of the k-nearest neighbour search for vectors in Apache Solr, we have enabled the possibility of indexing and searching numerical vectors. You can generate the vectors using deep neural network models such as BERT (or through any other technique that encodes an information need/corpus in numerical format).
It leverages the Navigable Small Graph World Lucene internal implementation.
DONATIONS
Apache Solr LLM module for text vectorisation
With the introduction of the LLM module in Apache Solr 9.8, you can configure Solr to talk with an external service to do the text vectorisation for you, offering a transparent semantic search experience end-to-end.
APACHE SOLR LEARNING TO RANK PLUGIN
With the Learning To Rank (or LTR for short) module, you can configure and run machine-learned ranking models in Apache Solr.
We joined the original Bloomberg-led development and have since kept contributing code, talks, and posts.
Update 2025: In this contribution we introduce a new Learning to Rank feature vectors cache, used for both feature logging and reranking phases. The previous cache implementation is replaced by a more efficient one improving performance and speeding up the search.
Jira Issue: SOLR-16667
GitHub PR: #3433
APACHE SOLR VECTOR SEARCH ENHANCEMENTS
Early Termination Strategy (PatienceKnnVectorQuery)
We introduce PatienceKnnVectorQuery, a version of knn vector query that exits the HNSW graph early when the queue remains saturated beyond a threshold for more than a number of iterations (patience).
Jira Issue: SOLR-17814
GitHub PR: #3644
SeededKnnVectorQuery
We introduce SeededKnnVectorQuery, a version of knn vector query that provides a query seed to initiate the vector search.
Jira Issue: SOLR-17813
GitHub PR: #3705
ACORN-Based Filtering
This contribution gives users the possibility of disabling/regulating the ACORN behavior, an algorithm that presents an optimized solution for filtered vector search. You can now run a vector search with a new parameter, filteredSearchThreshold, which controls when ACORN optimizations are applied. This allows the system to switch from the baseline algorithm to a more efficient strategy that scores and explores only vectors matching the filtering criteria.
Jira Issue: SOLR-17815
GitHub PR: #3680
Search Quality Evaluation in the Era of LLMs: Dataset Generator
This toolkit is designed to make offline search quality evaluation accessible to everyone. It provides a dataset generator that works directly from an indexed document collection, and includes tools to analyze the metrics extracted from the generated dataset using both exact and approximate vector search.
RATED RANKING EVALUATOR/RATED RANKING EVALUATOR ENTERPRISE
Rated Ranking Evaluator (RRE) is a search quality evaluation library which evaluates the quality of results coming from a search system.
Rated Ranking Evaluator Enterprise (RREE) is the evolution of RRE. It simplifies the process of search relevance testing by offering both explicit and implicit rating methods to assess the performance of a search engine, without requiring deep technical expertise.
APACHE SOLR LEARNING TO RANK INTERLEAVING
The Learning To Rank interleaving capability in Apache Solr can be used to mix up the results of different rankers to leverage the users’ implicit feedback and estimate the best ranking function.
We designed and developed the functionality, available from Apache Solr 8.8.
APACHE LUCENE Word2Vec Model To Generate Synonyms
This project contribution to Apache Lucene integrates a Word2Vec model with the text analysis pipeline to generate synonyms based on the values stored in the indexed document fields.
APACHE LUCENE
Weighted Synonyms
The weighted synonyms contribution makes it possible to assign a different weight to each synonym for a word and leverage the configuration to improve the search relevance of your search engine.
We designed and developed the functionality in Apache Lucene and Solr, available from 8.5.
Apache Lucene/Solr
Document Classification
Document classification in Apache Lucene and Solr leverages the internal implementation of text classification to assign tags and classes to entire documents, unsupervised.
We implemented it on top of Lucene text classification and integrated it in Apache Solr, available from 6.1.
APACHE LUCENE/SOLR
MORE LIKE THIS
The More Like This allows returning similar documents to an input document.
We have worked extensively on the feature for many years, from the Lucene and Solr sides, contributing many improvements and bug fixes.
VARIOUS SMALL CONTRIBUTIONS