Research & Development

Researching to improve how people interact with search technologies is a key value in Sease.
Let’s explore briefly the current areas of interest.

AI and Machine Learning Integration

Machine Learning is a branch of Computer Science that studies ways to attribute the ability to learn to computers without explicitly programming them.
Basically, a system aims to be able to update itself, based on past experience, to perform better its task.
A search engine is a natural fit for machine learning integration, it can benefit from learning in a lot of different areas :
– understanding better the semantic of the documents of the corpus
– improving the understanding of natural language to support advanced queries
– improving the understanding of what is relevant given a certain request.

In particular, Sease currently focuses on:

LLMs and RAG

Large Language Models and Retrieval Augmented Generation solutions are becoming ubiquitous.

At Sease, we are exploring the cutting edge of such technologies and techniques to bring additional benefits to the Search community, implementing new features in open source search engines and researching new ideas that can bring advancement in the field.

Vector Search

Vector Search is a technique that involves encoding text to vectors and then running an approximate K Nearest Neighbours search to find the closest candidate to the input query.

At Sease, we have strongly invested in the Lucene/Solr implementation and we are researching and implementing new ideas to improve it and make it more explainable.

Generative AI for Query/Document Expansion

Expanding documents at index time with additional information and potentially new terms that can better describe them is an interesting application of Machine Learning.

At Sease, we are researching new ways of enriching both documents and queries to reduce the gap between the user information need and the results.

Deep Learning for Search

Improving the effectiveness of your search results by implementing neural network-based techniques.

Deep Learning can help at improving search in several ways: from enabling image search using convolutional neural networks to improving document ranking using contextualized language models. The use of Deep Learning is rapidly becoming widespread in search engines. While the advancement of Deep Learning is remarkable and allows you to dramatically improve the quality of your search engine, it is critical to understand how to leverage it correctly for the specific use case.

Learning To Rank

Learning To Rank is the ability to learn how to rank the search results coming from a query, based on training data collected from past user interactions. Integrating Learning To Rank technologies into your search engine passes through different steps:

1) modelling your relevancy problem;
2) identify and engineer interesting features that describe your domain entities;
3) collect training samples (implicit/explicit);
4) select a model and related training algorithm;
5) train the model;
6) integrate the trained model with your search engine (indexing time and query time).

Personalisation

Offering personalised Search Results can bring key value to your organization. Personalisation (in search results) is the ability to provide a different ranking, depending on the user who is interacting with the search engine. Different approaches can be used to drive personalisation, from simple strategies to much more complex ones. One possible example is content-based:
1) define the features of interest for your domain;
2) collect user interactions ( clicks, add to chart, sales);
3) calculate a user profile;
4) boost results by empowering results matching the user profile.

Search Quality Evaluation

Evaluation is fundamental in every scientific domain.
Scientists come up with hypotheses to model real-world phenomena, and validate them by comparing their output with observations in nature. Evaluation plays the same key role in the field of information retrieval where researchers and practitioners:

– develop ranking models to explain the relationship between an information need expressed by a user (query) and information (search result) contained in available resources (corpus).
– test these models by comparing their outcomes with a collection of observations (implicit/explicit user feedback).

We are actively working on many aspects of the topic (Offline/Online testing, A/B testing, and Interleaving, Explicit/Implicit feedback, Click modelling and relevance estimation) and we contributed important milestones to the community:

Online Search Quality Evaluation

Interleaving is an online evaluation approach for information retrieval systems that compares ranking functions by mixing their results and interpreting the users’ implicit feedback.
We are actively researching the topic and contributed to the Interleaving module for Learning to Rank in Apache Solr.

Offline Search Quality Evaluation

RRE is an open-source relevance testing library for Elasticsearch and Apache Solr that simplifies and standardizes the process of testing your relevance in a Continuous Integration approach, on big numbers of queries and documents and gives the possibility of easily exploring the results on various metrics. You can run the evaluation using command line interfaces, configuring JSON files, and integrating with various Java artefacts provided by the framework. This tool is mostly dedicated to Software Engineers.

All-In-One Solution

RRE-Enterprise is a turnkey solution to help companies address the problem of Search Quality Evaluation. It offers you a complete application with a comprehensive User Interface that guides you from collecting the user’s feedback and judgments in an intuitive way to the evaluation of your system quality with few clicks.

Semantic Search

More than 80% of all potentially usable business information may originate in unstructured form.
Meaningfully structuring content is critical for any domain, Natural Language Processing and Semantic Enrichment are becoming increasingly important to improve the quality of tasks related to information retrieval. With the Semantic Web moving towards full realisation thanks to the Linked Data initiative and with the interest of major search engines in structured data, the search world is finding it more attractive to make its information machine-readable and exploit that information to improve search over its content.

Three trends are transforming the face of search:

Smart Autocomplete

Autocomplete and spellchecking are now common features, but making use of semantic data makes it possible to offer smarter features, guiding the users to what they want, in a natural way.

Entity Driven Search

Searching not by keyword, but by entities that represent specific concepts in a certain domain.

Knowledge Graphs

Leveraging relationships amongst entities: Linked Data datasets (Wikidata, DbPedia….) or custom companies’ knowledge bases.

Document Similarity

Identifying similar documents is an ancillary task for a search engine. Giving in input to a seed document and returning a set of similar results is not a trivial task.
More Like This (as it is called in the Lucene world) currently leverages the term frequencies in the seed document about the entire corpus.
This can be improved if we can model the document content by extracting the key concepts rather than mere keywords.

Recommender Engines

A recommender engine is a system that can recommend items, even before a user start a search.
Simply giving in input the user identifier, the system can provide interesting results.
Search engines are the natural place to build recommenders as a lot of information is already in there, ready to be used.
Collaborative filtering, content-based and hybrid approaches are explored to provide the most suitable way to apply to various domains.