Research & Development
Researching to improve how people interact with search technologies is a key value in Sease.
Let’s explore briefly the current areas of interest.
Areas of interest
Machine Learning Integration
Machine Learning is a branch of Computer Science that studies ways to attribute the ability to learn to computers without explicitly programming them.
Basically a system aim to be able to update itself, based on past experience, to perform better its task.
A search engine is a natural fit for machine learning integration, it can benefit from learning in a lot of different areas :
– understanding better the semantic of the documents of the corpus
– improving the understanding of natural language to support advanced queries
– improving the understanding of what is relevant given a certain request.
In particular Sease currently focus on:
Learning To Rank
Learning To Rank is the ability to learn how to rank the search results coming from a query, based on training data collected from past user interactions. Integrating Learning To Rank technologies to your search engine passes through different steps:
1) modelling your relevancy problem;
2) identify and engineer interesting features that describe your domain entities;
3) collect training samples (implicit/explicit);
4) select a model and related training algorithm;
5) train the model;
6) integrate the trained model with your search engine (indexing time and query time).
Classification is the task of assigning a category to a document from a set of given categories and training data. The system will learn how to assign a category to a document automatically, based on the provided human examples. Integrating classification algorithms with your search engine can help in categorizing the input content and make it easier to search it later.
Offering personalised Search Results can bring key value to your organization. Personalisation ( in search results) is the ability to provide different ranking, depending on the user who is interacting with the search engine. Different approaches can be used to drive the personalisation, from simple strategies to much more complex. One possible example is content based:
1) define the features of interest for your domain;
2) collect user interactions ( clicks, add to chart, sales);
3) calculate a user profile;
4) boost results empowering results matching the user profile.
Search Quality Evaluation
Evaluation is fundamental in every scientific domain.
Scientists come up with hypotheses to model real-world phenomena, and validate them by comparing their output with observations in nature. Evaluation plays the exact same key role in the field of information retrieval where researchers and practitioners:
– develop ranking models to explain the relationship between an information need expressed by a user (query) and information (search result) contained in available resources (corpus).
– test these models by comparing their outcomes with a collection of observations (implicit/explicit user feedback).
We are actively working on many aspects of the topic(Offline/Online testing, A/B testing, and Interleaving, Explicit/Implicit feedback, Click modeling and relevance estimation) and we contributed important milestones to the community:
Interleaving for Learning To Rank
Interleaving is an online evaluation approach for information retrieval systems that compares ranking functions by mixing their results and interpreting the users’ implicit feedback. We are actively researching the topic and contributed the Interleaving module for Learning to Rank in Apache Solr.
Rated Ranking Evaluator (RRE)
RRE is an open-source relevance testing library for Elasticsearch and Apache Solr that simplifies and standardizes the process of testing your relevance in a Continuous Integration approach, on big numbers of queries and documents and giving the possibility of easily exploring the results on various metrics. You can run the evaluation using command line interfaces, configuring JSON files, and integrating with various java artifacts provided by the framework. This tool is mostly dedicated to Software Engineers.
Rated Ranking Evaluator-Enterprise
RRE-Enterprise is a turnkey solution to help companies addressing the problem of Search Quality Evaluation. It offers you a complete application with a comprehensive User Interface that guides you from collecting the user's feedback and judgments in an intuitive way to the evaluation of your system quality with few clicks.
More than 80% of all potentially usable business information may originate in unstructured form.
Meaningfully structuring content is critical for any domain, Natural Language Processing and Semantic Enrichment is becoming increasingly important to improve the quality of tasks related to information retrieval.With the Semantic Web moving towards full realisation thanks to the Linked Data initiative and with the interest of major search engines in structured data, the search world is finding it more attractive to make its information machine readable and exploit that information to improve search over its content.
Three trends are transforming the face of search:
Autocomplete and spellchecking are now common features, but making use of semantic data makes it possible to offer smarter features, guiding the users to what they want, in a natural way.
Searching not by keyword, but by entities that represent specific concepts in a certain domain.
Leveraging relationships amongst entities: Linked Data datasets (Wikidata, DbPedia….) or custom companies’ knowledge bases.
Identifying similar documents is an ancillary task for a search engine.
Giving in input a seed document and returning a set of similar results is not a trivial task.
More Like This (as it is called in the Lucene world) currently leverages the term frequencies in the seed document in relation with the entire corpus.
This can be improved if we can model the document content extracting the key concepts rather than mere keywords.
A recommender engine is a system that is able to recommend items, even before a user start a search.
Simply giving in input the user identifier, the system is able to provide interesting results.
Search engines are the natural place to build recommenders as a lot of information is already in there, ready to be used.
Collaborative filtering, content based and hybrid approaches are explored to provide the most suitable way to apply to various domains.