Researching to improve how people interact with search technologies is a key value in Sease.
Let’s explore briefly the current areas of interest, for the details you can refer to the blog :
Meaningfully structuring content is critical for any domain, Natural Language Processing and Semantic Enrichment is becoming increasingly important to improve the quality of tasks related to information retrieval.With the Semantic Web moving towards full realisation thanks to the Linked Data initiative and with the interest of major search engines in structured data, the search world is finding it more attractive to make its information machine readable and exploit that information to improve search over its content.
Three trends are transforming the face of search:
2. Knowledge graphs. Leveraging relationships amongst entities: Linked Data datasets (Wikidata, DbPedia….) or custom companies’ knowledge bases.
3. Search assistance. Autocomplete and spellchecking are now common features, but making use of semantic data makes it possible to offer smarter features, guiding the users to what they want, in a natural way.
Identifying similar documents is an ancillary task for a search engine.
Giving in input a seed document and returning a set of similar results is not a trivial task.
More Like This (as it is called in the Lucene world) currently leverages the term frequencies in the seed document in relation with the entire corpus.
This can be improved if we can model the document content extracting the key concepts rather than mere keywords.
Machine Learning Integration
Machine Learning is a branch of Computer Science that studies ways to attribute the ability to learn to computers without explicitly programming them.
Basically a system aim to be able to update itself, based on past experience, to perform better its task.
A search engine is a natural fit for machine learning integration, it can benefit from learning in a lot of different areas :
– understanding better the semantic of the documents of the corpus
– improving the understanding of natural language to support advanced queries
– improving the understanding of what is relevant given a certain request
In particular Sease currently focus on :
Classification is the task of assigning a category to a document from a set of given categories and training data.
The system will learn how to assign a category to a document automatically, based on the provided human examples.
Integrating classification algorithms with your search engine can help in categorizing the input content and make it easier to search it later.
Learning To Rank
Learning To Rank is the ability to learn how to rank the search results coming from a query, based on training data collected from past user interactions.
Integrating Learning To Rank technologies to your search engine passes through different steps :
1) modelling your relevancy problem
2) identify and engineer interesting features that describe your domain entities
3) collect training samples ( implicit/explicit)
4) select a model and related training algorithm
5) train the model
6) integrate the trained model with your search engine (indexing time and query time)
Offering personalised Search Results can bring key value to your organization.
Personalisation ( in search results) is the ability to provide different ranking, depending on the user who is interacting with the search engine.
Different approaches can be used to drive the personalisation, from simple strategies to much more complex.
One possible example is content based :
1) define the features of interest for your domain
2) collect user interactions ( clicks, add to chart, sales)
3) calculate a user profile
4) boost results empowering results matching the user profile
A recommender engine is a system that is able to recommend items, even before a user start a search.
Simply giving in input the user identifier, the system is able to provide interesting results.
Search engines are the natural place to build recommenders as a lot of information is already in there, ready to be used.
Collaborative filtering, content based and hybrid approaches are explored to provide the most suitable way to apply to various domains.