Artificial Intelligence Applied to Search: Introduction
The main objective of this survey is to explore the state of the art of Artificial Intelligence applied to Search in the open source world.
1 – We start with an introduction, explaining what AI means in Information Retrieval and how it can improve search systems.
2 – For the second episode we move to OpenSearch and its neural search plugin, giving a detailed description of it through our end-to-end experience.
3 – The third episode explores Apache Solr, showing in practice how you use Apache Solr to index and search vectors and then run a full end-to-end neural search.
4 – Fourth episode is about Vespa with a comprehensive tutorial to implement Neural Search in this search technology, from documents and model preparation, to embeddings creation and k-NN queries.
5 – In the fifth episode we show in practice how you can use Elasticsearch to run a full end-to-end neural search.
6 – the sixth and last episode explains all the steps required to implement Text Embedding and Vector Search directly in Elasticsearch in a very simple way.
So, without further ado, let’s start!
How does Artificial Intelligence impact search?
Artificial intelligence (AI) is intelligence demonstrated by machines, unlike the natural intelligence displayed by humans and animals, which involves consciousness and emotionality.
Since computing power has strongly and steadily advanced in the recent past, AI has seen a resurgence lately and it is now used in many domains, including software engineering and Information Retrieval(the science that regulates Search Engines and similar systems).
Being a complex and generic topic, many sub-fields of Artificial Intelligence exist, each of them dealing with different technical considerations, goals, and tools they use.
We’ll focus particularly on Machine Learning :
Machine learning (ML) is the study of computer algorithms that improve automatically through experience and by the use of data. It is seen as a part of artificial intelligence. Machine learning algorithms build a model based on sample data, known as “training data“, in order to make predictions or decisions without being explicitly programmed to do so.
So what are the problems AI and Machine Learning may help to solve in Search?
Many actually, let’s see some example :
– natural language processing -> to better understand and model the user information need and corpus of information, text segmentation to target specific passages of information
– image/video recognition -> to extract features and search a multimedia corpus of information
– knowledge representation -> to build better data structures and search algorithms(e.g. vector-based), to identify meaning, synonyms, and relations between terms and concepts, spellcheck
– learning -> to learn relevance ranking functions, to classify query intent and documents, to offer personalized results
To solve these problems and bring interesting new capabilities to your search engine, deep learning comes to the rescue:
Deep learning (also known as deep structured learning) is part of a broader family of machine learning methods based on artificial neural networks with representation learning. Learning can be supervised, semi-supervised or unsupervised.
Deep-learning architectures such as deep neural networks, deep belief networks, graph neural networks, recurrent neural networks and convolutional neural networks have been applied to fields including computer vision, speech recognition, natural language processing, machine translation, bioinformatics, drug design, medical image analysis, material inspection and board game programs, where they have produced results comparable to and in some cases surpassing human expert performance.

Applying deep learning techniques to solve search problems is often called Neural Search (an industry derivation from the academic field of Neural information Retrieval).
We won’t explore the details of how Neural Networks work nor all the possibilities we can achieve with such a multi-faceted technology, so let’s keep our focus on what deep learning can contribute to search:
-
- a better text representation: moving away from the bag-of-words model (where terms are sequences of characters) to a multi-dimensional numerical(vectorized) approach, able to model terms as semantic units of information linked to each other, with meaning [1]
- text generation: language modeling techniques flourished and reached mainstream news thanks to outstanding results in generating text that is almost indistinguishable from human-made [2][3][4]
Generating text can be useful in many Information Retrieval areas: query auto-completion, query spellchecking, document summarization, search results explainability (summarizing the information that the document contributes to the user information need)…
Improvements in this field could bring to completely new types of information retrieval systems that behave like human experts: the system won’t just return a list of documents to satisfy your information need but synthesize a comprehensive natural language response backed by supporting evidence(documents). - a better image/video representation: extracting semantic features from images and videos (such as the objects and entities involved rather than just pixel and color-related information). [5]
Using large pre-trained models, finely tuned for your use case (potentially using transfer learning techniques) helps to build the foundation in advanced multimedia retrieval, reducing the effort of continuously supervised metadata tagging. - learning to rank: currently, the vast majority of search engines identify a set of candidate documents from the corpus of information (matching) and order them by relevance to satisfy the user information need(ranking).
Providing the most useful results first in the ranked list is fundamental: with deep learning is possible to train advanced relevance ranking models from past interactions/judgments to rank documents for a given query (both represented as numerical vectors) [6] - a better machine translation: having a computer able to translate languages with the quality of human experts has always been a challenge. Deep Learning managed to replace approaches such as rule-based systems and statistical phrase-based methods.[7] [8]
This brings huge benefits for multi-lingual search: you may query in a language and find documents in many different languages much more efficiently.[9]
From this introduction, we notice that many of the deep learning contributions to Search require supporting multi-dimensional numerical vectors in our search engine.
So how can you implement such wonders with currently available open-source technologies?
What is officially supported? Where do we need third-party plugins?
The next episode of this series explores Apache Lucene from this perspective! Stay tuned!
[3]https://medium.com/dataseries/six-times-bigger-than-gpt-3-inside-googles-trillion-parameter-switch-transformer-model-6f7a93c6aae , https://arxiv.org/pdf/2101.03961.pdf,
Shameless plug for our training and services!
Did I mention we do Learning To Rank and Artificial Intelligence in Search training?
We also provide consulting on these topics, get in touch if you want to bring your search engine to the next level with the power of AI!
Subscribe to our newsletter
Did you like this post about Artificial Intelligence Applied to Search? Don’t forget to subscribe to our Newsletter to stay always updated from the Information Retrieval world!
Related
Author
Alessandro Benedetti
Alessandro Benedetti is the founder of Sease Ltd. Senior Search Software Engineer, his focus is on R&D in information retrieval, information extraction, natural language processing, and machine learning.