Event, News

Sease at Berlin Buzzwords 2022

Germany’s most exciting conference on storing, processing, streaming and searching large amounts of digital data, with a focus on open-source software projects.

Location: Kulturbrauerei, Berlin
Date: 12th-14th June 2022

The first integrations of machine learning techniques with search allowed to improve the ranking of your search results (Learning To Rank) – but one limitation has always been that documents had to contain the keywords that the user typed in the search box in order to be retrieved. For example, the query “tiger” won’t retrieve documents containing only the terms “panthera tigris”. This is called the vocabulary mismatch problem and over the years it has been mitigated through query and document expansion approaches.

Neural search is an Artificial Intelligence technique that allows a search engine to reach those documents that are semantically similar to the user’s query without necessarily containing those terms; it avoids the need for long lists of synonyms by automatically learning the similarity of terms and sentences in your collection through the utilisation of deep neural networks and numerical vector representation.

This talk explores the first Apache Solr official contribution about this topic, available from Apache Solr 9.0.

During the talk we will give an overview of neural search: we will describe vector representations for queries and documents, and how Approximate K-Nearest Neighbor (KNN) vector search works.

We will show how neural search can be used along with deep learning techniques (e.g, BERT) or directly on vector data, and how we implemented this feature in Apache Solr.

If you want to expand your query/documents with synonyms in Apache Lucene, you need to have a predefined file containing the list of terms that share the same semantic. It’s not always easy to find a list of basic synonyms for a language and, even if you find it, this doesn’t necessarily match with your contextual domain.

The term “daemon” in the domain of operating system articles is not a synonym of “devil” but it’s closer to the term “process”.

Word2Vec is a two-layer neural network that takes as input a text and outputs a vector representation for each word in the dictionary. Two words with similar meanings are identified with two vectors close to each other.

This talk explores our contribution to Apache Lucene that integrates this technique with the text analysis pipeline.
We will show how you can automatically generate synonyms on the fly from an Apache Lucene index and how you can use this new feature along with Apache Solr

berlin buzzwords

Other posts you may find useful

Entity Search with Graph Embeddings – evaluation

We are Sease, an Information Retrieval Company based in London, focused on providing R&D project guidance and implementation, Search consulting services, Training, and Search solutions using open source software like Apache Lucene/Solr, Elasticsearch, OpenSearch and Vespa.

Sign up for our Newsletter

Did you like this post? Don’t forget to subscribe to our Newsletter to stay always updated in the Information Retrieval world!

About the company

about our work

Rated Ranking Evaluator
(RRE)

Rated Ranking Evaluator Enterprise (RREE)

Apache Solr LLM Highlighter plugin

News

Main Blog

TIPS AND TRICKS

LATEST BLOG POST

contact us

Don't miss all the news - subscribe to our newsletter!

Sease at Berlin Buzzwords 2022

Other posts you may find useful

Entity Search with graph embeddings – Part 4 – Evaluation and conclusion

Semantic Web & Linked Open Data

Solr Is Learning To Rank Better – Part 3 – Ltr tools

Lisa Biella

Lisa Biella

Follow Us

Top Categories

Recent Posts

Retrieval and Responsibility: The Ethics of Augmented Knowledge

Faster Vector Search: Early Termination Strategy Now in Apache Solr

OpenSearch and Large Language Models

Monthly video

Sign up for our Newsletter

Leave a Reply Cancel reply

Quick Links

Services

Subscribe

About the company

about our work

Rated Ranking Evaluator (RRE)

Rated Ranking Evaluator Enterprise (RREE)

Apache Solr LLM Highlighter plugin

News

Main Blog

TIPS AND TRICKS

LATEST BLOG POST

contact us

Don't miss all the news - subscribe to our newsletter!

Sease at Berlin Buzzwords 2022

Other posts you may find useful

Entity Search with graph embeddings – Part 4 – Evaluation and conclusion

Semantic Web & Linked Open Data

Solr Is Learning To Rank Better – Part 3 – Ltr tools

Lisa Biella

Lisa Biella

Follow Us

Top Categories

Recent Posts

Retrieval and Responsibility: The Ethics of Augmented Knowledge

Faster Vector Search: Early Termination Strategy Now in Apache Solr

OpenSearch and Large Language Models

Monthly video

Sign up for our Newsletter

Leave a Reply Cancel reply

Rated Ranking Evaluator
(RRE)