- This event has passed.
Word2Vec model to generate synonyms on the fly in Apache Lucene [Berlin Buzzwords]
June 14 @ 2:50 pm - 3:30 pm
Berlin Buzzwords is a conference focused on open source software projects in the field of big data analysis, scalability, storage and searchability. It provides a platform for developers, engineers, IT architects, analysts and data scientists who are interested in information retrieval, the searchability of large amounts of data, NoSQL and big data processing.
Daniele Antuzi is a software engineer passionate about high-performance data structures and algorithms. He has been working for 4 years in finance (List spa) and 2 years in cloud services (Amazon Web Services) but the curiosity to learn more about information retrieval brings him to join Sease Ltd.
He likes studying and experimenting with new technologies trying to reduce the gap between academia and industry.
Ilaria is an Information Retrieval/Machine Learning engineer at Sease. Strongly believing in the power of Big Data and Digital Transformation, she got a master in Data Science.
She loves the application of data mining and machine learning methods to information retrieval problems. Currently, she is involved in Learning to Rank projects.
Word2Vec model to generate synonyms on the fly in Apache Lucene
If you want to expand your query/documents with synonyms in Apache Lucene, you need to have a predefined file containing the list of terms that share the same semantic.
It’s not always easy to find a list of basic synonyms for a language and, even if you find it, this doesn’t necessarily match with your contextual domain.
The term “daemon” in the domain of operating system articles is not a synonym of “devil” but it’s closer to the term “process”.
Word2Vec is a two-layer neural network that takes as input a text and outputs a vector representation for each word in the dictionary.
Two words with similar meanings are identified with two vectors close to each other.
This talk explores our contribution to Apache Lucene that integrates this technique with the text analysis pipeline.
We will show how you can automatically generate synonyms on the fly from an Apache Lucene index and how you can use this new feature along with Apache Solr with practical examples!