After the success of the 2019 and 2020 editions, we are happy to announce we are hosting a sixth Information Retrieval Meetup in London, a free evening meetup aimed to Information Retrieval passionates and professionals who are curious to explore and discuss the latest trends in the field.
This time we go fully remote, given the COVID-19 situation and the impossibility of hosting the event live.
The evening will be structured with 2 technical talks followed by a networking/Q&A session.
You are invited to register : Register here
After a short welcome & latest news speech from our Founder Alessandro Benedetti, we will proceed to the first talk.
Our first speaker is Luke Gallagher PhD candidate at RMIT University:
Luke Gallagher is a PhD candidate at RMIT University working under the supervision of J. Shane Culpepper and B. Barla Cambazlogu. His research examines efficiency in multi-stage search components such as feature extraction, LTR and cascade ranking models. Luke believes that access to information, and education should be open and free. This has fueled his interests within the field of information retrieval, and a dedication to conduct research along the lines of these core values.
Feature Extraction for Large-Scale Text Collections
Feature engineering is a fundamental but poorly documented component in LTR search applications.
As a result, there are still few open access software packages that allow researchers and practitioners to easily simulate a feature extraction pipeline and conduct experiments in a lab setting.
This talk introduces Fxt, an open-source framework to perform efficient and scalable feature extraction. Fxt may be integrated into complex, high-performance software applications to help solve a wide variety of text-based machine learning problems.
The talk details how we built and documented a reproducible feature extraction pipeline with LTR experiments using the ClueWeb09B collection.
This LTR dataset is publicly available.
We’ll also discuss some of the benefits (feature extraction efficiency, model interpretation) of having open access tooling in this area for researchers and practitioners alike.
Second speakers will be Ilaria Petreti and Anna Ruggero from Sease:
Ilaria is a Data Scientist passionate about the world of Artificial Intelligence.She got a Master in Data Science, strongly believing in the power of Big Data and Digital Transformation. Thanks to the practical application on Flight Delay Prediction developed during her thesis work, she implemented several Data Mining and Machine Learning techniques and became familiar with the programming language R. She is also involved in a Research Project, deepening her knowledge about Ensemble Learning, with a specific focus on the Super Learner algorithm.
Anna Ruggero is a software engineer passionate about Information Retrieval and Data Mining.
She loves to find new solutions to problems, suggesting and testing new ideas, especially those that concern the integration of machine learning techniques into information retrieval systems.
Anna came into contact with search engines during her studies falling in love with this world, therefore she decided to investigate this topic further participating to the 12th European Summer School in Information Retrieval and doing her master degree dissertation on Entity Search.
Thanks to this path, she has expanded and improved her knowledges of Java and Python languages, information retrieval systems, clustering and word embeddings.
A Learning to Rank Project on a Daily Song Ranking Problem
Ranking data, i.e., ordered list of items, naturally appears in a wide variety of situation; understanding how to adapt a specific dataset and to design the best approach to solve a ranking problem in a real-world scenario is thus crucial.This talk aims to illustrate how to set up and build a Learning to Rank (LTR) project starting from the available data, in our case a Spotify Dataset (available on Kaggle) on the Worldwide Daily Song Ranking, and ending with the implementation of a ranking model. A step by step (phased) approach to cope with this task using open source libraries will be presented.We will examine in depth the most important part of the pipeline that is the data preprocessing and in particular how to model and manipulate the features in order to create the proper input dataset, tailored to the machine learning algorithm requirements.
Join us for a free evening online event!