Meetup, News

London Information Retrieval Meetup [November 2020]

October 20, 2020
5 mins read

After the success of the 2019 and 2020 editions, we are happy to announce we are hosting a sixth Information Retrieval Meetup in London, a free evening meetup aimed to Information Retrieval passionates and professionals who are curious to explore and discuss the latest trends in the field.

This time we go fully remote, given the COVID-19 situation and the impossibility of hosting the event live.

The evening will be structured with 2 technical talks followed by a networking/Q&A session.

first talk

Feature Extraction for Large-Scale Text Collections

Feature engineering is a fundamental but poorly documented component in LTR search applications.
As a result, there are still few open access software packages that allow researchers and practitioners to easily simulate a feature extraction pipeline and conduct experiments in a lab setting.

This talk introduces Fxt, an open-source framework to perform efficient and scalable feature extraction. Fxt may be integrated into complex, high-performance software applications to help solve a wide variety of text-based machine learning problems.
The talk details how we built and documented a reproducible feature extraction pipeline with LTR experiments using the ClueWeb09B collection.
This LTR dataset is publicly available.
We’ll also discuss some of the benefits (feature extraction efficiency, model interpretation) of having open access tooling in this area for researchers and practitioners alike.

the speaker

Luke Gallagher

PHD CANDIDATE @ RMIT University

Luke Gallagher is a PhD candidate at RMIT University working under the supervision of J. Shane Culpepper and B. Barla Cambazlogu. His research examines efficiency in multi-stage search components such as feature extraction, LTR and cascade ranking models. Luke believes that access to information, and education should be open and free. This has fueled his interests within the field of information retrieval, and a dedication to conduct research along the lines of these core values.

slides

Feature Extraction for Large-Scale Text Collections from Sease

video

second talk

A Learning to Rank Project on a Daily Song Ranking Problem

Ranking data, i.e., ordered list of items, naturally appears in a wide variety of situation; understanding how to adapt a specific dataset and to design the best approach to solve a ranking problem in a real-world scenario is thus crucial.This talk aims to illustrate how to set up and build a Learning to Rank (LTR) project starting from the available data, in our case a Spotify Dataset (available on Kaggle) on the Worldwide Daily Song Ranking, and ending with the implementation of a ranking model. A step by step (phased) approach to cope with this task using open source libraries will be presented.We will examine in depth the most important part of the pipeline that is the data preprocessing and in particular how to model and manipulate the features in order to create the proper input dataset, tailored to the machine learning algorithm requirements.

the speaker

Ilaria Petreti

R&D SOFTWARE ENGINEER @ SEASE

Ilaria is a Data Scientist passionate about the world of Artificial Intelligence. She loves applying Data Mining and Machine Learnings techniques, strongly believing in the power of Big Data and Digital Transformation.

Anna Ruggero

R&D SOFTWARE ENGINEER @ SEASE

Anna Ruggero is a software engineer passionate about Information Retrieval and Data Mining.
She loves to find new solutions to problems, suggesting and testing new ideas, especially those that concern the integration of machine learning techniques into information retrieval systems.