We are delighted to announce the Twenty-second London Information Retrieval Meetup, a free evening event aimed at Information Retrieval enthusiasts and professionals who are curious to explore and discuss the latest trends in the field.
This time the Meetup is Hybrid, with a live event in London, thanks to BCS and Search Solutions, being streamed online on Zoom!
ATTENTION
Remember to fill out the form to confirm the registration
in-presence
Location:
BCS London Headquarters,
Ground Floor, 25 – Copthall Avenue EC2R 7BP
Date: 25th November 2024
open doors from 6:00 PM (GMT)
online
Location:
Zoom [You will receive the link after the registration]
Date: 25th November 2024
open doors from 6:15 PM (GMT)
LONDON INFORMATION RETRIEVAL MEETUP
PROGRAM
The event will be structured around 2 technical talks, each followed by a Q&A session. The event will end with a networking session.
> Open doors from 6:00 GMT (in-presence)
> 6:15 GMT open doors for virtual attendees
- Welcome & Latest News – Alessandro Benedetti, Director @ Sease
> 6:30 GMT First talk – Towards Standardization of Privacy-Preserving IR: Decentralized Algorithms with a User-Centric Design (ESPRESSO) – Mohammad Bahrani | Research Associate @ University of Southampton
> 7:15 GMT Second talk – coming soon…
> 8:00 GMT Networking session + buffet
first talk
Towards Standardization of Privacy-Preserving IR: Decentralized Algorithms with a User-Centric Design (ESPRESSO)
The growing need for user privacy, data ownership, and digital sovereignty has given rise to research in decentralized systems, driving innovation in information retrieval (IR). While traditional IR techniques are well-established for centralized systems, the decentralized and federated IR domain still lacks a standardized framework for privacy-preserving systems that ensure full user access control and data ownership.
The ESPRESSO project seeks to address this gap by researching, developing, and evaluating decentralized algorithms, meta-information data structures, and indexing techniques to enable scalable IR across personal online data stores (PODS). One of the main contributions of our project is achieving accurate retrieval quality while addressing challenges such as strict access control in PODS, noisy data, and varying user contributions. Through ESPRESSO, we aim to establish a robust, privacy-preserving, and user-controlled decentralized IR paradigm, empowering data scientists, researchers, and stakeholders to derive meaningful benefits in the future.
Mohammad Bahrani
Research Associate @ University of Southampton
Mohammad Bahrani completed his PhD at Queen Mary University of London, where his research focused on improving notification filtering in critical domains, such as healthcare, through IR and advanced probabilistic models. His work went beyond traditional relevance-based evaluations by incorporating urgency as a key measure. During his PhD, Mohammad developed retrieval models enriched with semantic features, including sentiments, entities, and terms.
He recently joined the Web and Internet Science (WAIS) group at the University of Southampton as a Research Associate. He is currently contributing to the Efficient Search over Personal Repositories – Secure and Sovereign (ESPRESSO) project, focusing on privacy-preserving and user-controlled decentralized IR.
VIDEO
second talk
Blazing-Fast Serverless MapReduce Indexer for Apache Solr
Indexing data from databases to Apache Solr has always been an open problem: for a while, the data import handler was used even if it was not recommended for production environments. Traditional indexing processes often encounter scalability challenges, especially with large datasets.
In this talk, we explore the architecture and implementation of a serverless MapReduce indexer designed for Apache Solr but extendable to any search engine. By embracing a serverless approach, we can take advantage of the elasticity and scalability offered by cloud services like AWS Lambda, enabling efficient indexing without needing to manage infrastructure.
We dig into the principles of MapReduce, a programming model for processing large datasets, and discuss how it can be adapted for indexing documents into Apache Solr. Using AWS Step Functions to orchestrate multiple Lambdas, we demonstrate how to distribute indexing tasks across multiple resources, achieving parallel processing and significantly reducing indexing times.
Through practical examples, we address key considerations such as data partitioning, fault tolerance, concurrency, and cost.
We also cover integration points with other AWS services such as Amazon S3 for data storage and retrieval, as well as DynamoDB for distributed lock between the lambda instances.
Daniele Antuzi
R&D SOFTWARE ENGINEER @ SEASE
Software engineer passionate about high-performance data structures and algorithms.
He likes studying and experimenting new technologies trying to improve the state of the art.





