Sease, as a company, recognizes the value and importance of ongoing education and professional development for its employees. Therefore, Sease is delighted to offer sponsorship for conferences and similar events that promote learning and growth in various areas of expertise to its employees.

Dublin, April 2023: as usual, the week before easter is giving us a generous amount of Information Retrieval food for thought in the shape of the 45th European Conference on Information Retrieval (ECIR 2023).
The conference, back to its standard in-person format, boosts 5 packed-up days of R&D from the scientific community.
It targets many different topics from search to recommendations, from lexical to neural approaches, and from evaluation to efficiency.
This blog post summarises my personal experience through the list of short papers(presented as posters), long/reproducibility papers(presented as 15-minute talks), and industrial presentations that impressed me the most.
This doesn’t mean there weren’t other amazing papers, but the time I got was limited and I had to focus on what looked more interesting to me.
Without further ado let’s start!


This work from the Glasgow team (Mitko Gospodinov, Sean MacAvaney, and Craig Macdonald) won the best short paper award and struck my curiosity during the first-day posters presentations, on Monday.
The idea is simple and effective: expanding documents at indexing time with additional text coming from generated queries (Doc2Query) proved to be quite effective but it’s prone to hallucination.
Can we filter out the hallucinated queries at indexing time, reducing the impact on the final index size and speeding up/improving the quality of first-stage retrieval at query time?
Using relevance models to filter out the less relevant portion of generated queries per document proves to bring solid advantages, in particular, they considered three neural relevance models for filtering: ELECTRA, MonoT5, and TCT-ColBERT.
A big plus is also the environmental considerations (in the form of hours of computations) addressed in the paper’s last sections.
This is a very interesting work I recommend reading and integrating it into your practical applications.

This work from Guglielmo Faggioli, Thibault Formal, Stefano Marchesin, Stéphane Clinchant, Nicola Ferro, and Benjamin Piwowarski explores how popular Query Performance Predictors (QPP) behave in estimating the quality of neural information retrieval approaches.
What attracted my attention is the concept of Query Performance Predictors, which leverage statistical analysis and corpus-based information to estimate how well the system may perform (rather than using rated judgment lists, which is much more common in the industry).
The paper gives an excellent overview of the methods and assesses many of them on neural search system, the list of Query Performance Predictors evaluated is impressive and the list of neural search approaches is comprehensive, definitely an interesting read!

I have recently studied personally multi-field and multi-query-term retrieval (combined_fields_query , cross_fields, and a prototype in Apache Solr for implementing a new multi-field query parser).
My focus was on how to balance query terms Inverse Document Frequency and popularity (both for un-expanded queries and synonyms-expanded ones), so this paper from Tuomas Ketola and Thomas Roelleke naturally caught my attention.
Structured documents are everywhere and arguably they are even more common in industrial applications: I bet in your career you ended up working on multi-fielded documents most of the time.
This research work proposes an alternative model to BM25F: Information Content Field Weighting, grounded on analytical evidence and transparency.
A key aspect of structured document retrieval is weighting field scores, to obtain the final scoring function that combines the score of each field match, an activity done most of the time via optimisation (through a Learning To Rank approach for example).
The main method described in the paper explores field weighting based on the term distributions in the corpus (in relation to term frequencies and document frequencies) and it does an excellent job, even without training.
Would you like to see it in Apache Lucene and Solr? let us know in the comments below!
I had the chance to talk in the breaks with Tuomas, one of the authors and he’s a very nice and knowledgeable person, a big plus for any research!

This paper from Jonas Wallat, Fabian Beringer, Abhijit Anand, and Avishek Anand explores how BERT models encode specific linguistic properties and factual information.
The objective of using the probing strategy is to better understand each neural network layer and its contribution to the final language understanding.
The concept of a probe is that of a specific classifier for specific linguistic traits (for example Part Of Speech tagging) and it is used to evaluate how easy is the process of guessing a class from layer embeddings.
The probes take into account are for the tasks of:
lexical matching, semantical similarity, named entity recognition (NER), and coreference resolution.
It has been a nice talk and the findings are interesting!


Mengxing Dong, Bowei Zou, Yanling Li, and Yu Hong presented interesting research work on multi-choice reading comprehension, the task of assessing the correct answer among multi-choice distractors, given a query and a passage of ground truth.
CoLISA (Contrastive Learning and In-Sample Attention) is a novel method that aims to prudently exclude confusing distractors, and learning option-aware representations via contrastive learning on multiple options.
One of the main reasons this paper interested me is the fact that to a certain extent, identifying the most relevant documents in a search results list may be seen as a similar task to identifying the correct answer among distractors, definitely, an area I would like to study more in the next few months.

As soon as I read the title of this paper, I knew I wanted to know more: having unjudged documents, among your ratings, is a real and practical problem in the industry, where the corpus of information may change continuously and raters are unlikely to keep up with those changes.
Maik Fröbe, Lukas Gienapp, Martin Potthast, and Matthias Hagen explore this interesting problem and propose to use bootstrapping to generate a distribution of nDCG scores by sampling judgments for the unjudged documents using run-based and/or pool-based priors.
The results are encouraging and the experimental setup verified the accuracy of the proposed method, simulating un-judged documents out of a known dataset.
Definitely, interesting research I would love to see implemented in RRE!

Explainability is another topic I have personally explored a lot, in collaboration with my colleagues (explaining Learning To Rank).
Shap and Lime are very popular approaches and this research from Lijun Lyu and Avishek Anand proposes an alternative, based on approximating a black-box ranker with an aggregation of simple rankers with a listwise explainability approach.
Neural retrievers have been proven to bring many advantages in terms of precision and recall, but they naturally present interpretability challenges.
The research is built on post-hoc approaches, that approximate a neural ranker, on a query-by-query basis, aiming for the highest fidelity possible.
Then the simpler rankers can be used to highlight terms and phrases on the results and give a better intuition on why such documents were returned.

I really loved the Evaluation track! So many problems we face daily in our research/development activities and so many new intuitions and ideas!
Given I contributed interleaving to Apache Solr and we had a reproducibility paper published on interleaving (so consider at least 6 months of focused work on it) it was natural to channel my attention on this research from Kojiro Iizuka, Hajime Morita, and Makoto P. Kato.
The focus of the contribution is to investigate the claims of high efficiency for interleaved methods and back them with a consistent theoretical explanation, especially in comparison with A/B testing (an alternative online evaluation approach to interleaving).
The main intuition is that interleaving enables a better ranker to implicitly ‘steal’ the click opportunities of the lesser ranking, effectively limiting the number of clicks necessary to discern the ranking models, in comparison to other approaches.
I had also the occasion to speak with one of the authors (Kojiro), very kind person and open to discussing and brainstorming our approach that was presented the following day, thanks Kojiro!

I spent each poster session slowly passing by and assessing each short paper presented, it’s relaxing, feels personal and you can gather an incredible amount of ideas in a short frame of time from the authors themselves: it’s part of the day I prefer the most.
This short paper from Gregor Donabauer and Udo Kruschwitz was my favorite from the second day: I hate fake news and it’s always encouraging to see researchers working to address this annoying problem. The research adds the context from multiple social media platforms (represented with nodes and links in a graph) to ease the task of detecting a fake story.
The task shifts to a graph classification task, solvable with graph neural networks.
An interesting aspect of the research has been also to identify which social media features prove to be most useful in the task.
It was good to discuss this personally with the author!


This is our publication! I am so proud of what we achieved with my colleague Anna Ruggero: being a small company, it’s hard to find time and funding for pure research projects, but we did it and entirely self-funded!
Our work investigates the long-tail scenario for query distributions in the context of online evaluation through interleaved methods.
We’ve seen many times at our clients, this gap between the state-of-the-art literature and their realistic query distributions and we had the intuition of improving the outcome estimator using statistical hypothesis testing.
We wanted empirical evidence for our idea, so we started the reproducibility work and experimental analysis.
It’s been an amazing journey and I am genuinely extremely happy with the warm reception for our work, including various questions from the audience and discussions afterwards.
I’ll write a follow-up blog post on this entire experience in the next few days!

The third day was a bit of a mix of preparation for our talk and some extra-conference work, but I had the chance of enjoying a few other presentations and my favorite was this work from Giovanni Gabbolini and Derek Bridge. We wrote various blogs on Music Information Retrieval and in general, at Sease we are quite passionate about this.
This paper describes four novel approaches for multi-class context classification for playlists: the task is to predict the listening context (workout, reading, bedtime, hiking, etc) from a sequence of songs (playlist).
The research work explores both a metadata-only approach, integrated with a knowledge graph, and two variants that include audio processing, definitely, something we would like to try with our music-related clients!


Industry day is normally my favourite day, but I have to admit this year it was a bit of a hit-and-miss from my personal perspective:
some very interesting talks followed by pretty much sales pitches with not much technical value aside from a showcase of product X (don’t get me wrong, all talks were carefully crafted, but I have to admit I couldn’t gather much from some of them).
Given that, I enjoyed a lot these talks:

Towards Tenfold Productivity for Knowledge Workers: Combining Neural Search and Language Model Prompting

Jakub Zavrel (Zeta Alpha)

A lot of very interesting research works, done in the context of scientific/enterprise search

Relational Search and its Application to Investigative Intelligence Scenarios

Stephane Campinas, Matteo Catena and Renaud Delbru (Siren)

I met the founding people of Siren a long time ago (back in 2014), so it was very cool that their fast Lucene join implementation put the basis of a very interesting graph/aggregation explorer software, solid and successful!

Building a Cultural Search Engine: A Dense Multimodal Retrieval Approach

Konstantinos Perifanos and Lily Davies (codec.ai)

Interesting multi-modal retrieval using hierarchical navigable small-world graphs.

Building a Business-Contextual Image Retrieval API

Paul-Louis Nech (Algolia)

Very nice talk and interesting idea of mixing local sensitive hashing and graph-based approximate nearest neighbor.

Dedicated Search and Search-as-a-Service at Bloomberg

Ramsey Haddad, Andrey Ukhanov and Serkan Kirbas (Bloomberg)

We are long-time friends with Bloomberg, collaborated many times and attended many of their talks.
As usual, top-notch talk and probably the only one fully on target with the original industry day 2023 theme!
(Differences and challenges in building dedicated information access systems versus building “Search as a Service”).

Opportunities and Challenges in Multilingual Semantic Search

Nils Reimers (cohere.ai)

This was the final talk and I really liked the emphasis on neural search in opposition to lexical one.


And that’s it! It’s been an amazing week in Dublin, I came back with a luggage full of ideas and enthusiasm, met a ton of brilliant people, and had the chance of gathering with many old and new friends!
See you in Glasgow next year!


Subscribe to our newsletter

Did you like this post about Alessandro’s experience at ECIR 2023? Don’t forget to subscribe to our Newsletter to stay always updated in the Information Retrieval world!


Alessandro Benedetti

Alessandro Benedetti is the founder of Sease Ltd. Senior Search Software Engineer, his focus is on R&D in information retrieval, information extraction, natural language processing, and machine learning.

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.