The 45th European Conference on Information Retrieval (ECIR) is the annual premier European forum for the presentation of new research results in the broadly conceived area of Information Retrieval. ECIR provides an opportunity for both young and established researchers to present research papers reporting new, unpublished, and innovative research results.
Location: Dublin, Ireland
Date: 2-6th April 2023
Stat-weight: Improving the Estimator of Interleaved Methods Outcomes with Statistical Hypothesis Testing
Interleaving is an online evaluation approach for information retrieval systems that compares the effectiveness of ranking functions in interpreting the users’ implicit feedback. Previous work such as Hofmann et al. (2011) has evaluated the most promising interleaved methods at the time, on uniform distributions of queries. In the real world, usually, there is an unbalanced distribution of repeated queries that follows a long-tailed users’ search demand curve. This paper first aims to reproduce the Team Draft Interleaving accuracy evaluation on uniform query distributions and then focuses on assessing how this method generalises to long-tailed real-world scenarios. The replicability work raised interesting considerations on how the winning ranking function for each query should impact the overall winner for the entire evaluation. Based on what was observed, we propose that not all the queries should contribute to the final decision in equal proportion. As a result of these insights, we designed two variations of the ∆AB score winner estimator that assign to each query a credit based on statistical hypothesis testing. To reproduce, replicate and extend the original work, we have developed from scratch a system that simulates a search engine and users’ interactions from datasets from the industry. Our experiments confirm our intuition and show that our methods are promising in terms of accuracy, sensitivity, and robustness to noise.