Event, News

ECIR 2018 Experience

This blog is a quick summary of my (subjective) experience at ECIR 2018: the 40th European Conference on Information Retrieval, hosted in Grenoble (France) from 26/03/2018 to 29/03/2018.

Deep Learning and Explicability

Eight long papers accepted were about Deep Learning.
The topics “Neural Network” and “Word Embedding” were the most common in the accepted full papers (and rejected) at the conference. It is clear that deep learning technologies are strongly advancing as a de facto standard in Artificial Intelligence and this can be noticed also in Information Retrieval where these technologies can be used to better model the user behaviour and extract topics and semantics from documents.
But if with deep learning and the advanced capabilities of complex models you gain performance, on the other hand you lose the ability to explain and debug why such output corresponds to a given input.
A recurring topic in the Deep Learning track (and the Industry Day actually) was to find a balance in the performance gain of new techniques and the control over them.
It is an interesting topic and I believe it is just the other face of the coin, it is good though that Academia is noting the importance of such aspect: in the industry a technology requires control and maintenance much more than in the academic environment and most of the times the “debuggability” can affect the decision between a technology and another.

From Academic Papers to Production

The 2018 European Conference on Information Retrieval ended on 29/03 with a brilliant Industry Day focused on the perilous path from Research to Production.
This is the topic that most permeated the conference across different keynotes, sessions and informal discussions.
In Information Retrieval, it is still difficult to converge successful research into successful live systems: most of the time it is not clear which party should be interested in this process.
Okapi BM25 was first published and implemented in the 1980s and 1990s; it became the default similarity in Apache Lucene 6.0 in 2016.
Academia is focused on finding new interesting problems to solve with inventive techniques and Industry is focused on finding the quickest solution that works (usually).
This brings a gap: new solutions are not easily reproducible from academic papers and they are far from being practically ready to use outside the experimental controlled environment.
Brilliant researchers crave new interesting problems they can reason on and solve: their focus is to build a rough implementation and validate it with few metrics ( ideally with an accepted related publication ).
After that, their job is done, the problem is solved, the challenge is not interesting anymore and they can pass to the next problem.
Researchers get bored easily but they risk never seeing their research fulfilled, applied and used in real life.
Often Academia creates its problems to solve them and gets a publication: Publish or Perish -> no publications bring less funds.
This may or may be not a personal problem, depending on the individual.
The industry is usually seen by academics as a boring place where you just apply consolidated techniques to just get the best result with a minimum effort.
Sometimes industry is just where you end up when you want to make some money and see your effort to bring benefits to some population.
And this was (is) true most of the time but with the IT explosion we are living in and the boom of competition the situation nowadays is open to change, where a stronger connection between Academia and Industry can ( and should !) happen and conferences such as ECIR is a perfect ground to settle the basis.
So, building on the introduction let’s see a quick summary of the keynotes, topics and sessions that impressed me most from the conference!

From Academic Papers to Production: A Learning To Rank Story

DISCOVER MORE

Let’s start with Sease’s contribution to the conference: a story about the adoption of Learning To Rank in a real-world e-commerce scenario.
The session took place at the Industry Day (29/03) and focused on the challenges and pitfalls of moving from the research papers to production.
The entire journey was possible through Open Source software: Apache Solr and RankLib as the main actors.
Learning To Rank is becoming extremely popular in the industry and it is a very promising and useful technology to improve the relevancy function leveraging the user behaviour.
But from an open source perspective, is still quite a young technology and effort is required to get it right.
The main message that I wanted to transmit is: don’t be scared to fail, if something doesn’t work immediately out of the box, it doesn’t mean it’s not a valid technology, No Pain No Gain, Learning To Rank open source implementations are valid but require tuning and care to bring them to production with success (and the improvement these technologies can bring is extremely valuable).

The Harsh Reality of Production Information Access Systems

DISCOVER MORE

Recipient of the “Karen Spärck Jones Award” Fernando Diaz in this talk focused on the problems derived from the adoption of Information Retrieval technologies in production environments where a deep understanding of individuals, groups and society is required.
Building on the technical aspect involved in applying research in real-world systems, the focus switched to the ethical side: are current IR systems moving in the direction of providing the user with a piece of fair and equally accessible information or is the monetisation process behind it producing just addicted users who see (and buy) ad hoc tailored information?

Statistical Stemmers: A Reproducibility Study

DISCOVER MORE

This paper is a sort of symbol of the conference trend: reproducibility is as important as the innovation that research brings.
The winner of the Best Paper Award, may have caused some perplexity among the audience (why reward a paper that is not innovating but just reproducing past techniques?), but the message that transmits is clear: research needs to be easily reproducible and effort is required in that direction for a healthy Research & Development flow that doesn’t target just the publication but a real-world application.

Entity-centric Topic Extraction and Exploration: A Network-based Approach

DISCOVER MORE

Interesting talk, it explores topic modelling over time in a network-based fashion: Instead of modelling a topic as a ranked list of terms it uses a network (weighted graph) representation.
This may be interesting for an advanced More Like This implementation, it’s worth an investigation.

Information Scent, Searching and Stopping : Modelling SERP Level Stopping Behaviour

DISCOVER MORE

This talk focused on the entire Search Result Page as a possible signal that affects user stopping behaviour i.e. when a search result page is returned, the overall page quality affects the user perception of relevancy and may drive an immediate query reformulation OR a good abandonment ( when the information need is satisfied).
This is something that I experimented with: sometimes from a quick look at the result page you may realise if the search engine understood (or misunderstood) your information need.
Different factors are involved in what the author calls “Information Scent” but definitely the perceived relevance (modelled through different User Experience approaches) is an interesting topic that sits along the real relevance.
Further studies in this area may affect the way Search Results pages are rendered, to maximise the fruition of information.

Employing Document Embeddings to Solve the "New Catalog" Problem in User Targeting, and provide Explanations to the Users

DISCOVER MORE

The new catalog problem is a practical problem for modern recommender systems and platforms. There are a lot of use cases where you have a collection of items that you would like to recommend and this ranges over Music Streaming Platforms (Playlists, Albums, etc.), Video Streaming platforms (TV series genres, To-View lists etc.) and many other domains.
This paper explores both the algorithm behind such recommendations and the explanation needed: explaining to the user why a catalog may be relevant for his/her taste is as important as providing a relevant catalog of items.

Anatomy of an Idea: Mixing Open Source, research and Business

This keynote summarises the cornerstone of Sease culture: Open Source as a bridge between Academia and Industry.
If every research paper should be implemented in a production-ready open source platform as part of the publication process, the community would get a direct and immense benefit from it.
Iterative improvement would get great traction and generally speaking, the entire scientific community will get a strong boost with better accessibility.
Implementing research in state-of-the-art production-ready open source systems (where possible) would cut the adoption time at the industry level, triggering a healthy process of utilisation and bug fixing.

Industry Day

DISCOVER MORE

The industry day was the coronation of the overall trend that was permeating the conference: there is a strong need to build a better connection between the Academic and Industrial worlds.
The good audience reception shown ( the organisers had to move the track to the main venue) is proof that there is an increasingly stronger need to see interesting research applied (with success or failure) to the real world with the related lessons learned.
Plenty of talks in this session were brilliant, my favourites :

Fabrizio Silvestri (Facebook)
Query Embeddings: From Research to Production and Back!
Manos Tsagkias (904Labs)
A.I. for Search: Lessons Learned
Marc Bron (Schibsted Media Group)
Management of Industry Research: Experiences of a Research Scientist

In conclusion, the conference was perfectly organised in an excellent venue, the balance of topics and talks was fairly good (both academic and industrial) and I enjoyed my time in Grenoble, see you next year in Cologne!

apache solr, classification, deep learning, ecir, ecir2018, information retrieval, learning, learning to rank, ltr, machine learning, neural networks, recommender, recommender systems, solr, solr lucene

Alessandro Benedetti

Alessandro Benedetti is the founder of Sease Ltd. Senior Search Software Engineer, his focus is on R&D in information retrieval, information extraction, natural language processing, and machine learning.

read other blog posts of this author

Alessandro Benedetti

read other blog posts of this author

We are Sease, an Information Retrieval Company based in London, focused on providing R&D project guidance and implementation, Search consulting services, Training, and Search solutions using open source software like Apache Lucene/Solr, Elasticsearch, OpenSearch and Vespa.

Sign up for our Newsletter

Did you like this post? Don’t forget to subscribe to our Newsletter to stay always updated in the Information Retrieval world!

ECIR 2018 Experience

Deep Learning and Explicability

From Academic Papers to Production

From Academic Papers to Production: A Learning To Rank Story

The Harsh Reality of Production Information Access Systems

Statistical Stemmers: A Reproducibility Study

Entity-centric Topic Extraction and Exploration: A Network-based Approach

Information Scent, Searching and Stopping : Modelling SERP Level Stopping Behaviour

Employing Document Embeddings to Solve the "New Catalog" Problem in User Targeting, and provide Explanations to the Users

Anatomy of an Idea: Mixing Open Source, research and Business

Industry Day

Related

Alessandro Benedetti

Alessandro Benedetti

Follow Us

Top Categories

Recent Posts

Image Retrieval Using ViT + Generative Pre-trained Transformer (GPT)

OpenSearch Neural Search Tutorial: Hybrid Search

Sease at Community Over Code EU 2024

Monthly video

Sign up for our Newsletter

Leave a Reply Cancel reply

Quick Links

Services

Subscribe