This blog is a quick summary of my (subjective) experience at ECIR 2018 : the 40th European Conference on Information Retrieval, hosted in Grenoble (France) from 26/03/2018 to 29/03/2018.
Deep Learning and Explicability
Eight long papers accepted were about Deep Learning.
The topics “Neural Network” and “Word Embedding” were the most occurring in the accepted full papers (and rejected) at the conference. It is clear that deep learning technologies are strongly advancing as a de facto standard in Artificial Intelligence and this can be noticed also in Information Retrieval where these technologies can be used to better model the user behaviour and extract topics and semantic from documents.
But if with deep learning and the advanced capabilities of complex models you gain performance, on the other hand you lose the ability of explaining and debugging why such output corresponds to a given input.
A recurring topic in the Deep Learning track ( and the Industry Day actually) was to find a balance in the performance gain of new techniques and the control over them.
It is an interesting topic and I believe it is just the other face of the coin, it is good though that Academia is noting the importance of such aspect : in the industry a technology requires control and maintenance much more than in the academic environment and most of the times the “debuggability” can affect the decision between a technology and another.
From Academic Papers to Production
The 2018 European Conference on Information Retrieval ended the 29/03 with a brilliant Industry Day focused on the perilous path from Research to Production.
This is the topic that most permeated the conference across different keynotes, sessions and informal discussions.
In Information Retrieval it is still difficult to converge successful researches into successful live systems : most of the times it is not clear which party should be interested in this process.
Okapi BM25 was first published and implemented in the 1980s and 1990s; it became the default similarity in Apache Lucene 6.0 in 2016 .
Academia is focused on finding new interesting problems to solve with inventive techniques and Industry is focused on finding the quickest solution that works (usually).
This brings a gap : new solutions are not easily reproducible from academic papers and they are far from being practically ready to use outside the experimental controlled environment.
Brilliant researchers crave new interesting problems they can reason on and solve: their focus is to build a rough implementation and validate it with few metrics ( ideally with an accepted related publication ).
After that, their job is done, problem solved, the challenge is not interesting anymore and they can pass to the next problem.
Researchers get bored easily but they risk to never see their researches fulfilled, applied and used in real life.
Often Academia creates its own problems to solve them and get a publication : Publish or Perish -> no publications bring less funds.
This may be or may be not a personal problem, depending on the individual.
Industry is usually seen by academics as a boring place where you just apply consolidated techniques to just get the best result with a minimum effort.
Sometimes industry is just where you end up when you want to make some money and actually see your effort to bring benefits to some population.
And this was(is) true most of the times but with the IT explosion we are living and the boom of competition the situation nowadays is open to change, where a stronger connection between Academia and Industry can ( and should !) happen and conferences such as ECIR is a perfect ground to settle the basis.
So, building on the introduction let’s see a quick summary of the keynotes, topics and sessions that impressed me most from the conference!
From Academic Papers to Production : A Learning To Rank Story
Let’s start from Sease contribution to the conference : a story about the adoption of Learning To Rank in a real world e-commerce scenario.
The session took place at the Industry Day (29/03) and focused on the challenges and pitfalls of moving from the research papers to production.
The entire journey was possible through Open Source software : Apache Solr and RankLib as main actors.
Learning To Rank is becoming extremely popular in the industry and it is a very promising and useful technology to improve the relevancy function leveraging the user behaviour.
But from an open source perspective is still quite a young technology and effort is required to get it right.
The main message that I wanted to transmit is : don’t be scared to fail, if something doesn’t work immediately out of the box, it doesn’t mean it’s not a valid technology, No Pain No Gain, Learning To Rank open source implementations are valid but require tuning and care to bring them to production with success ( and the improvement these technologies can bring is extremely valuable).
The Harsh Reality of Production Information Access Systems
Recipient of the “Karen Spärck Jones Award” Fernando Diaz in this talk focused on the problems derived from the adoption of Information Retrieval technologies in production environments where a deep understanding of individuals, groups and society is required.
Building on the technical aspect involved in applying research in real world systems, the focus switched to the ethical side : are current IR systems moving in the direction of providing the user with a fair and equally accessible information or the monetisation process behind is producing just addicted users who see (and buy) ad hoc tailored information ?
Statistical Stemmers : A Reproducibility Study
This paper is a sort of symbol of the conference trend : reproducibility is as important as the innovation that a research brings.
Winner of the Best Paper Award, it may have caused some perplexity among the audience ( why to reward a paper that is not innovating but just reproducing past techniques ?), but the message that transmits is clear : a research needs to be easily reproducible and effort is required in that direction for an healthy Research & Development flow that doesn’t target just the publication but a real world application.
Entity-centric Topic Extraction and Exploration: A Network-based Approach
Interesting talk, it explores topic modelling over time in a network based fashion :
Instead of modelling a topic as ranked list of terms it uses a network (weighted graph) representation.
This may be interesting for an advanced More Like This implementation, it’s definitely worth an investigation.
Information Scent, Searching and Stopping : Modelling SERP Level Stopping Behaviour
This talk focused on the entire Search Result Page as a possible signal that affects user stopping behaviour i.e. when a search result page is returned, the overall page quality affects the user perception of relevancy and may drive an immediate query reformulation OR a good abandonment ( when the information need is satisfied).
This something that I personally experimented : sometimes from a quick look to the result page you may realise if the search engine understood ( or misunderstood) your information need.
Different factors are involved in what the author calls “Information Scent” but definitely the perceived relevance (modelled through different User Experience approaches) is definitely an interesting topic that sits along the real relevance.
Further studies in this area may affect the way Search Results pages are rendered, to maximise the fruition of information.
Employing Document Embeddings to Solve the “New Catalog” Problem in User Targeting, and provide Explanations to the Users
The new catalog problem is a practical problem for modern recommender systems and platforms. There are a lot of use cases where you have collection of items that you would like to recommend and this ranges over Music Streaming Platforms ( Playlists, Albums, ect), Video Streaming Platform ( Tv Series genres, To View lists ect) and many other domains.
This paper explores both the algorithm behind such recommendations and the explanation need : explaining to the user why a catalog may be relevant for his/her taste is as important as providing a relevant catalog of items.
Anatomy of an Idea: Mixing Open Source, research and Business
This keynote summarises the cornerstone of Sease culture :
Open Source as a bridge between Academia and Industry.
If every research paper should be implemented in a production ready open source platform as part of the publication process, the community would get a direct and immense benefit out of it.
Iterative improvement would get a great traction and generally speaking the entire scientific community will get a strong boost with a better accessibility .
Implementing researches in state of the art production ready open source systems ( where possible) would cut the adoption time at industry level, triggering an healthy process of utilisation and bug fixing.
The industry day was the coronation of the overall trend that was permeating the conference : there is a strong need to build a better connection between the Academic and Industrial world.
And the good audience reception shown ( the organisers had to move the track to the main venue) is a proof that there is an increasingly stronger need of seeing interesting researches applied ( with success or failure) to the real world with the related lessons learned.
Plenty of talks in this session were brilliant, my favourites :
- Fabrizio Silvestri (Facebook)
Query Embeddings: From Research to Production and Back!
- Manos Tsagkias (904Labs)
A.I. for Search: Lessons Learned
- Marc Bron (Schibsted Media Group)
Managment of Industry Research: Experiences of a Research Scientist
In conclusion, the conference was perfectly organised in an excellent venue, the balance in topics and talks was fairly good( both academic and industrial) and I really enjoyed my time in Grenoble, see you next year in Cologne !
Alessandro Benedetti is the founder of Sease Ltd.
Senior Search Software Engineer, his focus is on R&D in information retrieval, information extraction, natural language processing, and machine learning.