This blog is a quick summary of my (subjective) experience at Haystack 2019 : the Search Relevance Conference, hosted in Charlottesville (Virginia, USA) from 24/04/2019 to 25/04/2019.
References to the slides will be updated as soon as they become available.
First of all my feedback on the Haystack Conference is extremely positive.
From my perspective the conference has been a success.
Charlottesville is a delightful small city in the heart of Virginia, clean, organized, spatious and definitely relaxing, it has been a pleasure to spend my time there.
The venue chosen for the conference was a Cinema, initially I was surprised but it worked really well, kudos to OpenSource Connections for the idea.
The conference and talks were meticulously organised, on time and with a relaxed pace, that definitely helped both the audience and the speakers to enjoy it more: thanks to the whole organisation for such quality!
Let’s take a look to the conference itself now: it has been 2 days of very interesting talks, exploring the latest trends in the industry in regards to search relevance with a delightful tech agnostic approach.
That’s been one of my favourite aspects of the conference: no one was trying to sell its product, it was just a genuine discussion of interesting problems and practical solutions, no comparison between Apache Solr and Elasticsearch, just pure reasoning on challenging problems, that’s brilliant!
Last but not least, the conference allowed amazing search people from all over the world and cultures to meet, interact and discuss about search problems and technologies, it may sound obvious for a conference but it’s a great achievement nonetheless!
Keynote: What is Search Relevance?
Max Irwin opened the conference with its keynote on the meaning of Search Relevance, the talk was a smooth and nice introduction to the topic, making sure everyone was on the same page, ready for the following talks.
A good part of the opening was dedicated to the problem of collecting ground truth ratings (from explicit to implicit and hybrid approaches).
After the keynote it was our turn, it has been an honour to open the track sessions in theatre 5 with our talk “Rated Ranking Evaluator: An Open Source Approach to Search Quality Evaluation”.
Our talk was a revised version on the introduction to RRE with a focus on the whole picture and how our software fits industry requirements.
Building on the introduction, we explored what search quality evaluation means for a generic information retrieval system and how you can apply the fundamental concepts of the topic to the real world with a full journey of assessing your system quality in an open source ecosystem.
Last part of the session was reserved for a quick demo, showing the key components in the RRE framework.
Really happy of the reception from the audience, I take the occasion to say a big thank you to everyone present in the theatre that day, this really encourages us to continue our work and make RRE even better.
Making the Case for Human Judgement Relevance Testing
After our talk, it was the turn of LexisNexis with an overview on judgement relevancy testing with the talk by Tito Serra and Tara Diedrichsen “Making the Case for Human Judgement Relevance Testing”.
The talk was quite interesting and explored the ways to practically setup a human relevance testing programme.
When dealing with humans, reaching or estimating consensus is not trivial and it is also quite important to details as much as possible why a document is rated that way (the reason is as important as the rating).
Lunch break and we’re back to the business with “Query Relaxation – a Rewriting Technique between Searching and Recommendations” by Rene Kriegler.
This one has been personally one of my favourites: from a clear definition of the problem (reducing the occurrence of zero results searches), the speaker illustrated various approaches, starting from just naive techniques (based on random removal of terms or term frequencies based removal) to the final word2vec + neural network system, able to drop words to maximise the probability of presenting a query reformulation that appeared in past sessions.
The overview of the entire journey was detailed and direct, especially because all the iterations were described and not only the final successful steps.
Beyond the Search Engine: Improving Relevancy through Query Expansion
And to conclude the first day I chose “Beyond the Search Engine: Improving Relevancy through Query Expansion”, a journey to improve the relevance in an e-commerce domain, from Taylor Rose and David Mitchell from Ibotta.
Focus of the talk was to describe a successful inter-team collaboration where a curated knowledge base used by the Machine Learning team proved quite useful to improve the mechanics of synonym matching and product categorisation.
After the sessions the first day ended with lightning talks.
They were very quick and thoughts provoking, some of them that caught my attention:
- Quaerite – From Tim Allison, a toolkit to optimise search parameters using genetic algorithms
- Hello LTR – From Doug Turnbull, a set of Jupiter notebooks to quickly spin up LTR experiments
- Hathithrust – finally had the chance to hear live about one of the earliest Solr adopters for “big data” (I remember their to be the first articles I read about scaling up Apache Solr back in 2010)
- Smui – Search Management UI for Synonyms
- Querqy – from Rene Kriegler, a framework for query preprocessing in Java-based search engines
Addressing Variance in AB Tests: Interleaved Evaluation of Rankers
The second day opened for me with “Addressing Variance in AB Tests: Interleaved Evaluation of Rankers” where Erik Bernhardson went through the way the Wikimedia foundation faced the necessity of speeding up their AB tests, reducing the data necessary to validate the statistical significance of such tests.
The concept of interleaving results to assess rankers is well known to the academic community, but it was extremely useful to see a real life application and comparison of some of the available techniques.
Especially useful was the description of 2 tentative approaches:
– Balanced Interleaving
– Team Draft Interleaving
To learn more about the topic Erik recommended this very interesting blog post by Netflix : Innovating Faster on Personalization Algorithms at Netflix Using Interleaving
In addition to that, for people curious of exploring more the topic I would recommend this github project : https://github.com/mpkato/interleaving .
It offers the python implementations of various interleaving algorithms and present a valid bibliography of solid publications on the matter.
Solving for Satisfaction: Introduction to Click Models
Then was Elizabeth Haubert turn with “Solving for Satisfaction: Introduction to Click Models” a very interesting talk, cursed by some technical issues that didn’t prevent Elizabeth to perform brilliantly and detail to the audience various approaches in modelling the attractiveness and utility of search results from the user interactions.
If you are curious to learn more about click models I recommend this interesting survey:
Click Models for Web Search that explores in details some of the models introduced by Elizabeth.
Last in the morning was “Custom Solr Query Parser Design Option, and Pros & Cons” from Bertrand Rigaldies: a live manual to customise Apache Solr query parsing capabilities to your needs, including a bit of coding to show the key components involved in writing a custom query parser.The example illustrated was about a slight customisation of proximity search behaviour (to parse the user query and build Lucene Span Queries to satisfy a specific requirement in distance tolerance) and capitalisation support.
The code and slides used in the presentation are available here : https://github.com/o19s/solr-query-parser-demo
After lunch John Berryman (co-author of Relevant Search) with “Search Logs + Machine Learning = Auto-Tagging Inventory” faced content tagging from a different perspective:
can we use query and clicks logs to guess tags for documents?
The idea makes sense, when given a query you interact with a document you are effectively generating a correlation between the two entities and this can definitely be used to help in the generation of tags!
In the talk John went through few iterative approaches (one based on just query-clicked docs training set and one based on query grouped by session), you find the Jupiter notebooks here for your reference, try them out!
Learning To Rank Panel
Following up the unfortunate absence of one of the speakers, a panel on Learning To Rank industry application took place, with interesting discussions about one of the hottest technologies right now that presents a lot of challenges still.
Various people were involved in the session and it was definitely pleasant to partecipate to the discussion.
The main takeaway from the panel has been that even if LTR is an extremely promising technology, few adopters are right now really ready to proceed with the integration:
garbage in, garbage out is still valid and extra care is needed when starting a LTR project.
Search with Vectors
Before the conference wrap up, the last session I attended was from Simon Hughes “Search with Vectors”, a beautiful survey of vectorised similarity calculation strategies and how to use them in search nowadays in correlation with word2vec and similar approaches.
The focus of the talk is to describe how vector based search can help with synonymy, polysemy, hyper/hypo-nyms and related concepts.
The related code and slides from previous talks are available in the Dice repo: https://github.com/DiceTechJobs/VectorsInSearch