Improve e-commerce search relevance with Learning to Rank

CASE STUDY

Improve e-commerce search relevance with Learning to Rank

Index

The Client
The Problem
The Solution
- Exploration & Discovery
- Learning to Rank integration
The Workflow
Conclusions

The Client

Adore Beauty is an Australian online cosmetics retailer. The company was founded in 1999 by Kate Morris and James Height and is based in the suburb of Northcote in Melbourne, Victoria.

The Problem

The client didn’t have enough expertise with Apache Solr and wanted support for the technology and how to integrate machine learning technologies to improve the ranking of the search results of their e-commerce production system.

The Solution

Sease has been involved in two main activities to help AdoreBeauty with their mission:

Exploration & Discovery
Implementing a Learning To Rank integration from scratch

Let’s see the details of how we approached these activities.

Exploration & Discovery

Over the years we have refined how to face these scenarios and the formula we have designed is a set of online and offline activities with the scope of understanding the client search system and giving immediate value through a systematic audit.

The main focus of the Exploration & Discovery phase is to give immediate value to the client, highlighting what they are doing well and what can be done better in terms of processes, architecture, configuration and usage.

Once that team dynamics, communication protocols, and software engineering practices are clarified it’s time to get technical, so we reviewed the system architecture and Apache Solr schema and solrconfig configurations.
Analysing these aspects is strictly tied to the client’s requirements and how they expect their users to interact with the search engine.
For the specific case of Adore Beauty, we did a deep analysis of their SolrCloud architecture to design a reliable infrastructure to be used both in production and in twin environment to be used for dev and staging.

We evaluated the amount of data available and the growth rate per day to estimate the memory/disk requirements, with extra care to the JVM and off-heap RAM allocation.

Finally we assessed the number of Apache Solr instances required and the sharding/replication strategy, ideal for the client situation, as we didn’t want to over-shard (that brings consequences in extra workload necessary for the coordination).

This ended up as a detailed report, used by the client as a guideline for the initial project (and many more to follow!)

Each chapter has a short summary with the main takeaways and a list of paragraphs, each paragraph has an additional summary with the key takeaways and the long explanation.

Learning To Rank

Learning To Rank is the application of Machine Learning to learn a ranking function for your search results that optimises the users’/business utility.
To implement a Learning To Rank project, you need a training set, where queries and documents are described by a numerical vector (features) and <query, document> pairs are rated based on how relevant a document is for a certain query.

When approaching a Learning To Rank project you need to deeply understand a core set of factors that impact relevance in your client’s domain. And then proceed iteratively

The first step in any Learning To Rank project is to discuss and work with the business to identify what factors impact the relevance, in their domain.
Then each factor must be encoded in a numerical form and become a feature for the model to be trained on.
After that, it was important to define how we could estimate the relevance of a document, given a query. There are normally two ways to do that:

explicit (where a human or team of humans assign the rating explicitly)
implicit (where the rating is estimated based on users’ interactions)

We performed a deep investigation with the client and decided to go with a mix of the two, capturing users’ interactions such as clicks and add-to carts and integrating them with experts’ opinions.
Once the design is settled the next step is the implementation of a user interactions collector and a training set builder.

Collecting user interactions and converting them to a training set is fundamental: that’s your first step! Especially because you can use the interactions to monitor your search engine online

The User Interaction Collector has the responsibility of receiving interactions and storing them in a usable format (such as JSON).
The Training Set Builder has the responsibility of reading the JSON interactions and build a numerical training/test set.

These two applications were designed and implemented from scratch and brought to production.
Sease curated the pipeline end to end, from the collection of interactions through dedicated APIS to the training of the model.

>> Integration with apache solr

After that it was time to integrate with Apache Solr: we designed and develop a solution to deploy the model and configured the instance to extract features following the ones used for training.

The road to production has been extremely perilous, a Learning To Rank project can fail at many points, to mitigate that:

we’ve built an automated system to monitor the collected interactions and assess their quality

we’ve covered the whole process with unit, integration and end-to-end tests.

>> EVALUATION

Evaluation is fundamental for a Learning To Rank project, starting from offline evaluation at training time to online evaluation to monitor the system in production.
We curated both the processes and experimented with both classic A/B test online and interleaving.
In regards to online monitoring we applied techniques we designed and implemented both for clients and internal projects such as A/B testing dashboards using Kibana from the Elastic stack.

The Workflow

During the entire project, Sease followed an organized approach, utilizing web calls and emails for clear communication and continuous improvement. Frequent updates and discussions facilitated smooth cooperation with Adore Beauty, ensuring the project stayed on track with its goals and deadlines

Conclusions

Sease successfully enhanced Adore Beauty’s e-commerce search relevance by integrating Learning to Rank with Apache Solr. Sease developed tools like a user interaction collector and a training set builder, ensuring seamless integration and robust performance through automated monitoring and extensive testing. Evaluation techniques, including A/B testing, ensured ongoing optimization, helping Adore Beauty deliver a superior online shopping experience.

Improve the search relevance of your e-commerce

Contact us today to learn how Sease’s consulting services can help your organization improve e-commerce search relevance. Whether you’re looking to evaluate the quality of your search results, or implement machine learning integration from scratch into your system, our expert team is here to guide you every step of the way.

About the company

about our work

Rated Ranking Evaluator
(RRE)

Rated Ranking Evaluator Enterprise (RREE)

Apache Solr LLM Highlighter plugin

News

Main Blog

TIPS AND TRICKS

LATEST BLOG POST

contact us

Don't miss all the news - subscribe to our newsletter!

CASE STUDY