Model Training For Apache Solr Learning To Rank
If you want to train a model for Apache Solr Learning To Rank, you are in the right place.
This blog post is about the model training phase for the Apache Solr Learning To Rank integration.
Ranking SVM
The Ranking SVM algorithm is a pair-wise ranking method that adaptively sorts results based on how ‘relevant’ they are for a specific query ( we saw example of relevancy rating for the pair document-query in Part 1).
The Ranking SVM function maps each search query to the features of each sample.
This mapping function projects each data pair onto a feature space.
Each sample ( query-document ) of the training data will be used for the Ranking SVM algorithm to refine the mapping.
Generally, Ranking SVM includes three steps at training time:
- It maps the similarities between queries and the clicked results onto a certain feature space.
- It calculates the distances between any two of the vectors obtained in step 1.
- It forms an optimization problem which is similar to a standard SVM classification and solves this problem with the regular SVM solver
{
"class":"org.apache.solr.ltr.ranking.RankSVMModel",
"name":"myModelName",
"features":[
{ "name": "userTextTitleMatch"},
{ "name": "originalScore"},
{ "name": "isBook"}
],
"params":{
"weights": {
"userTextTitleMatch": 1.0,
"originalScore": 0.5,
"isBook": 0.1
}
Given 3 example features :
userTextTitleMatch – (query dependant feature) – (binary)
originalScore – (query dependant feature) – (ordinal)
isBook – (document dependant feature) – (binary)
The ranking SVM model in the example builds a linear function of the variables assigning a different weight to each of them.
Given the documents (expressed with their feature vector):
D1 [1.0, 100, 1]
D2 [0.0, 80, 1]
Applying the model we get the scores :
Score(D1) = 1.0 * userTextTitleMatch(1.0) + 0.5 * originalScore(100) + 0.1 * isBook(1.0) = 51.1
Score(D2) = 1.0 * userTextTitleMatch(0.0) + 0.5 * originalScore(80) + 0.1 * isBook(1.0) = 40.1
The D1 documents is more relevant according the model.
Key points :
- easy to debug and understand
- linear function
Here you can find some good Open Source libraries to train SVM models [2], [2.1].
LambdaMART
LambdaMART is a tree ensemble based model.
Each tree of the ensemble is a weighted regression tree and the final predicted score is the weighted sum of the prediction of each regression tree.
A regression tree is a decision tree that takes in input a feature vector and returns a scalar numerical value in output.
At a high level, LambdaMART is an algorithm that uses gradient boosting to directly optimize Learning to Rank specific cost functions such as NDCG.
To understand LambdaMART let’s explore the main two aspects: Lambda and MART.
MART
LambdaMART is a specific instance of Gradient Boosted Regression Trees, also referred to as Multiple Additive Regression Trees (MART).
Gradient Boosting is a technique for forming a model that is a weighted combination of an ensemble of “weak learners”. In our case, each “weak learner” is a decision tree.
Lambda
At each training point we need to provide a gradient that will allow us to minimize the cost function ( whatever we selected, NDCG for example).
To solve the problem LambdaMART uses an idea coming from lambdaRank:
at each certain point we calculate a value that acts as the gradient we require, this component will effectively modify the ranking, pushing up or down groups of documents and affecting effectively the rules for the leaf values, which cover the point used to calculate lambda.
For additional details, these resources have been useful [3], [4].
LambdaMART is currently considered one of the most effective models and it has been proved by a number of winning contests.
Let’s see how a lambdaMART model looks like
{
"class":"org.apache.solr.ltr.ranking.LambdaMARTModel",
"name":"lambdamartmodel",
"features":[
{ "name": "userTextTitleMatch"},
{ "name": "originalScore"}
],
"params":{
"trees": [
{
"weight" : 1,
"root": {
"feature": "userTextTitleMatch",
"threshold": 0.5,
"left" : {
"value" : -100
},
"right": {
"feature" : "originalScore",
"threshold": 10.0,
"left" : {
"value" : 50
},
"right" : {
"value" : 75
}
}
}
},
{
"weight" : 2,
"root": {
"value" : -10
}
}
]
}
}
This example model is composed by two trees, each branch split evaluates a conditional expression on a feature value, if the feature value is :
<= threshold we visit the left branch
> threshold we visit the right branch
Reaching the leaf of a tree will produce a score, this will be weighted according to the tree weight and then summed to the other scores produced ( the model is an ensemble of trees).
Given the documents :
D1 [1, 9]
D2 [0, 10]
Applying the model we get the scores :
Score(D1) = 1* (userTextTitleMatch (1) > 0.5 go right , originalScore (9) < 10 = 50) +
2 * -10 = 30
Score(D2) = 1 *(userTextTitleMatch(0) <= 0.5 = -100) +
2 * -10 = -120
The D1 document is more relevant according the model.
Key points :
- ensemble of trees are effective
- difficult to debug
- non linear function
There are good open source libraries to train LambdaMART models, the one that we are going to use in this case of study is RankLib [5]
Evaluation Metric
NDCG@K
Normalised Discounted Cumulative Gain @k is a natural fit when choosing lambdaMART.
This metric evaluates the performance of a ranking model, it varies from 0.0 to 1.0, with 1.0
representing the ideal ranking model.
It takes K as parameter :
K : maximum number of documents that will be returned.
K specifies, for each query, after the re-ranking, the top K results to evaluate.
N.B. set K to be the number of top documents you want to optimise.
Generally it is the number of documents you show in the first page of results.
Given in input the test set, this is grouped by queryId and for each query the ideal ranking (obtained sorting the documents by descendant relevancy) is compared with the ranking generated by the ranking model.
An average for all the queryIds is calculated.
A detailed explanation can be found on Wikipedia [6].
Knowing how it works, 2 factors sound immediately important when reasoning about NDCG :
1) distribution of relevancy scores in the test set, per queryId
2) number of samples per queryId
War Story 1 : too many highly relevant samples -> high NDCG
Given the test set A :
5 qid:1 1:1 …
5 qid:1 1:1 …
4 qid:1 1:1 …
5 qid:2 1:1 …
4 qid:2 1:1 …
4 qid:2 1:1 …
Even if the ranking model does not a good job, the high distribution of relevant samples increase the probability of hitting high score in the metric.
Having few samples (<K) per queryId means that we will not evaluate the performance of our ranking model with accuracy.
Model Training
Parameter | Description |
---|---|
train | Path to the training set file |
feature | Feature description file, list feature Ids to be considered by the learner, each on a separate line. If not specified, all features will be used. |
ranker | Specify which ranking algorithm to use e.g. 6: LambdaMART |
metric2t | Metric to optimize on the training data. Supported: MAP, NDCG@k, DCG@k, P@k, RR@k, ERR@k (default=ERR@10) |
tvs | Set train-validation split to be (x)(1.0-x). x * (size of the training set) will be used for training (1.0 -x) * (size of the training set) will be used for validation |
kcv | Stands for k Cross Validation Specify if you want to perform k-fold cross validation using ONLY the specified training data (default=NoCV). -kcv 5 -tvs 0.8 -Test Set : 20 rows -Training Set : 64 ( 0.8 * 80) -Validation Set : 16 (0.2 * 80) |
kcvmd | Directory where to save models trained via cross-validation (default=not-save). |
kcvmn | Name for model learned in each fold. It will be prefix-ed with the fold-number (default=empty). |
sparse | Allow sparse data processing for the training set ( which means that you don’t need to specify all the features for each sample where the feature has no value) |
tree | Number of trees of the ensemble(default=1000). More complex is the learning problem, more trees we need, but it’s important to be careful and not overfit the training data |
leaf | Number of leaves for each tree (default=100). As the number of trees, it is important to tune carefully this value to avoid overfitting trees. |
shrinkage | This is the learning rate of the algorithm(default=0.1) If this is too aggressive the Ranking model will quickly fit the training set, but will not react properly to the validation set evaluation, which means an overall poorer model. |
tc | Number of threshold candidates for tree spliting. -1 to use all feature values (default=256) Increasing this value we increase the complexity and possible overfitting of the model. It is suggested to start with a low value and then increase it according the general cardinality of your features. |
mls | Min leaf support – minimum #samples each leaf has to contain (default=1). This is quite an important parameter if we want to take care of outliers. We can tune this parameter to include only leaves with an high number of samples, discarding pattern validated by a weak support. |
Tuning the model training parameters is not an easy task.
Apart from the general suggestions, trial & error with a careful comparison of the evaluation metric score is the path to go.
Comparing different models on a common Test Set can help when we have produced a number of models with different tuning configurations.
Converting Model To Json
Shameless plug for our training and services!
Did I mention we do Apache Solr Beginner and Learning to Rank training?
We also provide consulting on these topics, get in touch if you want to bring your search engine to the next level!
Subscribe to our newsletter
Did you like this post about Solr Is Learning To Rank Better? Don’t forget to subscribe to our Newsletter to stay always updated from the Information Retrieval world!
Related
Author
Alessandro Benedetti
Alessandro Benedetti is the founder of Sease Ltd. Senior Search Software Engineer, his focus is on R&D in information retrieval, information extraction, natural language processing, and machine learning.
Comments (3)
Leave a comment Cancel reply
This site uses Akismet to reduce spam. Learn how your comment data is processed.
Solr Is Learning To Rank Better – Part 4 – Solr Integration
February 3, 2018[…] modelled our dataset, we collected the data and refined it in Part 1 . Trained the model in Part 2 . Analysed and evaluate the model and training set in Part 3 . We are ready to rock and deploy the […]
Michał Wieczorek
March 18, 2021Could you please provide some examples of trainingSet_2016-07-20.txt and trainingSet_2016-07-20_features.txt files?
I tried to run RankLib library with the same params as above (without features option), but it ends up with the java.lang.ArrayIndexOutOfBoundsException.
Btw, I checked the source code of the library and I’m really surprised to see only few unit test there, most of which are ignored because they don’t pass
Anna Ruggero
March 31, 2021Hi Michał,
I don’t know if this could help you but in the RankLib documentation (https://sourceforge.net/p/lemur/wiki/RankLib%20How%20to%20use/) you can find a link to some files example.
Specifically, RankLib uses LETOR format and in their website (https://www.microsoft.com/en-us/research/project/letor-learning-rank-information-retrieval/?from=http%3A%2F%2Fresearch.microsoft.com%2Fen-us%2Fum%2Fbeijing%2Fprojects%2Fletor%2Fletor4dataset.aspx) you can download their datasets and have a look on the right format to use.
Cheers.