Let’s focus on the first three ranking suggestions we just saw :
Query : Mini Bar Fri
100 |Mini Bar Fridge something
100 |Mini Bar Fridge something else
100 |Mini Bar Fridge a a a a a a a a a a a a a a a a a a a a a a
Intuitively we want this order to break the ties.
Closer the number of matched terms with the total number of terms for the suggestion, the better.
Ideally we want our top scoring suggestion to just have the matched terms if possible.
We also don’t want to bring strong inconsistencies for the other suggestions, we should ideally only affect the ties.
This is achievable calculating an additional coefficient, dependant on the term counts :
Token Count Coefficient = matched terms count / total terms count
Then we can scale this value accordingly :
90% of the final score will derive from the positional coefficient
10% of the final score will derive from the token count coefficient
Query : Mini Bar Fri
90 * 1.0 + 10*3/4 = 97|Mini Bar Fridge something
90 * 1.0 + 10*3/5 = 96|Mini Bar Fridge something else
90 * 1.0 + 10*3/25 = 91|Mini Bar Fridge a a a a a a a a a a a a a a a a a a a a a a
It will require some additional tuning but the overall idea should bring a better ranking function to the BlendedInfix when multiple term matches are involved!
If you have any suggestions, feel free to leave a comment below!
Code is available in the Github Pull Request attached to the Lucene Jira issue [4].
Bobby
December 10, 2019Query : Mini
10|Minimal Bar Fridge
10|Mini
10|Mini Fridge
10|Mini Bar Fridge
Since there is first word exact match, scores are identical. Because of this order also not accurate. Can u suggest on this?