Apache Solr Main Blog
Quantities detection in Apache Solr

Give the height the right weight: quantities detection in Apache Solr

Quantity detection? What is a quantity?
And why do we need to detect it?

A quantity, as described by Martin Fowler in his “Analysis Patterns” [1] is defined as a pair which combines an amount and unit (such as 30 litres, 0.25 cl, or 140 cm). In search-based applications, there are many cases where you may want to classify your searchable dataset using dimensioned attributes because such quantities have a special meaning within the business context you are working on. The first example that comes to my mind?

Apache Solr Quantity Detection Plugin

Beer is offered in several containers (e.g. cans, bottles); each of them is available in multiple sizes (e.g. 25 cl, 50 cl, 75 cl or 0.25 lt, 0.50 lt, 0.75 lt). A good catalogue would capture this information in dedicated fields, like “container” (bottle, can) and “capacity” (25cl, 50cl, 75cl in the example above): in this way the search logic can properly make use of them. Faceting (and subsequent filtering) is a good example of what the user can do after a first search has been executed: he can filter and refine results, hopefully finding what he was looking for.

But if we start from the beginning of user interaction, there’s no result at all: only the blank textfield where the user is going to type something. “Something” could be whatever, anything (in his mind) related to the product he wants to find: a brand, a container type, a model name, a quantity. In a few words: anything which represents one or more relevant features of the product he’s looking for.

So one of the main challenges, when implementing a search logic, is to get the point about the meaning of the entered terms. This is in general a very hard topic, often involving complicated stuff (e.g. machine learning), but sometimes things move on an easier side, especially when concepts, we want to detect, follow a common and regular pattern: like a quantity.

The main idea behind the quantity detection plugin [2] we developed at Sease is the following: starting from the user entered query, first, it detects the quantities (i.e. the amounts and the corresponding units); then, this information will be isolated from the main query and they will be used for boosting up all products relevant to those quantities. Relevancy here can be meant in different ways:

    • exact match: all bottles with a capacity of 25cl
    • range match: all bottles with a capacity between 50cl and 75cl.
    • equivalence exact match: all bottles with a capacity of 0.5 litre (1lt = 100cl)
    • equivalence range match: all bottles with a capacity between 0.5 and 1 litre (1lt = 100cl)

The following is a short list with a brief description of all supported features:

    • variants: a unit can have a preferred form and (optionally) several variants. This can include different forms of the same unit (e.g. mt, meter) or an equivalent unit in a different metric system (e.g. cl, once)
    • equivalences: it’s possible to define an equivalence table so units can be converted at runtime (“beer 0.25 lt” will have the same meaning of “beer 25cl”). An equivalence table maps a unit with a conversion factor.
    • boost: each unit can have a dedicated boost, especially useful for weighting multiple matching units.
    • ranges: each unit can have a configured gap, which triggers a range query where the detected amount can be in the middle (PIVOT), at the beginning (MIN) or at the end (MAX) of the generated range
    • multi-fields: in case we have more than one attribute using the same unit (e.g. height, width, depth)
    • assumptions: in case an “orphan” amount (i.e an amount without a unit) is detected, it’s possible to define an assumption table and let Solr guess the unit.

Feel free to have a try, and if you think it could be useful, please share with us your idea and/or your feedback.

// our service

Shameless plug for our training and services!

Did I mention we do Apache Solr Beginner and Elasticsearch Beginner training?
We also provide consulting on these topics, get in touch if you want to bring your search engine to the next level!

// STAY ALWAYS UP TO DATE

Subscribe to our newsletter

Did you like this post about the quantities detection in Apache Solr? Don’t forget to subscribe to our Newsletter to stay always updated from the Information Retrieval world!

Author

Andrea Gazzarini

Andrea Gazzarini is a curious software engineer, mainly focused on the Java language and Search technologies. With more than 15 years of experience in various software engineering areas, his adventure in the search world began in 2010, when he met Apache Solr and later Elasticsearch.

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.