There’s a logical gap between the document entity representation from a user and a server perspective.
For a user, a document is an item in the user interface, the result of a given search, it is assigned to a given rating. Behind the scenes, the technical representation consists of HTML code.
For a search engine, a document is an object, an instance of a class used to denote a search result. It usually consists of a Map-like (i.e., key-value pairs) structure where keys are attribute names and values are attribute values.
The “identity” of those two entities is different. The server-side requires a unique, system-scoped identifier associated with every document.
On the client-side, instead, the identifier:
- is optional
- when present, it is usually page-scoped (i.e., unique across the other identifiers on the current page)
- most probably differs from the server identifier.
In such a context the RREE ID Discovery component comes to help.
The identifier discovery is the first thing that happens when a set of explicit ratings are received.
The component is able to find a correlation between the document as it is received in the incoming payload and the corresponding server-side representation. The correlation is then persisted as part of the rating definition and it is used in the subsequent evaluation process.
The main challenge in the correlation phase is the potential lack of information that could arrive in the payload. We already said, the identifiers there are optional, can differ, and usually follow a completely different logic.
The key factor for having a good id correlation in RREE cannot be determined in advance; the process relies on an accurate configuration both on the browser plugin and in the discovery engine.