Ranked List (Recommender Systems)
Ranked List output data is typically used in recommender systems, which is a type of machine learning model that generates suggestions about “relevant” ranked items based on some input data. An example of a recommender system using ranked list data is a model that recommends relevant movies to a viewer based on metadata generated from their watch history.
Formatted Data in Arthur
Ranked List output models require the following data formatting:
[
{ // first recommended item
"item_id": "item1",
"score": 0.324,
"label": "apple"
},
{ // second recommended item
"item_id": "item2",
"score": 0.024,
"label": "banana"
]
In this formatting, the score
must be a float value, whereas the label
and item_id
must be string values. The label
is an optional, readable version of item_id
and score
is an optional score/probability for a given item. If one of these optional metadata fields are specified in one inference, it must be specified for all of them.
Arthur expects the list of ranked items to be sorted in rank order, such that the highest ranked item is first. Each ranked list output model in Arthur can have max 1000 total unique recommended items in its reference dataset. Additionally, each ranked list output model can have max 100 recommendations per inference/ground truth.
Recommender Systems Ground Truth
The ground truth for ranked list output models is an array of strings representing the items that have been determined “relevant” for a given inference.
Available Metrics
When onboarding recommender system models, you have a number of default metrics available to you within the UI. You can learn more about each specific metric in the metrics section of the documentation.
Out-of-Box Metrics
The following metrics are automatically available in the UI (out-of-the-box) when teams onboard a ranked list model. Find out more about these metrics in the Performance Metrics section.
Metric | Metric Type |
---|---|
Precision at k | Performance |
Recall at k | Performance |
nDCG at k | Performance |
Mean Reciprocal Rank | Performance |
Ranked List AUC | Performance |
Inference Count | Ingestion |
Drift Metrics
In the Arthur platform, drift metrics are calculated compared to a reference dataset. So, once a reference dataset is onboarded for your model, these metrics are available out of the box for comparison. Find out more about these metrics in the Drift and Anomaly section.
Note: Teams are able to evaluate drift for inference data at different intervals with our Python SDK and query service (for example data coming into the model now, compared to a month ago).
PSI | Feature Drift |
---|---|
Time Series Drift | Feature Drift |
Prediction Drift | Prediction Drift |
User-Defined Metrics
Whether your team uses a different performance metric, wants to track defined data segments, or needs logical functions to create a metric for external stakeholders (like product or business metrics). Learn more about creating metrics with data in Arthur in the User-Defined Metrics section.
Updated 11 months ago