Regression
Image Regression Models within Arthur
Regression models predict a numeric outcome. In Arthur, these models are listed under the regression model type.
Some common examples of Image regression are:
- Predict user age based on a photograph
- Predict house price from the image of the home
Formatted Data in Arthur
Image regression models require two columns: image input and numeric output. When onboarding a reference dataset (and setting a model schema), you need to specify a target column for each inference's ground truth. Many teams also choose to onboard metadata for the model (i.e., any information you want to track about your inferences) as non-input attributes.
Attribute (Image Input) | Prediction (numeric) | Ground Truth (numeric) | Non-Input Attribute (numeric or categorical) |
---|---|---|---|
image_1.jpg | 45.34 | 62.42 | High School Education |
image_2.jpg | 55.1 | 53.2 | Graduate Degree |
Predict Function and Mapping
These are some examples of common values teams need to onboard for their regression models.
The relationship between the prediction and ground truth column must be defined to help set up your Arthur environment to calculate default performance metrics. Additionally, if teams wish to enable explainability, they must provide a few Assets Required For Explainability. Below is an example of the runnable predict function, which outputs a single numeric prediction.
## Single Column Ground Truth
output_mapping = {
'prediction_column':'gt_column'}
# Build Arthur Model with this technique
arthur_model.build(reference_data,
pred_to_ground_truth_map=output_mapping
)
## Example prediction function for binary classification
def predict(x):
return model.predict(x)
Available Metrics
When onboarding regression models, several default metrics are available to you within the UI. You can learn more about each specific metric in the metrics section of the documentation.
Out-of-the-Box Metrics
The following metrics are automatically available in the UI (out-of-the-box) per class when teams onboard a regression model. Learn more about these metrics in the Performance Metrics section.
Metric | Metric Type |
---|---|
Root Mean Squared Error | Performance |
Mean Absolute Error | Performance |
R Squared | Performance |
Inference Count | Ingestion |
Average Prediction | Ingestion |
Drift Metrics
In the platform, drift metrics are calculated compared to a reference dataset. So, once a reference dataset is onboarded for your model, these metrics are available out of the box for comparison. Learn more about these metrics in the Drift and Anomaly section.
Of note, for unstructured data types (like text and image), feature drift is calculated for non-input attributes. The actual input to the model (in this case, text) drift is calculated with multivariate drift to accommodate the multivariate nature/relationships within the data type.
PSI | Feature Drift |
---|---|
KL Divergence | Feature Drift |
JS Divergence | Feature Drift |
Hellinger Distance | Feature Drift |
Hypothesis Test | Feature Drift |
Prediction Drift | Prediction Drift |
Multivariate Drift | Multivariate Drift |
Note: Teams can evaluate drift for inference data at different intervals with our Python SDK and query service (for example, data coming into the model now compared to a month ago).
User-Defined Metrics
Whether your team uses a different performance metric, wants to track defined data segments, or needs logical functions to create a metric for external stakeholders (like product or business metrics). Learn more about creating metrics with data in Arthur in the User-Defined Metrics section.
Available Enrichments
The following enrichments can be enabled for this model type:
Anomaly Detection | Hot Spots | Explainability | Bias Mitigation |
---|---|---|---|
X | X |
Updated about 1 year ago