Regression models predict a numeric outcome. In Arthur, these models are listed under the regression model type.

Some common examples of Text regression are:

What is the predicted review score for written restaurant reviews?
Predict house price from description text

Formatted Data in Arthur

Tabular regression models require

two columns: text input and numeric output. When onboarding a reference dataset (and setting a model schema), you need to specify a target column for each inference's ground truth. Many teams also choose to onboard metadata for the model (i.e. any information you want to track about your inferences) as non-input attributes.

Attribute (numeric or categorical)	Attribute (numeric or categorical)	Prediction (numeric)	Ground Truth (numeric)	Non-Input Attribute (numeric or categorical)
45	Graduate Degree	45.34	62.42	Female
22	Bachelor's Degree	55.1	53.2	Male

Predict Function and Mapping

These are some examples of common values teams need to onboard for their regression models.

The relationship between the prediction and ground truth column must be defined to help set up your Arthur environment to calculate default performance metrics. Additionally, if teams wish to enable explainability, they must provide a few Assets Required For Explainability. Below is an example of the runnable predict function, which outputs a single numeric prediction.

## Single Column Ground Truth 
output_mapping = {
  'prediction_column':'gt_column'} 

# Build Arthur Model with this technique
arthur_model.build(reference_data,
                   pred_to_ground_truth_map=output_mapping 
                   )

## Example prediction function for binary classification

def predict(x):
  return model.predict(x)

Available Metrics

When onboarding tabular regression models, you have a number of default metrics available to you within the UI. You can learn more about each specific metric in the metrics section of the documentation.

Out-of-the-Box Metrics

The following metrics are automatically available in the UI (out-of-the-box) per class when teams onboard a regression model. Find out more about these metrics in the Performance Metrics section.

Metric	Metric Type
Root Mean Squared Error	Performance
Mean Absolute Error	Performance
R Squared	Performance
Inference Count	Ingestion
Average Prediction	Ingestion

Drift Metrics

In the platform, drift metrics are calculated compared to a reference dataset. So, once a reference dataset is onboarded for your model, these metrics are available out of the box for comparison. Find out more about these metrics in the Drift and Anomaly section.

Note: Teams are able to evaluate drift for inference data at different intervals with our Python SDK and query service (for example data coming into the model now, compared to a month ago).

PSI	Feature Drift
KL Divergence	Feature Drift
JS Divergence	Feature Drift
Hellinger Distance	Feature Drift
Hypothesis Test	Feature Drift
Prediction Drift	Prediction Drift
Multivariate Drift	Multivariate Drift

User-Defined Metrics

Whether your team uses a different performance metric, wants to track defined segments of data, or needs logical functions to create a metric for external stakeholders (like product or business metrics). Learn more about creating metrics with data in Arthur in the User-Defined Metrics section.

Available Enrichments

The following enrichments can be enabled for this model type:

Anomaly Detection	Hot Spots	Explainability	Bias Mitigation
X		X