Product DocumentationAPI and Python SDK ReferenceRelease Notes
Schedule a Demo
Schedule a Demo

Binary Classification

Binary classification models predict a binary outcome (i.e., one of two potential classes). In Arthur, these models fall into the classification category and are represented by the Multiclass model type.

Some common examples of Tabular binary classification are:

  • Is this credit card transaction fraud or not?
  • Will a customer click on an ad or not?

Frequently, these models output both a yes/no answer and a probability for each (i.e., prob_yes and prob_no). These probabilities are then categorized into yes/no based on a threshold. In these cases, during onboarding, teams will supply their classification threshold and continuously track the class probabilities (i.e., prob_yes, prob_no).

Formatted Data in Arthur

Tabular binary classification models require three things to be specified in their schema: all predicting model attributes (or features), predicted probability of outputs, and a column for the inference's true label (or ground truth). Many teams also choose to onboard metadata for the model (i.e. any information you want to track about your inferences) as non-input attributes.

Attribute (numeric or categorical)Attribute (numeric or categorical)Probability of Prediction AProbability of Prediction BGround TruthNon-Input Attribute (numeric or categorical)
High School Education34.5.95.05AMale
Graduate Degree44.1.86.14BFemale

Predict Function and Mapping

These are some examples of common values teams need to onboard for their binary classification models.

The relationship between the prediction and ground truth column must be defined to help set up your Arthur environment to calculate default performance metrics. There are 3 options for formatting this, depending on your reference dataset. Additionally, if teams wish to enable explainability, they must provide a few Assets Required For Explainability. Below are common examples of the required runnable predict function (that outputs two values, the probability of each potential class).

## Option 1:  Single Prediction Column, Single Ground Truth Column
# Map PredictedValue Column to its corresponding GroundTruth value.
# This tells Arthur that the `pred_proba_credit_default` column represents 
# the probability that the ground truth column has the value 1
pred_to_ground_truth_map_1 = {'pred_proba_credit_default' : 1}

# Building the Model with this technique
arthur_model.build(reference_data,
                   ground_truth_column='ground_truth',
                   pred_to_ground_truth_map=pred_to_ground_truth_map_1,
                   )

## Option 2:  Multiple Prediction Columns, Single Ground Truth Column
# Map each PredictedValue attribute to its corresponding GroundTruth value.
pred_to_ground_truth_map_2 = {'pred_0' : 0,
                            'pred_1' : 1}

# Building the Model with this technique
arthur_model.build(reference_data,
                   ground_truth_column='ground_truth',
                   pred_to_ground_truth_map=pred_to_ground_truth_map_2,
                   positive_predicted_attr = 'pred_1'
                   )

## Option 3:  Multiple Prediction and Ground Truth Columns
# Map each PredictedValue attribute to its corresponding GroundTruth attribute.
pred_to_ground_truth_map_3 = {'pred_0' : 'gt_0',
                            'pred_1' : 'gt_1'}

# Building the Model with this technique
arthur_model.build(reference_data,
                   pred_to_ground_truth_map=pred_to_ground_truth_map_3,
                   positive_predicted_attr = 'pred_1'
                   )

# example_entrypoint.py

sk_model = joblib.load("./serialized_model.pkl")

def predict(x):
    return sk_model.predict_proba(x)
# example_entrypoint.py
from utils import pipeline_transformations

sk_model = joblib.load("./serialized_model.pkl")

def predict(x):
    return sk_model.predict_proba(pipeline_transformations(x))

Available Metrics

When onboarding tabular classification models, several default metrics are available to you within the UI. You can learn more about each specific metric in the metrics section of the documentation.

Out-of-the-Box Metrics

When teams onboard a binary classification model, the following metrics are automatically available in the UI (out-of-the-box). Find out more about these metrics in the Performance Metrics section.

MetricMetric Type
Accuracy RatePerformance
Balanced Accuracy RatePerformance
AUCPerformance
RecallPerformance
PrecisionPerformance
Specificity (TNR)Performance
F1Performance
False Positive RatePerformance
False Negative RatePerformance
Inference CountIngestion
Inference Count by ClassIngestion

Drift Metrics

In the platform, drift metrics are calculated compared to a reference dataset. So, once a reference dataset is onboarded for your model, these metrics are available out of the box for comparison. Find out more about these metrics in the Drift and Anomaly section.

Note: Teams are able to evaluate drift for inference data at different intervals with our Python SDK and query service (for example data coming into the model now, compared to a month ago).

PSIFeature Drift
KL DivergenceFeature Drift
JS DivergenceFeature Drift
Hellinger DistanceFeature Drift
Hypothesis TestFeature Drift
Prediction DriftPrediction Drift
Multivariate DriftMultivariate Drift

Fairness Metrics

As further described in the Fairness Metrics section of the documentation, fairness metrics are available for any tabular Arthur attributes manually selected to monitor for bias.

MetricMetric Type
Accuracy RateFairness
True Positive Rate (Equal Opportunity)Fairness
True Negative RateFairness
False Positive RateFairness
False Negative RateFairness

User-Defined Metrics

Whether your team uses a different performance metric, wants to track defined segments of data, or needs logical functions to create a metric for external stakeholders (like product or business metrics). Learn more about creating metrics with data in Arthur in the User-Defined Metrics section.

Available Enrichments

The following enrichments are able to be enabled for this model type:

Anomaly DetectionHot SpotsExplainabilityBias Mitigation
XXXX

What’s Next

Learn more about how to interact with models, including binary classification, in Arthur