Setting A Reference¶
For data drift and anomaly detection, you need to set your model’s training data to serve as the baseline. All new inferences are compared to this baseline set in order to quantify drift and stability of incoming data streams. The reference set should include:
ground truth [optional]
# get all input columns reference_set = df.copy() # set ground truth labels reference_set["consumer_credit_score_gt"] = Y_train # get model predictions preds = sklearn_model.predict_proba(X_train) reference_set["consumer_credit_score_prediction"] = preds[:, 1]
Now we set the baseline data.
A Note About Large Batches¶
If your reference set is larger than might fit in memory in a pd.DataFrame, you can specify a directory containing parquet files to upload a batch.