Positive-Class Error Profile
Quantifies where and how your model makes mistakes on the positive class, highlighting false positives, false negatives, and error concentration across score ranges.
Overview
The Positive-Class Error Profile bucket describes how your classifier makes mistakes on the positive class across the score distribution. It focuses on:
- Where false positives and false negatives concentrate
- How error behavior changes as scores increase
- How much “bad” volume you get when you target a given positive-class segment
This bucket is most natural for binary classification but can be applied to multiclass by defining a one-vs-rest positive class (e.g., “fraud” vs “not fraud”).
Metrics
All metrics are defined in terms of the confusion-matrix counts within a segment (e.g., score bin, time bucket):
TP– true positivesFP– false positivesFN– false negativesTN– true negatives
Total = TP + FP + FN + TN
adjusted_false_positive_rate
False positive rate on negative cases (optionally smoothed to avoid divide-by-zero):
adjusted_false_positive_rate = FP / (FP + TN)bad_case_rate
Fraction of all cases that are classified as bad (prediction = 0):
bad_case_rate = (FN + TN) / (TP + FP + FN + TN)false_positive_ratio
Share of predicted positive cases that are actually negative (how “dirty” your positive bucket is):
false_positive_ratio = FP / (TP + FP)total_false_positive_rate
Fraction of all cases that are false positives:
total_false_positive_rate = FP / (TP + FP + FN + TN)overprediction_rate
Rate at which the model over-predicts positives relative to the negative population (conceptually FPR):
overprediction_rate = FP / (FP + TN)underprediction_rate
Rate at which the model under-predicts positives (missed positives) relative to actual positives (FNR):
underprediction_rate = FN / (TP + FN)valid_detection_rate
Overall fraction of correctly classified cases (global accuracy):
valid_detection_rate = (TP + TN) / (TP + FP + FN + TN)Data Requirements
Your dataset must include:
{{label_col}}– ground truth label (0/1 for binary; specific class for multiclass one-vs-rest){{score_col}}– predicted probability or score for the positive class{{timestamp_col}}– event or prediction timestamp- Optional:
{{weight_col}}– sample weight (if used)
Base Metric SQL (Per-Score-Bin Confusion Matrix)
This SQL computes confusion-matrix counts and derived rates per score bin and 5-minute time bucket using a default threshold of 0.5. You can change the threshold if your application uses a different operating point.
WITH scored AS (
SELECT
{{timestamp_col}} AS event_ts,
{{label_col}} AS label,
{{score_col}} AS score
FROM {{dataset}}
),
binned AS (
SELECT
time_bucket(INTERVAL '5 minutes', event_ts) AS ts,
width_bucket(score, 0.0, 1.0, 10) AS score_bin,
label,
score,
CASE WHEN score >= 0.5 THEN 1 ELSE 0 END AS pred_label
FROM scored
)
SELECT
ts,
score_bin,
COUNT(*) AS total,
SUM(CASE WHEN label = 1 THEN 1 ELSE 0 END) AS positives,
SUM(CASE WHEN label = 0 THEN 1 ELSE 0 END) AS negatives,
-- Confusion matrix
SUM(CASE WHEN pred_label = 1 AND label = 1 THEN 1 ELSE 0 END) AS tp,
SUM(CASE WHEN pred_label = 1 AND label = 0 THEN 1 ELSE 0 END) AS fp,
SUM(CASE WHEN pred_label = 0 AND label = 1 THEN 1 ELSE 0 END) AS fn,
SUM(CASE WHEN pred_label = 0 AND label = 0 THEN 1 ELSE 0 END) AS tn,
-- Derived rates
(SUM(CASE WHEN pred_label = 1 AND label = 0 THEN 1 ELSE 0 END))::double precision
/ NULLIF(SUM(CASE WHEN label = 0 THEN 1 ELSE 0 END), 0) AS adjusted_false_positive_rate,
(SUM(CASE WHEN pred_label = 0 THEN 1 ELSE 0 END))::double precision
/ NULLIF(COUNT(*), 0) AS bad_case_rate,
(SUM(CASE WHEN pred_label = 1 AND label = 0 THEN 1 ELSE 0 END))::double precision
/ NULLIF(SUM(CASE WHEN pred_label = 1 THEN 1 ELSE 0 END), 0) AS false_positive_ratio,
(SUM(CASE WHEN pred_label = 1 AND label = 0 THEN 1 ELSE 0 END))::double precision
/ NULLIF(COUNT(*), 0) AS total_false_positive_rate,
(SUM(CASE WHEN pred_label = 1 AND label = 0 THEN 1 ELSE 0 END))::double precision
/ NULLIF(SUM(CASE WHEN label = 0 THEN 1 ELSE 0 END), 0) AS overprediction_rate,
(SUM(CASE WHEN pred_label = 0 AND label = 1 THEN 1 ELSE 0 END))::double precision
/ NULLIF(SUM(CASE WHEN label = 1 THEN 1 ELSE 0 END), 0) AS underprediction_rate,
(SUM(CASE WHEN pred_label = label THEN 1 ELSE 0 END))::double precision
/ NULLIF(COUNT(*), 0) AS valid_detection_rate
FROM binned
GROUP BY ts, score_bin
ORDER BY ts, score_bin;You can register any or all of these derived columns as reported metrics.
Plots (Daily Aggregated)
Plot 1 — FP & Bad Case Rates Over Time
Uses:
adjusted_false_positive_ratefalse_positive_ratiototal_false_positive_ratebad_case_rate
SELECT
time_bucket(INTERVAL '1 day', ts) AS day,
AVG(adjusted_false_positive_rate) AS adjusted_false_positive_rate,
AVG(false_positive_ratio) AS false_positive_ratio,
AVG(total_false_positive_rate) AS total_false_positive_rate,
AVG(bad_case_rate) AS bad_case_rate
FROM {{bucket_1_positive_class_error_profile_metrics}}
GROUP BY day
ORDER BY day;What this shows
This plot trends multiple notions of “false positives” and “bad outcomes” over time. It lets you see whether the model is:
- Flagging too many negatives as positives (
adjusted_false_positive_rate/total_false_positive_rate) - Putting too many negatives into the positive bucket (
false_positive_ratio) - Over-classifying cases as bad overall (
bad_case_rate)
How to interpret it
- Spikes in any FP-related line often correspond to data issues, model regressions, or policy changes.
- A rising bad_case_rate without business explanation may mean the model is over-declining / over-rejecting.
- If FP rates increase while business KPIs worsen, this is a strong signal that thresholds or retraining should be reviewed.
Plot 2 — Overprediction vs Underprediction
Uses:
overprediction_rateunderprediction_rate
SELECT
time_bucket(INTERVAL '1 day', ts) AS day,
AVG(overprediction_rate) AS overprediction_rate,
AVG(underprediction_rate) AS underprediction_rate
FROM {{bucket_1_positive_class_error_profile_metrics}}
GROUP BY day
ORDER BY day;What this shows
This plot compares how often the model over-predicts positives (FPs) vs under-predicts positives (FNs) over time.
How to interpret it
- If overprediction_rate >> underprediction_rate, the model is aggressively calling positives, likely impacting cost/capacity.
- If underprediction_rate >> overprediction_rate, the model is missing many true positives, impacting risk detection.
- Ideally, the ratio between the two aligns with business preferences: in some risk domains, you prefer more FPs; in others, you strongly penalize FNs.
Plot 3 — False Positive Ratio vs Valid Detection Rate
Uses:
false_positive_ratiovalid_detection_rate
SELECT
time_bucket(INTERVAL '1 day', ts) AS day,
AVG(false_positive_ratio) AS false_positive_ratio,
AVG(valid_detection_rate) AS valid_detection_rate
FROM {{bucket_1_positive_class_error_profile_metrics}}
GROUP BY day
ORDER BY day;What this shows
This plot contrasts how dirty the positive bucket is (false_positive_ratio) with overall correctness (valid_detection_rate).
How to interpret it
- Days where
false_positive_ratiois high butvalid_detection_rateremains flat may mean errors are mostly concentrated in positives rather than negatives. - If both degrade together, the model is likely struggling broadly (not just in the positive segment).
- You can use this to explain to stakeholders why precision dropped: because the model is trading global accuracy for more aggressive positive predictions.
Binary vs Multiclass
-
Binary: use
label ∈ {0,1}andscoreas the probabilityp(y=1 | x). -
Multiclass: choose a
{{positive_class_value}}and convert to one-vs-rest:CASE WHEN {{label_col}} = '{{positive_class_value}}' THEN 1 ELSE 0 END AS labelUse the probability for that class as
score. Repeat the metric for each class of interest.
Use Cases
- Risk scoring (fraud, credit, abuse detection)
- Triage models where analysts work the top-scoring cases
- Any binary decisioning system with high cost asymmetry between FP and FN
Alternative SQL example
SELECT
s.bucket AS bucket,
-- FPR: FP / (FP + TN)
COALESCE(
s.fp::float / NULLIF((s.fp + s.tn)::float, 0),
0
) AS adjusted_false_positive_rate,
-- Bad case rate: (FN + TN) / total (all cases classified as 0)
COALESCE(
(s.fn + s.tn)::float / NULLIF(s.total, 0),
0
) AS bad_case_rate,
-- False positive ratio: using same FPR definition (FP / (FP + TN))
COALESCE(
s.fp::float / NULLIF((s.fp + s.tn)::float, 0),
0
) AS false_positive_ratio,
-- Valid detection rate: (TP + TN) / total
COALESCE(
(s.tp + s.tn)::float / NULLIF(s.total, 0),
0
) AS valid_detection_rate,
-- Overprediction: (predicted_pos - actual_pos) / total, floored at 0
GREATEST(
COALESCE(
(s.predicted_pos - s.actual_pos) / NULLIF(s.total, 0),
0
),
0
) AS overprediction_rate,
-- Underprediction: (actual_pos - predicted_pos) / total, floored at 0
GREATEST(
COALESCE(
(s.actual_pos - s.predicted_pos) / NULLIF(s.total, 0),
0
),
0
) AS underprediction_rate,
-- Total false positive rate: same FPR definition for now (FP / (FP + TN))
COALESCE(
s.fp::float / NULLIF((s.fp + s.tn)::float, 0),
0
) AS total_false_positive_rate
FROM
(
SELECT
c.bucket,
c.tp,
c.fp,
c.tn,
c.fn,
(c.tp + c.tn + c.fp + c.fn)::float AS total,
(c.tp + c.fp)::float AS predicted_pos,
(c.tp + c.fn)::float AS actual_pos
FROM
(
SELECT
time_bucket (INTERVAL '5 minutes', {{timestamp_col}}) AS bucket,
SUM(
CASE
WHEN {{ground_truth}} = 1
AND {{prediction}} >= {{threshold}} THEN 1
ELSE 0
END
) AS tp,
SUM(
CASE
WHEN {{ground_truth}} = 0
AND {{prediction}} >= {{threshold}} THEN 1
ELSE 0
END
) AS fp,
SUM(
CASE
WHEN {{ground_truth}} = 0
AND {{prediction}} < {{threshold}} THEN 1
ELSE 0
END
) AS tn,
SUM(
CASE
WHEN {{ground_truth}} = 1
AND {{prediction}} < {{threshold}} THEN 1
ELSE 0
END
) AS fn
FROM
{{dataset}}
GROUP BY
bucket
) AS c
) AS s
ORDER BY
s.bucket;
Updated 1 day ago