Positive-Class Error Profile
Quantifies where and how your model makes mistakes on the positive class, highlighting false positives, false negatives, and error concentration across score ranges.
Overview
The Positive-Class Error Profile bucket describes how your classifier makes mistakes on the positive class across the score distribution. It focuses on:
- Where false positives and false negatives concentrate
- How error behavior changes as scores increase
- How much “bad” volume you get when you target a given positive-class segment
This bucket is most natural for binary classification but can be applied to multiclass by defining a one-vs-rest positive class (e.g., “fraud” vs “not fraud”).
Metrics
All metrics are defined in terms of the confusion-matrix counts within a segment (e.g., score bin, time bucket):
TP– true positivesFP– false positivesFN– false negativesTN– true negatives
Total = TP + FP + FN + TN
adjusted_false_positive_rate
False positive rate on negative cases (optionally smoothed to avoid divide-by-zero):
adjusted_false_positive_rate = FP / (FP + TN)bad_case_rate
Fraction of all cases that are classified as bad (prediction = 0):
bad_case_rate = (FN + TN) / (TP + FP + FN + TN)false_positive_ratio
Share of predicted positive cases that are actually negative (how “dirty” your positive bucket is):
false_positive_ratio = FP / (TP + FP)total_false_positive_rate
Fraction of all cases that are false positives:
total_false_positive_rate = FP / (TP + FP + FN + TN)overprediction_rate
Rate at which the model over-predicts positives relative to the negative population (conceptually FPR):
overprediction_rate = FP / (FP + TN)underprediction_rate
Rate at which the model under-predicts positives (missed positives) relative to actual positives (FNR):
underprediction_rate = FN / (TP + FN)valid_detection_rate
Overall fraction of correctly classified cases (global accuracy):
valid_detection_rate = (TP + TN) / (TP + FP + FN + TN)Data Requirements
Your dataset must include:
{{label_col}}– ground truth label (0/1 for binary; specific class for multiclass one-vs-rest){{score_col}}– predicted probability or score for the positive class{{timestamp_col}}– event or prediction timestamp- Optional:
{{weight_col}}– sample weight (if used)
Base Metric SQL (Per-Score-Bin Confusion Matrix)
This SQL computes confusion-matrix counts and derived rates per score bin and 5-minute time bucket using a default threshold of 0.5. You can change the threshold if your application uses a different operating point.
WITH scored AS (
SELECT
{{timestamp_col}} AS event_ts,
{{label_col}} AS label,
{{score_col}} AS score
FROM {{dataset}}
),
binned AS (
SELECT
time_bucket(INTERVAL '5 minutes', event_ts) AS ts,
width_bucket(score, 0.0, 1.0, 10) AS score_bin,
label,
score,
CASE WHEN score >= 0.5 THEN 1 ELSE 0 END AS pred_label
FROM scored
)
SELECT
ts,
score_bin,
COUNT(*) AS total,
SUM(CASE WHEN label = 1 THEN 1 ELSE 0 END) AS positives,
SUM(CASE WHEN label = 0 THEN 1 ELSE 0 END) AS negatives,
-- Confusion matrix
SUM(CASE WHEN pred_label = 1 AND label = 1 THEN 1 ELSE 0 END) AS tp,
SUM(CASE WHEN pred_label = 1 AND label = 0 THEN 1 ELSE 0 END) AS fp,
SUM(CASE WHEN pred_label = 0 AND label = 1 THEN 1 ELSE 0 END) AS fn,
SUM(CASE WHEN pred_label = 0 AND label = 0 THEN 1 ELSE 0 END) AS tn,
-- Derived rates
(SUM(CASE WHEN pred_label = 1 AND label = 0 THEN 1 ELSE 0 END))::double precision
/ NULLIF(SUM(CASE WHEN label = 0 THEN 1 ELSE 0 END), 0) AS adjusted_false_positive_rate,
(SUM(CASE WHEN pred_label = 0 THEN 1 ELSE 0 END))::double precision
/ NULLIF(COUNT(*), 0) AS bad_case_rate,
(SUM(CASE WHEN pred_label = 1 AND label = 0 THEN 1 ELSE 0 END))::double precision
/ NULLIF(SUM(CASE WHEN pred_label = 1 THEN 1 ELSE 0 END), 0) AS false_positive_ratio,
(SUM(CASE WHEN pred_label = 1 AND label = 0 THEN 1 ELSE 0 END))::double precision
/ NULLIF(COUNT(*), 0) AS total_false_positive_rate,
(SUM(CASE WHEN pred_label = 1 AND label = 0 THEN 1 ELSE 0 END))::double precision
/ NULLIF(SUM(CASE WHEN label = 0 THEN 1 ELSE 0 END), 0) AS overprediction_rate,
(SUM(CASE WHEN pred_label = 0 AND label = 1 THEN 1 ELSE 0 END))::double precision
/ NULLIF(SUM(CASE WHEN label = 1 THEN 1 ELSE 0 END), 0) AS underprediction_rate,
(SUM(CASE WHEN pred_label = label THEN 1 ELSE 0 END))::double precision
/ NULLIF(COUNT(*), 0) AS valid_detection_rate
FROM binned
GROUP BY ts, score_bin
ORDER BY ts, score_bin;You can register any or all of these derived columns as reported metrics.
Plots (Daily Aggregated)
Preview Data
for startDate use 2025-11-26T17:54:05.425Z for endDate use 2025-12-10T17:54:05.425Z
Plot 1 — FP & Bad Case Rates Over Time
Uses:
adjusted_false_positive_ratefalse_positive_ratiototal_false_positive_ratebad_case_rate
SELECT
time_bucket_gapfill(
'1 day',
timestamp,
'{{dateStart}}'::timestamptz,
'{{dateEnd}}'::timestamptz
) AS time_bucket_1d,
metric_name,
CASE
WHEN metric_name = 'adjusted_false_positive_rate' THEN 'Adjusted False Positive Rate'
WHEN metric_name = 'false_positive_ratio' THEN 'False Positive Ratio'
WHEN metric_name = 'total_false_positive_rate' THEN 'Total False Positive Rate'
WHEN metric_name = 'bad_case_rate' THEN 'Bad Case Rate'
ELSE metric_name
END AS friendly_name,
COALESCE(AVG(value), 0) AS metric_value
FROM metrics_numeric_latest_version
WHERE metric_name IN (
'adjusted_false_positive_rate',
'false_positive_ratio',
'total_false_positive_rate',
'bad_case_rate'
)
[[AND timestamp BETWEEN '{{dateStart}}' AND '{{dateEnd}}']]
GROUP BY time_bucket_1d, metric_name
ORDER BY time_bucket_1d, metric_name;
What this shows
This plot trends multiple notions of “false positives” and “bad outcomes” over time. It lets you see whether the model is:
- Flagging too many negatives as positives (
adjusted_false_positive_rate/total_false_positive_rate) - Putting too many negatives into the positive bucket (
false_positive_ratio) - Over-classifying cases as bad overall (
bad_case_rate)
How to interpret it
- Spikes in any FP-related line often correspond to data issues, model regressions, or policy changes.
- A rising bad_case_rate without business explanation may mean the model is over-declining / over-rejecting.
- If FP rates increase while business KPIs worsen, this is a strong signal that thresholds or retraining should be reviewed.
Plot 2 — Overprediction vs Underprediction
Uses:
overprediction_rateunderprediction_rate
SELECT
time_bucket_gapfill(
'1 day',
timestamp,
'{{dateStart}}'::timestamptz,
'{{dateEnd}}'::timestamptz
) AS time_bucket_1d,
metric_name,
CASE
WHEN metric_name = 'overprediction_rate' THEN 'Overprediction Rate'
WHEN metric_name = 'underprediction_rate' THEN 'Underprediction Rate'
ELSE metric_name
END AS friendly_name,
COALESCE(AVG(value), 0) AS metric_value
FROM metrics_numeric_latest_version
WHERE metric_name IN (
'overprediction_rate',
'underprediction_rate'
)
[[AND timestamp BETWEEN '{{dateStart}}' AND '{{dateEnd}}']]
GROUP BY time_bucket_1d, metric_name
ORDER BY time_bucket_1d, metric_name;
What this shows
This plot compares how often the model over-predicts positives (FPs) vs under-predicts positives (FNs) over time.
How to interpret it
- If overprediction_rate >> underprediction_rate, the model is aggressively calling positives, likely impacting cost/capacity.
- If underprediction_rate >> overprediction_rate, the model is missing many true positives, impacting risk detection.
- Ideally, the ratio between the two aligns with business preferences: in some risk domains, you prefer more FPs; in others, you strongly penalize FNs.
Plot 3 — False Positive Ratio vs Valid Detection Rate
Uses:
false_positive_ratiovalid_detection_rate
SELECT
time_bucket_gapfill(
'1 day',
timestamp,
'{{dateStart}}'::timestamptz,
'{{dateEnd}}'::timestamptz
) AS time_bucket_1d,
metric_name,
CASE
WHEN metric_name = 'false_positive_ratio' THEN 'False Positive Ratio'
WHEN metric_name = 'valid_detection_rate' THEN 'Valid Detection Rate'
ELSE metric_name
END AS friendly_name,
COALESCE(AVG(value), 0) AS metric_value
FROM metrics_numeric_latest_version
WHERE metric_name IN (
'false_positive_ratio',
'valid_detection_rate'
)
[[AND timestamp BETWEEN '{{dateStart}}' AND '{{dateEnd}}']]
GROUP BY time_bucket_1d, metric_name
ORDER BY time_bucket_1d, metric_name;
What this shows
This plot contrasts how dirty the positive bucket is (false_positive_ratio) with overall correctness (valid_detection_rate).
How to interpret it
- Days where
false_positive_ratiois high butvalid_detection_rateremains flat may mean errors are mostly concentrated in positives rather than negatives. - If both degrade together, the model is likely struggling broadly (not just in the positive segment).
- You can use this to explain to stakeholders why precision dropped: because the model is trading global accuracy for more aggressive positive predictions.
Binary vs Multiclass
-
Binary: use
label ∈ {0,1}andscoreas the probabilityp(y=1 | x). -
Multiclass: choose a
{{positive_class_value}}and convert to one-vs-rest:CASE WHEN {{label_col}} = '{{positive_class_value}}' THEN 1 ELSE 0 END AS labelUse the probability for that class as
score. Repeat the metric for each class of interest.
Use Cases
- Risk scoring (fraud, credit, abuse detection)
- Triage models where analysts work the top-scoring cases
- Any binary decisioning system with high cost asymmetry between FP and FN
Alternative SQL example
WITH counts AS (
SELECT
time_bucket(INTERVAL '1 day', {{timestamp_col}}) AS bucket,
SUM(CASE WHEN {{ground_truth}} = 1 AND {{prediction}} >= {{threshold}} THEN 1 ELSE 0 END) AS tp,
SUM(CASE WHEN {{ground_truth}} = 0 AND {{prediction}} >= {{threshold}} THEN 1 ELSE 0 END) AS fp,
SUM(CASE WHEN {{ground_truth}} = 0 AND {{prediction}} < {{threshold}} THEN 1 ELSE 0 END) AS tn,
SUM(CASE WHEN {{ground_truth}} = 1 AND {{prediction}} < {{threshold}} THEN 1 ELSE 0 END) AS fn
FROM {{dataset}}
GROUP BY 1
),
prepared AS (
SELECT
bucket,
tp::float AS tp,
fp::float AS fp,
tn::float AS tn,
fn::float AS fn,
(tp + fp + tn + fn)::float AS total,
(tp + fp)::float AS predicted_pos,
(tp + fn)::float AS actual_pos,
(fp + tn)::float AS negatives
FROM counts
)
SELECT
bucket AS bucket,
-- Adjusted False Positive Rate: FP / negatives
CASE WHEN negatives > 0 THEN fp / negatives ELSE 0 END
AS adjusted_false_positive_rate,
-- Bad Case Rate: actual "bad" cases / total
CASE WHEN total > 0 THEN (tp + fn) / total ELSE 0 END
AS bad_case_rate,
-- False Positive Ratio: FP / total
CASE WHEN total > 0 THEN fp / total ELSE 0 END
AS false_positive_ratio,
-- Valid Detection Rate: (TP + TN) / total
CASE WHEN total > 0 THEN (tp + tn) / total ELSE 0 END
AS valid_detection_rate,
-- Overprediction: (predicted_pos - actual_pos) / total, floored at 0
CASE WHEN total > 0 THEN GREATEST((predicted_pos - actual_pos) / total, 0)
ELSE 0 END
AS overprediction_rate,
-- Underprediction: (actual_pos - predicted_pos) / total, floored at 0
CASE WHEN total > 0 THEN GREATEST((actual_pos - predicted_pos) / total, 0)
ELSE 0 END
AS underprediction_rate,
-- Total False Positive Rate: global FP / global total
CASE WHEN SUM(total) OVER () > 0
THEN SUM(fp) OVER () / SUM(total) OVER ()
ELSE 0 END
AS total_false_positive_rate
FROM prepared
ORDER BY bucket;Updated 17 days ago