Positive-Class Error Profile

Quantifies where and how your model makes mistakes on the positive class, highlighting false positives, false negatives, and error concentration across score ranges.


Overview

The Positive-Class Error Profile bucket describes how your classifier makes mistakes on the positive class across the score distribution. It focuses on:

  • Where false positives and false negatives concentrate
  • How error behavior changes as scores increase
  • How much “bad” volume you get when you target a given positive-class segment

This bucket is most natural for binary classification but can be applied to multiclass by defining a one-vs-rest positive class (e.g., “fraud” vs “not fraud”).

Metrics

All metrics are defined in terms of the confusion-matrix counts within a segment (e.g., score bin, time bucket):

  • TP – true positives
  • FP – false positives
  • FN – false negatives
  • TN – true negatives
    Total = TP + FP + FN + TN

adjusted_false_positive_rate
False positive rate on negative cases (optionally smoothed to avoid divide-by-zero):

adjusted_false_positive_rate = FP / (FP + TN)

bad_case_rate
Fraction of all cases that are classified as bad (prediction = 0):

bad_case_rate = (FN + TN) / (TP + FP + FN + TN)

false_positive_ratio
Share of predicted positive cases that are actually negative (how “dirty” your positive bucket is):

false_positive_ratio = FP / (TP + FP)

total_false_positive_rate
Fraction of all cases that are false positives:

total_false_positive_rate = FP / (TP + FP + FN + TN)

overprediction_rate
Rate at which the model over-predicts positives relative to the negative population (conceptually FPR):

overprediction_rate = FP / (FP + TN)

underprediction_rate
Rate at which the model under-predicts positives (missed positives) relative to actual positives (FNR):

underprediction_rate = FN / (TP + FN)

valid_detection_rate
Overall fraction of correctly classified cases (global accuracy):

valid_detection_rate = (TP + TN) / (TP + FP + FN + TN)

Data Requirements

Your dataset must include:

  • {{label_col}} – ground truth label (0/1 for binary; specific class for multiclass one-vs-rest)
  • {{score_col}} – predicted probability or score for the positive class
  • {{timestamp_col}} – event or prediction timestamp
  • Optional: {{weight_col}} – sample weight (if used)

Base Metric SQL (Per-Score-Bin Confusion Matrix)

This SQL computes confusion-matrix counts and derived rates per score bin and 5-minute time bucket using a default threshold of 0.5. You can change the threshold if your application uses a different operating point.

WITH scored AS (
    SELECT
        {{timestamp_col}} AS event_ts,
        {{label_col}}    AS label,
        {{score_col}}    AS score
    FROM {{dataset}}
),
binned AS (
    SELECT
        time_bucket(INTERVAL '5 minutes', event_ts) AS ts,
        width_bucket(score, 0.0, 1.0, 10) AS score_bin,
        label,
        score,
        CASE WHEN score >= 0.5 THEN 1 ELSE 0 END AS pred_label
    FROM scored
)
SELECT
    ts,
    score_bin,
    COUNT(*) AS total,
    SUM(CASE WHEN label = 1 THEN 1 ELSE 0 END) AS positives,
    SUM(CASE WHEN label = 0 THEN 1 ELSE 0 END) AS negatives,

    -- Confusion matrix
    SUM(CASE WHEN pred_label = 1 AND label = 1 THEN 1 ELSE 0 END) AS tp,
    SUM(CASE WHEN pred_label = 1 AND label = 0 THEN 1 ELSE 0 END) AS fp,
    SUM(CASE WHEN pred_label = 0 AND label = 1 THEN 1 ELSE 0 END) AS fn,
    SUM(CASE WHEN pred_label = 0 AND label = 0 THEN 1 ELSE 0 END) AS tn,

    -- Derived rates
    (SUM(CASE WHEN pred_label = 1 AND label = 0 THEN 1 ELSE 0 END))::double precision
        / NULLIF(SUM(CASE WHEN label = 0 THEN 1 ELSE 0 END), 0)      AS adjusted_false_positive_rate,
    (SUM(CASE WHEN pred_label = 0 THEN 1 ELSE 0 END))::double precision
        / NULLIF(COUNT(*), 0)                                       AS bad_case_rate,
    (SUM(CASE WHEN pred_label = 1 AND label = 0 THEN 1 ELSE 0 END))::double precision
        / NULLIF(SUM(CASE WHEN pred_label = 1 THEN 1 ELSE 0 END), 0) AS false_positive_ratio,
    (SUM(CASE WHEN pred_label = 1 AND label = 0 THEN 1 ELSE 0 END))::double precision
        / NULLIF(COUNT(*), 0)                                       AS total_false_positive_rate,
    (SUM(CASE WHEN pred_label = 1 AND label = 0 THEN 1 ELSE 0 END))::double precision
        / NULLIF(SUM(CASE WHEN label = 0 THEN 1 ELSE 0 END), 0)      AS overprediction_rate,
    (SUM(CASE WHEN pred_label = 0 AND label = 1 THEN 1 ELSE 0 END))::double precision
        / NULLIF(SUM(CASE WHEN label = 1 THEN 1 ELSE 0 END), 0)      AS underprediction_rate,
    (SUM(CASE WHEN pred_label = label THEN 1 ELSE 0 END))::double precision
        / NULLIF(COUNT(*), 0)                                       AS valid_detection_rate
FROM binned
GROUP BY ts, score_bin
ORDER BY ts, score_bin;

You can register any or all of these derived columns as reported metrics.

Plots (Daily Aggregated)

Plot 1 — FP & Bad Case Rates Over Time

Uses:

  • adjusted_false_positive_rate
  • false_positive_ratio
  • total_false_positive_rate
  • bad_case_rate
SELECT
    time_bucket(INTERVAL '1 day', ts) AS day,
    AVG(adjusted_false_positive_rate) AS adjusted_false_positive_rate,
    AVG(false_positive_ratio)         AS false_positive_ratio,
    AVG(total_false_positive_rate)    AS total_false_positive_rate,
    AVG(bad_case_rate)                AS bad_case_rate
FROM {{bucket_1_positive_class_error_profile_metrics}}
GROUP BY day
ORDER BY day;

What this shows
This plot trends multiple notions of “false positives” and “bad outcomes” over time. It lets you see whether the model is:

  • Flagging too many negatives as positives (adjusted_false_positive_rate / total_false_positive_rate)
  • Putting too many negatives into the positive bucket (false_positive_ratio)
  • Over-classifying cases as bad overall (bad_case_rate)

How to interpret it

  • Spikes in any FP-related line often correspond to data issues, model regressions, or policy changes.
  • A rising bad_case_rate without business explanation may mean the model is over-declining / over-rejecting.
  • If FP rates increase while business KPIs worsen, this is a strong signal that thresholds or retraining should be reviewed.

Plot 2 — Overprediction vs Underprediction

Uses:

  • overprediction_rate
  • underprediction_rate
SELECT
    time_bucket(INTERVAL '1 day', ts) AS day,
    AVG(overprediction_rate) AS overprediction_rate,
    AVG(underprediction_rate) AS underprediction_rate
FROM {{bucket_1_positive_class_error_profile_metrics}}
GROUP BY day
ORDER BY day;

What this shows
This plot compares how often the model over-predicts positives (FPs) vs under-predicts positives (FNs) over time.

How to interpret it

  • If overprediction_rate >> underprediction_rate, the model is aggressively calling positives, likely impacting cost/capacity.
  • If underprediction_rate >> overprediction_rate, the model is missing many true positives, impacting risk detection.
  • Ideally, the ratio between the two aligns with business preferences: in some risk domains, you prefer more FPs; in others, you strongly penalize FNs.

Plot 3 — False Positive Ratio vs Valid Detection Rate

Uses:

  • false_positive_ratio
  • valid_detection_rate
SELECT
    time_bucket(INTERVAL '1 day', ts) AS day,
    AVG(false_positive_ratio)  AS false_positive_ratio,
    AVG(valid_detection_rate)  AS valid_detection_rate
FROM {{bucket_1_positive_class_error_profile_metrics}}
GROUP BY day
ORDER BY day;

What this shows
This plot contrasts how dirty the positive bucket is (false_positive_ratio) with overall correctness (valid_detection_rate).

How to interpret it

  • Days where false_positive_ratio is high but valid_detection_rate remains flat may mean errors are mostly concentrated in positives rather than negatives.
  • If both degrade together, the model is likely struggling broadly (not just in the positive segment).
  • You can use this to explain to stakeholders why precision dropped: because the model is trading global accuracy for more aggressive positive predictions.

Binary vs Multiclass

  • Binary: use label ∈ {0,1} and score as the probability p(y=1 | x).

  • Multiclass: choose a {{positive_class_value}} and convert to one-vs-rest:

    CASE WHEN {{label_col}} = '{{positive_class_value}}' THEN 1 ELSE 0 END AS label

    Use the probability for that class as score. Repeat the metric for each class of interest.

Use Cases

  • Risk scoring (fraud, credit, abuse detection)
  • Triage models where analysts work the top-scoring cases
  • Any binary decisioning system with high cost asymmetry between FP and FN

Alternative SQL example

SELECT
  s.bucket AS bucket,

  -- FPR: FP / (FP + TN)
  COALESCE(
    s.fp::float / NULLIF((s.fp + s.tn)::float, 0),
    0
  ) AS adjusted_false_positive_rate,

  -- Bad case rate: (FN + TN) / total  (all cases classified as 0)
  COALESCE(
    (s.fn + s.tn)::float / NULLIF(s.total, 0),
    0
  ) AS bad_case_rate,

  -- False positive ratio: using same FPR definition (FP / (FP + TN))
  COALESCE(
    s.fp::float / NULLIF((s.fp + s.tn)::float, 0),
    0
  ) AS false_positive_ratio,

  -- Valid detection rate: (TP + TN) / total
  COALESCE(
    (s.tp + s.tn)::float / NULLIF(s.total, 0),
    0
  ) AS valid_detection_rate,

  -- Overprediction: (predicted_pos - actual_pos) / total, floored at 0
  GREATEST(
    COALESCE(
      (s.predicted_pos - s.actual_pos) / NULLIF(s.total, 0),
      0
    ),
    0
  ) AS overprediction_rate,

  -- Underprediction: (actual_pos - predicted_pos) / total, floored at 0
  GREATEST(
    COALESCE(
      (s.actual_pos - s.predicted_pos) / NULLIF(s.total, 0),
      0
    ),
    0
  ) AS underprediction_rate,

  -- Total false positive rate: same FPR definition for now (FP / (FP + TN))
  COALESCE(
    s.fp::float / NULLIF((s.fp + s.tn)::float, 0),
    0
  ) AS total_false_positive_rate

FROM
  (
    SELECT
      c.bucket,
      c.tp,
      c.fp,
      c.tn,
      c.fn,
      (c.tp + c.tn + c.fp + c.fn)::float AS total,
      (c.tp + c.fp)::float AS predicted_pos,
      (c.tp + c.fn)::float AS actual_pos
    FROM
      (
        SELECT
          time_bucket (INTERVAL '5 minutes', {{timestamp_col}}) AS bucket,
          SUM(
            CASE
              WHEN {{ground_truth}} = 1
               AND {{prediction}} >= {{threshold}} THEN 1
              ELSE 0
            END
          ) AS tp,
          SUM(
            CASE
              WHEN {{ground_truth}} = 0
               AND {{prediction}} >= {{threshold}} THEN 1
              ELSE 0
            END
          ) AS fp,
          SUM(
            CASE
              WHEN {{ground_truth}} = 0
               AND {{prediction}} < {{threshold}} THEN 1
              ELSE 0
            END
          ) AS tn,
          SUM(
            CASE
              WHEN {{ground_truth}} = 1
               AND {{prediction}} < {{threshold}} THEN 1
              ELSE 0
            END
          ) AS fn
        FROM
          {{dataset}}
        GROUP BY
          bucket
      ) AS c
  ) AS s
ORDER BY
  s.bucket;