Positive-Class Error Profile

Overview

The Positive-Class Error Profile bucket describes how your classifier makes mistakes on the positive class across the score distribution. It focuses on:

Where false positives and false negatives concentrate
How error behavior changes as scores increase
How much “bad” volume you get when you target a given positive-class segment

This bucket is most natural for binary classification but can be applied to multiclass by defining a one-vs-rest positive class (e.g., “fraud” vs “not fraud”).

Metrics

All metrics are defined in terms of the confusion-matrix counts within a segment (e.g., score bin, time bucket):

TP – true positives
FP – false positives
FN – false negatives
TN – true negatives
Total = TP + FP + FN + TN

adjusted_false_positive_rate
False positive rate on negative cases (optionally smoothed to avoid divide-by-zero):

adjusted_false_positive_rate = FP / (FP + TN)

bad_case_rate
Fraction of all cases that are classified as bad (prediction = 0):

bad_case_rate = (FN + TN) / (TP + FP + FN + TN)

false_positive_ratio
Share of predicted positive cases that are actually negative (how “dirty” your positive bucket is):

false_positive_ratio = FP / (TP + FP)

total_false_positive_rate
Fraction of all cases that are false positives:

total_false_positive_rate = FP / (TP + FP + FN + TN)

overprediction_rate
Rate at which the model over-predicts positives relative to the negative population (conceptually FPR):

overprediction_rate = FP / (FP + TN)

underprediction_rate
Rate at which the model under-predicts positives (missed positives) relative to actual positives (FNR):

underprediction_rate = FN / (TP + FN)

valid_detection_rate
Overall fraction of correctly classified cases (global accuracy):

valid_detection_rate = (TP + TN) / (TP + FP + FN + TN)

Data Requirements

Your dataset must include:

{{label_col}} – ground truth label (0/1 for binary; specific class for multiclass one-vs-rest)
{{score_col}} – predicted probability or score for the positive class
{{timestamp_col}} – event or prediction timestamp
Optional: {{weight_col}} – sample weight (if used)

Base Metric SQL (Per-Score-Bin Confusion Matrix)

This SQL computes confusion-matrix counts and derived rates per score bin and 5-minute time bucket using a default threshold of 0.5. You can change the threshold if your application uses a different operating point.

WITH counts AS (
  SELECT
    time_bucket(INTERVAL '1 day', {{timestamp_col}}) AS bucket,
    SUM(CASE WHEN {{ground_truth}} = 1 AND {{prediction}} >= {{threshold}} THEN 1 ELSE 0 END) AS tp,
    SUM(CASE WHEN {{ground_truth}} = 0 AND {{prediction}} >= {{threshold}} THEN 1 ELSE 0 END) AS fp,
    SUM(CASE WHEN {{ground_truth}} = 0 AND {{prediction}} <  {{threshold}} THEN 1 ELSE 0 END) AS tn,
    SUM(CASE WHEN {{ground_truth}} = 1 AND {{prediction}} <  {{threshold}} THEN 1 ELSE 0 END) AS fn
  FROM {{dataset}}
  GROUP BY 1
),
prepared AS (
  SELECT
    bucket,
    tp::float AS tp,
    fp::float AS fp,
    tn::float AS tn,
    fn::float AS fn,
    (tp + fp + tn + fn)::float AS total,
    (tp + fp)::float          AS predicted_pos,
    (tp + fn)::float          AS actual_pos,
    (fp + tn)::float          AS negatives
  FROM counts
)
SELECT
  bucket AS bucket,

  -- Adjusted False Positive Rate: FP / negatives
  CASE WHEN negatives > 0 THEN fp / negatives ELSE 0 END
    AS adjusted_false_positive_rate,

  -- Bad Case Rate: actual "bad" cases / total
  CASE WHEN total > 0 THEN (tp + fn) / total ELSE 0 END
    AS bad_case_rate,

  -- False Positive Ratio: FP / total
  CASE WHEN total > 0 THEN fp / total ELSE 0 END
    AS false_positive_ratio,

  -- Valid Detection Rate: (TP + TN) / total
  CASE WHEN total > 0 THEN (tp + tn) / total ELSE 0 END
    AS valid_detection_rate,

  -- Overprediction: (predicted_pos - actual_pos) / total, floored at 0
  CASE WHEN total > 0 THEN GREATEST((predicted_pos - actual_pos) / total, 0)
       ELSE 0 END
    AS overprediction_rate,

  -- Underprediction: (actual_pos - predicted_pos) / total, floored at 0
  CASE WHEN total > 0 THEN GREATEST((actual_pos - predicted_pos) / total, 0)
       ELSE 0 END
    AS underprediction_rate,

  -- Total False Positive Rate: global FP / global total
  CASE WHEN SUM(total) OVER () > 0
       THEN SUM(fp) OVER () / SUM(total) OVER ()
       ELSE 0 END
    AS total_false_positive_rate

FROM prepared
ORDER BY bucket;

You can register any or all of these derived columns as reported metrics.

Plots (Daily Aggregated)

Preview Data

for startDate use 2025-11-26T17:54:05.425Z for endDate use 2025-12-10T17:54:05.425Z

Plot 1 — FP & Bad Case Rates Over Time

Uses:

adjusted_false_positive_rate
false_positive_ratio
total_false_positive_rate
bad_case_rate

SELECT 
    time_bucket_gapfill(
        '1 day',
        timestamp,
        '{{dateStart}}'::timestamptz,
        '{{dateEnd}}'::timestamptz
    ) AS time_bucket_1d,
    
    metric_name,
    
    CASE 
        WHEN metric_name = 'adjusted_false_positive_rate' THEN 'Adjusted False Positive Rate'
        WHEN metric_name = 'false_positive_ratio'         THEN 'False Positive Ratio'
        WHEN metric_name = 'total_false_positive_rate'    THEN 'Total False Positive Rate'
        WHEN metric_name = 'bad_case_rate'                THEN 'Bad Case Rate'
        ELSE metric_name
    END AS friendly_name,
    
    COALESCE(AVG(value), 0) AS metric_value

FROM metrics_numeric_latest_version
WHERE metric_name IN (
    'adjusted_false_positive_rate',
    'false_positive_ratio',
    'total_false_positive_rate',
    'bad_case_rate'
)
[[AND timestamp BETWEEN '{{dateStart}}' AND '{{dateEnd}}']]

GROUP BY time_bucket_1d, metric_name
ORDER BY time_bucket_1d, metric_name;

What this shows
This plot trends multiple notions of “false positives” and “bad outcomes” over time. It lets you see whether the model is:

Flagging too many negatives as positives (adjusted_false_positive_rate / total_false_positive_rate)
Putting too many negatives into the positive bucket (false_positive_ratio)
Over-classifying cases as bad overall (bad_case_rate)

How to interpret it

Spikes in any FP-related line often correspond to data issues, model regressions, or policy changes.
A rising bad_case_rate without business explanation may mean the model is over-declining / over-rejecting.
If FP rates increase while business KPIs worsen, this is a strong signal that thresholds or retraining should be reviewed.

Plot 2 — Overprediction vs Underprediction

Uses:

overprediction_rate
underprediction_rate

SELECT 
    time_bucket_gapfill(
        '1 day',
        timestamp,
        '{{dateStart}}'::timestamptz,
        '{{dateEnd}}'::timestamptz
    ) AS time_bucket_1d,
    
    metric_name,
    
    CASE 
        WHEN metric_name = 'overprediction_rate' THEN 'Overprediction Rate'
        WHEN metric_name = 'underprediction_rate' THEN 'Underprediction Rate'
        ELSE metric_name
    END AS friendly_name,
    
    COALESCE(AVG(value), 0) AS metric_value

FROM metrics_numeric_latest_version
WHERE metric_name IN (
    'overprediction_rate',
    'underprediction_rate'
)
[[AND timestamp BETWEEN '{{dateStart}}' AND '{{dateEnd}}']]

GROUP BY time_bucket_1d, metric_name
ORDER BY time_bucket_1d, metric_name;

What this shows
This plot compares how often the model over-predicts positives (FPs) vs under-predicts positives (FNs) over time.

How to interpret it

If overprediction_rate >> underprediction_rate, the model is aggressively calling positives, likely impacting cost/capacity.
If underprediction_rate >> overprediction_rate, the model is missing many true positives, impacting risk detection.
Ideally, the ratio between the two aligns with business preferences: in some risk domains, you prefer more FPs; in others, you strongly penalize FNs.

Plot 3 — False Positive Ratio vs Valid Detection Rate

Uses:

false_positive_ratio
valid_detection_rate

SELECT 
    time_bucket_gapfill(
        '1 day',
        timestamp,
        '{{dateStart}}'::timestamptz,
        '{{dateEnd}}'::timestamptz
    ) AS time_bucket_1d,
    
    metric_name,
    
    CASE 
        WHEN metric_name = 'false_positive_ratio' THEN 'False Positive Ratio'
        WHEN metric_name = 'valid_detection_rate' THEN 'Valid Detection Rate'
        ELSE metric_name
    END AS friendly_name,
    
    COALESCE(AVG(value), 0) AS metric_value

FROM metrics_numeric_latest_version
WHERE metric_name IN (
    'false_positive_ratio',
    'valid_detection_rate'
)
[[AND timestamp BETWEEN '{{dateStart}}' AND '{{dateEnd}}']]

GROUP BY time_bucket_1d, metric_name
ORDER BY time_bucket_1d, metric_name;

What this shows
This plot contrasts how dirty the positive bucket is (false_positive_ratio) with overall correctness (valid_detection_rate).

How to interpret it

Days where false_positive_ratio is high but valid_detection_rate remains flat may mean errors are mostly concentrated in positives rather than negatives.
If both degrade together, the model is likely struggling broadly (not just in the positive segment).
You can use this to explain to stakeholders why precision dropped: because the model is trading global accuracy for more aggressive positive predictions.

Binary vs Multiclass

Binary: use label ∈ {0,1} and score as the probability p(y=1 | x).
Multiclass: choose a {{positive_class_value}} and convert to one-vs-rest:
```
CASE WHEN {{label_col}} = '{{positive_class_value}}' THEN 1 ELSE 0 END AS label
```
Use the probability for that class as score. Repeat the metric for each class of interest.

Use Cases

Risk scoring (fraud, credit, abuse detection)
Triage models where analysts work the top-scoring cases
Any binary decisioning system with high cost asymmetry between FP and FN