Detection & Acceptance Profile

Maps out how recall, precision, accuracy, and acceptance rate trade off as you move the decision threshold, so you can pick operating points aligned with business goals.

Overview

The Detection & Acceptance Profile bucket characterizes how your model’s detection power and acceptance behavior change as you move the decision threshold on the positive-class score.

It answers questions like:

  • “If I tighten my threshold to reduce volume, how much recall do I lose?”
  • “Where is the best operating point to balance business capacity and risk?”

This bucket supports:

  • Binary classification, directly on the positive-class score
  • Multiclass classification, via per-class one-vs-rest profiles

Metrics

Let TP, FP, FN, TN be computed at a given threshold, with Total = TP + FP + FN + TN.

capture_rate
Fraction of the population that the model “captures” as positive (acceptance volume):

capture_rate = (TP + FP) / Total

correct_detection_rate
Overall fraction of correct decisions (global accuracy):

correct_detection_rate = (TP + TN) / Total

true_detection_rate
Quality of the accepted positives, i.e., precision:

true_detection_rate = TP / (TP + FP)

true_positive_rate
Classic recall / TPR:

true_positive_rate = TP / (TP + FN)

correct_acceptance_rate
Fraction of all cases that are correctly accepted as positive:

correct_acceptance_rate = TP / Total

valid_detection_rate
Same quantity as accuracy but used explicitly in plots with “acceptance”:

valid_detection_rate = (TP + TN) / Total

You can compute all of these from a single confusion matrix per threshold and bucket.

Data Requirements

  • {{label_col}} – ground truth binary label (or per-class label for multiclass)
  • {{score_col}} – predicted probability or score for the positive class
  • {{timestamp_col}} – event or prediction time

Base Metric SQL — Threshold Grid

WITH base AS (
    SELECT
        {{timestamp_col}} AS event_ts,
        {{label_col}}    AS label,
        {{score_col}}    AS score
    FROM {{dataset}}
),
grid AS (
    SELECT
        generate_series(0.0, 1.0, 0.01) AS threshold
),
scored AS (
    SELECT
        time_bucket(INTERVAL '5 minutes', event_ts) AS ts,
        g.threshold,
        label,
        score,
        CASE WHEN score >= g.threshold THEN 1 ELSE 0 END AS pred_pos
    FROM base
    CROSS JOIN grid g
)
SELECT
    ts,
    threshold,
    COUNT(*)                                                   AS total,
    SUM(CASE WHEN label = 1 THEN 1 ELSE 0 END)                AS actual_pos,
    SUM(CASE WHEN label = 0 THEN 1 ELSE 0 END)                AS actual_neg,
    SUM(CASE WHEN pred_pos = 1 AND label = 1 THEN 1 ELSE 0 END) AS tp,
    SUM(CASE WHEN pred_pos = 1 AND label = 0 THEN 1 ELSE 0 END) AS fp,
    SUM(CASE WHEN pred_pos = 0 AND label = 1 THEN 1 ELSE 0 END) AS fn,
    SUM(CASE WHEN pred_pos = 0 AND label = 0 THEN 1 ELSE 0 END) AS tn
FROM scored
GROUP BY ts, threshold
ORDER BY ts, threshold;

You can register tp, fp, fn, tn, and total as reported metrics, and derive the named rates in queries or as additional reported metrics.

Plots

Plot 4 — Recall Variants Over Time

Uses:

  • capture_rate
  • correct_detection_rate
  • true_detection_rate
  • true_positive_rate
WITH daily AS (
    SELECT
        time_bucket(INTERVAL '1 day', ts) AS day,
        threshold,
        SUM(tp) AS tp,
        SUM(fp) AS fp,
        SUM(fn) AS fn,
        SUM(tn) AS tn
    FROM {{bucket_3_detection_acceptance_metrics}}
    GROUP BY day, threshold
)
SELECT
    day,
    threshold,
    (tp + fp)::double precision / NULLIF(tp + fp + fn + tn, 0) AS capture_rate,
    (tp + tn)::double precision / NULLIF(tp + fp + fn + tn, 0) AS correct_detection_rate,
    tp::double precision / NULLIF(tp + fp, 0)                  AS true_detection_rate,
    tp::double precision / NULLIF(tp + fn, 0)                  AS true_positive_rate
FROM daily
ORDER BY day, threshold;

What this shows
For each day and threshold, this plot shows how volume, accuracy, precision, and recall move together. It lets you see how different operating points behave over time.

How to interpret it

  • Use vertical slices (fixed day) to compare thresholds and choose an operating point.
  • Use horizontal slices (fixed threshold) to see whether recall or precision is drifting.
  • If capture_rate is stable but true_detection_rate drops, the model is accepting the same volume but with worse quality (precision regression).

Plot 5 — Acceptance + Accuracy

Uses:

  • correct_acceptance_rate
  • valid_detection_rate
WITH daily AS (
    SELECT
        time_bucket(INTERVAL '1 day', ts) AS day,
        threshold,
        SUM(tp) AS tp,
        SUM(fp) AS fp,
        SUM(fn) AS fn,
        SUM(tn) AS tn
    FROM {{bucket_3_detection_acceptance_metrics}}
    GROUP BY day, threshold
)
SELECT
    day,
    threshold,
    tp::double precision / NULLIF(tp + fp + fn + tn, 0)        AS correct_acceptance_rate,
    (tp + tn)::double precision / NULLIF(tp + fp + fn + tn, 0) AS valid_detection_rate
FROM daily
ORDER BY day, threshold;

What this shows
This plot focuses on how many cases are correctly picked up as positive (correct_acceptance_rate) vs how often the model is right overall (valid_detection_rate).

How to interpret it

  • Points with high valid_detection_rate but low correct_acceptance_rate mean the model is accurate but conservative—good at saying “no,” not at finding positives.
  • Points with high correct_acceptance_rate but modest valid_detection_rate indicate the model is catching many positives but also making more mistakes elsewhere.
  • This is a good “business-friendly” view when explaining model performance to non-ML stakeholders.

Plot 6 — Detection vs Acceptance Trade-Off

Uses:

  • true_positive_rate
  • correct_acceptance_rate
WITH daily AS (
    SELECT
        time_bucket(INTERVAL '1 day', ts) AS day,
        threshold,
        SUM(tp) AS tp,
        SUM(fp) AS fp,
        SUM(fn) AS fn,
        SUM(tn) AS tn
    FROM {{bucket_3_detection_acceptance_metrics}}
    GROUP BY day, threshold
)
SELECT
    day,
    threshold,
    tp::double precision / NULLIF(tp + fn, 0)                  AS true_positive_rate,
    tp::double precision / NULLIF(tp + fp + fn + tn, 0)        AS correct_acceptance_rate
FROM daily
ORDER BY day, threshold;

What this shows
This plot is a trade-off curve between recall (true_positive_rate) and how much correctly-accepted positive volume you get (correct_acceptance_rate) as you move the threshold.

How to interpret it

  • Moving along the curve corresponds to adjusting the threshold.
  • Regions where small increases in acceptance yield big gains in recall are often attractive operating points.
  • If the curve is very flat, the model may lack discriminative power in the relevant region, and you may need feature or model improvements rather than threshold tweaks.

Binary vs Multiclass

  • Binary: use the natural positive class and its probability as score.
  • Multiclass: for each class c of interest:
    • Define label = 1 when the ground truth label is c, else 0.
    • Use the model’s predicted probability for class c as score.
    • Compute a Detection & Acceptance profile per class.

Alternative SQL Example

SELECT
  s.bucket AS bucket,

  -- Positive-class detection (all recall/TPR variants)
  COALESCE(
    s.tp / NULLIF(s.tp + s.fn, 0),
    0
  ) AS capture_rate,
  COALESCE(
    (s.tp + s.tn) / NULLIF(s.total, 0),
    0
  ) AS correct_detection_rate,
  COALESCE(
    s.tp / NULLIF(s.tp + s.fn, 0),
    0
  ) AS true_detection_rate,
  COALESCE(
    s.tp / NULLIF(s.tp + s.fn, 0),
    0
  ) AS true_positive_rate,

  -- Correct acceptance rate: TP / (TP + TN + FP + FN)
  COALESCE(
    s.tp / NULLIF(s.total, 0),
    0
  ) AS correct_acceptance_rate,

  -- Overall correctness: (TP + TN) / (TP + TN + FP + FN)
  COALESCE(
    (s.tp + s.tn) / NULLIF(s.total, 0),
    0
  ) AS valid_detection_rate

FROM
  (
    SELECT
      c.bucket AS bucket,
      c.tp::float AS tp,
      c.fp::float AS fp,
      c.tn::float AS tn,
      c.fn::float AS fn,
      (c.tp + c.tn + c.fp + c.fn)::float AS total
    FROM
      (
        SELECT
          time_bucket (INTERVAL '5 minutes', {{timestamp_col}}) AS bucket,
          SUM(
            CASE
              WHEN {{ground_truth}} = 1
               AND {{prediction}} >= {{threshold}} THEN 1
              ELSE 0
            END
          ) AS tp,
          SUM(
            CASE
              WHEN {{ground_truth}} = 0
               AND {{prediction}} >= {{threshold}} THEN 1
              ELSE 0
            END
          ) AS fp,
          SUM(
            CASE
              WHEN {{ground_truth}} = 0
               AND {{prediction}} < {{threshold}} THEN 1
              ELSE 0
            END
          ) AS tn,
          SUM(
            CASE
              WHEN {{ground_truth}} = 1
               AND {{prediction}} < {{threshold}} THEN 1
              ELSE 0
            END
          ) AS fn
        FROM
          {{dataset}}
        GROUP BY
          bucket
      ) AS c
  ) AS s
ORDER BY
  s.bucket;