Detection & Acceptance Profile
Maps out how recall, precision, accuracy, and acceptance rate trade off as you move the decision threshold, so you can pick operating points aligned with business goals.
Overview
The Detection & Acceptance Profile bucket characterizes how your model’s detection power and acceptance behavior change as you move the decision threshold on the positive-class score.
It answers questions like:
- “If I tighten my threshold to reduce volume, how much recall do I lose?”
- “Where is the best operating point to balance business capacity and risk?”
This bucket supports:
- Binary classification, directly on the positive-class score
- Multiclass classification, via per-class one-vs-rest profiles
Metrics
Let TP, FP, FN, TN be computed at a given threshold, with Total = TP + FP + FN + TN.
capture_rate
Fraction of the population that the model “captures” as positive (acceptance volume):
capture_rate = (TP + FP) / Totalcorrect_detection_rate
Overall fraction of correct decisions (global accuracy):
correct_detection_rate = (TP + TN) / Totaltrue_detection_rate
Quality of the accepted positives, i.e., precision:
true_detection_rate = TP / (TP + FP)true_positive_rate
Classic recall / TPR:
true_positive_rate = TP / (TP + FN)correct_acceptance_rate
Fraction of all cases that are correctly accepted as positive:
correct_acceptance_rate = TP / Totalvalid_detection_rate
Same quantity as accuracy but used explicitly in plots with “acceptance”:
valid_detection_rate = (TP + TN) / TotalYou can compute all of these from a single confusion matrix per threshold and bucket.
Data Requirements
{{label_col}}– ground truth binary label (or per-class label for multiclass){{score_col}}– predicted probability or score for the positive class{{timestamp_col}}– event or prediction time
Base Metric SQL — Threshold Grid
WITH base AS (
SELECT
{{timestamp_col}} AS event_ts,
{{label_col}} AS label,
{{score_col}} AS score
FROM {{dataset}}
),
grid AS (
SELECT
generate_series(0.0, 1.0, 0.01) AS threshold
),
scored AS (
SELECT
time_bucket(INTERVAL '5 minutes', event_ts) AS ts,
g.threshold,
label,
score,
CASE WHEN score >= g.threshold THEN 1 ELSE 0 END AS pred_pos
FROM base
CROSS JOIN grid g
)
SELECT
ts,
threshold,
COUNT(*) AS total,
SUM(CASE WHEN label = 1 THEN 1 ELSE 0 END) AS actual_pos,
SUM(CASE WHEN label = 0 THEN 1 ELSE 0 END) AS actual_neg,
SUM(CASE WHEN pred_pos = 1 AND label = 1 THEN 1 ELSE 0 END) AS tp,
SUM(CASE WHEN pred_pos = 1 AND label = 0 THEN 1 ELSE 0 END) AS fp,
SUM(CASE WHEN pred_pos = 0 AND label = 1 THEN 1 ELSE 0 END) AS fn,
SUM(CASE WHEN pred_pos = 0 AND label = 0 THEN 1 ELSE 0 END) AS tn
FROM scored
GROUP BY ts, threshold
ORDER BY ts, threshold;You can register tp, fp, fn, tn, and total as reported metrics, and derive the named rates in queries or as additional reported metrics.
Plots
Plot 4 — Recall Variants Over Time
Uses:
capture_ratecorrect_detection_ratetrue_detection_ratetrue_positive_rate
WITH daily AS (
SELECT
time_bucket(INTERVAL '1 day', ts) AS day,
threshold,
SUM(tp) AS tp,
SUM(fp) AS fp,
SUM(fn) AS fn,
SUM(tn) AS tn
FROM {{bucket_3_detection_acceptance_metrics}}
GROUP BY day, threshold
)
SELECT
day,
threshold,
(tp + fp)::double precision / NULLIF(tp + fp + fn + tn, 0) AS capture_rate,
(tp + tn)::double precision / NULLIF(tp + fp + fn + tn, 0) AS correct_detection_rate,
tp::double precision / NULLIF(tp + fp, 0) AS true_detection_rate,
tp::double precision / NULLIF(tp + fn, 0) AS true_positive_rate
FROM daily
ORDER BY day, threshold;What this shows
For each day and threshold, this plot shows how volume, accuracy, precision, and recall move together. It lets you see how different operating points behave over time.
How to interpret it
- Use vertical slices (fixed
day) to compare thresholds and choose an operating point. - Use horizontal slices (fixed
threshold) to see whether recall or precision is drifting. - If
capture_rateis stable buttrue_detection_ratedrops, the model is accepting the same volume but with worse quality (precision regression).
Plot 5 — Acceptance + Accuracy
Uses:
correct_acceptance_ratevalid_detection_rate
WITH daily AS (
SELECT
time_bucket(INTERVAL '1 day', ts) AS day,
threshold,
SUM(tp) AS tp,
SUM(fp) AS fp,
SUM(fn) AS fn,
SUM(tn) AS tn
FROM {{bucket_3_detection_acceptance_metrics}}
GROUP BY day, threshold
)
SELECT
day,
threshold,
tp::double precision / NULLIF(tp + fp + fn + tn, 0) AS correct_acceptance_rate,
(tp + tn)::double precision / NULLIF(tp + fp + fn + tn, 0) AS valid_detection_rate
FROM daily
ORDER BY day, threshold;What this shows
This plot focuses on how many cases are correctly picked up as positive (correct_acceptance_rate) vs how often the model is right overall (valid_detection_rate).
How to interpret it
- Points with high valid_detection_rate but low correct_acceptance_rate mean the model is accurate but conservative—good at saying “no,” not at finding positives.
- Points with high correct_acceptance_rate but modest valid_detection_rate indicate the model is catching many positives but also making more mistakes elsewhere.
- This is a good “business-friendly” view when explaining model performance to non-ML stakeholders.
Plot 6 — Detection vs Acceptance Trade-Off
Uses:
true_positive_ratecorrect_acceptance_rate
WITH daily AS (
SELECT
time_bucket(INTERVAL '1 day', ts) AS day,
threshold,
SUM(tp) AS tp,
SUM(fp) AS fp,
SUM(fn) AS fn,
SUM(tn) AS tn
FROM {{bucket_3_detection_acceptance_metrics}}
GROUP BY day, threshold
)
SELECT
day,
threshold,
tp::double precision / NULLIF(tp + fn, 0) AS true_positive_rate,
tp::double precision / NULLIF(tp + fp + fn + tn, 0) AS correct_acceptance_rate
FROM daily
ORDER BY day, threshold;What this shows
This plot is a trade-off curve between recall (true_positive_rate) and how much correctly-accepted positive volume you get (correct_acceptance_rate) as you move the threshold.
How to interpret it
- Moving along the curve corresponds to adjusting the threshold.
- Regions where small increases in acceptance yield big gains in recall are often attractive operating points.
- If the curve is very flat, the model may lack discriminative power in the relevant region, and you may need feature or model improvements rather than threshold tweaks.
Binary vs Multiclass
- Binary: use the natural positive class and its probability as
score. - Multiclass: for each class
cof interest:- Define
label = 1when the ground truth label isc, else 0. - Use the model’s predicted probability for class
casscore. - Compute a Detection & Acceptance profile per class.
- Define
Alternative SQL Example
SELECT
s.bucket AS bucket,
-- Positive-class detection (all recall/TPR variants)
COALESCE(
s.tp / NULLIF(s.tp + s.fn, 0),
0
) AS capture_rate,
COALESCE(
(s.tp + s.tn) / NULLIF(s.total, 0),
0
) AS correct_detection_rate,
COALESCE(
s.tp / NULLIF(s.tp + s.fn, 0),
0
) AS true_detection_rate,
COALESCE(
s.tp / NULLIF(s.tp + s.fn, 0),
0
) AS true_positive_rate,
-- Correct acceptance rate: TP / (TP + TN + FP + FN)
COALESCE(
s.tp / NULLIF(s.total, 0),
0
) AS correct_acceptance_rate,
-- Overall correctness: (TP + TN) / (TP + TN + FP + FN)
COALESCE(
(s.tp + s.tn) / NULLIF(s.total, 0),
0
) AS valid_detection_rate
FROM
(
SELECT
c.bucket AS bucket,
c.tp::float AS tp,
c.fp::float AS fp,
c.tn::float AS tn,
c.fn::float AS fn,
(c.tp + c.tn + c.fp + c.fn)::float AS total
FROM
(
SELECT
time_bucket (INTERVAL '5 minutes', {{timestamp_col}}) AS bucket,
SUM(
CASE
WHEN {{ground_truth}} = 1
AND {{prediction}} >= {{threshold}} THEN 1
ELSE 0
END
) AS tp,
SUM(
CASE
WHEN {{ground_truth}} = 0
AND {{prediction}} >= {{threshold}} THEN 1
ELSE 0
END
) AS fp,
SUM(
CASE
WHEN {{ground_truth}} = 0
AND {{prediction}} < {{threshold}} THEN 1
ELSE 0
END
) AS tn,
SUM(
CASE
WHEN {{ground_truth}} = 1
AND {{prediction}} < {{threshold}} THEN 1
ELSE 0
END
) AS fn
FROM
{{dataset}}
GROUP BY
bucket
) AS c
) AS s
ORDER BY
s.bucket;
Updated 1 day ago