Subgroup Rate Comparison
Compares key metrics (e.g., acceptance, error, detection) across segments or cohorts to surface fairness issues and performance gaps between subgroups.
Overview
The Subgroup Rate Comparison bucket evaluates how key classification metrics differ across segments or subgroups (e.g., region, channel, demographic buckets). It is primarily used for fairness, bias, and performance parity analysis.
It helps answer:
- “Is my acceptance rate much higher for one group than another?”
- “Does recall degrade for certain segments?”
- “Are error rates uneven across subgroups?”
Metrics
Let bad_rate be the misclassification rate in a subgroup:
bad_rate_subgroup = (FP + FN) / (TP + FP + FN + TN)Let rate_subgroup be some primary rate of interest (acceptance, recall, FPR, etc.), and rate_reference / bad_rate_reference be the corresponding rate for a reference group (e.g., global population or a designated baseline subgroup).
rate_difference
Absolute difference between a subgroup’s rate and the reference rate:
rate_difference = | rate_subgroup − rate_reference |relative_bad_rate_difference
Relative difference in bad rate between a subgroup and the reference:
relative_bad_rate_difference = (bad_rate_subgroup − bad_rate_reference) / bad_rate_referenceOften expressed as a percentage in UI (e.g., “Subgroup A has 40% higher bad rate than reference group”).
Data Requirements
{{label_col}}– ground truth label{{pred_label_col}}– predicted label (or thresholded score){{subgroup_col}}– subgroup identifier (e.g., channel, geography, age band){{timestamp_col}}– event time
Subgroups should have reasonable sample sizes; very rare groups will yield noisy metrics.
Base Metric SQL — Per-Subgroup Confusion Matrix
WITH base AS (
SELECT
{{timestamp_col}} AS event_ts,
{{label_col}} AS label,
{{pred_label_col}} AS pred_label,
{{subgroup_col}} AS subgroup
FROM {{dataset}}
),
agg AS (
SELECT
time_bucket(INTERVAL '5 minutes', event_ts) AS ts,
subgroup,
COUNT(*) AS total,
SUM(CASE WHEN pred_label = 1 AND label = 1 THEN 1 ELSE 0 END) AS tp,
SUM(CASE WHEN pred_label = 1 AND label = 0 THEN 1 ELSE 0 END) AS fp,
SUM(CASE WHEN pred_label = 0 AND label = 1 THEN 1 ELSE 0 END) AS fn,
SUM(CASE WHEN pred_label = 0 AND label = 0 THEN 1 ELSE 0 END) AS tn
FROM base
GROUP BY ts, subgroup
)
SELECT
ts,
subgroup,
total,
tp,
fp,
fn,
tn,
(tp + fp)::double precision / NULLIF(total, 0) AS acceptance_rate,
(fp + fn)::double precision / NULLIF(total, 0) AS bad_rate
FROM agg;You can store acceptance_rate and bad_rate as reported metrics, then compute disparity metrics in downstream queries.
Plots
Plot 7 — Absolute Rate Difference
Uses:
rate_difference
Example to compute acceptance rate differences vs global:
WITH daily AS (
SELECT
time_bucket(INTERVAL '1 day', ts) AS day,
subgroup,
SUM(tp) AS tp,
SUM(fp) AS fp,
SUM(fn) AS fn,
SUM(tn) AS tn
FROM {{bucket_4_subgroup_metrics}}
GROUP BY day, subgroup
),
rates AS (
SELECT
day,
subgroup,
(tp + fp)::double precision / NULLIF(tp + fp + fn + tn, 0) AS acceptance_rate
FROM daily
),
global_rates AS (
SELECT
day,
AVG(acceptance_rate) AS global_acceptance_rate
FROM rates
GROUP BY day
)
SELECT
r.day,
r.subgroup,
r.acceptance_rate,
ABS(r.acceptance_rate - g.global_acceptance_rate) AS rate_difference
FROM rates r
JOIN global_rates g USING (day);What this shows
This plot highlights how far each subgroup’s acceptance rate deviates from the global average on an absolute scale.
How to interpret it
- Larger
rate_differencevalues mean greater disparity in how frequently groups are accepted. - If specific subgroups consistently show higher or lower acceptance, that may indicate potential bias or misalignment with policy.
- You can set alert thresholds (e.g., “> 5 percentage points difference”) to flag fairness concerns.
Plot 8 — Relative Bad Rate Difference
Uses:
relative_bad_rate_difference
WITH daily AS (
SELECT
time_bucket(INTERVAL '1 day', ts) AS day,
subgroup,
SUM(tp) AS tp,
SUM(fp) AS fp,
SUM(fn) AS fn,
SUM(tn) AS tn
FROM {{bucket_4_subgroup_metrics}}
GROUP BY day, subgroup
),
rates AS (
SELECT
day,
subgroup,
(fp + fn)::double precision / NULLIF(tp + fp + fn + tn, 0) AS bad_rate
FROM daily
),
global_rates AS (
SELECT
day,
AVG(bad_rate) AS global_bad_rate
FROM rates
GROUP BY day
)
SELECT
r.day,
r.subgroup,
r.bad_rate,
(r.bad_rate - g.global_bad_rate) / NULLIF(g.global_bad_rate, 0) AS relative_bad_rate_difference
FROM rates r
JOIN global_rates g USING (day);What this shows
This plot measures how much higher or lower each subgroup’s error rate is relative to the global error rate, on a relative scale.
How to interpret it
- A value of
0.4means “this subgroup’s bad rate is 40% higher than the global average.” - Large positive values highlight groups bearing a disproportionate error burden.
- This is especially useful in fairness/compliance reviews where relative harm matters more than absolute percentage points.
Plot 9 — Combined Disparity View
Uses:
rate_differencerelative_bad_rate_difference
WITH daily AS (
SELECT
time_bucket(INTERVAL '1 day', ts) AS day,
subgroup,
SUM(tp) AS tp,
SUM(fp) AS fp,
SUM(fn) AS fn,
SUM(tn) AS tn
FROM {{bucket_4_subgroup_metrics}}
GROUP BY day, subgroup
),
rates AS (
SELECT
day,
subgroup,
(tp + fp)::double precision / NULLIF(tp + fp + fn + tn, 0) AS acceptance_rate,
(fp + fn)::double precision / NULLIF(tp + fp + fn + tn, 0) AS bad_rate
FROM daily
),
global_rates AS (
SELECT
day,
AVG(acceptance_rate) AS global_acceptance_rate,
AVG(bad_rate) AS global_bad_rate
FROM rates
GROUP BY day
)
SELECT
r.day,
r.subgroup,
r.acceptance_rate,
r.bad_rate,
ABS(r.acceptance_rate - g.global_acceptance_rate) AS rate_difference,
(r.bad_rate - g.global_bad_rate) / NULLIF(g.global_bad_rate, 0) AS relative_bad_rate_difference
FROM rates r
JOIN global_rates g USING (day);What this shows
This plot combines absolute acceptance-rate disparity with relative error-rate disparity for each subgroup.
How to interpret it
- Subgroups with high rate_difference and high relative_bad_rate_difference are “double cluster” risk: they are treated differently and experience more errors.
- Subgroups with low acceptance disparity but high bad-rate disparity might be getting similar volumes, but with very different quality of decisions.
- This combined view is a strong candidate for a “fairness overview” chart for auditors and risk teams.
Alternative SQL
SELECT
r.bucket AS bucket,
-- Absolute difference so it can never be negative
COALESCE(ABS(r.rate_a - r.rate_b), 0) AS rate_difference,
-- Relative bad rate difference (kept signed unless you want absolute here too)
COALESCE((r.rate_a - r.rate_b) / NULLIF(r.rate_b, 0), 0) AS relative_bad_rate_difference
FROM
(
SELECT
c.bucket,
-- Predicted positive rate for Group A
COALESCE(
SUM(
CASE
WHEN c.group_val = '{{group_a}}'
AND c.pred_label = 1 THEN 1
ELSE 0
END
)::float
/ NULLIF(
SUM(
CASE
WHEN c.group_val = '{{group_a}}' THEN 1
ELSE 0
END
),
0
),
0
) AS rate_a,
-- Predicted positive rate for Group B
COALESCE(
SUM(
CASE
WHEN c.group_val = '{{group_b}}'
AND c.pred_label = 1 THEN 1
ELSE 0
END
)::float
/ NULLIF(
SUM(
CASE
WHEN c.group_val = '{{group_b}}' THEN 1
ELSE 0
END
),
0
),
0
) AS rate_b,
c.bucket
FROM
(
SELECT
time_bucket (INTERVAL '5 minutes', {{timestamp_col}}) AS bucket,
{{group_col}} AS group_val,
CASE
WHEN {{score_col}} >= {{threshold}} THEN 1
ELSE 0
END AS pred_label
FROM
{{dataset}}
) AS c
GROUP BY
c.bucket
) AS r
ORDER BY
r.bucket;
Updated 1 day ago