Gini Coefficient (Single-Threshold Metric)

This document provides an example of the metric Gini Coefficient.

Overview

This custom metric computes a daily Gini value at a fixed decision threshold on your prediction column.
It treats rows as being above or below the threshold and measures how mixed those two groups are.

  • Lower values (closer to 0) → more pure (most predictions on one side of the threshold).
  • Higher values (toward the maximum) → more mixed, i.e., less separation at that threshold.

This is useful when you want a simple, threshold-specific impurity measure that you can track over time alongside other classification metrics.


Step 1: Write the SQL

This SQL:

  • Buckets rows into 1-day windows
  • Splits predictions using a configurable thresholdValue
  • Computes a Gini-style impurity based on the proportion above vs below that threshold
SELECT
  time_bucket (INTERVAL '1 day', {{timestampColumnName}}) AS ts,
  1 - (
    POWER(
      SUM(
        CASE
          WHEN {{predictionColumnName}} >= {{thresholdValue}} THEN 1
          ELSE 0
        END
      ) * 1.0 / COUNT(*),
      2
    ) + POWER(
      SUM(
        CASE
          WHEN {{predictionColumnName}} < {{thresholdValue}} THEN 1
          ELSE 0
        END
      ) * 1.0 / COUNT(*),
      2
    )
  ) AS gini_coefficient
FROM
  {{dataset}}
GROUP BY
  ts
ORDER BY
  ts;

What this query is doing

  • time_bucket('1 day', {{timestampColumnName}}) groups events into daily time buckets and exposes the result as ts.
  • The two SUM(CASE ...) blocks count:
    • Rows with {{predictionColumnName}} >= {{thresholdValue}}
    • Rows with {{predictionColumnName}} < {{thresholdValue}}
  • Each count is divided by COUNT(*) to get a proportion.
  • The expression 1 - (p_high² + p_low²) is the Gini impurity of this two-group split, returned as gini_coefficient.

Step 2: Fill Basic Information

When creating the custom metric in the Arthur UI:

  1. Name:
    Gini Coefficient

  2. Description (optional but recommended):
    Daily Gini-style impurity at a fixed prediction threshold, based on how many predictions fall above vs below the threshold.


Step 3: Configure the Aggregate Arguments

You will set up four aggregate arguments to parameterize the SQL.

Argument 1 — Timestamp Column

  1. Parameter Key: timestampColumnName
  2. Friendly Name: TimestampColumnName
  3. Description: Column parameter: timestampColumnName
  4. Parameter Type: Column
  5. Source Dataset Parameter Key: Dataset (dataset)
  6. Allow Any Column Type: No
  7. Tag hints (optional): primary_timestamp
  8. Allowed Column Types (optional): timestamp

This tells Arthur which timestamp column to use for the time_bucket function.


Argument 2 — Prediction Column

  1. Parameter Key: predictionColumnName
  2. Friendly Name: PredictionColumnName
  3. Description: Column parameter: predictionColumnName
  4. Parameter Type: Column
  5. Source Dataset Parameter Key: Dataset (dataset)
  6. Allow Any Column Type: No
  7. Tag hints (optional): prediction
  8. Allowed Column Types (optional): float

This should point to your model’s prediction or score column (typically a probability for the positive class).


Argument 3 — Threshold Value

  1. Parameter Key: thresholdValue
  2. Friendly Name: ThresholdValue
  3. Description: Literal threshold used to split predictions into high vs low groups.
  4. Parameter Type: Literal
  5. Data Type: Float

Use this to match your operating threshold (e.g., 0.5) or any other cutpoint you care about. You can later clone this metric with different thresholds if needed.


Argument 4 — Dataset

  1. Parameter Key: dataset
  2. Friendly Name: Dataset
  3. Description: Dataset for the aggregation.
  4. Parameter Type: Dataset

This links the metric definition to whichever Arthur dataset (inference or batch) you want to compute Gini on.


Step 4: Configure the Reported Metrics

Reported Metric 1 — Gini Coefficient

  1. Metric Name: Gini Coefficient
  2. Description: Daily Gini-style impurity of predictions at the configured threshold.
  3. Value Column: gini_coefficient
  4. Timestamp Column: ts
  5. Metric Kind: Numeric

This tells Arthur which column from the SQL result to store as the metric value and which column is the associated timestamp.


Interpreting the Gini Coefficient

  • Low values (close to 0)
    • Most predictions are concentrated on one side of the threshold.
    • The split is “pure”: the threshold is separating your dataset into a dominant group and a small minority.
  • Higher values
    • Predictions are more evenly split above vs below the threshold.
    • The threshold is less discriminative in terms of how it partitions the population.

You can:

  • Plot this metric over time to see whether the sharpness of your decision boundary is changing.
  • Use multiple versions (different thresholdValues) to compare how different thresholds behave operationally.