Metrics Versioning

Overview

Arthur supports two related but distinct versioning concepts:

  1. Tracking metrics by application version — segment a metric by a version identifier (e.g., prompt_version_id) so your dashboard shows separate lines per version. This is the primary use case when you want to compare prompt versions, model versions, or any categorical dimension over time.

  2. Metric computation versions — each time Arthur recalculates metrics, it creates a new internal version snapshot. The _latest_version views always surface the most recent computation automatically.


Tracking Metrics by Application Version

How It Works

When you configure a metric, you specify one or more Segmentation Columns — fields in your inference data (e.g., prompt_version_id, model_version, environment) that Arthur stores in the dimensions JSONB column alongside each metric row.

Once stored, you can group or filter by those dimensions in any chart SQL query — producing separate series per version.

The result looks like this:

metrics_version1

The Inference Count by Prompt Version chart shows a separate line for each value of prompt_version_id (1.0-rc1, 1.0-rc2, 1.0-rc3, 1.0-rc4) — with no additional SQL complexity beyond GROUP BY dimensions->>'prompt_version_id'.


Step 1 — Configure a Segmentation Column

When setting up a metric's configuration, add the version identifier field as a Segmentation Column:

metrics_version2
FieldExample valueDescription
Aggregation TypeInference CountThe metric to compute
Timestamp Columninference_timeThe field used as the metric timestamp
Segmentation Columnsprompt_version_idThe field(s) Arthur stores in dimensions for grouping

You can add multiple segmentation columns. Each becomes a key in the dimensions JSONB object — queryable as dimensions->>'column_name'.


Step 2 — Query by Version Dimension

Once the segmentation column is configured, use the ->> operator to group or filter by it:

Inference count broken down by prompt version:

SELECT
  time_bucket('1 day', timestamp)     AS bucket,
  dimensions->>'prompt_version_id'    AS prompt_version,
  SUM(value)                           AS count
FROM metrics_numeric_latest_version
WHERE metric_name = 'inference_count'
  [[ AND timestamp BETWEEN '{{dateStart}}' AND '{{dateEnd}}' ]]
GROUP BY bucket, prompt_version
ORDER BY bucket, prompt_version;

Filter to a specific version:

SELECT
  time_bucket('1 day', timestamp) AS bucket,
  SUM(value) AS count
FROM metrics_numeric_latest_version
WHERE metric_name  = 'inference_count'
  AND dimensions->>'prompt_version_id' = '1.0-rc2'
  [[ AND timestamp BETWEEN '{{dateStart}}' AND '{{dateEnd}}' ]]
GROUP BY bucket
ORDER BY bucket;

Compare two versions side by side:

SELECT
  time_bucket('1 day', timestamp)                                                     AS bucket,
  SUM(value) FILTER (WHERE dimensions->>'prompt_version_id' = '1.0-rc1')             AS rc1_count,
  SUM(value) FILTER (WHERE dimensions->>'prompt_version_id' = '1.0-rc2')             AS rc2_count
FROM metrics_numeric_latest_version
WHERE metric_name = 'inference_count'
  AND dimensions->>'prompt_version_id' IN ('1.0-rc1', '1.0-rc2')
  [[ AND timestamp BETWEEN '{{dateStart}}' AND '{{dateEnd}}' ]]
GROUP BY bucket
ORDER BY bucket;

Building the Chart

In the Create chart builder:

  1. Write a query that includes the version dimension column in your SELECT (e.g., dimensions->>'prompt_version_id' AS prompt_version).
  2. Set X Axis to your time bucket column.
  3. Set Y Axis to your value column.
  4. Click + Segment data under the Y Axis series and select the version column — this splits the single series into one line per version value.

Metric Computation Versions

Every time Arthur computes metrics — triggered by the dashboard Refresh button or a scheduled job — it creates a new computation version. This is independent of your application's version identifiers.

sequenceDiagram
    participant User
    participant Arthur
    participant TimescaleDB

    User->>Arthur: Refresh metrics (date range)
    Arthur->>Arthur: Queue MetricsCalculation job
    Arthur->>TimescaleDB: Write new rows with incremented version_num
    TimescaleDB->>TimescaleDB: Update max_version_for_timestamp
    Arthur->>User: Last Updated timestamp advances

Key points:

  • Each refresh run increments version_num monotonically.
  • All rows written by that run share the same version_num.
  • The _latest_version views filter automatically to rows where metric_version = max_version_for_timestamp — so you always get the freshest computation by default.
  • Older computation versions remain queryable directly.

The metric_version Column

Both metrics_numeric_latest_version and metrics_sketch_latest_version expose metric_version:

ColumnTypeDescription
metric_versionintegerMonotonically increasing version number. Higher = more recent computation.

You do not need to filter on metric_version for normal dashboard queries — the _latest_version views handle it automatically.

The max_version_for_timestamp Column

Visible in the Preview data pane when querying raw tables. This is the internal value the _latest_version views use to filter rows:

WHERE metric_version = max_version_for_timestamp

Do not reference this column directly in chart queries — use the _latest_version views instead.


Listing Computation Versions via API

import requests

response = requests.get(
    f"https://your-arthur-instance.example.com/api/v1/models/{MODEL_ID}/metrics/versions",
    headers={"Authorization": f"Bearer {API_TOKEN}"},
    params={"page_size": 10, "sort": "created_at", "order": "desc"}
)

for v in response.json()["records"]:
    print(f"v{v['version_num']}  created={v['created_at']}  range={v['range_start']} → {v['range_end']}")
const response = await fetch(
  `https://your-arthur-instance.example.com/api/v1/models/${MODEL_ID}/metrics/versions?page_size=10&sort=created_at&order=desc`,
  { headers: { "Authorization": `Bearer ${API_TOKEN}` } }
);
const { records } = await response.json();
records.forEach(v =>
  console.log(`v${v.version_num}  created=${v.created_at}  range=${v.range_start} → ${v.range_end}`)
);
curl "https://your-arthur-instance.example.com/api/v1/models/{model_id}/metrics/versions?page_size=10&sort=created_at&order=desc" \
  -H "Authorization: Bearer your-api-token"

Response structure:

{
  "records": [
    {
      "version_num": 5,
      "created_at": "2026-04-21T17:05:00Z",
      "updated_at": "2026-04-21T17:05:42Z",
      "scope_model_id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
      "range_start": "2026-04-07T00:00:00Z",
      "range_end": "2026-04-21T17:00:00Z"
    }
  ],
  "pagination": {
    "total_records": 5,
    "total_pages": 1,
    "current_page": 1,
    "page_size": 10
  }
}

Sort parameters: created_at, updated_at, range_start, range_end — with order: asc or desc (default: desc).


Querying a Specific Computation Version

SELECT
    time_bucket('1 day', timestamp) AS bucket,
    SUM(value) AS count
FROM metrics_numeric_latest_version
WHERE model_id      = '{{your-model-uuid}}'
  AND metric_name   = 'inference_count'
  AND metric_version = 3
  AND timestamp BETWEEN '{{dateStart}}' AND '{{dateEnd}}'
GROUP BY bucket
ORDER BY bucket;
⚠️

Filtering on metric_version bypasses the "latest version" logic. Use this only when you intentionally want to compare a specific historical computation run.


When Computation Versions Are Created

TriggerResult
Dashboard Refresh button (manual)New version for the selected date range
Scheduled refresh (Repeat Every or cron expression)New version covering the configured lookback period on each run
Inference data backfillNew version covering the backfilled range

Each version's range_start and range_end indicate the inference time window it covers — not when the computation ran (created_at).


Next Steps

GoalWhere to go
Trigger a metrics refreshDashboard — Refresh button
Write chart SQL with segmentationCustom Graphs
Understand the full metrics schemaMetrics Overview & Data Model
Alert when a metric drops below thresholdAlerts & Alert Rules