Metrics Query Best Practices & Troubleshooting

Overview

This page covers best practices for writing efficient queries against Arthur's TimescaleDB metrics store and fixes for the most common errors. For query patterns, see Basic Metric Query Patterns and Advanced SQL Patterns & Sketch Functions.

Best Practices

Always Scope Queries to a Time Range

The metrics hypertables can contain months of data. Always include a timestamp >= predicate to allow TimescaleDB to use chunk exclusion — this can reduce query time by orders of magnitude.

-- ✅ Good: time range predicate enables chunk exclusion
WHERE model_id    = '...'
  AND timestamp >= NOW() - INTERVAL '7 days'

-- ❌ Bad: full table scan across all chunks
WHERE model_id = '...'

Use `time_bucket()` for Downsampling

Always use TimescaleDB's time_bucket() function (not date_trunc()) when grouping by time intervals. time_bucket() is optimized for hypertables and supports arbitrary intervals.

-- ✅ Preferred
time_bucket('1 hour', timestamp)

-- ⚠️ Works but not optimized for hypertables
date_trunc('hour', timestamp)

Merge Sketches Before Extracting Quantiles

You cannot apply kll_float_sketch_get_quantile to a column of sketch rows directly — you must aggregate them first with kll_float_sketch_merge. Use a CTE to merge once and reuse the result.

-- ✅ Correct: merge first, then extract quantile
kll_float_sketch_get_quantile(kll_float_sketch_merge(value), 0.95)

-- ❌ Wrong: cannot apply scalar function to unaggregated column
kll_float_sketch_get_quantile(value, 0.95)  -- will error if multiple rows

Handle NULL Dimensions Carefully

Metrics without dimension slices have dimensions = '{}' (an empty JSONB object), not NULL. Accessing a missing key returns NULL:

-- Safe: returns NULL for rows where the key doesn't exist
WHERE dimensions->>'environment' = 'production'
-- This correctly excludes rows where 'environment' key is absent

-- Explicit NULL check if needed
WHERE dimensions->>'environment' IS NOT NULL

Use `NULLIF` to Avoid Division by Zero

When computing ratios or z-scores, protect against zero denominators:

-- ✅ Safe
ABS(value - mean_val) / NULLIF(stddev_val, 0)

-- ❌ Will error if stddev_val = 0
ABS(value - mean_val) / stddev_val

Index Awareness

The metrics views are backed by TimescaleDB hypertables with indexes on (model_id, metric_name, timestamp). Your WHERE clauses should always include model_id and metric_name to use these indexes efficiently. Filtering on dimensions JSONB values is a secondary filter applied after the index scan.

Common Errors and Fixes

"Function kll_float_sketch_merge does not exist"

Cause: The DataSketches PostgreSQL extension is not installed in your database, or you are connected to the wrong database.

Fix: Confirm you are connected to the Arthur metrics database (not a local PostgreSQL instance or a DuckDB connection). Contact your Arthur platform administrator to verify the extension is installed:

SELECT * FROM pg_extension WHERE extname LIKE '%sketch%';

"Column does not exist" on a Sketch View

Cause: Using an incorrect column name. The sketch column in metrics_sketch_latest_version is named value, not sketch.

Fix: Use value in all sketch queries:

-- ✅ Correct
kll_float_sketch_merge(value)

-- ❌ Wrong
kll_float_sketch_merge(sketch)

Wrong View for a Metric Type

Cause: Querying metrics_numeric_latest_version for a sketch metric (or vice versa).

Fix: Use metrics_sketch_latest_version for distributional metrics and metrics_numeric_latest_version for scalar metrics. Run the discovery queries to confirm:

SELECT 'numeric' AS metric_type, metric_name
FROM metrics_numeric_latest_version
WHERE model_id = '{{your-model-uuid}}' AND metric_name = 'your_metric_name'
UNION ALL
SELECT 'sketch', metric_name
FROM metrics_sketch_latest_version
WHERE model_id = '{{your-model-uuid}}' AND metric_name = 'your_metric_name';

"DuckDB syntax not supported" / `approx_quantile` Error

Cause: You have copied a query written for Arthur's custom aggregation pipeline (which uses DuckDB) into the TimescaleDB metrics store.

Fix: Replace DuckDB-specific functions with their TimescaleDB equivalents:

DuckDB (custom aggregations)	TimescaleDB metrics store
`approx_quantile(col, 0.95)`	`kll_float_sketch_get_quantile(kll_float_sketch_merge(value), 0.95)`
`quantile_cont(col, [0.5, 0.95])`	`kll_float_sketch_get_quantiles(kll_float_sketch_merge(value), ARRAY[0.5, 0.95])`
`epoch_ms(ts)`	`EXTRACT(EPOCH FROM ts) * 1000`
`date_diff('day', ts1, ts2)`	`ts2 - ts1` (returns an `INTERVAL`)

Troubleshooting

Query Returns No Rows

Possible causes and fixes:

Wrong model_id: Verify the UUID matches your model exactly (UUIDs are case-sensitive).
Time range too narrow: The most recent data may be up to 5 minutes old (the aggregation interval). Try extending your window: NOW() - INTERVAL '30 minutes'.
Metric name typo: Use the discovery query to list exact metric names — they are case-sensitive strings.
Dimension filter too restrictive: Try removing dimension filters first to confirm the base metric has data, then add filters back.

Alert Rule Query Fires Unexpectedly

Cause: Your HAVING clause condition is inverted, or NOW() is evaluated at query-parse time rather than execution time in some client tools.

Fix: Test your query in a SQL client connected to the metrics database with the exact NOW() substitution Arthur will use. Ensure your HAVING clause returns zero rows when the system is healthy and at least one row when the condition is violated.

Slow Query Performance

Checklist:

✅ model_id is in the WHERE clause
✅ metric_name is in the WHERE clause
✅ timestamp >= predicate is present to enable chunk exclusion
✅ You are using time_bucket() (not date_trunc()) for grouping
✅ For sketch queries, you are merging in a CTE rather than calling kll_float_sketch_merge multiple times in the same SELECT

If queries remain slow after checking the above, run EXPLAIN ANALYZE on your query and share the output with your Arthur platform administrator.