Anomaly Detection

For an explanation of how to use Anomaly Detection with Arthur, please see the user guide.

Anomaly scores scores are computed by training a model on the reference set you provide to Arthur, and using that model to assign an Anomaly Score to each inference you send to Arthur. Scores of 0.5 are given to “typical” examples from your reference set, while higher scores are given to more anomalous inferences and lower scores are given to instances that the model judges as similar to the reference data with high confidence.

How it works

We calculate Anomaly Scores with an Isolation Forest algorithm. This algorithm works by building what is essentially a density model of the data by iteratively isolating data points from one another. Because anomalies tend to be farther away from other points and occur less frequently, they are easier to isolate from other points, so we can use a data point’s “ease of isolation” to describe its anomaly. The method is based on the paper linked here.

The Isolation Forest “method takes advantage of two [quantitative properties of anomalies]: i) they are the minority consisting of fewer instances and ii) they have attribute-values that are very different from those of normal instances. In other words, anomalies are ‘few and different’, which make them more susceptible to isolation than normal points.”

The idea is to build a binary tree which randomly segments the data until each each instance can be uniquely selected (or a maximum height is reached). Anomalous instances will take fewer steps to become isolated on the tree, because of the properties mentioned above.


In the example above, we can see that the in-distribution \(x_i\) takes many more steps to reach than the out-of-distribution \(x_0\)

Of course using a single randomly generated tree would be noisy. So we train multiple trees to construct an Isolation Forest of multiple trees and use the average path length, noting that average path lengths converge:


The Isolation Forest algorithm is highly efficient compared to other density estimation techniques because the individual trees can be built from samples of the data without losing performance.

When you add a reference set to your model in Arthur, we fit an Isolation Forest model to that data, so that we can compute an anomaly score for the inferences your model receives.

Generating Anomaly Scores

The path length, or number of steps taken to reach the partition that a data point belongs to, varies between \(0\) and \(n-1\) where \(n\) is the number of datapoints in the training set. Following our intuition above, the shorter the path length, the more anomalous a point is. In order to measure anomaly, an anomaly score between \(0\) and \(1\) is generated by normalizing the path length by the average path length and applying an inverse logarithmic scale.

In particular, anomaly score \(s(x,n) = 2^{-\frac{E(h(x))}{c(n)}}\), where \(E(h(x))\) is the average path length to datapoint \(x\)from a collection of isolation trees and \(c(n)\) is the average path length given the number of datapoints in the dataset \(n\).

Every inference you send to Arthur will be evaluated against the trained Isolation Forest model and given an anomaly score.

This can be seen in the anomaly score contour of 64 normally distributed points:


At Arthur, we also visualize the distribution of anomaly scores among all of the inferences your model has retrieved since training. When you select an inference in our UI you’ll see where it falls on this distribution:


Interpreting Anomaly Scores

The resulting anomaly scores can be interpreted in the following way:

  • if instances return \(s\) very close to \(1\), then they are definitely anomalies

  • if instances have \(s\) smaller than \(0.5\), then they are quite safe to be regarded as normal instances

  • if all the instances return \(s \approx 0.5\), then the entire sample does not really have any distinct anomaly