Terminology

What are Connectors?

Connectors are extensible resources which connect Arthur Engines to arbitrary Data Sources (data lakes, data warehouses, etc.) which contain Datasets that might be stored and formatted in an arbitrary manner (eg: JSON, CSV, Parquet, etc.). For example:
- S3 buckets containing CSV files
- PostgreSQL database with data stored in tables
- Hive cluster containing parquet-formatted partition files
- Google BigQuery instance with tables
A connector connects to a data source specified by the connector configuration:
- Locator (eg: URL)
- Credentials
- Any other relevant configuration data

What are Metrics?

Metrics are computations that are performed on a dataset that return time-series-formatted results to the Arthur Platform
Metrics are defined by a pipeline:
- Transformations - this allows for mutating the schema of the input dataset and adding/removing/changing columns that might be relevant for calculation
- Aggregations - this allows for the bucketing + aggregation of data which will be shipped to the platform
Metrics can have one of two output types:
- Numeric - the output of the metric is a time series where the values are numbers
- Sketch - the output of the metrics is a Sketch (eg: Distribution) defined by buckets (count, interval)
Metric parameters (eg: which columns a metric uses to compute results) are defined by Schema Tags that are set on a Dataset when it is onboarded to the Arthur Platform
- Some tags are automatically inferred - for example, all columns that are typed with a number type are tagged with the numerical tag, and every metric that consumes a numerical parameter will be run
- Users can provide tags manually - for example, a user can set the llm_prompt column so that metrics which consume LLM prompts will be run
Metric results can be further enriched with dimensions, which are key-value pairs that allow for segmentation, filtering, and further analysis
- Eg: a user can define a segment on the inference count metric which buckets a column “age” into different age buckets ([0, 20, 40, 60, 80])
For example:
- The inference count metric reads from a dataset and performs an aggregation which counts the number of rows within a 5m interval
- The numeric sketch metric reads a numeric column from a dataset and computes a sketch on the numeric column within each 5m interval

What are Guardrails?

Guardrails (rules) are Pass/Fail decisions executed on single inferences that are computed on the output of a Metric using a configured threshold and boundary condition
For example: a Prompt Injection Rule with a threshold of 0.8 will Pass if the metric computed on an inference is less than 0.8 and fail if greater than or equal to 0.8
Inferences are validated by rules by sending the inference to an API endpoint

What are Alerts?

What are Webhooks?