Supported Connectors
The Arthur platform supports the following connectors:
Google Cloud:
AWS:
Arthur:
Connector Details
This section describes how to configure each connector, and what the required permissions are to use it in the external system. Additionally, it documents the dataset locator for each connector, which represents how datasets are configured within a connector.
BigQuery
The BigQuery connector allows users to monitor models whose data resides in BigQuery datasets.
Permissions
The connector requires the following roles in the GCP project:
BigQuery Data Viewer
- allows the connector to list the datasets and tables in the projectBigQuery Job User
- allows the connector to read data in the project's tables
Configuration
The connector supports the following configuration:
Project ID
(required) - the id string of the GCP projectCredentials
(optional) - users can upload a JSON GCP Service Account credential the connector will use.
If none are provided, the connector will use the runtime environment to look them up.Location
(optional) - the GCP location to use when communicating with the BigQuery API
Dataset Locator
Dataset ID
(required) - ID of the dataset in BigQuery. Do not include the project ID as that is already set in the
connector configuration.Table Name
(required) - Name of the table in the BigQuery dataset.
Google Cloud Storage Bucket (GCS)
The GCS connector allows users to monitor models whose data resides in GCS buckets. Today it supports both Parquet and JSON file formats.
Permissions
The connector requires the following roles on the bucket:
Storage Object Viewer
- allows the connector to list and read the objects in the bucket
Configuration
The connector supports the following configuration:
Project ID
(required) - the id string of the GCP projectBucket
(required) - the name of the GCS bucketCredentials
(optional) - users can upload a JSON GCP Service Account credential the connector will use.
If none are provided, the connector will use the runtime environment to look them up.Location
(optional) - the GCP location to use when communicating with the GCS API
Dataset Locator
File Prefix
(required) - The prefix of the dataset's files in the bucket. This prefix should
include strftime placeholders for year, month, day, and optionally, hour. The Arthur platform uses this format string to efficiently look up data for specific time ranges when calculating metrics. An example value might be:/year=%Y/month=%m/day=%d/hour=%H/
. Note, do not include the bucket name in the prefix, as it is already set in the connector configuration.File Type
(required) - One ofjson
orparquet
. Specifies the format of the data files in the dataset.File Suffix
(optional) - If there are multiple kinds of files in the prefix, this option can be used to filter out
files based on a suffix regex. For example, to limit the results to only files ending in.json
, set this value to.*\.json
. The regex syntax for this option follows Python's re library. If not set, no filtering is applied.Timestamp time zone
(optional) - The time zone to use when populating the file prefix time placeholders.
Defaults to UTC.
S3 Bucket
The S3 connector allows users to monitor models whose data resides in S3 buckets. Today it supports both Parquet and JSON file formats.
Permissions
The connector requires the following permissions policy on the bucket:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "SID",
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:GetObjectVersion",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::<bucket>/*",
"arn:aws:s3:::<bucket>"
]
}
]
}
Configuration
The connector supports the following configuration:
Bucket
(required) - the name of the GCS bucketAccess Key ID
(optional) - the AWS Access Key ID. Only needed if using access key authentication with AWS. If this
is not set, the connector will attempt to load credentials from its runtime environment.Secret Access Key
(optional) - the AWS Secret Access Key. Only needed if using access key authentication with AWS.
If this is not set, the connector will attempt to load credentials from its runtime environment.Role ARN
(optional) - set if accessing the bucket requires assuming a role. If this is not set, the connector will
use access keys, or attempt to load credentials from its runtime environment.External ID
(optional) - if using the assume role option, it is recommended to set an External ID in the role's
trust policy to prevent the confused deputy problem.Role Duration Seconds
(optional) - if using the assume role option, this specifies how long the session is valid.
It defaults to 3600, one hour, but some role policies require it to be smaller.AWS Region
(optional) - the AWS region where the bucket resides
Dataset Locator
File Prefix
(required) - The prefix of the dataset's files in the bucket. This prefix should
include strftime placeholders for year, month, day, and optionally, hour. The Arthur platform uses this format string to efficiently look up data for specific time ranges when calculating metrics. An example value might be:/year=%Y/month=%m/day=%d/hour=%H/
. Note, do not include the bucket name in the prefix, as it is already set in the connector configuration.File Type
(required) - One ofjson
orparquet
. Specifies the format of the data files in the dataset.File Suffix
(optional) - If there are multiple kinds of files in the prefix, this option can be used to filter out
files based on a suffix regex. For example, to limit the results to only files ending in.json
, set this value to.*\.json
. The regex syntax for this option follows Python's re library. If not set, no filtering is applied.Timestamp time zone
(optional) - The time zone to use when populating the file prefix time placeholders.
Defaults to UTC.
Arthur Shield Instance
The Arthur Shield connector allows users to monitor Generative AI models as tasks in an Arthur Shield instance.
Permissions
The connector requires the following role in the Shield API Key:
ORG-AUDITOR
- allows the connector to list the tasks and read task inferences
Configuration
The connector supports the following configuration:
Endpoint
(required) - the url host for the Arthur Shield instance. e.g.https://shield.arthur.ai
API Key
(required) - the string API key for Shield with at least theORG-AUDITOR
role
Dataset Locator
Task ID
(required) - the UUID of the task to be monitored in Shield
Updated about 2 months ago