Batch Ingestion from S3

This guide outlines how to have Arthur ingest data from S3. This is an alternative to using the API or SDK to send files.

Obtain Credentials

As a first step, you will need to have Arthur supply you with an IAM user you can use to authenticate with S3. Please reach out to your main point of contact at Arthur and they can supply you with credentials.

Create Prefix

Files need to be placed in a particular prefix in order to get ingested properly. The format is slightly different for Inference files and Ground Truth files.

Inferences

prod-arthurai-inference-ingest/inference/batch/s3_ingestion/org={org_id}/model={model_id}/batch={batch_id}/file.parquet

Ground Truth

Ground truth prefix doesn’t require batch=, since they are not tied to a specific batch.

prod-arthurai-inference-ingest/ground_truth/batch/s3_ingestion/org={org_id}/model={model_id}/file.parquet

Prefix Parameters

You will need the following things to formulate the prefix to where your files will be saved in S3.

  • org_id

    • The ID for your organization,

    • Can be found as a result of the login API call

    • POST https://app.arthur.ai/api/v3/login {"login": "username", "password": "pw"}

  • model_id

    • The ID for the model you are uploading inferences for

    • Can be found in the URL when viewing a model in the UI

    • Can also be found as a result of GET https://app.arthur.ai/api/v3/models

  • batch_id

    • The ID for the batch you are sending files for

    • This is determined by you

    • Ex: batch_20210112

Upload Inference Files

Once you know what prefix to send files to, the next step is to upload them.

After all files have been uploaded to the prefix, you must upload an empty file, named _SUCCESS, which marks the batch as complete.

Upload Ground Truth Files

For Ground Truth files, all that is required is to simply upload them to the prefix. There is no concept of a batch, and so no _SUCCESS file is necessary.