Datasets (Engine)

Overview

To create, version, and manage evaluation datasets in the Arthur Engine so your eval runs are reproducible, you upload a named, versioned dataset to a specific task, then reference that dataset by ID or version in every evaluation run. Because datasets are scoped to a task and immutable once versioned, two runs pointing at the same dataset version will always operate on identical inputs — giving you an apples-to-apples comparison across model changes, prompt updates, or scorer configurations.

This page covers engine-side datasets — task-scoped datasets used to power evaluations and benchmarks in the Arthur Engine. These are distinct from platform-side datasets used for model monitoring. For platform-side datasets, see the Datasets (Platform) page.

flowchart LR
    A["Raw test cases<br>or CSV"] --> B["Create Dataset<br>POST /datasets"]
    B --> C["Named Dataset<br>with dataset_id"]
    C --> D["Add a Version<br>POST /datasets/:id/versions"]
    D --> E["Versioned Dataset<br>v1, v2, ..."]
    E --> F["Reference in<br>Eval Run"]
    F --> G["Reproducible<br>Eval Results"]
📘

Engine base URL

All Engine API calls in this page use http://localhost:3030 as the default base URL. Set ARTHUR_BASE_URL in your environment to override this for staging or production deployments.


Prerequisites

Before you create your first dataset, make sure you have:

  • An Arthur Engine instance running and reachable (default: http://localhost:3030)
  • An API key with DATASET_WRITE permission — set as ARTHUR_API_KEY in your environment or passed directly
  • A task already created in the Engine — datasets are scoped to a task. See Tasks if you need to create one first
  • Your test cases ready — either as a list of column/value pairs, a CSV, or a JSON array

Create a Dataset

The create-then-version workflow is the foundation of reproducible evals. You first register a named dataset under a task, then upload one or more versioned snapshots of the actual data. This separation lets you evolve your dataset over time without breaking references to earlier versions.

Step 1 — Register the dataset name

Call POST /api/v2/tasks/{task_id}/datasets to register a new dataset under your task. This creates the dataset record and returns a dataset_id you'll use in all subsequent calls.

import requests
import os

ARTHUR_BASE_URL = os.environ.get("ARTHUR_BASE_URL", "http://localhost:3030")
ARTHUR_API_KEY = os.environ["ARTHUR_API_KEY"]
TASK_ID = "your-task-id"

headers = {
    "Authorization": f"Bearer {ARTHUR_API_KEY}",
    "Content-Type": "application/json",
}

payload = {
    "name": "customer-support-qa-bench",
    "description": "Golden set of customer support Q&A pairs for regression testing",
}

response = requests.post(
    f"{ARTHUR_BASE_URL}/api/v2/tasks/{TASK_ID}/datasets",
    json=payload,
    headers=headers,
)
response.raise_for_status()
dataset = response.json()
dataset_id = dataset["id"]
print(f"Created dataset: {dataset_id}")
const ARTHUR_BASE_URL = process.env.ARTHUR_BASE_URL ?? "http://localhost:3030";
const ARTHUR_API_KEY = process.env.ARTHUR_API_KEY;
const TASK_ID = "your-task-id";

const response = await fetch(
  `${ARTHUR_BASE_URL}/api/v2/tasks/${TASK_ID}/datasets`,
  {
    method: "POST",
    headers: {
      Authorization: `Bearer ${ARTHUR_API_KEY}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      name: "customer-support-qa-bench",
      description: "Golden set of customer support Q&A pairs for regression testing",
    }),
  }
);

if (!response.ok) throw new Error(`HTTP ${response.status}`);
const dataset = await response.json();
console.log("Created dataset:", dataset.id);
curl -X POST http://localhost:3030/api/v2/tasks/{task_id}/datasets \
  -H "Authorization: Bearer $ARTHUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "customer-support-qa-bench",
    "description": "Golden set of customer support Q&A pairs for regression testing"
  }'

The response includes the dataset_id you'll need for the next step:

{
  "id": "ds_01hx9z3k2m4n5p6q7r8s9t0u",
  "task_id": "your-task-id",
  "name": "customer-support-qa-bench",
  "description": "Golden set of customer support Q&A pairs for regression testing",
  "created_at": 1717243200000,
  "updated_at": 1717243200000,
  "latest_version_number": null
}

Step 2 — Upload your first version

With the dataset_id in hand, upload your actual test cases as the first version. Each version is an immutable snapshot of the data at that point in time.

Rows use a column/value format: each row is an object with a data array of {column_name, column_value} pairs. You define your own column names — there are no reserved field names.

import requests
import os

ARTHUR_BASE_URL = os.environ.get("ARTHUR_BASE_URL", "http://localhost:3030")
ARTHUR_API_KEY = os.environ["ARTHUR_API_KEY"]
DATASET_ID = "ds_01hx9z3k2m4n5p6q7r8s9t0u"

headers = {
    "Authorization": f"Bearer {ARTHUR_API_KEY}",
    "Content-Type": "application/json",
}

payload = {
    "rows_to_add": [
        {
            "data": [
                {"column_name": "input", "column_value": "How do I reset my password?"},
                {"column_name": "expected_output", "column_value": "Visit account settings and click 'Forgot Password'."},
                {"column_name": "category", "column_value": "account"},
            ]
        },
        {
            "data": [
                {"column_name": "input", "column_value": "What is your refund policy?"},
                {"column_name": "expected_output", "column_value": "We offer a 30-day money-back guarantee on all plans."},
                {"column_name": "category", "column_value": "billing"},
            ]
        },
    ]
}

response = requests.post(
    f"{ARTHUR_BASE_URL}/api/v2/datasets/{DATASET_ID}/versions",
    json=payload,
    headers=headers,
)
response.raise_for_status()
version = response.json()
print(f"Created version: {version['version_number']}")
const ARTHUR_BASE_URL = process.env.ARTHUR_BASE_URL ?? "http://localhost:3030";
const ARTHUR_API_KEY = process.env.ARTHUR_API_KEY;
const DATASET_ID = "ds_01hx9z3k2m4n5p6q7r8s9t0u";

const response = await fetch(
  `${ARTHUR_BASE_URL}/api/v2/datasets/${DATASET_ID}/versions`,
  {
    method: "POST",
    headers: {
      Authorization: `Bearer ${ARTHUR_API_KEY}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      rows_to_add: [
        {
          data: [
            { column_name: "input", column_value: "How do I reset my password?" },
            { column_name: "expected_output", column_value: "Visit account settings and click 'Forgot Password'." },
            { column_name: "category", column_value: "account" },
          ],
        },
        {
          data: [
            { column_name: "input", column_value: "What is your refund policy?" },
            { column_name: "expected_output", column_value: "We offer a 30-day money-back guarantee on all plans." },
            { column_name: "category", column_value: "billing" },
          ],
        },
      ],
    }),
  }
);

if (!response.ok) throw new Error(`HTTP ${response.status}`);
const version = await response.json();
console.log("Created version:", version.version_number);
curl -X POST http://localhost:3030/api/v2/datasets/{dataset_id}/versions \
  -H "Authorization: Bearer $ARTHUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "rows_to_add": [
      {
        "data": [
          {"column_name": "input", "column_value": "How do I reset my password?"},
          {"column_name": "expected_output", "column_value": "Visit account settings and click Forgot Password."},
          {"column_name": "category", "column_value": "account"}
        ]
      }
    ]
  }'
📘

Row schema

Each row is a list of {column_name, column_value} pairs. You define the column names — use whatever makes sense for your eval (e.g. input, expected_output, context, category). All values are strings. Column names are inferred from the first version and carried forward.

UI: Create a dataset

In the Arthur dashboard, open your task and navigate to Datasets in the left navigation. Click + New Dataset, enter a name and optional description, then click Create.

Once created, open the dataset and either:

  • Import CSV — upload a CSV file; Arthur auto-detects the delimiter and maps columns
  • Add rows manually — enter values cell by cell in the table editor
  • Generate synthetic data — use the AI generation flow (see Generate Synthetic Data)

When you're done editing, click Save as New Version to commit the snapshot.


Version Your Datasets

Versioning is what makes eval runs reproducible. Every time you save a new snapshot of your test cases, the Engine assigns it an incrementing integer version number (1, 2, 3, …). Older versions are never modified or deleted when you add a new one.

How versioning works

flowchart TD
    DS["Dataset: customer-support-qa-bench<br>dataset_id: ds_01hx..."]
    DS --> V1["Version 1<br>50 rows — initial golden set<br>created: 2024-06-01"]
    DS --> V2["Version 2<br>75 rows — added billing cases<br>created: 2024-07-15"]
    DS --> V3["Version 3<br>75 rows — corrected 3 expected outputs<br>created: 2024-08-02"]
    V1 --> R1["Eval Run A<br>model: gpt-4o-mini"]
    V2 --> R2["Eval Run B<br>model: gpt-4o-mini"]
    V3 --> R3["Eval Run C<br>model: gpt-4-turbo"]

Because each eval run records the exact dataset_id + version_number it used, you can always re-run any historical configuration and get the same inputs.

UI: Browse version history

From the dataset detail view (Evals → Dataset → click a dataset), click the Versions button in the top-right header. This opens a side drawer listing all versions with timestamps. Click any version to switch to it — the row table updates to show that version's contents. The current version number is reflected in the URL as ?version=<number>.

Add a new version

New versions support incremental updates: add rows, delete rows by ID or by filter, and update existing rows — all in a single call. The Engine creates a new immutable snapshot reflecting the result.

import requests, os, json

ARTHUR_BASE_URL = os.environ.get("ARTHUR_BASE_URL", "http://localhost:3030")
ARTHUR_API_KEY = os.environ["ARTHUR_API_KEY"]
DATASET_ID = "ds_01hx9z3k2m4n5p6q7r8s9t0u"

with open("new_cases.json") as f:
    new_rows = json.load(f)  # list of {"data": [{column_name, column_value}, ...]}

response = requests.post(
    f"{ARTHUR_BASE_URL}/api/v2/datasets/{DATASET_ID}/versions",
    json={
        "rows_to_add": new_rows,
        "rows_to_delete": ["row-id-to-remove"],         # optional
        "rows_to_delete_filter": [                       # optional: delete by column value
            {"column_name": "category", "column_value": "deprecated"}
        ],
    },
    headers={
        "Authorization": f"Bearer {ARTHUR_API_KEY}",
        "Content-Type": "application/json",
    },
)
response.raise_for_status()
version = response.json()
print(f"New version: {version['version_number']}")
const ARTHUR_BASE_URL = process.env.ARTHUR_BASE_URL ?? "http://localhost:3030";
const ARTHUR_API_KEY = process.env.ARTHUR_API_KEY;
const DATASET_ID = "ds_01hx9z3k2m4n5p6q7r8s9t0u";

const response = await fetch(
  `${ARTHUR_BASE_URL}/api/v2/datasets/${DATASET_ID}/versions`,
  {
    method: "POST",
    headers: {
      Authorization: `Bearer ${ARTHUR_API_KEY}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      rows_to_add: newRows,
      rows_to_delete: ["row-id-to-remove"],
    }),
  }
);

if (!response.ok) throw new Error(`HTTP ${response.status}`);
const version = await response.json();
console.log("New version:", version.version_number);
curl -X POST http://localhost:3030/api/v2/datasets/{dataset_id}/versions \
  -H "Authorization: Bearer $ARTHUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "rows_to_add": [...],
    "rows_to_delete": ["row-id-to-remove"]
  }'

List all versions of a dataset

import requests, os

ARTHUR_BASE_URL = os.environ.get("ARTHUR_BASE_URL", "http://localhost:3030")
ARTHUR_API_KEY = os.environ["ARTHUR_API_KEY"]
DATASET_ID = "ds_01hx9z3k2m4n5p6q7r8s9t0u"

response = requests.get(
    f"{ARTHUR_BASE_URL}/api/v2/datasets/{DATASET_ID}/versions",
    headers={"Authorization": f"Bearer {ARTHUR_API_KEY}"},
)
response.raise_for_status()
for v in response.json()["versions"]:
    print(f"v{v['version_number']} — {v['column_names']} — created {v['created_at']}")
const response = await fetch(
  `${ARTHUR_BASE_URL}/api/v2/datasets/${DATASET_ID}/versions`,
  { headers: { Authorization: `Bearer ${ARTHUR_API_KEY}` } }
);

if (!response.ok) throw new Error(`HTTP ${response.status}`);
const { versions } = await response.json();
versions.forEach((v) =>
  console.log(`v${v.version_number} — created ${v.created_at}`)
);
curl "http://localhost:3030/api/v2/datasets/{dataset_id}/versions" \
  -H "Authorization: Bearer $ARTHUR_API_KEY"
⚠️

Versions are immutable

Once a version is created, its rows cannot be edited. To correct a mistake, create a new version with the corrected rows and update your eval runs to reference the new version number.


Browse and Inspect

Once you have datasets and versions, you'll want to list them, inspect their contents, and confirm the right data is in place before kicking off an eval run.

UI: Browse datasets

Navigate to Evals → Dataset in the left sidebar. The list view shows all datasets for the current task with name, latest version number, and last updated time. Use the search bar to filter by name.

Search datasets for a task

import requests, os

ARTHUR_BASE_URL = os.environ.get("ARTHUR_BASE_URL", "http://localhost:3030")
ARTHUR_API_KEY = os.environ["ARTHUR_API_KEY"]
TASK_ID = "your-task-id"

response = requests.get(
    f"{ARTHUR_BASE_URL}/api/v2/tasks/{TASK_ID}/datasets/search",
    params={"page": 0, "page_size": 20},
    headers={"Authorization": f"Bearer {ARTHUR_API_KEY}"},
)
response.raise_for_status()
for ds in response.json()["datasets"]:
    print(f"{ds['name']} — id: {ds['id']} — latest version: {ds['latest_version_number']}")
const response = await fetch(
  `${ARTHUR_BASE_URL}/api/v2/tasks/${TASK_ID}/datasets/search?page=0&page_size=20`,
  { headers: { Authorization: `Bearer ${ARTHUR_API_KEY}` } }
);

if (!response.ok) throw new Error(`HTTP ${response.status}`);
const { datasets } = await response.json();
datasets.forEach((ds) =>
  console.log(`${ds.name} — id: ${ds.id} — latest version: ${ds.latest_version_number}`)
);
curl "http://localhost:3030/api/v2/tasks/{task_id}/datasets/search?page=0&page_size=20" \
  -H "Authorization: Bearer $ARTHUR_API_KEY"

You can filter by name substring or specific IDs:

# Filter by name
response = requests.get(
    f"{ARTHUR_BASE_URL}/api/v2/tasks/{TASK_ID}/datasets/search",
    params={"name": "qa-bench", "sort_by": "updated_at", "sort_order": "desc"},
    headers={"Authorization": f"Bearer {ARTHUR_API_KEY}"},
)

Fetch a specific version's rows

Use GET /api/v2/datasets/{dataset_id}/versions/{version_number} to retrieve the contents of a specific version. Pass the integer version number, or use the latest_version_only=true query parameter on the versions list to get the most recent snapshot.

import requests, os

ARTHUR_BASE_URL = os.environ.get("ARTHUR_BASE_URL", "http://localhost:3030")
ARTHUR_API_KEY = os.environ["ARTHUR_API_KEY"]
DATASET_ID = "ds_01hx9z3k2m4n5p6q7r8s9t0u"
VERSION = 1  # integer version number

response = requests.get(
    f"{ARTHUR_BASE_URL}/api/v2/datasets/{DATASET_ID}/versions/{VERSION}",
    headers={"Authorization": f"Bearer {ARTHUR_API_KEY}"},
)
response.raise_for_status()
data = response.json()
print(f"Version {data['version_number']} — columns: {data['column_names']}")
for row in data["rows"][:3]:
    print(row["data"])
const response = await fetch(
  `${ARTHUR_BASE_URL}/api/v2/datasets/${DATASET_ID}/versions/${VERSION}`,
  { headers: { Authorization: `Bearer ${ARTHUR_API_KEY}` } }
);

if (!response.ok) throw new Error(`HTTP ${response.status}`);
const data = await response.json();
console.log(`Version ${data.version_number} — columns: ${data.column_names}`);
data.rows.slice(0, 3).forEach((row) => console.log(row.data));
curl "http://localhost:3030/api/v2/datasets/{dataset_id}/versions/{version_number}" \
  -H "Authorization: Bearer $ARTHUR_API_KEY"

To get the latest version without knowing its number:

response = requests.get(
    f"{ARTHUR_BASE_URL}/api/v2/datasets/{DATASET_ID}/versions",
    params={"latest_version_only": "true"},
    headers={"Authorization": f"Bearer {ARTHUR_API_KEY}"},
)
latest = response.json()["versions"][0]
print(f"Latest is v{latest['version_number']}")
📘

Tip: Pin versions in production

During active development, fetching with latest_version_only=true means you always test against the most current dataset. Once a benchmark is stable, pin to a specific version number so results remain comparable across runs.


Generate Synthetic Data

If your team doesn't have labeled examples yet, Arthur can generate synthetic test cases using an LLM.

UI: Generate synthetic data

From the dataset detail view (Evals → Dataset → click a dataset), click the Generate button in the header. This opens a two-phase modal:

  1. Configure — describe the dataset's purpose, define each column, set the number of rows (max 25), and choose a model
  2. Canvas — a chat interface where you review the generated rows and send follow-up messages to refine them (add more rows, adjust outputs, change categories, etc.)

When satisfied, confirm to add the rows to the current dataset. Save as a new version to commit them. Synthetic generation is tied to an existing dataset version — you describe your dataset's purpose and columns, and Arthur generates rows you can review before committing them as a new version.

Generation is conversational: you can send follow-up messages to refine, add, or remove rows before saving.

Start a generation session

import requests, os

ARTHUR_BASE_URL = os.environ.get("ARTHUR_BASE_URL", "http://localhost:3030")
ARTHUR_API_KEY = os.environ["ARTHUR_API_KEY"]
DATASET_ID = "ds_01hx9z3k2m4n5p6q7r8s9t0u"
VERSION = 1  # version to base generation on

payload = {
    "dataset_purpose": "Customer support Q&A pairs for testing an AI assistant's ability to handle account and billing questions",
    "column_descriptions": [
        {"column_name": "input", "description": "A customer question about account management or billing"},
        {"column_name": "expected_output", "description": "The correct, concise answer an agent should give"},
        {"column_name": "category", "description": "One of: account, billing, cancellation"},
    ],
    "num_rows": 10,           # max 25 per request
    "model_provider": "openai",
    "model_name": "gpt-4o",
}

response = requests.post(
    f"{ARTHUR_BASE_URL}/api/v2/datasets/{DATASET_ID}/versions/{VERSION}/generate-synthetic",
    json=payload,
    headers={
        "Authorization": f"Bearer {ARTHUR_API_KEY}",
        "Content-Type": "application/json",
    },
)
response.raise_for_status()
result = response.json()
print(f"Generated {len(result['rows'])} rows")
print(result["assistant_message"]["content"])
curl -X POST \
  "http://localhost:3030/api/v2/datasets/{dataset_id}/versions/{version_number}/generate-synthetic" \
  -H "Authorization: Bearer $ARTHUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "dataset_purpose": "Customer support Q&A pairs for testing an AI assistant",
    "column_descriptions": [
      {"column_name": "input", "description": "A customer question"},
      {"column_name": "expected_output", "description": "The correct answer"}
    ],
    "num_rows": 10,
    "model_provider": "openai",
    "model_name": "gpt-4o"
  }'

Refine with follow-up messages

# Continue the conversation to refine the generated rows
refine_response = requests.post(
    f"{ARTHUR_BASE_URL}/api/v2/datasets/{DATASET_ID}/versions/{VERSION}/generate-synthetic/message",
    json={
        "message": "Add 5 more rows focused on password reset edge cases, and make the expected outputs more concise",
        "current_rows": result["rows"],
        "conversation_history": [result["assistant_message"]],
        "model_provider": "openai",
        "model_name": "gpt-4o",
    },
    headers={
        "Authorization": f"Bearer {ARTHUR_API_KEY}",
        "Content-Type": "application/json",
    },
)
refined = refine_response.json()
print(f"Rows added: {len(refined['rows_added'])}, modified: {len(refined['rows_modified'])}")

Promote synthetic rows into a version

The generation endpoints return rows but do not automatically create a dataset version. Review the output, then commit it:

curated_rows = refined["rows"]  # inspect and filter as needed

version_response = requests.post(
    f"{ARTHUR_BASE_URL}/api/v2/datasets/{DATASET_ID}/versions",
    json={"rows_to_add": curated_rows},
    headers={
        "Authorization": f"Bearer {ARTHUR_API_KEY}",
        "Content-Type": "application/json",
    },
)
version_response.raise_for_status()
print(f"Saved as version {version_response.json()['version_number']}")
🚧

Review synthetic data before using it in benchmarks

Synthetic examples can contain factual errors or unrealistic edge cases. Always review a sample before promoting synthetic rows into a dataset version you'll use for official benchmarks.


Update and Delete

UI: Update or delete a dataset

From the Evals → Dataset list view, each row has action buttons:

  • Edit icon — opens the edit modal to update the dataset name or description
  • Delete icon — shows a confirmation dialog, then permanently deletes the dataset and all its versions

Update dataset metadata

You can update a dataset's name, description, or metadata at any time without affecting its versions or any eval runs that reference it by ID.

import requests, os

ARTHUR_BASE_URL = os.environ.get("ARTHUR_BASE_URL", "http://localhost:3030")
ARTHUR_API_KEY = os.environ["ARTHUR_API_KEY"]
DATASET_ID = "ds_01hx9z3k2m4n5p6q7r8s9t0u"

response = requests.patch(
    f"{ARTHUR_BASE_URL}/api/v2/datasets/{DATASET_ID}",
    json={"description": "Golden set v2 — expanded to 75 cases including billing edge cases"},
    headers={
        "Authorization": f"Bearer {ARTHUR_API_KEY}",
        "Content-Type": "application/json",
    },
)
response.raise_for_status()
print("Dataset updated:", response.json())
const response = await fetch(
  `${ARTHUR_BASE_URL}/api/v2/datasets/${DATASET_ID}`,
  {
    method: "PATCH",
    headers: {
      Authorization: `Bearer ${ARTHUR_API_KEY}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      description: "Golden set v2 — expanded to 75 cases including billing edge cases",
    }),
  }
);

if (!response.ok) throw new Error(`HTTP ${response.status}`);
console.log("Dataset updated:", await response.json());
curl -X PATCH http://localhost:3030/api/v2/datasets/{dataset_id} \
  -H "Authorization: Bearer $ARTHUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"description": "Golden set v2 — expanded to 75 cases including billing edge cases"}'

Delete a dataset

Deleting a dataset removes it and all its versions permanently. Eval runs that previously referenced this dataset will retain their recorded results, but you will no longer be able to re-run them against the original data.

response = requests.delete(
    f"{ARTHUR_BASE_URL}/api/v2/datasets/{DATASET_ID}",
    headers={"Authorization": f"Bearer {ARTHUR_API_KEY}"},
)
response.raise_for_status()
print("Dataset deleted.")
const response = await fetch(
  `${ARTHUR_BASE_URL}/api/v2/datasets/${DATASET_ID}`,
  {
    method: "DELETE",
    headers: { Authorization: `Bearer ${ARTHUR_API_KEY}` },
  }
);

if (!response.ok) throw new Error(`HTTP ${response.status}`);
console.log("Dataset deleted.");
curl -X DELETE "http://localhost:3030/api/v2/datasets/{dataset_id}" \
  -H "Authorization: Bearer $ARTHUR_API_KEY"

Deletion is permanent

There is no soft-delete or recycle bin. If you need to retire a dataset without losing access to its data, consider updating its name to include a [DEPRECATED] prefix instead of deleting it.


Next Steps

Now that you have a named, versioned dataset, you're ready to put it to work:

What to do nextWhere to go
Run an evaluation against your datasetEvaluation Runs
Configure scorers to grade your model's outputsScorers
Set up automated eval pipelines in CICI/CD Integration
Manage tasks that datasets are scoped toTasks
Learn about platform-side datasets for model monitoringDatasets (Platform)

Summary of what you did on this page:

  1. Registered a named dataset under a task with POST /api/v2/tasks/{task_id}/datasets
  2. Uploaded test cases as version 1 with POST /api/v2/datasets/{dataset_id}/versions
  3. Learned how to add new versions with incremental row updates
  4. Searched and inspected dataset contents via GET /api/v2/tasks/{task_id}/datasets/search
  5. Generated synthetic test cases using the conversational generation API
  6. Updated dataset metadata and deleted datasets when needed