Prompt Experiments Workflow

This section describes how to use Prompt Experiments.

Common Workflows

Workflow 1: Create and Iterate Experiment from Scratch

This workflow is ideal when you're starting fresh and want to test new prompts.

Step 1: Create a New Experiment

  1. Navigate to the Prompt Experiments page for your task

  2. Click "Create Experiment". It will open the configuration form.

  3. Enter a name and optional description

Step 2: Select Prompts

  1. Choose a saved prompt from the dropdown

  2. Select one or more versions of that prompt to test (e.g., Prompt 1 v1, Prompt 1 v2, Prompt 1 v3)

Step 3: Select Dataset

  1. Choose a dataset from the dropdown

  2. Select a dataset version

  3. (Optional) Add row filters to test on a subset

Step 4: Select Evaluators

  1. Add one or more evaluators to score prompt outputs

  2. Select the version of each evaluator you want to use. Then click "+ ADD"

Step 5: Map Prompt Variables

  1. For each variable required by your prompts, select the corresponding dataset column

  2. The system validates that all required variables are mapped

Step 6: Configure Evaluators

  1. For each evaluator, map its required variables:
    • Map dataset columns to evaluator variables

    • Or map prompt outputs to evaluator variables (using JSON paths)

Step 7: Create Experiment

  1. Review your configuration summary

  2. Click "Create Experiment" to start execution

  3. The experiment runs asynchronously - results will populate automatically, and you can navigate away and return later

Step 8: Analyze Results

  1. View the experiment detail page to see results as they populate

  2. Review summary statistics and per-prompt performance once complete

  3. Click on individual prompts to see detailed results

  4. Click on test cases to see inputs, outputs, and evaluation scores

Step 9: Iterate

  1. Clone the experiment by clicking button "Copy to new experiment" to create a new version with modifications.


  2. Or create a new experiment based on what you learned

Workflow 2: Load Existing Experiment Configuration with New Prompts

This workflow is useful when you want to test new prompts using a proven experiment setup.

Step 1: Start from Existing Experiment

  1. Navigate to an Prompt Experiments page.

  2. Click "Create from Existing" button.

  3. Select an existing experiment based on which new experiment should be created.

  4. The experiment configuration will be pre-filled

Step 2: Modify Prompts

  1. Add new prompts (saved or unsaved) to test alongside existing ones
  2. Remove prompts you no longer want to test
  3. Keep the same dataset, evaluators, and mappings

Step 3: Adjust as Needed

  1. Review variable mappings (may need updates if new prompts have different variables)
  2. Verify evaluator configurations still make sense
  3. Update experiment name and description

Step 4: Run and Compare

  1. Create the new experiment
  2. Compare results with the original experiment to see how new prompts perform

Workflow 3: Deep Dive into a Single Prompt

This workflow helps you understand how a specific prompt performs across all test cases.

Step 1: Open Experiment Results

  1. Navigate to a completed experiment
  2. Review the summary statistics

Step 2: Select a Prompt

  1. Click on a prompt card in the summary view to see detailed results for all test cases
  2. Or click "Open in Notebook" on the prompt card to open the prompt in a notebook with the experiment configuration (dataset, evaluators, mappings) pre-loaded, allowing you to iterate directly

Step 3: Analyze Performance

  1. Review evaluation performance metrics (pass rates, scores)
  2. Browse test case results in the table
  3. Use pagination to navigate through all test cases

Step 4: Inspect Individual Test Cases

  1. Click on a test case row to see full details
  2. Review:
    • Input variables used
    • Rendered prompt (with variables filled in)
    • Prompt output (content, tool calls, cost)
    • Evaluation results (scores, explanations)

Step 5: Identify Patterns

  1. Filter or search for specific patterns
  2. Look for common failure modes
  3. Identify inputs where the prompt excels or struggles

Step 6: Take Action

  1. Open in Notebook: Click the "Open in Notebook" button on the prompt card to open the prompt in a notebook with the existing experiment configuration pre-loaded. This allows you to iterate on the prompt directly while preserving the dataset, evaluators, and variable mappings from the experiment.
  2. Use insights to refine prompts
  3. Create new experiments to test improvements
  4. Document findings for your team

Understanding Experiment Status

Experiments progress through several statuses:

  • Queued: Experiment is waiting to start execution
  • Running: Experiment is actively executing prompts and evaluations
  • Completed: All test cases finished successfully
  • Failed: Experiment encountered an error and stopped

Individual test cases also have statuses:

  • Queued: Waiting to be processed
  • Running: Prompt execution in progress
  • Evaluating: Evaluations running on prompt output
  • Completed: Test case finished successfully
  • Failed: Test case encountered an error

You can monitor progress in real-time - the UI auto-refreshes while experiments are running.

Best Practices

  • Iterate Incrementally: Make small changes and test them systematically rather than large overhauls
  • Compare Systematically: Test multiple prompt versions in the same experiment for fair comparison
  • Review Explanations: Don't just look at pass/fail - read evaluator explanations to understand why prompts succeed or fail
  • Document Findings: Use experiment descriptions to note what changed and what you learned

Relationship to Notebooks

Prompt Experiments can be created from and linked to Prompt Notebooks. Notebooks provide a workspace for iterating on prompt configurations before running experiments. Key points:

  • Notebooks are draft workspaces where you can develop and test prompts
  • Experiments are formal runs that test prompts against datasets
  • You can create experiments from notebook configurations
  • Experiments can be linked back to notebooks for organization