Evals Management

This section describes how to use Evals Management.

Overview

An evaluator is an automated test that scores prompt outputs. Evaluators can check for:

  • Quality metrics (e.g., correctness, relevance, completeness)
  • Safety checks (e.g., toxicity, bias)
  • Custom criteria defined by your team

Each evaluator requires specific input variables, which can come from:

  • Dataset columns: Static values from your test data
  • Experiment output: Values extracted from prompt outputs

Example:

An evaluator called "Answer Correctness" that checks if the prompt's answer matches the expected answer. It requires:

  • response: The prompt's output
  • expected_answer: The correct answer

For each test case, it compares the response to the expected answer and returns a pass/fail score.

Create Evaluator

  1. Navigate to the Evals Management page.

  2. Click "+ New Evaluator" button. It will open configuration form:

  3. Enter eval name or select template or existing one.

  4. Enter instructions. Note: they will be filled automatically if you select template or existing one.

  5. Fill Model Provider and Name.

  6. Click "Save Eval" button.

  7. Once Evaluator is created you will be redirected to it's details page:

  8. From here it's possible to edit evals, add tags to it and delete it: