Arthur + CrewAI

How does a developer instrument a CrewAI application with Arthur in under 10 minutes? You install the SDK with the crewai extra, initialize an Arthur instance, and call arthur.instrument_crewai() before creating any agents or crews. This single method call auto-instruments all CrewAI agent actions, tool calls, and LLM invocations — no manual span creation required. Traces flow automatically to the Arthur platform where you can inspect every step of your multi-agent workflows.


Overview

CrewAI is a framework for orchestrating autonomous AI agents that collaborate on complex tasks. Arthur's CrewAI integration uses OpenInference auto-instrumentation to capture the full execution tree — from the top-level crew kickoff down to individual LLM completions — as structured OpenTelemetry traces. Once instrumented, you get full visibility into:

  • Agent actions — every agent invocation with role, goal, and backstory
  • Tool calls — input arguments and return values for each tool
  • LLM completions — prompts, responses, model parameters, and token counts
  • Crew orchestration — sequential or hierarchical agent execution flow
  • Session and user context — group traces by conversation or end-user
sequenceDiagram
    participant App as Your Application
    participant SDK as Arthur SDK
    participant Crew as CrewAI
    participant Engine as Arthur GenAI Engine

    App->>SDK: arthur.instrument_crewai()
    Note over SDK: Auto-instrumentation enabled
    App->>Crew: crew.kickoff()
    Crew-->>App: Result
    SDK->>Engine: Trace (spans, attributes)
    Note over Engine: Traces visible in dashboard

Prerequisites:

  • Python 3.10+
  • An Arthur GenAI Engine instance (cloud or local)
  • An Arthur API key — see API Keys to create one

Installation

Install the Arthur Observability SDK with the crewai extra:

pip install "arthur-observability-sdk[crewai]"

This pulls in openinference-instrumentation-crewai and its dependencies automatically.


Initialize Arthur

Create a single Arthur instance at application startup.

from arthur_observability_sdk import Arthur

arthur = Arthur(
    api_key="your-api-key",        # or set ARTHUR_API_KEY env var
    base_url="https://your-arthur-engine-instance",  # or set ARTHUR_BASE_URL env var
    task_id="<your-task-uuid>",    # Arthur task UUID
    service_name="my-crewai-app",
)
ParameterDescription
api_keyYour Arthur Engine API key. Falls back to ARTHUR_API_KEY env var.
base_urlBase URL of your Arthur GenAI Engine. Falls back to ARTHUR_BASE_URL env var, then http://localhost:3030.
task_idArthur task UUID for associating traces with a specific task.
service_nameOpenTelemetry service.name resource attribute. Used to identify your application in the Arthur dashboard. Creates a new task based on service_name if task_id isn't specified.
📘

At least one of task_id or service_name must be provided. A new task with the service_name will be created if task_id is not specified.

⚠️

Use environment variables for secrets. Set ARTHUR_API_KEY and ARTHUR_BASE_URL as environment variables (e.g., in a .env file) rather than hardcoding them in your application.


Instrument CrewAI

Call arthur.instrument_crewai() before you create any Agent, Task, or Crew objects. The instrumentor patches CrewAI classes at import time, so any objects created after this call are automatically traced.

from crewai import Agent, Task, Crew, Process
from arthur_observability_sdk import Arthur

# 1. Initialize Arthur
arthur = Arthur(
    api_key="your-api-key",
    base_url="https://your-arthur-engine-instance",
    task_id="<your-task-uuid>",
    service_name="my-crewai-app",
)

# 2. Instrument CrewAI — MUST come before creating agents/crews
arthur.instrument_crewai()

# 3. Now define your agents and tasks
researcher = Agent(
    role="Senior Researcher",
    goal="Uncover insights on a given topic",
    backstory="You're a meticulous analyst.",
)

research_task = Task(
    description="Research the latest in AI observability.",
    expected_output="A short report.",
    agent=researcher,
)

crew = Crew(
    agents=[researcher],
    tasks=[research_task],
    process=Process.sequential,
)

# 4. Run the crew — traces are captured automatically
result = crew.kickoff()
print(result)

# 5. Shut down cleanly to flush pending spans
arthur.shutdown()

Key points:

  • instrument_crewai() must be called before creating any Agent, Task, or Crew objects.
  • All CrewAI agents, tools, and LLM calls are automatically traced — no decorator or wrapper needed.
  • Call arthur.shutdown() when your application exits to flush any remaining traces.
⚠️

Order matters. If you create agents or crews before calling instrument_crewai(), those objects will not be traced. Always instrument first.


Trace Multi-Agent Runs

For production applications, attach session and user context to your traces. This lets you filter and group traces in the Arthur dashboard by conversation, user, or custom metadata.

from arthur_observability_sdk import Arthur
from crewai import Agent, Task, Crew, Process

arthur = Arthur(
    api_key="your-api-key",
    base_url="https://your-arthur-engine-instance",
    task_id="<your-task-uuid>",
    service_name="my-crewai-app",
)
arthur.instrument_crewai()

# Define agents
researcher = Agent(
    role="Senior Researcher",
    goal="Uncover insights on a given topic",
    backstory="You're a meticulous analyst.",
)

writer = Agent(
    role="Technical Writer",
    goal="Write clear, concise reports",
    backstory="You turn complex research into readable content.",
)

# Define tasks
research_task = Task(
    description="Research the latest in AI observability.",
    expected_output="A bullet-point summary of key findings.",
    agent=researcher,
)

writing_task = Task(
    description="Write a short report based on the research findings.",
    expected_output="A 200-word report.",
    agent=writer,
)

# Multi-agent crew
crew = Crew(
    agents=[researcher, writer],
    tasks=[research_task, writing_task],
    process=Process.sequential,
)

# Tag the entire run with session and user context
with arthur.attributes(session_id="sess-1", user_id="user-42"):
    result = crew.kickoff()

print(result)
arthur.shutdown()

A multi-agent sequential crew produces a trace tree like this:

flowchart TD
    ROOT["Crew Kickoff (CHAIN)"]
    ROOT --> A1["Agent: Senior Researcher"]
    A1 --> T1["Task: Research AI observability"]
    T1 --> LLM1["LLM Call (OpenAI)"]
    T1 --> TOOL1["Tool: Search"]
    TOOL1 --> LLM2["LLM Call (OpenAI)"]
    ROOT --> A2["Agent: Technical Writer"]
    A2 --> T2["Task: Write report"]
    T2 --> LLM3["LLM Call (OpenAI)"]

Each node is an OpenTelemetry span with OpenInference semantic attributes — including input/output content, token counts, model names, and latency.


Add Session and User Context

Use arthur.attributes() as a context manager to tag all spans created within its scope:

with arthur.attributes(session_id="sess-1", user_id="user-42"):
    result = crew.kickoff()

This is especially useful for:

  • Multi-turn conversations — trace an entire chat session end-to-end
  • Per-user analytics — understand how individual users interact with your application
  • Debugging — filter traces in the Arthur dashboard by session or user

Verify in Arthur

After running your crew, traces appear in the Arthur GenAI Engine within seconds.

Traces viewed on the Arthur Engine UI

What to look for in the dashboard:

  • Trace list — each crew.kickoff() call appears as a trace with the full agent/task/tool span tree
  • Session grouping — if you used arthur.attributes(session_id=...), traces are grouped by session
  • User filtering — filter by user_id to see a specific user's interactions
  • Token usage — prompt and completion token counts are captured automatically

You can also query traces programmatically:

curl -X GET "${ARTHUR_BASE_URL}/api/v1/traces?task_ids=${ARTHUR_TASK_ID}" \
  -H "Authorization: Bearer ${ARTHUR_API_KEY}"

Troubleshooting

SymptomFix
No traces appearEnsure instrument_crewai() is called before any CrewAI imports/instantiation.
No traces appearVerify ARTHUR_API_KEY and ARTHUR_BASE_URL are set correctly; check network connectivity to your Arthur Engine.
ImportError on instrumentRun pip install "arthur-observability-sdk[crewai]" to install the required extra.

Next Steps

Now that your CrewAI application is instrumented, explore these capabilities:

  • Continuous Evaluations — automatically score agent outputs for quality, safety, and relevance on every run
  • Agentic Experiments — compare different crew configurations (agent roles, tools, process types) to find the best-performing setup
  • Prompt Management — store and version your agent system prompts in Arthur with arthur.get_prompt() so you can iterate without redeploying code
  • Read our Best Practices for Building Agents Blog Series — observability and tracing fundamentals for building production agents
  • Other Integrations — if your agents call LangChain chains or other frameworks internally, layer additional instrumentors alongside CrewAI instrumentation
flowchart LR
    A[Instrument CrewAI] --> B[View Traces]
    B --> C[Add Evaluations]
    B --> D[Manage Prompts]
    C --> E[Set Up Continuous Evals]
    D --> F[A/B Test Prompts]
    E --> G[Production Monitoring]
    F --> G