Arthur + CrewAI
How does a developer instrument a CrewAI application with Arthur in under 10 minutes? You install the SDK with the crewai extra, initialize an Arthur instance, and call arthur.instrument_crewai() before creating any agents or crews. This single method call auto-instruments all CrewAI agent actions, tool calls, and LLM invocations — no manual span creation required. Traces flow automatically to the Arthur platform where you can inspect every step of your multi-agent workflows.
Overview
CrewAI is a framework for orchestrating autonomous AI agents that collaborate on complex tasks. Arthur's CrewAI integration uses OpenInference auto-instrumentation to capture the full execution tree — from the top-level crew kickoff down to individual LLM completions — as structured OpenTelemetry traces. Once instrumented, you get full visibility into:
- Agent actions — every agent invocation with role, goal, and backstory
- Tool calls — input arguments and return values for each tool
- LLM completions — prompts, responses, model parameters, and token counts
- Crew orchestration — sequential or hierarchical agent execution flow
- Session and user context — group traces by conversation or end-user
sequenceDiagram
participant App as Your Application
participant SDK as Arthur SDK
participant Crew as CrewAI
participant Engine as Arthur GenAI Engine
App->>SDK: arthur.instrument_crewai()
Note over SDK: Auto-instrumentation enabled
App->>Crew: crew.kickoff()
Crew-->>App: Result
SDK->>Engine: Trace (spans, attributes)
Note over Engine: Traces visible in dashboard
Prerequisites:
- Python 3.10+
- An Arthur GenAI Engine instance (cloud or local)
- An Arthur API key — see API Keys to create one
Installation
Install the Arthur Observability SDK with the crewai extra:
pip install "arthur-observability-sdk[crewai]"This pulls in openinference-instrumentation-crewai and its dependencies automatically.
Initialize Arthur
Create a single Arthur instance at application startup.
from arthur_observability_sdk import Arthur
arthur = Arthur(
api_key="your-api-key", # or set ARTHUR_API_KEY env var
base_url="https://your-arthur-engine-instance", # or set ARTHUR_BASE_URL env var
task_id="<your-task-uuid>", # Arthur task UUID
service_name="my-crewai-app",
)| Parameter | Description |
|---|---|
api_key | Your Arthur Engine API key. Falls back to ARTHUR_API_KEY env var. |
base_url | Base URL of your Arthur GenAI Engine. Falls back to ARTHUR_BASE_URL env var, then http://localhost:3030. |
task_id | Arthur task UUID for associating traces with a specific task. |
service_name | OpenTelemetry service.name resource attribute. Used to identify your application in the Arthur dashboard. Creates a new task based on service_name if task_id isn't specified. |
At least one oftask_idorservice_namemust be provided. A new task with theservice_namewill be created iftask_idis not specified.
Use environment variables for secrets. SetARTHUR_API_KEYandARTHUR_BASE_URLas environment variables (e.g., in a.envfile) rather than hardcoding them in your application.
Instrument CrewAI
Call arthur.instrument_crewai() before you create any Agent, Task, or Crew objects. The instrumentor patches CrewAI classes at import time, so any objects created after this call are automatically traced.
from crewai import Agent, Task, Crew, Process
from arthur_observability_sdk import Arthur
# 1. Initialize Arthur
arthur = Arthur(
api_key="your-api-key",
base_url="https://your-arthur-engine-instance",
task_id="<your-task-uuid>",
service_name="my-crewai-app",
)
# 2. Instrument CrewAI — MUST come before creating agents/crews
arthur.instrument_crewai()
# 3. Now define your agents and tasks
researcher = Agent(
role="Senior Researcher",
goal="Uncover insights on a given topic",
backstory="You're a meticulous analyst.",
)
research_task = Task(
description="Research the latest in AI observability.",
expected_output="A short report.",
agent=researcher,
)
crew = Crew(
agents=[researcher],
tasks=[research_task],
process=Process.sequential,
)
# 4. Run the crew — traces are captured automatically
result = crew.kickoff()
print(result)
# 5. Shut down cleanly to flush pending spans
arthur.shutdown()Key points:
instrument_crewai()must be called before creating anyAgent,Task, orCrewobjects.- All CrewAI agents, tools, and LLM calls are automatically traced — no decorator or wrapper needed.
- Call
arthur.shutdown()when your application exits to flush any remaining traces.
Order matters. If you create agents or crews before callinginstrument_crewai(), those objects will not be traced. Always instrument first.
Trace Multi-Agent Runs
For production applications, attach session and user context to your traces. This lets you filter and group traces in the Arthur dashboard by conversation, user, or custom metadata.
from arthur_observability_sdk import Arthur
from crewai import Agent, Task, Crew, Process
arthur = Arthur(
api_key="your-api-key",
base_url="https://your-arthur-engine-instance",
task_id="<your-task-uuid>",
service_name="my-crewai-app",
)
arthur.instrument_crewai()
# Define agents
researcher = Agent(
role="Senior Researcher",
goal="Uncover insights on a given topic",
backstory="You're a meticulous analyst.",
)
writer = Agent(
role="Technical Writer",
goal="Write clear, concise reports",
backstory="You turn complex research into readable content.",
)
# Define tasks
research_task = Task(
description="Research the latest in AI observability.",
expected_output="A bullet-point summary of key findings.",
agent=researcher,
)
writing_task = Task(
description="Write a short report based on the research findings.",
expected_output="A 200-word report.",
agent=writer,
)
# Multi-agent crew
crew = Crew(
agents=[researcher, writer],
tasks=[research_task, writing_task],
process=Process.sequential,
)
# Tag the entire run with session and user context
with arthur.attributes(session_id="sess-1", user_id="user-42"):
result = crew.kickoff()
print(result)
arthur.shutdown()A multi-agent sequential crew produces a trace tree like this:
flowchart TD
ROOT["Crew Kickoff (CHAIN)"]
ROOT --> A1["Agent: Senior Researcher"]
A1 --> T1["Task: Research AI observability"]
T1 --> LLM1["LLM Call (OpenAI)"]
T1 --> TOOL1["Tool: Search"]
TOOL1 --> LLM2["LLM Call (OpenAI)"]
ROOT --> A2["Agent: Technical Writer"]
A2 --> T2["Task: Write report"]
T2 --> LLM3["LLM Call (OpenAI)"]
Each node is an OpenTelemetry span with OpenInference semantic attributes — including input/output content, token counts, model names, and latency.
Add Session and User Context
Use arthur.attributes() as a context manager to tag all spans created within its scope:
with arthur.attributes(session_id="sess-1", user_id="user-42"):
result = crew.kickoff()This is especially useful for:
- Multi-turn conversations — trace an entire chat session end-to-end
- Per-user analytics — understand how individual users interact with your application
- Debugging — filter traces in the Arthur dashboard by session or user
Verify in Arthur
After running your crew, traces appear in the Arthur GenAI Engine within seconds.

Traces viewed on the Arthur Engine UI
What to look for in the dashboard:
- Trace list — each
crew.kickoff()call appears as a trace with the full agent/task/tool span tree - Session grouping — if you used
arthur.attributes(session_id=...), traces are grouped by session - User filtering — filter by
user_idto see a specific user's interactions - Token usage — prompt and completion token counts are captured automatically
You can also query traces programmatically:
curl -X GET "${ARTHUR_BASE_URL}/api/v1/traces?task_ids=${ARTHUR_TASK_ID}" \
-H "Authorization: Bearer ${ARTHUR_API_KEY}"Troubleshooting
| Symptom | Fix |
|---|---|
| No traces appear | Ensure instrument_crewai() is called before any CrewAI imports/instantiation. |
| No traces appear | Verify ARTHUR_API_KEY and ARTHUR_BASE_URL are set correctly; check network connectivity to your Arthur Engine. |
ImportError on instrument | Run pip install "arthur-observability-sdk[crewai]" to install the required extra. |
Next Steps
Now that your CrewAI application is instrumented, explore these capabilities:
- Continuous Evaluations — automatically score agent outputs for quality, safety, and relevance on every run
- Agentic Experiments — compare different crew configurations (agent roles, tools, process types) to find the best-performing setup
- Prompt Management — store and version your agent system prompts in Arthur with
arthur.get_prompt()so you can iterate without redeploying code - Read our Best Practices for Building Agents Blog Series — observability and tracing fundamentals for building production agents
- Other Integrations — if your agents call LangChain chains or other frameworks internally, layer additional instrumentors alongside CrewAI instrumentation
flowchart LR
A[Instrument CrewAI] --> B[View Traces]
B --> C[Add Evaluations]
B --> D[Manage Prompts]
C --> E[Set Up Continuous Evals]
D --> F[A/B Test Prompts]
E --> G[Production Monitoring]
F --> GUpdated about 22 hours ago