Integrations Overview
Arthur AI supports a broad set of frameworks and tools for building, observing, and evaluating AI applications — including OpenAI, Anthropic, LangChain, LiteLLM, CrewAI, LlamaIndex, OpenAI Agents, Google ADK, AWS Bedrock, and Mastra (JavaScript/TypeScript). To pick the right integration, identify the language you develop in (Python or JavaScript/TypeScript), the LLM framework you already use, and whether you need tracing, evaluation, or both. This page gives you a single at-a-glance reference for every supported integration so you can confirm compatibility with your stack and jump straight to the framework-specific guide.
How Arthur Integrations Work
Every Arthur integration follows the same core pattern: your application sends OpenTelemetry-compatible traces to the Arthur GenAI Engine, where they are stored, scored, and surfaced in the Arthur platform. The integration layer handles span creation, context propagation, and attribute mapping automatically — you just initialize Arthur and instrument your framework.
flowchart LR
A[Your Application] -->|instrument| B[Arthur SDK / Integration]
B -->|OTLP traces| C[Arthur GenAI Engine]
C --> D[Observability Dashboard]
C --> E[Continuous Evals]
C --> F[Agentic Experiments]
The table below summarizes every supported integration, the language it targets, and what it provides.
| Integration | Language | Install Extra | Capabilities |
|---|---|---|---|
| OpenAI | Python | arthur-observability-sdk[openai] | Auto-instrumented traces for Chat Completions, Embeddings, and Assistants API calls |
| Anthropic | Python | arthur-observability-sdk[anthropic] | Auto-instrumented traces for messages.create() and other Anthropic SDK calls |
| LangChain | Python | arthur-observability-sdk[langchain] | Auto-instrumented traces for chains, agents, tools, retrievers, and LLM calls |
| LiteLLM | Python | arthur-observability-sdk[litellm] | Auto-instrumented traces across 100+ LLM providers via LiteLLM's unified interface |
| CrewAI | Python | arthur-observability-sdk[crewai] | Multi-agent crew tracing including agent actions, tool calls, and LLM completions |
| LlamaIndex | Python | arthur-observability-sdk[llama-index] | RAG pipeline tracing — retrieval, embeddings, and LLM synthesis |
| OpenAI Agents | Python | arthur-observability-sdk[openai-agents] | Agent handoff and tool-call tracing for the OpenAI Agents SDK |
| Google ADK | Python | arthur-observability-sdk[google-adk] | Gemini-powered agent tracing including tool calls and conversation turns |
| AWS Bedrock | Python | arthur-observability-sdk[bedrock] | Auto-instrumented traces for Bedrock invoke_model and converse calls across all hosted models |
| Mastra | JavaScript / TypeScript | @mastra/arthur | Telemetry exporter for Mastra agents, tools, and workflows + remote prompt management |
Next Steps
Once you've finished following the guides above to set up tracing, you can explore these capabilities:
- Prompt Management — version and manage prompts in Arthur, then fetch them at runtime with
arthur.get_prompt() - Continuous Evaluations — set up automated quality checks that run against your traced inferences
- Agentic Experiments — run structured experiments against your AI tasks to evaluate performance at scale
- Read our Best Practices for Building Agents Blog Series — observability and tracing fundamentals for building production agents
flowchart LR
A[Instrument Your Framework] --> B[View Traces]
B --> C[Add Evaluations]
B --> D[Manage Prompts]
C --> E[Set Up Continuous Evals]
D --> F[A/B Test Prompts]
E --> G[Production Monitoring]
F --> GUpdated about 22 hours ago