Integrations Overview

Arthur AI supports a broad set of frameworks and tools for building, observing, and evaluating AI applications — including OpenAI, Anthropic, LangChain, LiteLLM, CrewAI, LlamaIndex, OpenAI Agents, Google ADK, AWS Bedrock, and Mastra (JavaScript/TypeScript). To pick the right integration, identify the language you develop in (Python or JavaScript/TypeScript), the LLM framework you already use, and whether you need tracing, evaluation, or both. This page gives you a single at-a-glance reference for every supported integration so you can confirm compatibility with your stack and jump straight to the framework-specific guide.

How Arthur Integrations Work

Every Arthur integration follows the same core pattern: your application sends OpenTelemetry-compatible traces to the Arthur GenAI Engine, where they are stored, scored, and surfaced in the Arthur platform. The integration layer handles span creation, context propagation, and attribute mapping automatically — you just initialize Arthur and instrument your framework.

flowchart LR
    A[Your Application] -->|instrument| B[Arthur SDK / Integration]
    B -->|OTLP traces| C[Arthur GenAI Engine]
    C --> D[Observability Dashboard]
    C --> E[Continuous Evals]
    C --> F[Agentic Experiments]

The table below summarizes every supported integration, the language it targets, and what it provides.

Integration	Language	Install Extra	Capabilities
OpenAI	Python	`arthur-observability-sdk[openai]`	Auto-instrumented traces for Chat Completions, Embeddings, and Assistants API calls
Anthropic	Python	`arthur-observability-sdk[anthropic]`	Auto-instrumented traces for `messages.create()` and other Anthropic SDK calls
LangChain	Python	`arthur-observability-sdk[langchain]`	Auto-instrumented traces for chains, agents, tools, retrievers, and LLM calls
LiteLLM	Python	`arthur-observability-sdk[litellm]`	Auto-instrumented traces across 100+ LLM providers via LiteLLM's unified interface
CrewAI	Python	`arthur-observability-sdk[crewai]`	Multi-agent crew tracing including agent actions, tool calls, and LLM completions
LlamaIndex	Python	`arthur-observability-sdk[llama-index]`	RAG pipeline tracing — retrieval, embeddings, and LLM synthesis
OpenAI Agents	Python	`arthur-observability-sdk[openai-agents]`	Agent handoff and tool-call tracing for the OpenAI Agents SDK
Google ADK	Python	`arthur-observability-sdk[google-adk]`	Gemini-powered agent tracing including tool calls and conversation turns
AWS Bedrock	Python	`arthur-observability-sdk[bedrock]`	Auto-instrumented traces for Bedrock `invoke_model` and `converse` calls across all hosted models
Mastra	JavaScript / TypeScript	`@mastra/arthur`	Telemetry exporter for Mastra agents, tools, and workflows + remote prompt management

Next Steps

Once you've finished following the guides above to set up tracing, you can explore these capabilities:

Prompt Management — version and manage prompts in Arthur, then fetch them at runtime with arthur.get_prompt()
Continuous Evaluations — set up automated quality checks that run against your traced inferences
Agentic Experiments — run structured experiments against your AI tasks to evaluate performance at scale
Read our Best Practices for Building Agents Blog Series — observability and tracing fundamentals for building production agents

flowchart LR
    A[Instrument Your Framework] --> B[View Traces]
    B --> C[Add Evaluations]
    B --> D[Manage Prompts]
    C --> E[Set Up Continuous Evals]
    D --> F[A/B Test Prompts]
    E --> G[Production Monitoring]
    F --> G

Updated about 22 hours ago