Arthur + LiteLLM
How does a developer instrument a LiteLLM application with Arthur in under 10 minutes? By installing the SDK with the litellm extra, initializing an Arthur instance, and calling arthur.instrument_litellm(). That single call automatically traces every LiteLLM completion — regardless of which underlying provider (OpenAI, Anthropic, Bedrock, etc.) handles the request — and exports those traces to Arthur for observability, evaluation, and debugging.
Overview
LiteLLM provides a unified completion() interface that abstracts across 100+ LLM providers. Arthur's integration hooks into LiteLLM at the framework level so that every call — whether it routes to GPT, Claude, Bedrock, or any other supported model — is captured as an OpenInference trace and sent to Arthur Engine. Once instrumented, you get full visibility into:
- Prompts and completions — every message sent and received
- Model identity — which model and provider handled each request
- Latency and errors — per-call timing and failure tracking
- Token usage — input and output token counts
- Provider-agnostic observability — switch models without touching your instrumentation code
- Session and user context — group traces by conversation or end-user
sequenceDiagram
participant App as Your Application
participant SDK as Arthur SDK
participant LL as LiteLLM
participant Engine as Arthur GenAI Engine
App->>SDK: arthur.instrument_litellm()
Note over SDK: Auto-instrumentation enabled
App->>LL: litellm.completion(...)
LL-->>App: Response
SDK->>Engine: Trace (spans, attributes)
Note over Engine: Traces visible in dashboard
Prerequisites:
- Python 3.10+
- An Arthur GenAI Engine instance (cloud or local)
- An Arthur API key — see API Keys to create one
Installation
Install the Arthur Observability SDK with the litellm extra:
pip install "arthur-observability-sdk[litellm]"This installs arthur-observability-sdk, litellm, and the OpenInference LiteLLM instrumentor in one step.
Initialize Arthur
Create a single Arthur instance at application startup.
import litellm
from arthur_observability_sdk import Arthur
arthur = Arthur(
api_key="your-api-key", # or set ARTHUR_API_KEY env var
base_url="https://your-arthur-engine-instance", # or set ARTHUR_BASE_URL env var
task_id="<your-task-uuid>", # Arthur task UUID
service_name="my-litellm-app",
)| Parameter | Description |
|---|---|
api_key | Your Arthur Engine API key. Falls back to ARTHUR_API_KEY env var. |
base_url | Base URL of your Arthur GenAI Engine. Falls back to ARTHUR_BASE_URL env var, then http://localhost:3030. |
task_id | Arthur task UUID for associating traces with a specific task. |
service_name | OpenTelemetry service.name resource attribute. Used to identify your application in the Arthur dashboard. Creates a new task based on service_name if task_id isn't specified. |
At least one oftask_idorservice_namemust be provided. A new task with theservice_namewill be created iftask_idis not specified.
Use environment variables for secrets. SetARTHUR_API_KEYandARTHUR_BASE_URLas environment variables (e.g., in a.envfile) rather than hardcoding them in your application.
Instrument LiteLLM
Call instrument_litellm() once, immediately after creating the Arthur instance. Then use litellm.completion() as you normally would — every call is automatically traced:
import litellm
from arthur_observability_sdk import Arthur
arthur = Arthur(
api_key="your-api-key", # or set ARTHUR_API_KEY env var
base_url="https://your-arthur-engine-instance", # or set ARTHUR_BASE_URL env var
task_id="<your-task-uuid>", # Arthur task UUID
service_name="my-litellm-app",
)
arthur.instrument_litellm()
response = litellm.completion(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello from LiteLLM!"}],
)
print(response.choices[0].message.content)
arthur.shutdown()Key points:
instrument_litellm()patches LiteLLM globally — you do not need to wrap individual calls.- The instrumentor captures input messages, output content, model name, token counts, and latency automatically.
- Call
arthur.shutdown()when your application exits to flush any remaining traces.
Trace Multiple Providers
The power of combining LiteLLM with Arthur is that your instrumentation stays the same no matter which provider you call. Switch models by changing the model string — Arthur captures traces identically:
# Anthropic via LiteLLM
response = litellm.completion(
model="claude-sonnet-4-5",
messages=[{"role": "user", "content": "Hello, Claude!"}],
)
# Bedrock via LiteLLM
response = litellm.completion(
model="bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0",
messages=[{"role": "user", "content": "Hello!"}],
)arthur.instrument_litellm() traces every call regardless of which provider LiteLLM routes to.
Add Session and User Context
Add session and user context to group related traces together:
with arthur.attributes(session_id="sess-1", user_id="user-42"):
response = litellm.completion(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello!"}],
)This is especially useful for:
- Multi-turn conversations — trace an entire chat session end-to-end
- Per-user analytics — understand how individual users interact with your application
- Debugging — filter traces in the Arthur dashboard by session or user
Verify in Arthur
After running your instrumented application, traces appear in the Arthur GenAI Engine within seconds.

Traces viewed on the Arthur Engine UI
What to look for in the dashboard:
- Trace list — each
litellm.completion()call appears as a trace with the model name, input/output messages, token usage, and latency - Session grouping — if you used
arthur.attributes(session_id=...), traces are grouped by session - User filtering — filter by
user_idto see a specific user's interactions - Token usage — prompt and completion token counts are captured automatically
You can also query traces programmatically:
curl -X GET "${ARTHUR_BASE_URL}/api/v1/traces?task_ids=${ARTHUR_TASK_ID}" \
-H "Authorization: Bearer ${ARTHUR_API_KEY}"Troubleshooting
| Symptom | Fix |
|---|---|
| No traces appearing | Verify ARTHUR_API_KEY and ARTHUR_BASE_URL are correct and your Arthur Engine is reachable from your application. |
| Traces delayed | Traces are exported asynchronously via BatchSpanProcessor; allow a few seconds, or call arthur.shutdown() to flush. |
| Wrong provider model called | Confirm the model string matches LiteLLM's expected format (e.g. bedrock/anthropic.claude-...). |
ImportError on instrument | Run pip install "arthur-observability-sdk[litellm]" to install the required extra. |
Next Steps
Now that you have LiteLLM instrumented with Arthur, explore these capabilities:
- Prompt Management — use
arthur.get_prompt()to fetch versioned prompts from Arthur Engine and combine them with your LiteLLM calls - Continuous Evaluations — set up automated evaluation rules that score every traced LiteLLM response for quality, safety, or custom criteria
- Agentic Experiments — run structured experiments to compare model performance across providers
- Read our Best Practices for Building Agents Blog Series — observability and tracing fundamentals for building production agents
- Other Integrations — if you also use LangChain or call OpenAI directly in parts of your application, add
arthur.instrument_langchain()orarthur.instrument_openai()alongside your LiteLLM instrumentation
flowchart LR
A[Instrument LiteLLM] --> B[View Traces]
B --> C[Add Evaluations]
B --> D[Manage Prompts]
C --> E[Set Up Continuous Evals]
D --> F[A/B Test Prompts]
E --> G[Production Monitoring]
F --> GUpdated about 22 hours ago