Arthur + LiteLLM

How does a developer instrument a LiteLLM application with Arthur in under 10 minutes? By installing the SDK with the litellm extra, initializing an Arthur instance, and calling arthur.instrument_litellm(). That single call automatically traces every LiteLLM completion — regardless of which underlying provider (OpenAI, Anthropic, Bedrock, etc.) handles the request — and exports those traces to Arthur for observability, evaluation, and debugging.


Overview

LiteLLM provides a unified completion() interface that abstracts across 100+ LLM providers. Arthur's integration hooks into LiteLLM at the framework level so that every call — whether it routes to GPT, Claude, Bedrock, or any other supported model — is captured as an OpenInference trace and sent to Arthur Engine. Once instrumented, you get full visibility into:

  • Prompts and completions — every message sent and received
  • Model identity — which model and provider handled each request
  • Latency and errors — per-call timing and failure tracking
  • Token usage — input and output token counts
  • Provider-agnostic observability — switch models without touching your instrumentation code
  • Session and user context — group traces by conversation or end-user
sequenceDiagram
    participant App as Your Application
    participant SDK as Arthur SDK
    participant LL as LiteLLM
    participant Engine as Arthur GenAI Engine

    App->>SDK: arthur.instrument_litellm()
    Note over SDK: Auto-instrumentation enabled
    App->>LL: litellm.completion(...)
    LL-->>App: Response
    SDK->>Engine: Trace (spans, attributes)
    Note over Engine: Traces visible in dashboard

Prerequisites:

  • Python 3.10+
  • An Arthur GenAI Engine instance (cloud or local)
  • An Arthur API key — see API Keys to create one

Installation

Install the Arthur Observability SDK with the litellm extra:

pip install "arthur-observability-sdk[litellm]"

This installs arthur-observability-sdk, litellm, and the OpenInference LiteLLM instrumentor in one step.


Initialize Arthur

Create a single Arthur instance at application startup.

import litellm
from arthur_observability_sdk import Arthur

arthur = Arthur(
    api_key="your-api-key",        # or set ARTHUR_API_KEY env var
    base_url="https://your-arthur-engine-instance",  # or set ARTHUR_BASE_URL env var
    task_id="<your-task-uuid>",    # Arthur task UUID
    service_name="my-litellm-app",
)
ParameterDescription
api_keyYour Arthur Engine API key. Falls back to ARTHUR_API_KEY env var.
base_urlBase URL of your Arthur GenAI Engine. Falls back to ARTHUR_BASE_URL env var, then http://localhost:3030.
task_idArthur task UUID for associating traces with a specific task.
service_nameOpenTelemetry service.name resource attribute. Used to identify your application in the Arthur dashboard. Creates a new task based on service_name if task_id isn't specified.
📘

At least one of task_id or service_name must be provided. A new task with the service_name will be created if task_id is not specified.

⚠️

Use environment variables for secrets. Set ARTHUR_API_KEY and ARTHUR_BASE_URL as environment variables (e.g., in a .env file) rather than hardcoding them in your application.


Instrument LiteLLM

Call instrument_litellm() once, immediately after creating the Arthur instance. Then use litellm.completion() as you normally would — every call is automatically traced:

import litellm
from arthur_observability_sdk import Arthur

arthur = Arthur(
    api_key="your-api-key",        # or set ARTHUR_API_KEY env var
    base_url="https://your-arthur-engine-instance",  # or set ARTHUR_BASE_URL env var
    task_id="<your-task-uuid>",    # Arthur task UUID
    service_name="my-litellm-app",
)
arthur.instrument_litellm()

response = litellm.completion(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello from LiteLLM!"}],
)
print(response.choices[0].message.content)

arthur.shutdown()

Key points:

  • instrument_litellm() patches LiteLLM globally — you do not need to wrap individual calls.
  • The instrumentor captures input messages, output content, model name, token counts, and latency automatically.
  • Call arthur.shutdown() when your application exits to flush any remaining traces.

Trace Multiple Providers

The power of combining LiteLLM with Arthur is that your instrumentation stays the same no matter which provider you call. Switch models by changing the model string — Arthur captures traces identically:

# Anthropic via LiteLLM
response = litellm.completion(
    model="claude-sonnet-4-5",
    messages=[{"role": "user", "content": "Hello, Claude!"}],
)

# Bedrock via LiteLLM
response = litellm.completion(
    model="bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0",
    messages=[{"role": "user", "content": "Hello!"}],
)

arthur.instrument_litellm() traces every call regardless of which provider LiteLLM routes to.


Add Session and User Context

Add session and user context to group related traces together:

with arthur.attributes(session_id="sess-1", user_id="user-42"):
    response = litellm.completion(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Hello!"}],
    )

This is especially useful for:

  • Multi-turn conversations — trace an entire chat session end-to-end
  • Per-user analytics — understand how individual users interact with your application
  • Debugging — filter traces in the Arthur dashboard by session or user

Verify in Arthur

After running your instrumented application, traces appear in the Arthur GenAI Engine within seconds.

Traces viewed on the Arthur Engine UI

What to look for in the dashboard:

  • Trace list — each litellm.completion() call appears as a trace with the model name, input/output messages, token usage, and latency
  • Session grouping — if you used arthur.attributes(session_id=...), traces are grouped by session
  • User filtering — filter by user_id to see a specific user's interactions
  • Token usage — prompt and completion token counts are captured automatically

You can also query traces programmatically:

curl -X GET "${ARTHUR_BASE_URL}/api/v1/traces?task_ids=${ARTHUR_TASK_ID}" \
  -H "Authorization: Bearer ${ARTHUR_API_KEY}"

Troubleshooting

SymptomFix
No traces appearingVerify ARTHUR_API_KEY and ARTHUR_BASE_URL are correct and your Arthur Engine is reachable from your application.
Traces delayedTraces are exported asynchronously via BatchSpanProcessor; allow a few seconds, or call arthur.shutdown() to flush.
Wrong provider model calledConfirm the model string matches LiteLLM's expected format (e.g. bedrock/anthropic.claude-...).
ImportError on instrumentRun pip install "arthur-observability-sdk[litellm]" to install the required extra.

Next Steps

Now that you have LiteLLM instrumented with Arthur, explore these capabilities:

  • Prompt Management — use arthur.get_prompt() to fetch versioned prompts from Arthur Engine and combine them with your LiteLLM calls
  • Continuous Evaluations — set up automated evaluation rules that score every traced LiteLLM response for quality, safety, or custom criteria
  • Agentic Experiments — run structured experiments to compare model performance across providers
  • Read our Best Practices for Building Agents Blog Series — observability and tracing fundamentals for building production agents
  • Other Integrations — if you also use LangChain or call OpenAI directly in parts of your application, add arthur.instrument_langchain() or arthur.instrument_openai() alongside your LiteLLM instrumentation
flowchart LR
    A[Instrument LiteLLM] --> B[View Traces]
    B --> C[Add Evaluations]
    B --> D[Manage Prompts]
    C --> E[Set Up Continuous Evals]
    D --> F[A/B Test Prompts]
    E --> G[Production Monitoring]
    F --> G