Arthur + LlamaIndex

How does a developer instrument a LlamaIndex RAG pipeline with Arthur in under 10 minutes? You install the Arthur Observability SDK with the llama-index extra, initialize the Arthur client, call instrument_llama_index(), and run your pipeline as usual. Arthur automatically captures every retrieval, embedding, and LLM span using the OpenInference standard — giving you full visibility into which chunks were retrieved, how the query engine scored them, and what the LLM generated. No manual span creation required.

Overview

LlamaIndex is one of the most popular RAG frameworks — teams building retrieval pipelines need traces that show which chunks were retrieved, what the query engine scored, and what the LLM generated. Arthur's OpenInference support captures all of this automatically.

When you call arthur.instrument_llama_index(), the SDK registers an OpenInference instrumentor that hooks into LlamaIndex's internal callback system. Once instrumented, you get full visibility into:

Retrieval — query text, retrieved chunks, relevance scores
Embeddings — input text, model name, vector dimensions
LLM completions — prompt messages, completion text, token usage, model parameters
Query engine orchestration — the full chain from query to final answer
Session and user context — group traces by conversation or end-user

sequenceDiagram
    participant App as Your Application
    participant SDK as Arthur SDK
    participant LI as LlamaIndex
    participant Engine as Arthur GenAI Engine

    App->>SDK: arthur.instrument_llama_index()
    Note over SDK: Auto-instrumentation enabled
    App->>LI: query_engine.query(...)
    LI-->>App: Response
    SDK->>Engine: Trace (spans, attributes)
    Note over Engine: Traces visible in dashboard

Prerequisites:

Python 3.10+
An Arthur GenAI Engine instance (cloud or local)
An Arthur API key — see API Keys to create one

Installation

Install the Arthur Observability SDK with the llama-index extra:

pip install "arthur-observability-sdk[llama-index]"

This pulls in openinference-instrumentation-llama-index and its dependencies automatically.

Initialize Arthur

Create a single Arthur instance at application startup.

from arthur_observability_sdk import Arthur

arthur = Arthur(
    api_key="your-api-key",        # or set ARTHUR_API_KEY env var
    base_url="https://your-arthur-engine-instance",  # or set ARTHUR_BASE_URL env var
    task_id="<your-task-uuid>",    # Arthur task UUID
    service_name="my-llamaindex-app",
)

Parameter	Description
`api_key`	Your Arthur Engine API key. Falls back to `ARTHUR_API_KEY` env var.
`base_url`	Base URL of your Arthur GenAI Engine. Falls back to `ARTHUR_BASE_URL` env var, then `http://localhost:3030`.
`task_id`	Arthur task UUID for associating traces with a specific task.
`service_name`	OpenTelemetry `service.name` resource attribute. Used to identify your application in the Arthur dashboard. Creates a new task based on `service_name` if `task_id` isn't specified.

📘
At least one of task_id or service_name must be provided. A new task with the service_name will be created if task_id is not specified.

⚠️
Use environment variables for secrets. Set ARTHUR_API_KEY and ARTHUR_BASE_URL as environment variables (e.g., in a .env file) rather than hardcoding them in your application.

Instrument LlamaIndex

A single method call enables automatic tracing for your entire LlamaIndex pipeline. Call this before you build any indexes or run any queries.

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from arthur_observability_sdk import Arthur

# 1. Initialize Arthur
arthur = Arthur(
    api_key="your-api-key",
    base_url="https://your-arthur-engine-instance",
    task_id="<your-task-uuid>",
    service_name="my-llamaindex-app",
)

# 2. Instrument LlamaIndex
arthur.instrument_llama_index()

# 3. Build your RAG pipeline as usual
documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)

# 4. Query — traces are captured automatically
query_engine = index.as_query_engine()
response = query_engine.query("What does Arthur do?")
print(response)

# 5. Shut down cleanly (flushes pending spans)
arthur.shutdown()

Key points:

instrument_llama_index() patches LlamaIndex globally — you do not need to wrap individual queries.
Every retrieval, embedding, and LLM call is automatically traced with the full parent-child span hierarchy.
Call arthur.shutdown() when your application exits to flush any remaining traces.

Trace Index and Query Operations

When you run query_engine.query(...), Arthur captures a trace with the following span hierarchy:

flowchart TD
    A["CHAIN: query"]
    A --> B["RETRIEVER: retrieve"]
    B --> C["EMBEDDING: embed query"]
    A --> D["LLM: synthesize"]

Each span includes rich attributes:

RETRIEVER spans — the query text, each retrieved document chunk, and relevance scores
EMBEDDING spans — the input text, embedding model name, and vector dimensions
LLM spans — the full prompt messages, completion text, token usage, and model name

Add Session and User Context

Use arthur.attributes() as a context manager to tag traces with session IDs, user IDs, and custom metadata. This enables filtering and grouping in the Arthur platform.

with arthur.attributes(session_id="sess-1", user_id="user-42"):
    response = query_engine.query("What does Arthur do?")

This is especially useful for:

Multi-turn conversations — trace an entire chat session end-to-end
Per-user analytics — understand how individual users interact with your application
Debugging — filter traces in the Arthur dashboard by session or user

You can also tag traces with custom metadata for A/B testing retrieval strategies:

with arthur.attributes(session_id="experiment-1", metadata={"strategy": "hybrid-search"}):
    response = query_engine.query("What does Arthur do?")

Verify in Arthur

After running your pipeline, traces appear in the Arthur GenAI Engine within seconds.

What to look for in the dashboard:

Trace list — each query() call appears as a trace with the full retrieval/embedding/LLM span tree
Span hierarchy — each query shows a CHAIN root span with RETRIEVER and LLM child spans
Retrieved documents — click into a RETRIEVER span to see the chunks that were returned and their scores
LLM inputs/outputs — click into an LLM span to see the full prompt and completion text

You can also query traces programmatically:

curl -X GET "${ARTHUR_BASE_URL}/api/v1/traces?task_ids=${ARTHUR_TASK_ID}" \
  -H "Authorization: Bearer ${ARTHUR_API_KEY}"

Troubleshooting

Symptom	Fix
No traces appearing	Verify `ARTHUR_API_KEY` and `ARTHUR_BASE_URL` are correct and your Arthur Engine is reachable.
Missing retrieval/embedding spans	Call `arthur.instrument_llama_index()` before building any indexes or running queries.
Traces delayed	Traces are exported asynchronously via `BatchSpanProcessor`; allow a few seconds, or call `arthur.shutdown()` to flush.
`ImportError` on instrument	Run `pip install "arthur-observability-sdk[llama-index]"` to install the required extra.

Next Steps

Now that your LlamaIndex pipeline is instrumented, explore these capabilities:

Continuous Evaluations — automatically score retrieval quality, faithfulness, and answer relevance on every trace
Prompt Management — version and manage your LlamaIndex prompt templates with arthur.get_prompt()
Agentic Experiments — run structured experiments to compare retrieval strategies or prompt variations
Read our Best Practices for Building Agents Blog Series — observability and tracing fundamentals for building production agents
Other Integrations — if your pipeline also uses LangChain or direct OpenAI calls, layer additional instrumentors alongside LlamaIndex

flowchart LR
    A[Instrument LlamaIndex] --> B[View Traces]
    B --> C[Add Evaluations]
    B --> D[Manage Prompts]
    C --> E[Set Up Continuous Evals]
    D --> F[A/B Test Prompts]
    E --> G[Production Monitoring]
    F --> G

Arthur + LlamaIndex

Overview

Installation

Initialize Arthur

At least one of `task_id` or `service_name` must be provided. A new task with the `service_name` will be created if `task_id` is not specified.

Use environment variables for secrets. Set `ARTHUR_API_KEY` and `ARTHUR_BASE_URL` as environment variables (e.g., in a `.env` file) rather than hardcoding them in your application.

Instrument LlamaIndex

Trace Index and Query Operations

Add Session and User Context

Verify in Arthur

Troubleshooting

Next Steps

Overview

Installation

Initialize Arthur

At least one of task_id or service_name must be provided. A new task with the service_name will be created if task_id is not specified.

Use environment variables for secrets. Set ARTHUR_API_KEY and ARTHUR_BASE_URL as environment variables (e.g., in a .env file) rather than hardcoding them in your application.

Instrument LlamaIndex

Trace Index and Query Operations

Add Session and User Context

Verify in Arthur

Troubleshooting

Next Steps

At least one of `task_id` or `service_name` must be provided. A new task with the `service_name` will be created if `task_id` is not specified.

Use environment variables for secrets. Set `ARTHUR_API_KEY` and `ARTHUR_BASE_URL` as environment variables (e.g., in a `.env` file) rather than hardcoding them in your application.