Arthur + LlamaIndex

How does a developer instrument a LlamaIndex RAG pipeline with Arthur in under 10 minutes? You install the Arthur Observability SDK with the llama-index extra, initialize the Arthur client, call instrument_llama_index(), and run your pipeline as usual. Arthur automatically captures every retrieval, embedding, and LLM span using the OpenInference standard — giving you full visibility into which chunks were retrieved, how the query engine scored them, and what the LLM generated. No manual span creation required.


Overview

LlamaIndex is one of the most popular RAG frameworks — teams building retrieval pipelines need traces that show which chunks were retrieved, what the query engine scored, and what the LLM generated. Arthur's OpenInference support captures all of this automatically.

When you call arthur.instrument_llama_index(), the SDK registers an OpenInference instrumentor that hooks into LlamaIndex's internal callback system. Once instrumented, you get full visibility into:

  • Retrieval — query text, retrieved chunks, relevance scores
  • Embeddings — input text, model name, vector dimensions
  • LLM completions — prompt messages, completion text, token usage, model parameters
  • Query engine orchestration — the full chain from query to final answer
  • Session and user context — group traces by conversation or end-user
sequenceDiagram
    participant App as Your Application
    participant SDK as Arthur SDK
    participant LI as LlamaIndex
    participant Engine as Arthur GenAI Engine

    App->>SDK: arthur.instrument_llama_index()
    Note over SDK: Auto-instrumentation enabled
    App->>LI: query_engine.query(...)
    LI-->>App: Response
    SDK->>Engine: Trace (spans, attributes)
    Note over Engine: Traces visible in dashboard

Prerequisites:

  • Python 3.10+
  • An Arthur GenAI Engine instance (cloud or local)
  • An Arthur API key — see API Keys to create one

Installation

Install the Arthur Observability SDK with the llama-index extra:

pip install "arthur-observability-sdk[llama-index]"

This pulls in openinference-instrumentation-llama-index and its dependencies automatically.


Initialize Arthur

Create a single Arthur instance at application startup.

from arthur_observability_sdk import Arthur

arthur = Arthur(
    api_key="your-api-key",        # or set ARTHUR_API_KEY env var
    base_url="https://your-arthur-engine-instance",  # or set ARTHUR_BASE_URL env var
    task_id="<your-task-uuid>",    # Arthur task UUID
    service_name="my-llamaindex-app",
)
ParameterDescription
api_keyYour Arthur Engine API key. Falls back to ARTHUR_API_KEY env var.
base_urlBase URL of your Arthur GenAI Engine. Falls back to ARTHUR_BASE_URL env var, then http://localhost:3030.
task_idArthur task UUID for associating traces with a specific task.
service_nameOpenTelemetry service.name resource attribute. Used to identify your application in the Arthur dashboard. Creates a new task based on service_name if task_id isn't specified.
📘

At least one of task_id or service_name must be provided. A new task with the service_name will be created if task_id is not specified.

⚠️

Use environment variables for secrets. Set ARTHUR_API_KEY and ARTHUR_BASE_URL as environment variables (e.g., in a .env file) rather than hardcoding them in your application.


Instrument LlamaIndex

A single method call enables automatic tracing for your entire LlamaIndex pipeline. Call this before you build any indexes or run any queries.

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from arthur_observability_sdk import Arthur

# 1. Initialize Arthur
arthur = Arthur(
    api_key="your-api-key",
    base_url="https://your-arthur-engine-instance",
    task_id="<your-task-uuid>",
    service_name="my-llamaindex-app",
)

# 2. Instrument LlamaIndex
arthur.instrument_llama_index()

# 3. Build your RAG pipeline as usual
documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)

# 4. Query — traces are captured automatically
query_engine = index.as_query_engine()
response = query_engine.query("What does Arthur do?")
print(response)

# 5. Shut down cleanly (flushes pending spans)
arthur.shutdown()

Key points:

  • instrument_llama_index() patches LlamaIndex globally — you do not need to wrap individual queries.
  • Every retrieval, embedding, and LLM call is automatically traced with the full parent-child span hierarchy.
  • Call arthur.shutdown() when your application exits to flush any remaining traces.

Trace Index and Query Operations

When you run query_engine.query(...), Arthur captures a trace with the following span hierarchy:

flowchart TD
    A["CHAIN: query"]
    A --> B["RETRIEVER: retrieve"]
    B --> C["EMBEDDING: embed query"]
    A --> D["LLM: synthesize"]

Each span includes rich attributes:

  • RETRIEVER spans — the query text, each retrieved document chunk, and relevance scores
  • EMBEDDING spans — the input text, embedding model name, and vector dimensions
  • LLM spans — the full prompt messages, completion text, token usage, and model name

Add Session and User Context

Use arthur.attributes() as a context manager to tag traces with session IDs, user IDs, and custom metadata. This enables filtering and grouping in the Arthur platform.

with arthur.attributes(session_id="sess-1", user_id="user-42"):
    response = query_engine.query("What does Arthur do?")

This is especially useful for:

  • Multi-turn conversations — trace an entire chat session end-to-end
  • Per-user analytics — understand how individual users interact with your application
  • Debugging — filter traces in the Arthur dashboard by session or user

You can also tag traces with custom metadata for A/B testing retrieval strategies:

with arthur.attributes(session_id="experiment-1", metadata={"strategy": "hybrid-search"}):
    response = query_engine.query("What does Arthur do?")

Verify in Arthur

After running your pipeline, traces appear in the Arthur GenAI Engine within seconds.

Traces viewed on the Arthur Engine UI

What to look for in the dashboard:

  • Trace list — each query() call appears as a trace with the full retrieval/embedding/LLM span tree
  • Span hierarchy — each query shows a CHAIN root span with RETRIEVER and LLM child spans
  • Retrieved documents — click into a RETRIEVER span to see the chunks that were returned and their scores
  • LLM inputs/outputs — click into an LLM span to see the full prompt and completion text

You can also query traces programmatically:

curl -X GET "${ARTHUR_BASE_URL}/api/v1/traces?task_ids=${ARTHUR_TASK_ID}" \
  -H "Authorization: Bearer ${ARTHUR_API_KEY}"

Troubleshooting

SymptomFix
No traces appearingVerify ARTHUR_API_KEY and ARTHUR_BASE_URL are correct and your Arthur Engine is reachable.
Missing retrieval/embedding spansCall arthur.instrument_llama_index() before building any indexes or running queries.
Traces delayedTraces are exported asynchronously via BatchSpanProcessor; allow a few seconds, or call arthur.shutdown() to flush.
ImportError on instrumentRun pip install "arthur-observability-sdk[llama-index]" to install the required extra.

Next Steps

Now that your LlamaIndex pipeline is instrumented, explore these capabilities:

flowchart LR
    A[Instrument LlamaIndex] --> B[View Traces]
    B --> C[Add Evaluations]
    B --> D[Manage Prompts]
    C --> E[Set Up Continuous Evals]
    D --> F[A/B Test Prompts]
    E --> G[Production Monitoring]
    F --> G