Traces Overview

New to Arthur? The Quickstart shows you how to send your first trace in under 5 minutes using the observability SDK.

What Is a Trace?

A trace is a directed tree of spans. Each span represents one meaningful unit of work: an LLM call, a retrieval step, a tool invocation, or an agent decision loop. Spans share a common trace_id and are linked by parent-child relationships that reconstruct the full execution path.

graph TD
    A["Trace: answer_question (root span)"]
    A --> B["RETRIEVER: fetch_documents"]
    A --> C["LLM: generate_answer"]
    B --> D["EMBEDDING: embed_query"]

Arthur ingests traces over OpenTelemetry (OTEL) using the OpenInference semantic conventions — an open standard for LLM observability. This means:

You instrument once using standard OTEL APIs.
Arthur understands LLM-specific attributes like input.value, output.value, llm.token_count.total, and span kinds like LLM, RETRIEVER, TOOL, and AGENT.
Any OTEL-compatible framework instrumentation (LangChain, LlamaIndex, OpenAI, etc.) works automatically.

Key Concepts

Concept	Definition
Trace	The complete record of one request. Identified by a `trace_id`.
Span	One unit of work within a trace. Has a name, start/end time, attributes, and an optional parent span.
Root span	The top-level span in a trace. Has no parent. Represents the entire request.
Span kind	An OpenInference attribute (`openinference.span.kind`) that classifies what a span does: `LLM`, `RETRIEVER`, `TOOL`, `AGENT`, `CHAIN`, `EMBEDDING`, `RERANKER`, or `GUARDRAIL`.
Session	A group of traces that belong to the same user conversation or interaction thread. Set via the `session.id` attribute.
User	The end-user who triggered the request. Set via the `user.id` attribute.
Task	An Arthur-level concept that groups traces from the same deployed application or model. You associate your SDK client with a task at initialization.

Rule of thumb for granularity: Create a span for each logical step that you want to observe, debug, or evaluate independently. You do not need a span for every Python function — only for steps where latency, inputs, outputs, or quality matter to you.

How Tracing Works in Arthur

sequenceDiagram
    participant App as Your Application
    participant SDK as Arthur SDK (OTEL)
    participant Engine as Arthur Engine
    participant Platform as Arthur Platform

    App->>SDK: Start span (e.g., LLM call)
    App->>SDK: Set attributes (input, output, tokens)
    App->>SDK: End span
    SDK->>Engine: Export spans via OTLP/HTTP (POST /api/v1/traces)
    Engine->>Platform: Evaluations, Full Agentic Tool Kit
    Platform-->>App: Dashboards, alerts, policies & governance available

The Arthur SDK wraps the OpenTelemetry SDK and configures an OTLP/HTTP exporter that sends spans to your Arthur deployment at {base_url}/api/v1/traces. Spans are batched and exported asynchronously — your application latency is not affected.

Instrument Your First Trace

The simplest case is a single LLM call wrapped in one trace with one span. Start here before adding complexity.

Step 1: Install the SDK

pip install "arthur-observability-sdk"

# To add framework auto-instrumentation (e.g., OpenAI)
pip install "arthur-observability-sdk[openai]"

# Mastra users: install the Arthur exporter
npm install @mastra/arthur

See the SDK Reference for the full API, available extras, and configuration options.

Step 2: Initialize Arthur and get a tracer

from arthur_observability_sdk.arthur import Arthur
from opentelemetry import trace
from openinference.semconv.trace import SpanAttributes, OpenInferenceSpanKindValues

arthur = Arthur(
    task_name="my-llm-app",
    api_key="your-api-key",
    base_url="https://your-arthur-engine-url",
)

tracer = trace.get_tracer("my-llm-app")

// Mastra handles tracer setup automatically once ArthurExporter is wired in.
// No manual tracer initialization needed.
const agent = mastra.getAgent('my-agent');

Step 3: Create a span for your LLM call

import openai

client = openai.OpenAI()

with tracer.start_as_current_span("llm_call") as span:
    # Tag the span as an LLM span using OpenInference conventions
    span.set_attribute(
        SpanAttributes.OPENINFERENCE_SPAN_KIND,
        OpenInferenceSpanKindValues.LLM.value
    )
    span.set_attribute(SpanAttributes.INPUT_VALUE, "What is the capital of France?")

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "What is the capital of France?"}],
    )

    output = response.choices[0].message.content
    span.set_attribute(SpanAttributes.OUTPUT_VALUE, output)
    span.set_attribute(SpanAttributes.LLM_MODEL_NAME, "gpt-4o")
    span.set_attribute(
        SpanAttributes.LLM_TOKEN_COUNT_TOTAL,
        response.usage.total_tokens
    )

print(output)
# Don't forget to flush on shutdown
arthur.shutdown()

// Mastra auto-instruments every agent/LLM call once ArthurExporter is wired in.
// No manual span creation needed — just invoke your agent.
const agent = mastra.getAgent('my-agent');
const response = await agent.generate('What is the capital of France?');
console.log(response.text);

Step 4: Use auto-instrumentation (recommended)

If you use a supported framework, Arthur can instrument it automatically — no manual span creation needed:

from arthur_observability_sdk.arthur import Arthur

arthur = Arthur(
    task_name="my-llm-app",
    api_key="your-api-key",
    base_url="https://your-arthur-engine-url",
)

# Auto-instrument the OpenAI client — all calls are traced automatically
arthur.instrument_openai()

# Now use OpenAI normally
import openai
client = openai.OpenAI()
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What is the capital of France?"}],
)

// Mastra auto-instruments all supported frameworks via ArthurExporter.
// See the Quickstart (doc:quickstart-eval) for the full setup.
const agent = mastra.getAgent('my-agent');
const response = await agent.generate('What is the capital of France?');
console.log(response.text);

Multi-Span Traces

Most real applications involve more than one step. A RAG pipeline, for example, embeds the query, retrieves documents, and then generates an answer. Each step should be its own span, nested under a root span that represents the full request.

RAG Pipeline Example

graph TD
    Root["CHAIN: answer_question (root)"]
    Root --> Embed["EMBEDDING: embed_query"]
    Root --> Retrieve["RETRIEVER: fetch_documents"]
    Root --> Generate["LLM: generate_answer"]

from arthur_observability_sdk.arthur import Arthur
from opentelemetry import trace
from openinference.semconv.trace import SpanAttributes, OpenInferenceSpanKindValues

arthur = Arthur(task_name="rag-app", api_key="your-api-key", base_url="https://your-arthur-engine-url")
tracer = trace.get_tracer("rag-app")

def answer_question(user_query: str) -> str:
    # Root span — represents the full request
    with tracer.start_as_current_span("answer_question") as root:
        root.set_attribute(SpanAttributes.OPENINFERENCE_SPAN_KIND, OpenInferenceSpanKindValues.CHAIN.value)
        root.set_attribute(SpanAttributes.INPUT_VALUE, user_query)
        # Optional: attach session and user context
        root.set_attribute("session.id", "session-abc-123")
        root.set_attribute("user.id", "user-42")

        # Step 1: Embed the query
        with tracer.start_as_current_span("embed_query") as embed_span:
            embed_span.set_attribute(SpanAttributes.OPENINFERENCE_SPAN_KIND, OpenInferenceSpanKindValues.EMBEDDING.value)
            embedding = embed_query(user_query)  # your embedding function
            embed_span.set_attribute(SpanAttributes.EMBEDDING_MODEL_NAME, "text-embedding-3-small")

        # Step 2: Retrieve documents
        with tracer.start_as_current_span("fetch_documents") as retriever_span:
            retriever_span.set_attribute(SpanAttributes.OPENINFERENCE_SPAN_KIND, OpenInferenceSpanKindValues.RETRIEVER.value)
            docs = retrieve_documents(embedding)  # your retrieval function
            retriever_span.set_attribute(SpanAttributes.INPUT_VALUE, user_query)
            retriever_span.set_attribute("retrieval.document_count", len(docs))

        # Step 3: Generate answer
        with tracer.start_as_current_span("generate_answer") as llm_span:
            llm_span.set_attribute(SpanAttributes.OPENINFERENCE_SPAN_KIND, OpenInferenceSpanKindValues.LLM.value)
            answer = generate_with_context(user_query, docs)  # your LLM call
            llm_span.set_attribute(SpanAttributes.OUTPUT_VALUE, answer)
            llm_span.set_attribute(SpanAttributes.LLM_MODEL_NAME, "gpt-4o")

        root.set_attribute(SpanAttributes.OUTPUT_VALUE, answer)
        return answer

// Mastra traces RAG pipelines automatically — each step (embed, retrieve,
// generate) is captured as a separate span with the correct OpenInference
// span kind. No manual span creation needed.
const agent = mastra.getAgent('rag-agent');
const response = await agent.generate(userQuery);
console.log(response.text);

# Submit all spans in a single OTLP payload.
# Each span references the root span's spanId as its parentSpanId.
# See the OTLP/HTTP spec for the full payload structure.

Key rule: Child spans are created inside the parent span's context block. The OTEL context propagation system automatically sets the parentSpanId — you do not need to set it manually.

Supported Span Types

Arthur uses OpenInference span kinds to classify and evaluate spans differently. Use the right kind for each step so Arthur can apply the correct evaluators and display the right metrics.

Span Kind	Use For	Key Attributes
`LLM`	Direct calls to a language model	`llm.model_name`, `llm.token_count.total`, `llm.token_count.prompt`, `llm.token_count.completion`, `input.value`, `output.value`
`CHAIN`	Orchestration logic, pipelines, or any multi-step sequence	`input.value`, `output.value`
`RETRIEVER`	Document retrieval from a vector store or search index	`input.value`, `retrieval.documents`
`EMBEDDING`	Generating vector embeddings	`embedding.model_name`, `embedding.embeddings`
`TOOL`	Tool or function calls made by an agent	`tool.name`, `tool.description`, `input.value`, `output.value`
`AGENT`	An autonomous agent making decisions and invoking tools	`input.value`, `output.value`
`RERANKER`	Reranking retrieved documents	`reranker.model_name`, `reranker.input_documents`, `reranker.output_documents`
`GUARDRAIL`	Safety or policy checks on inputs/outputs	`input.value`, `output.value`

Setting span kind in code

from openinference.semconv.trace import SpanAttributes, OpenInferenceSpanKindValues

with tracer.start_as_current_span("my_tool_call") as span:
    span.set_attribute(
        SpanAttributes.OPENINFERENCE_SPAN_KIND,
        OpenInferenceSpanKindValues.TOOL.value  # "TOOL"
    )
    span.set_attribute("tool.name", "search_web")
    span.set_attribute(SpanAttributes.INPUT_VALUE, '{"query": "Arthur AI pricing"}')
    result = search_web(query="Arthur AI pricing")
    span.set_attribute(SpanAttributes.OUTPUT_VALUE, str(result))

span.setAttribute("openinference.span.kind", "TOOL");
span.setAttribute("tool.name", "search_web");
span.setAttribute("input.value", JSON.stringify({ query: "Arthur AI pricing" }));

Next Steps

SDK Reference — Full API, available extras, and configuration options.
Dataset Transforms — Automatically extract span attributes into evaluation datasets.
Run Evaluations — Score your traces with LLM-as-judge or custom evaluators.
Alerts & Alert Rules — Get notified when metrics cross a threshold.
Observe & Dashboard — Monitor your application's health across traces.