May 2026 Release Notes

by Ashley Nader

Arthur Platform

Governance & Policy Management

  • Comprehensive policy lifecycle management. Organizations can create, update, and delete policies with inline alert and attestation rules, full API support, and SpiceDB authorization.
  • Bulk workspace compliance checks. Trigger compliance evaluations across all policy-model pairs in a workspace with a single API call and per-assignment job tracking.
  • Policy assignment enforcement. Automatically materialize alert rules on target models with dual permission checks and cascade deletion protection.
  • Configurable compliance time windows. Users can select from six relative presets or define custom start dates when running policy checks, replacing hardcoded 30-day windows.
  • Workspace compliance overview. Flat, queryable table showing per-rule and per-model compliance status with filtering, sorting, and summary rollup counts.
  • Three new governance roles. Organization Governance Admin, Organization Policy Reader, and Workspace Policy Manager enable fine-grained delegation of governance responsibilities.

Agent Discovery & Management

  • Agent scanning and registration. Automatically discover agents within workspaces with GCP agent metadata tracking and system project creation.
  • Mute and unmute unregistered agents. Configure durations from 7-365 days with dedicated tabs for discovered and muted agents.
  • Agent observability metrics. Track agent span counts, tool count metrics, and tool call latencies for comprehensive agentic workflow monitoring.
  • Agent registration workflow. Register discovered agents with complete GCP provider metadata through the governance interface.

Dashboard & Analytics

  • Workspace and project-level dashboards. Complement existing model-level analytics with date range selectors and custom filtering capabilities.
  • Custom metrics management. Create, read, update, and soft delete custom aggregations with workspace-scoped management and testing capabilities.
  • Policy dashboard integration. Analytics views scoped to individual policies with exportable data and visualization.
  • Chart library access. Browse and select from curated published charts when creating custom dashboards.

Data Connectors & Datasets

  • Azure Blob Storage connector. Full support for Azure Blob Storage containers as data sources with comprehensive authentication options.
  • Expanded connector support. New Databricks, ODBC, Snowflake, BigQuery, S3, and GCS connectors with native authentication and schema inference.
  • Multiple datasets per model. Associate models with multiple data sources and perform cross-dataset aggregation matching.
  • Static dataset support. Handle datasets without time columns for non-temporal data analysis.

Security & Access Control

  • OIDC authentication. Replace access key-based authentication across all deployment environments with modern identity provider integration.
  • IDP-managed group membership. Synchronize group access via identity provider token claims instead of manual provisioning.
  • Audit logging system. Comprehensive API request tracking with CloudWatch forwarding and 365-day retention.
  • User invitation workflow. Email-based account creation with multi-group assignment and pending invitation management.

Infrastructure & Deployment

  • High availability database clusters. Patroni-based PostgreSQL and TimescaleDB HA with automatic failover and streaming replication.
  • OpenShift deployment support. Standardized Helm charts with proper fsGroup configuration and audit log PVC support.
  • LTS build pipelines. Independent publishing workflows for long-term support versions including arthur-client PyPI and Helm charts.
  • CloudWatch monitoring dashboards. Track platform health metrics including response codes, latency, and running task counts.

Bug Fixes

  • Fixed user authentication queries running unnecessarily on every request, reducing database load.
  • Resolved timezone handling issues in upsolve exports that were breaking timestamp-based filtering.
  • Optimized job dequeue polling from 225ms to 0.008ms with partial indexing on active jobs.
  • Fixed header validation behavior in FastAPI to properly reject underscore headers by default.

Arthur Engine & Toolkit

Deployment & Infrastructure

  • Airgapped deployment support. Pre-cached tiktoken encodings eliminate external network calls during container initialization in network-restricted environments.
  • Non-root container execution. Docker configurations support running as non-root users across ECS, GCP, and Kubernetes deployments.
  • Unified model upload pipeline. Consolidated codebase supporting S3, filesystem/PVC, and GCS backends with consistent configuration.
  • Azure OpenAI provider support. Full LiteLLM integration with API key, endpoint URL, and optional API version configuration.

Onboarding & User Experience

  • Interactive onboarding CLI agent. Automates setup of observability, model configuration, and Python instrumentation with guided validation paths.
  • Arthur-onboard Claude Code skill. Streamlined engine installation and code instrumentation through guided prompts on macOS and Windows.
  • Multistep onboarding tour. Walks first-time users through essential platform features with contextual tooltips and progress tracking.

Observability & Instrumentation

  • Expanded framework instrumentors. Added support for agentminds, baml, and codex instrumentors as optional extras in the Arthur SDK.
  • Stateless validation endpoint. Run Arthur's built-in guardrail checks against arbitrary prompt/response pairs without persistent setup.
  • Improved PII detection accuracy. Reduced PERSON entity false-positive rate from 17.9% to 0% by filtering digit-containing detections.

Transform Management

  • Transform version history. Complete version tracking with author and timestamp information for each configuration change.
  • One-click restore capability. Revert transforms to any earlier version without data loss through a confirmation dialog.
  • Edit history panel. Read-only snapshot viewer for previewing previous transform configurations.

Multi-Tenancy & Access Control

  • Organization-scoped data isolation. Database-level tenant isolation with automatic migration of existing data to default organization.
  • API key organization scoping. Distinguish admin cross-org keys from tenant-scoped keys for proper access control.
  • Route-level enforcement. Comprehensive org scope filtering across 193 OpenAPI operations preventing cross-organization resource access.

Bug Fixes

  • Fixed subagent span hierarchy to correctly nest under parent AGENT spans in distributed traces.
  • Resolved race conditions in UserPromptSubmit that could overwrite trace context.
  • Restored missing definition field on TraceTransformResponse for backward compatibility.
  • Fixed model image bootstrap jobs to succeed on first run without latest-dev tag dependencies.

Arthur Engine & Toolkit

Evaluation & Continuous Monitoring

  • Unified Evaluators interface. Consolidated two-tab layout combines evaluator and continuous eval management with inline controls and staleness warnings for better workflow efficiency.
  • Bulk evaluation testing. Run continuous evaluations against multiple trace IDs simultaneously with dedicated test run history tracking.
  • Automated compliance scheduling. Models with compliance policies now receive automatic 24-hour periodic checks with independent on-demand testing.
  • Policy violation metrics. New policy_alert_rule_check_count metric tracks individual rule violations with detailed dimensions for historic trend analysis.
  • Direct trace evaluation. Select specific traces from the traces table and launch targeted evaluation test runs through intuitive picker dialogs.

Trace Management & Observability

  • Trace retention policies. Configure organization-wide automatic deletion of expired traces and spans with background batch processing and circuit breaker protection.
  • Enhanced trace filtering. Substring matching for user IDs and improved token counting accuracy for multi-entry API responses.
  • GenAI Engine task ID propagation. OTLP spans now include task ID attributes for better correlation in external observability platforms.
  • Improved agent instrumentation. Better visibility into agents, skills, and subagent context propagation across tool spawns.

User Experience & Interface

  • Interactive AI assistant. Engine Chatbot with intelligent query capabilities, automatic model provider detection, and natural language resource management commands.
  • Enhanced navigation. Infinite scroll on All Tasks page removes 50-task display limits and improves task browsing experience.
  • Tag-based prompt filtering. Platform Management displays tags on prompts with multi-select server-side filtering for large prompt libraries.
  • Form protection dialogs. Confirmation prompts prevent accidental loss of transform builder configurations and dataset mappings.

Developer Experience & SDK

  • Configurable deployment options. Support for multiple genai-engine stacks on single machines with customizable ports and CORS policies.
  • Graceful model compatibility. API calls succeed with unknown model pricing by defaulting to $0.00 instead of throwing exceptions.
  • Apple Silicon support. Resolved MPS device compatibility issues for SentenceTransformer inference on Apple Silicon Macs.

Security & Infrastructure

  • Enhanced dependency security. Locked LiteLLM to version 1.80.0 with 3-day minimum release age for automated upgrades.
  • Standards-compliant HTTP. Chunked transfer encoding by default with proper header handling for 1xx and 204 status codes per RFC specifications.
  • Helm chart distribution. Enabled publishing to Docker Hub for open-source self-hosted deployments.

Bug Fixes

  • Fixed continuous evaluation results pagination displaying only single pages instead of full record counts.
  • Resolved system task bootstrap race conditions in multi-worker environments.
  • Corrected URL parameter handling with proper FastAPI path validation to prevent silent parameter swapping.
  • Fixed policy alert rule check metric to report accurate violation counts instead of always showing 1.0.

February 2026 Release Notes

by Pranav Shikarpur

Platform Updates

Advanced Analytics Enhancements

  • Introduced flexible analytics capabilities to slice and filter AI performance metrics across models, agents, datasets, and time intervals
  • Added new Agent Span Count metric, enabling aggregation on agent level spans similar to existing tool span count metrics
  • Improved persona specific visibility for Product, Engineering, Compliance, and Leadership stakeholders
  • Enhanced metric exploration to support faster root cause analysis and performance monitoring

Navigation & Workflow Optimization

  • Refined global navigation structure to improve discoverability of core workflows
  • Maintained RBAC enforcement across updated navigation experiences
  • Reduced friction between experimentation, tracing, analytics, and governance workflows

Trace Experience Upgrades

  • Launched new Trace Overview dashboard with aggregated KPIs
  • Added Trace KPI summary card for faster performance visibility
  • Released redesigned Trace Viewer for improved debugging workflows
  • Enhanced trace table with span status badges and token counts
  • Surfaced cost metrics directly within trace workflows
  • Introduced enhanced filtering mechanisms for:
    • Trace Viewer
    • CE management and results pages
  • Introduced platform wide Dark Mode

Agent Discovery & Governance Enhancements

  • Expanded Agent Discovery foundations with structured metadata support
  • Added annotation analytics for improved agent oversight
  • Strengthened governance workflows for increased transparency across teams

Arthur Engine & Toolkit

Agent Experiments & RAG Workflows

  • Improved Agent Experiments UI for clearer experiment management
  • Added reproducible session IDs for deterministic evaluation
  • Enabled dataset overwrite support
  • Introduced bulk editing capabilities
  • Strengthened JSON validation for structured outputs
  • Enhanced RAG notebooks and retrieval experiment workflows

Real Time Trace Ingestion Enhancements

  • Introduced Agent Polling Mechanism to continuously poll GCP Cloud Run traces
  • Enabled automatic population of Cloud Run traces directly into the Engine
  • Reduced manual ingestion overhead for cloud native agent deployments

Expanded Model Provider Support

  • Added support for Google Vertex AI
  • Added support for AWS Bedrock
  • Added support for vLLM
  • Improved provider handling logic
  • Implemented fixes and enhancements for Gemini integrations

Synthetic Data Generation

  • Introduced built in synthetic dataset generators to support safe experimentation and evaluation workflows
  • Added dataset generators for:
    • Binary classification: card fraud detection
    • Binary classification: credit application approval
    • Regression: loan amount prediction
    • Regression: housing price prediction

Deployment & Infrastructure Enhancements

  • Added GCP model upload workflows with CI/CD integration
  • Introduced OpenShift compatibility
  • Enabled airgapped model loading support
  • Improved model management controls and deployment flexibility

Data & Connector Improvements

  • Expanded bucket based connectors with CSV file support
  • Improved parquet handling performance
  • Added Databricks integration
  • Enhanced transform scalability
  • Improved evaluation mapping workflows

Security, Stability & Performance

  • Applied security patches and dependency upgrades
  • Improved evaluation stability
  • Enhanced reliability across experiments and trace workflows
  • Delivered UX refinements to improve overall platform responsiveness and consistency

January 2026 Release Notes

by Pranav Shikarpur

Whether you're shipping your first agent or scaling an entire AI ecosystem, this release gives you even more tools to go from prototype to production — with confidence and control.

  • [New] Agent Development Toolkit: An end-to-end toolkit for building, debugging, evaluating, and shipping AI agents—designed to move seamlessly from prototype to production.
    • Getting Started & Observability
    • Configure your model providers with full control over sourcing and access
    • Create and manage tasks that mirror real-world agent behavior
    • Capture OpenTelemetry-based traces across agent runs
    • Inspect executions in the Trace Viewer, including step-by-step agent actions
    • Search and filter traces to quickly identify errors, failures, and regressions
    • View sessions and chat threads, with deep linking from external applications
    • Track token usage and cost by agent, user, session, or conversation
  • Advanced Agent & RAG Workflows
    • Configure connections to Weaviate vector stores
    • Run RAG notebooks and RAG experiments with supervised evals
    • Execute end-to-end agent experiments and notebooks with evaluation built in
  • Prompt-Centric Workflows
    • Manage prompts with versioning, tagging, promotion, and audit history with full traceability
    • Quickly test ideas with the prompt playground
    • Run structured comparisons with prompt experiments
    • Iterate collaboratively with interactive prompt notebooks
    • Promote prompts into production with a single step
    • Manage prompts with versioning, tagging, promotion, and audit history
    • Run completions through the Arthur Engine using streaming and batch APIs
    • Compare prompt changes using Prompt Experiments for regression testing and bulk assessment
  • Unified Evaluation for Online + Offline
    • Run online evals continuously on live traces in production
    • Upload datasets for offline evaluation before deployment
    • Seamlessly explore evaluation results in Trace Viewer and dashboards
    • Add and manage datasets directly in-platform
    • Collect traces directly into datasets for test case generation
    • Create and manage custom evaluators for supervised and automated testing
    • Provide human feedback on traces to enrich evaluation signals
    • Explore eval results seamlessly in Trace Viewer and dashboards
  • Arthur x Google Cloud
  • Arthur Engine OSS Enhancements
    • Model Source Control:Configure GenAI models to be pulled from secure, customer-managed repositories instead of public sources like Hugging Face.
    • Advanced Metric Segmentation: Segment metrics by user ID, conversation ID, and more for deeper analysis.
    • Improved ODBC Connector Support: Better database view handling, more reliable primary key detection, and configurable connection/login timeouts.
    • Bootstrapping Reliability: Improved performance and resilience for GenAI model setup and execution.

December 2025 Release Notes

by Pranav Shikarpur

Arthur Platform

  • Arthur is solving the agent visibility gap with the launch of the industry’s first Agentic Discovery & Governance (ADG) Platform. Arthur’s ADG platform was built to turn agent chaos into a structured, scalable operation.
  • New agentic features are arriving in January 2026 that will provide powerful tools for testing, tracing, and deploying agent based workflows.

Arthur Evals Engine OSS

New Features

  • Test & Preview Custom Metrics Before Saving: Users can now validate their custom metrics directly within the creation and editing workflow.
  • Users can run the metric against available datasets to preview results and confirm the logic behaves as expected before saving.

Bug Fixes

  • Sketch metrics can now be created and calculated without specifying any dimension columns. Frontend No Longer Overwrites User-Defined Metadata for Reported Metrics.

Community

Venture Backed Startup?

  • Join Arthur’s Start Up Partner Program: If you’re building a venture-backed startup that uses AI Agents and are trying to figure out how to reliably ship them to production, this program is perfect for you. Apply Today

November 2025 Release Notes

by Pranav Shikarpur

Arthur Platform

  • Improved support for custom metrics: You can now test a custom metric before creating it
  • New agentic features are arriving in January 2026 that will provide powerful tools for testing, tracing, and deploying agent based workflows.
    • Interested in trying it first? Email [email protected] to join the early access group. Arthur Evals Engine OSS

Enhancements

  • Enhancements to PII detection model to improve date/time identification.
  • Docker configuration has been updated to use Postgres version 15, ensuring compatibility & preventing initialization errors during new engine setup.

Bug Fixes

  • Fixed an issue where some metrics were missing from the selection list for custom datasets.
  • Increase ML engine aggregation timeout to support segmentation of larger & more complex datasets.

Community

  • Venture Backed Startup?
    • Join Arthur’s Start Up Partner Program: If you’re building a venture-backed startup that uses AI Agents and are trying to figure out how to reliably ship them to production, this program is perfect for you. Apply Today

October 2025 Release Notes

by Pranav Shikarpur

New from Arthur

  • Start Up Partner Program: If you’re building a venture-backed startup that uses AI Agents and are trying to figure out how to reliably ship them to production, this program is perfect for you. Apply Today

New Platform Features:

  • Custom Metrics: You can now define and manage custom metrics using SQL. Custom metrics can be reused across models and projects, and integrate seamlessly with dashboards, alerts, and queries in the Arthur platform. Versioning ensures you can update metric logic while preserving historical data accuracy. Learn more
  • Snowflake Connector: Added support for selecting Snowflake as a data source in the connector workflow.
  • Agent Trace Viewer: Improved filters — users can now filter by metric evaluation results, span type, and more.

Engine Enhancements:

  • Added support for creating custom metrics on data with nested columns.
  • GenAI Engine now runs as a non-root user.
  • Updated telemetry ORM models, update migrations to enforce non-null timestamps.
  • Improved pagination handling for MSSQL.
  • Added status_code and session_id to spans.

September 2025 Release Notes

by Pranav Shikarpur

New Platform Features

  • Custom Metrics (Evals)
  • Dashboard Versioning feature for easier roll backs
  • New Workspace home

Engine Release Notes

Enhancements

  • Span Query Improvements:
    • New GET endpoint /v1/spans/query: allows filtering spans by type.
    • Added support for span name column: improves query flexibility and performance.
    • Optimized span queries: added indexes to frequently queried columns.
    • Improved ingestion stability: fixed batch ingestion when root spans are present.
  • Improved developer experience by unifying our API schema and client libraries across the GenAI & Ml
  • Engines as well as the Arthur platform.
  • ML engine is run as non-root user.
  • Pushed ML Engine Artifacts to Nexus.

August 2025 Release Notes

by Pranav Shikarpur

New Platform Features

  • Sneek Peak: Support for Agentic AI is now available in the Arthur Platform

Engine Release Notes

New Features

  • Agentic monitoring is now supported in the GenAI Engine: Building on the recently added /traces/ API, this release introduces support for monitoring agentic behavior:
    • Tasks now include an is_agentic flag to enable targeted analysis and evaluation.
    • Metrics and traces APIs have been upgraded to support structured outputs, trace reconstruction, and intelligent defaults.
    • The engine selectively computes metrics for agentic tasks, improving the precision of evaluations.
  • Added support for new Database connector: We’ve introduced a new ODBC-based Database connector with support for MSSQL, PostgreSQL, Oracle, and MySQL. This includes enhanced configuration options (e.g., table name, dialect) and standardized field naming for easier integration and future extensibility.

Enhancements/Bug Fixes

  • Added CloudFormation launch button with pre-populated client ID

  • Addressed API key validation latencies for users with large numbers of API keys.

  • Converted hallucination LLM call to structured output to improve accuracy

  • Added possible_segmentation tag to improve model segmentation diagnostics.

  • Addressed a bug related to incorrect function renaming after a refactor.

  • Guardrails Enterprise

    • Vulnerability Fix:Patched pillow vulnerability CVE-2025-48379
    • Enhancement:Introduced a feature flag for the PII rule that enables the administrators to toggle between the standard and the "strict" mode
    • Bug fix:Fixed the email address PII detection issue that was not catching addresses with a certain format

July 2025 Release Notes

by Pranav Shikarpur

New Features:

  • Added support for Multimodal CV evals with metrics + visualizing inferences in the Arthur Platform.
  • Users can now optionally configure attributes to segment over when defining metrics.
  • Engine Installation flow now supports non docker installation methods.
  • Support for consuming OTEL traces emitted from LLM + Agentic Applications.
  • Support for segmenting metrics on values (inc. model version id, prompt version id, etc.)
  • Support for experiment tracking in the Arthur Dashboard. Now offers the ability to segment metrics (eg: by prompt-version, model-version).
  • New navigation bar to improve usability and discoverability of platform functionality.
  • Support for non-docker installation methods for the Arthur Engine.

Enhancements:

  • Made significant performance improvements to the PII detection model, resulting in fewer false positives.
  • The inference deep dive table now returns up to 50 rows per page.
  • Improved hallucination detection for numbered lists and other structured formats.
  • Introduced configurable max-token limit for hallucination checks, helping users fine-tune thresholds for context.
  • Metrics task ID now exposed in the GenAI model UX
  • Filters UI fixes for Inference Deep Dive.