October 2025 Release Notes

by Pranav Shikarpur

New from Arthur

  • Start Up Partner Program: If you’re building a venture-backed startup that uses AI Agents and are trying to figure out how to reliably ship them to production, this program is perfect for you. Apply Today

New Platform Features:

  • Custom Metrics: You can now define and manage custom metrics using SQL. Custom metrics can be reused across models and projects, and integrate seamlessly with dashboards, alerts, and queries in the Arthur platform. Versioning ensures you can update metric logic while preserving historical data accuracy. Learn more
  • Snowflake Connector: Added support for selecting Snowflake as a data source in the connector workflow.
  • Agent Trace Viewer: Improved filters — users can now filter by metric evaluation results, span type, and more.

Engine Enhancements:

  • Added support for creating custom metrics on data with nested columns.
  • GenAI Engine now runs as a non-root user.
  • Updated telemetry ORM models, update migrations to enforce non-null timestamps.
  • Improved pagination handling for MSSQL.
  • Added status_code and session_id to spans.

September 2025 Release Notes

by Pranav Shikarpur

New Platform Features

  • Custom Metrics (Evals)
  • Dashboard Versioning feature for easier roll backs
  • New Workspace home

Engine Release Notes

Enhancements

  • Span Query Improvements:
    • New GET endpoint /v1/spans/query: allows filtering spans by type.
    • Added support for span name column: improves query flexibility and performance.
    • Optimized span queries: added indexes to frequently queried columns.
    • Improved ingestion stability: fixed batch ingestion when root spans are present.
  • Improved developer experience by unifying our API schema and client libraries across the GenAI & Ml
  • Engines as well as the Arthur platform.
  • ML engine is run as non-root user.
  • Pushed ML Engine Artifacts to Nexus.

August 2025 Release Notes

by Pranav Shikarpur

New Platform Features

  • Sneek Peak: Support for Agentic AI is now available in the Arthur Platform

Engine Release Notes

New Features

  • Agentic monitoring is now supported in the GenAI Engine: Building on the recently added /traces/ API, this release introduces support for monitoring agentic behavior:
    • Tasks now include an is_agentic flag to enable targeted analysis and evaluation.
    • Metrics and traces APIs have been upgraded to support structured outputs, trace reconstruction, and intelligent defaults.
    • The engine selectively computes metrics for agentic tasks, improving the precision of evaluations.
  • Added support for new Database connector: We’ve introduced a new ODBC-based Database connector with support for MSSQL, PostgreSQL, Oracle, and MySQL. This includes enhanced configuration options (e.g., table name, dialect) and standardized field naming for easier integration and future extensibility.

Enhancements/Bug Fixes

  • Added CloudFormation launch button with pre-populated client ID

  • Addressed API key validation latencies for users with large numbers of API keys.

  • Converted hallucination LLM call to structured output to improve accuracy

  • Added possible_segmentation tag to improve model segmentation diagnostics.

  • Addressed a bug related to incorrect function renaming after a refactor.

  • Guardrails Enterprise

    • Vulnerability Fix:Patched pillow vulnerability CVE-2025-48379
    • Enhancement:Introduced a feature flag for the PII rule that enables the administrators to toggle between the standard and the "strict" mode
    • Bug fix:Fixed the email address PII detection issue that was not catching addresses with a certain format

July 2025 Release Notes

by Pranav Shikarpur

New Features:

  • Added support for Multimodal CV evals with metrics + visualizing inferences in the Arthur Platform.
  • Users can now optionally configure attributes to segment over when defining metrics.
  • Engine Installation flow now supports non docker installation methods.
  • Support for consuming OTEL traces emitted from LLM + Agentic Applications.
  • Support for segmenting metrics on values (inc. model version id, prompt version id, etc.)
  • Support for experiment tracking in the Arthur Dashboard. Now offers the ability to segment metrics (eg: by prompt-version, model-version).
  • New navigation bar to improve usability and discoverability of platform functionality.
  • Support for non-docker installation methods for the Arthur Engine.

Enhancements:

  • Made significant performance improvements to the PII detection model, resulting in fewer false positives.
  • The inference deep dive table now returns up to 50 rows per page.
  • Improved hallucination detection for numbered lists and other structured formats.
  • Introduced configurable max-token limit for hallucination checks, helping users fine-tune thresholds for context.
  • Metrics task ID now exposed in the GenAI model UX
  • Filters UI fixes for Inference Deep Dive.