April 2026 Release Notes

Arthur Engine & Toolkit

Unified Evaluators interface. Consolidated two-tab layout combines evaluator and continuous eval management with inline controls and staleness warnings for better workflow efficiency.
Bulk evaluation testing. Run continuous evaluations against multiple trace IDs simultaneously with dedicated test run history tracking.
Automated compliance scheduling. Models with compliance policies now receive automatic 24-hour periodic checks with independent on-demand testing.
Policy violation metrics. New policy_alert_rule_check_count metric tracks individual rule violations with detailed dimensions for historic trend analysis.
Direct trace evaluation. Select specific traces from the traces table and launch targeted evaluation test runs through intuitive picker dialogs.

Trace retention policies. Configure organization-wide automatic deletion of expired traces and spans with background batch processing and circuit breaker protection.
Enhanced trace filtering. Substring matching for user IDs and improved token counting accuracy for multi-entry API responses.
GenAI Engine task ID propagation. OTLP spans now include task ID attributes for better correlation in external observability platforms.
Improved agent instrumentation. Better visibility into agents, skills, and subagent context propagation across tool spawns.

Interactive AI assistant. Engine Chatbot with intelligent query capabilities, automatic model provider detection, and natural language resource management commands.
Enhanced navigation. Infinite scroll on All Tasks page removes 50-task display limits and improves task browsing experience.
Tag-based prompt filtering. Platform Management displays tags on prompts with multi-select server-side filtering for large prompt libraries.
Form protection dialogs. Confirmation prompts prevent accidental loss of transform builder configurations and dataset mappings.

Configurable deployment options. Support for multiple genai-engine stacks on single machines with customizable ports and CORS policies.
Graceful model compatibility. API calls succeed with unknown model pricing by defaulting to $0.00 instead of throwing exceptions.
Apple Silicon support. Resolved MPS device compatibility issues for SentenceTransformer inference on Apple Silicon Macs.

Enhanced dependency security. Locked LiteLLM to version 1.80.0 with 3-day minimum release age for automated upgrades.
Standards-compliant HTTP. Chunked transfer encoding by default with proper header handling for 1xx and 204 status codes per RFC specifications.
Helm chart distribution. Enabled publishing to Docker Hub for open-source self-hosted deployments.

Fixed continuous evaluation results pagination displaying only single pages instead of full record counts.
Resolved system task bootstrap race conditions in multi-worker environments.
Corrected URL parameter handling with proper FastAPI path validation to prevent silent parameter swapping.
Fixed policy alert rule check metric to report accurate violation counts instead of always showing 1.0.