January 2026 Release Notes
5 days ago by Pranav Shikarpur
Whether you're shipping your first agent or scaling an entire AI ecosystem, this release gives you even more tools to go from prototype to production — with confidence and control.
- [New] Agent Development Toolkit: An end-to-end toolkit for building, debugging, evaluating, and shipping AI agents—designed to move seamlessly from prototype to production.
- Getting Started & Observability
- Configure your model providers with full control over sourcing and access
- Create and manage tasks that mirror real-world agent behavior
- Capture OpenTelemetry-based traces across agent runs
- Inspect executions in the Trace Viewer, including step-by-step agent actions
- Search and filter traces to quickly identify errors, failures, and regressions
- View sessions and chat threads, with deep linking from external applications
- Track token usage and cost by agent, user, session, or conversation
- Advanced Agent & RAG Workflows
- Configure connections to Weaviate vector stores
- Run RAG notebooks and RAG experiments with supervised evals
- Execute end-to-end agent experiments and notebooks with evaluation built in
- Prompt-Centric Workflows
- Manage prompts with versioning, tagging, promotion, and audit history with full traceability
- Quickly test ideas with the prompt playground
- Run structured comparisons with prompt experiments
- Iterate collaboratively with interactive prompt notebooks
- Promote prompts into production with a single step
- Manage prompts with versioning, tagging, promotion, and audit history
- Run completions through the Arthur Engine using streaming and batch APIs
- Compare prompt changes using Prompt Experiments for regression testing and bulk assessment
- Unified Evaluation for Online + Offline
- Run online evals continuously on live traces in production
- Upload datasets for offline evaluation before deployment
- Seamlessly explore evaluation results in Trace Viewer and dashboards
- Add and manage datasets directly in-platform
- Collect traces directly into datasets for test case generation
- Create and manage custom evaluators for supervised and automated testing
- Provide human feedback on traces to enrich evaluation signals
- Explore eval results seamlessly in Trace Viewer and dashboards
- Arthur x Google Cloud
- Arthur’s ADG platform is now live on the Google Cloud Marketplace. , making it easier than ever to discover, govern, monitor, and evaluate AI systems — all within your existing GCP environment.
- Arthur Engine OSS Enhancements
- Model Source Control:Configure GenAI models to be pulled from secure, customer-managed repositories instead of public sources like Hugging Face.
- Advanced Metric Segmentation: Segment metrics by user ID, conversation ID, and more for deeper analysis.
- Improved ODBC Connector Support: Better database view handling, more reliable primary key detection, and configurable connection/login timeouts.
- Bootstrapping Reliability: Improved performance and resilience for GenAI model setup and execution.