July 2025 Release Notes

by Pranav Shikarpur

New Features:

  • Added support for Multimodal CV evals with metrics + visualizing inferences in the Arthur Platform.
  • Users can now optionally configure attributes to segment over when defining metrics.
  • Engine Installation flow now supports non docker installation methods.
  • Support for consuming OTEL traces emitted from LLM + Agentic Applications.
  • Support for segmenting metrics on values (inc. model version id, prompt version id, etc.)
  • Support for experiment tracking in the Arthur Dashboard. Now offers the ability to segment metrics (eg: by prompt-version, model-version).
  • New navigation bar to improve usability and discoverability of platform functionality.
  • Support for non-docker installation methods for the Arthur Engine.

Enhancements:

  • Made significant performance improvements to the PII detection model, resulting in fewer false positives.
  • The inference deep dive table now returns up to 50 rows per page.
  • Improved hallucination detection for numbered lists and other structured formats.
  • Introduced configurable max-token limit for hallucination checks, helping users fine-tune thresholds for context.
  • Metrics task ID now exposed in the GenAI model UX
  • Filters UI fixes for Inference Deep Dive.