Releases: arthur-ai/arthur-engine
2025 August_A (2.1.71)
New Features:
- Agentic monitoring is now supported in the GenAI Engine: Building on the recently added /traces/ API, this release introduces support for monitoring agentic behavior:
- Tasks now include an is_agentic flag to enable targeted analysis and evaluation.
- Metrics and traces APIs have been upgraded to support structured outputs, trace reconstruction, and intelligent defaults.
- The engine selectively computes metrics for agentic tasks, improving the precision of evaluations.
- Added support for new Database connector: We’ve introduced a new ODBC-based Database connector with support for MSSQL, PostgreSQL, Oracle, and MySQL. This includes enhanced configuration options (e.g., table name, dialect) and standardized field naming for easier integration and future extensibility.
Enhancements/Bug Fixes
- We’ve made several updates to reduce latency, support longer inputs, and increase performance in our toxicity model:
- Introduced a new chunking strategy for the harmful request classifier.
- Increased the toxicity classifier’s token limit from 1,200 to 8,000 tokens.
- Upgraded our profanity model resulting in increased performance for profanity detection.
- Improved PII Detection Accuracy with New Post-processing Enhancements:
- Cleaner Entity Recognition: Common non-name terms like “me,” “you,” and “doctor” are now filtered out to reduce false positives in name detection.
- Smarter Crypto Wallet & Bank Account Validation: Wallet and bank account entities are now validated for structure, expected length, and presence of unexpected characters or common words like “wallet.”
- Enhanced URL Detection: URLs are now correctly identified even when missing standard prefixes like
http://
.
- Improved the prompt injection classifier model, boosting its accuracy and efficiency. The updated model provides enhanced detection capabilities for identifying potential prompt injection attempts. It now prioritizes precision over recall, effectively reducing false positives where legitimate user inputs are mistakenly flagged, while maintaining robust security against genuine attacks.
- Added CloudFormation launch button with pre-populated client ID.
- Addressed API key validation latencies for users with large numbers of API keys.
- Converted hallucination LLM call to structured output to improve accuracy.
- Added possible_segmentation tag to improve model segmentation diagnostics.
- Addressed a bug related to incorrect function renaming after a refactor.
- Addressed a bug that caused incorrect redirection for organization members.
- Resolved an issue where links were not functioning correctly for keyword rules.
- The display of the 'last updated' timestamp has been corrected.
- Added more specific error messages in metrics creation workflows
- In chat playground, users can now input the URL of an OpenAI compatible endpoint.
- "Permission Type" now renders correctly in the UI.
2025 June_A (2.1.46)
New Features:
- Added image support for metrics + visualizing inferences in the Arthur Platform.
- Users can now optionally configure attributes to segment over when defining metrics.
Enhancements:
-
Improved hallucination detection for numbered lists and other structured formats.
-
Introduced configurable max-token limit for hallucination checks, helping users fine-tune thresholds for context.
2025 May_C (2.1.44)
New Features
- Added a '/traces/' API to support ingesting Open-Telemetry traces that meet the OpenInference (https://github.com/arize-ai/openinference/) specification. This feature is in preparation for adding agentic evaluations - more details coming soon
Enhancements:
- Added Docker Compose health checks to improve service startup reliability.
- Introduced a single script to install both the GenAI and ML engines.
- Initialized the Arthur Common module with CI, linting, and unit testing.
- Deprecated default validation endpoints to simplify integration and encourage explicit configuration.
- Made OpenAI-related configs optional for easier local development.
- Increased the GenAI engine startup timeout for stability.
Bug fix:
- Patched a critical vulnerability in the
h11
library.
2025 May_B (2.1.40)
Enhancements:
- Patched a PyTorch vulnerability
- Configured Renovate on the Arthur Engine GitHub repository for automated dependency updates
- The
FETCH_RAW_DATA_ENABLED
configuration now exposed on the Helm Chart - Docker Compose always pulls the container images for the
latest
tag users - Postgres now uses a volume to persist data in Docker Compose
Bug fix:
- The ml-engine was not able to communicate with the genai-engine in the arthur-engine Docker Compose deployment. All services are now on the default network and able to communicate.
2025 May_A (2.1.39)
Deprecation:
- Deprecated the endpoints that validate prompt and response on default rules without any task association
Enhancement:
- Reduced the number of configurations exposed for the first deploy experience with Docker Compose
2025 April_B (2.1.37)
Enhancements:
- Open sourced the Arthur Engine full deployment scripts, comprised of both the
genai-engine
and theml-engine
components! You now have access to see how theml-engine
is deployed on Docker Compose, AWS ECS, and Kubernetes. All deployment scripts can now be found in the/deployment
folder. - The GenAI Engine server can now start with no LLM service connected. This allows users without access to a LLS service to still use the non-LLM based evaluations.
- Improved the configuration parser for the LLM service connection string.
- Various enhancements have been made to the Docker Compose scripts.
2025 April_A (2.1.23)
Enhancements:
- Optimized the profanity detection function in the toxicity rule to improve latency for inferences with a large number of consecutive repeating characters.
- Increased the overall concurrency of GPU deployments by using 5 Gunicorn workers by default and ensuring that the models load without encountering any race condition issues.
- Improved quick deployment by adding start scripts for Docker Compose, Helm Chart, and AWS CloudFormation.
Bug fix:
- Disabled rules can be now archived. Previously, the archive endpoint was responding that deactivated rules were not associated with any task.
2025 February_A (2.1.18)
We are thrilled to announce the very first release of the Arthur Engine, now available as an open source project!
The Arthur Engine is a tool designed for evaluating and benchmarking machine learning models and enforcing guardrails in your LLM applications and generative AI workflows.
This initial release debuts the GenAI Engine submodule and its capability to add guardrails to your LLM applications and generative AI workflows.
We value your feedback and contributions. Whether you encounter issues, have suggestions, or want to contribute to the project, please feel free to reach out via GitHub Issues or join our community discussions on Discord.