Releases · arthur-ai/arthur-engine

28 Aug 17:46

madeleinelane

2.1.71

eca7ec0

2025 August_A (2.1.71) Latest

Latest

New Features:

Agentic monitoring is now supported in the GenAI Engine: Building on the recently added /traces/ API, this release introduces support for monitoring agentic behavior:
- Tasks now include an is_agentic flag to enable targeted analysis and evaluation.
- Metrics and traces APIs have been upgraded to support structured outputs, trace reconstruction, and intelligent defaults.
- The engine selectively computes metrics for agentic tasks, improving the precision of evaluations.
Added support for new Database connector: We’ve introduced a new ODBC-based Database connector with support for MSSQL, PostgreSQL, Oracle, and MySQL. This includes enhanced configuration options (e.g., table name, dialect) and standardized field naming for easier integration and future extensibility.

Enhancements/Bug Fixes

We’ve made several updates to reduce latency, support longer inputs, and increase performance in our toxicity model:
- Introduced a new chunking strategy for the harmful request classifier.
- Increased the toxicity classifier’s token limit from 1,200 to 8,000 tokens.
- Upgraded our profanity model resulting in increased performance for profanity detection.
Improved PII Detection Accuracy with New Post-processing Enhancements:
- Cleaner Entity Recognition: Common non-name terms like “me,” “you,” and “doctor” are now filtered out to reduce false positives in name detection.
- Smarter Crypto Wallet & Bank Account Validation: Wallet and bank account entities are now validated for structure, expected length, and presence of unexpected characters or common words like “wallet.”
- Enhanced URL Detection: URLs are now correctly identified even when missing standard prefixes like http://.
Improved the prompt injection classifier model, boosting its accuracy and efficiency. The updated model provides enhanced detection capabilities for identifying potential prompt injection attempts. It now prioritizes precision over recall, effectively reducing false positives where legitimate user inputs are mistakenly flagged, while maintaining robust security against genuine attacks.
Added CloudFormation launch button with pre-populated client ID.
Addressed API key validation latencies for users with large numbers of API keys.
Converted hallucination LLM call to structured output to improve accuracy.
Added possible_segmentation tag to improve model segmentation diagnostics.
Addressed a bug related to incorrect function renaming after a refactor.
Addressed a bug that caused incorrect redirection for organization members.
Resolved an issue where links were not functioning correctly for keyword rules.
The display of the 'last updated' timestamp has been corrected.
Added more specific error messages in metrics creation workflows
In chat playground, users can now input the URL of an OpenAI compatible endpoint.
"Permission Type" now renders correctly in the UI.

Assets 2

26 Jun 14:00

madeleinelane

2.1.46

4b6bc7f

2025 June_A (2.1.46)

New Features:

Added image support for metrics + visualizing inferences in the Arthur Platform.
Users can now optionally configure attributes to segment over when defining metrics.

Enhancements:

Improved hallucination detection for numbered lists and other structured formats.
Introduced configurable max-token limit for hallucination checks, helping users fine-tune thresholds for context.

Assets 2

23 May 20:24

madeleinelane

2.1.44

742b027

2025 May_C (2.1.44)

New Features

Added a '/traces/' API to support ingesting Open-Telemetry traces that meet the OpenInference (https://github.com/arize-ai/openinference/) specification. This feature is in preparation for adding agentic evaluations - more details coming soon

Enhancements:

Added Docker Compose health checks to improve service startup reliability.
Introduced a single script to install both the GenAI and ML engines.
Initialized the Arthur Common module with CI, linting, and unit testing.
Deprecated default validation endpoints to simplify integration and encourage explicit configuration.
Made OpenAI-related configs optional for easier local development.
Increased the GenAI engine startup timeout for stability.

Bug fix:

Patched a critical vulnerability in the h11 library.

Assets 2

08 May 07:18

ntatsumi

2.1.40

a2ed623

2025 May_B (2.1.40)

Enhancements:

Patched a PyTorch vulnerability
Configured Renovate on the Arthur Engine GitHub repository for automated dependency updates
The FETCH_RAW_DATA_ENABLED configuration now exposed on the Helm Chart
Docker Compose always pulls the container images for the latest tag users
Postgres now uses a volume to persist data in Docker Compose

Bug fix:

The ml-engine was not able to communicate with the genai-engine in the arthur-engine Docker Compose deployment. All services are now on the default network and able to communicate.

Assets 2

02 May 04:56

ntatsumi

2.1.39

736873c

2025 May_A (2.1.39)

Deprecation:

Deprecated the endpoints that validate prompt and response on default rules without any task association

Enhancement:

Reduced the number of configurations exposed for the first deploy experience with Docker Compose

Assets 2

29 Apr 21:11

ntatsumi

2.1.37

e4d440c

2025 April_B (2.1.37)

Enhancements:

Open sourced the Arthur Engine full deployment scripts, comprised of both the genai-engine and the ml-engine components! You now have access to see how the ml-engine is deployed on Docker Compose, AWS ECS, and Kubernetes. All deployment scripts can now be found in the /deployment folder.
The GenAI Engine server can now start with no LLM service connected. This allows users without access to a LLS service to still use the non-LLM based evaluations.
Improved the configuration parser for the LLM service connection string.
Various enhancements have been made to the Docker Compose scripts.

Assets 2

16 Apr 13:57

ntatsumi

2.1.23

9039b62

2025 April_A (2.1.23)

Enhancements:

Optimized the profanity detection function in the toxicity rule to improve latency for inferences with a large number of consecutive repeating characters.
Increased the overall concurrency of GPU deployments by using 5 Gunicorn workers by default and ensuring that the models load without encountering any race condition issues.
Improved quick deployment by adding start scripts for Docker Compose, Helm Chart, and AWS CloudFormation.

Bug fix:

Disabled rules can be now archived. Previously, the archive endpoint was responding that deactivated rules were not associated with any task.

Assets 2

31 Mar 04:19

ntatsumi

2.1.18

6bf8076

2025 February_A (2.1.18)

We are thrilled to announce the very first release of the Arthur Engine, now available as an open source project!

The Arthur Engine is a tool designed for evaluating and benchmarking machine learning models and enforcing guardrails in your LLM applications and generative AI workflows.

This initial release debuts the GenAI Engine submodule and its capability to add guardrails to your LLM applications and generative AI workflows.

We value your feedback and contributions. Whether you encounter issues, have suggestions, or want to contribute to the project, please feel free to reach out via GitHub Issues or join our community discussions on Discord.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: arthur-ai/arthur-engine

2025 August_A (2.1.71)

Uh oh!

2025 June_A (2.1.46)

Uh oh!

2025 May_C (2.1.44)

Uh oh!

2025 May_B (2.1.40)

Uh oh!

2025 May_A (2.1.39)

Uh oh!

2025 April_B (2.1.37)

Uh oh!

2025 April_A (2.1.23)

Uh oh!

2025 February_A (2.1.18)

Uh oh!