Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency

Alluxio Confidential
AI 3.7 Launch
Universal S3 + POSIX cache for end-to-end AI workloads.
AI at Full Throughput
and Ultra Low Latency
Jingwen Ouyang
Senior Product Manager

AI Data Life Cycle
Data
Collection
Data
Preprocessing
Model
Training
Model
Veriﬁcation
Model
Loading
Inference
Data
Archiving
Data is everywhere in every stage of the journey.
Data needs to be accessed fast, friction free, and low cost.

Alluxio makes it easy to share and
manage data from
any storage
to any compute engine
in any environment
with high performance and low cost.

Alluxio as the Universal Cache
for S3 and POSIX Workloads
AI 3.7 Highlight

Alluxio as the Universal Cache for S3 and POSIX Workloads
THROUGHPUT LATENCY
Simplified example: application throughput =100 MB/s, T_setup=200 ms
● Large dataset: 1 GB → ~10.2 s (throughput dominates)
● Small dataset: 128 KB → ~0.201 s (latency dominates).
Training & rollout love throughput; inference loves latency!
T_total ≈ T_setup (≈ TTFB) + (data_size / application throughput)

THROUGHPUT
Alluxio has always been a leader in high throughput.
Enables customers to rapidly load massive quantities
of data into GPU memory for AI training and model
deployment/cold starts.
NEW in AI 3.7… Alluxio also delivers
Ultra Low Latency Caching for data
stored on cloud storage (e.g. AWS S3).
LATENCY

● A Single Alluxio Worker achieved a high throughput comparing to HPC Storage Solutions:
○ Up to 81.6 Gbps (or 9.5 GiB/s) w/ 100 Gbps network - 2.5 GiB/s(@1 thread) to 9.5 GiB/s(@32 threads)
○ Up to 352.2 Gbps (or 41 GiB/s) w/ 400 Gbps network
Setup
● Alluxio:
1 Alluxio worker (i3en.metal)
● FIO Benchmark:
Sequential Read
bs = 256KB
Note: an Alluxio fuse client (c5n.metal) co-located
with training servers is responsible for POSIX API
access to Alluxio Workers which actually cache the
data
Throughput Microbenchmark: Reads from A Single Worker

Alluxio is the industry-leading
sub-ms time to ﬁrst byte (TTFB) solution on S3-class storage
How much better is Alluxio? (Details next slide)
➔ 45x Lower Latency than S3 Standard
➔ 5x Lower Latency than S3 Express One Zone
➔ Unlimited, linear scalability

Test environment references
Alluxio EE
● Version/Spec: Alluxio Enterprise AI 3.6 (50TB
cache)
● Test env: 1 FUSE (C5n.metal, 100Gbps
network) and 1 Worker (i3en.metal)
AWS S3
● Version/Spec: AWS S3 bucket (Standard Class)
network)
AWS S3 Express One Zone
● Version/Spec: AWS bucket (S3 Express One
Zone Class)
network)
➔ 45x Lower Latency than S3 Standard
➔ 5x Lower Latency than S3 Express One Zone

But that’s not all…
➔ Alluxio is 100% transparent to AI workloads
➔ S3 API & POSIX API
◆ Broad application support PyTorch, Python, AWS SDK, Boto3 …
◆ NO code changes
◆ NO workﬂow changes
➔ NO data imports or migrations
Drop in Alluxio as the transparent S3 caching layer
for faster and more scalable AI!

What the Public Results Show — MLPerf Training v2.0
Alluxio Achieves Exceptional GPU Utilization
Alluxio
DDN
Nutanix
Hammerspace
HPE
Source: MLCommons MLPerf Training v2.0 (retrieved 08/12/2025).
MLPerf®/MLCommons® are trademarks of MLCommons.

What the Public Results Show — MLPerf Training v2.0
Linear scale-out: clients↑ workers↑ throughput↑
Source: MLCommons MLPerf Training v2.0 (retrieved 08/12/2025).
MLPerf®/MLCommons® are trademarks of MLCommons.

AI 3.7 Key Features: Operations & Management
Deploy & Conﬁgure Alluxio in WebUI
After installing the Alluxio K8s Operator, the intuitive WebUI can be used to conﬁgure cluster
parameters, allocate resources, and customize deployments, making Alluxio deployment faster,
simpler, and more accurate than ever before.

AI 3.7 Key Features: Operations & Management
FUSE Non-Disruptive Upgrade: Rolling, K8s-native FUSE upgrades; mounts stay live.
Alluxio's innovative FUSE Online Upgrades feature allows admins to upgrade FUSE services
without interrupting active AI workloads
Alluxio's FUSE Online Upgrade capability maintains data accessibility throughout the upgrade
process by:
● Preserving active ﬁle handles and connections
● Queuing operations during the brief transition period
● Automatically resuming operations within tens of seconds
In this release, read operations (read, stat) are fully retained, while write operations will be
supported in future updates. This ensures your critical AI workloads keep running even during
necessary system maintenance.

AI 3.7 Key Features: Security & Compliance
Role Based Access Controls for S3 Data
● Deﬁne who can access which data
● Deﬁne what they can do with that data
● Integrations with Authentication &
Authorization providers
○ OIDC based Providers
(Okta, Cognito, Microsoft AD)
○ Apache Ranger, Open Policy Agent (OPA)
Audit & Analyze User Data Access & Operations
Now, every interaction is automatically recorded
with:
● User identities and authentication details
● Operations performed (read, write, delete, etc.)
● Precise timestamps
● Accessed resources and paths
Enables security teams to detect anomalies,
investigate incidents, and demonstrate compliance
with regulatory requirements, which are essential
for enterprise AI deployments handling sensitive
data.

Demo
Accelerate AI.

Demo: Transparent S3 Cache
Link 4:23
This demo showcases how to seamlessly accelerate a PyTorch data loading script using the Alluxio S3 API without any code modifications.
The presenter first runs a Python script that uses the s3torchconnectorlibrary to read data directly from an S3 bucket, establishing a
baseline performance of about 25 seconds. Then, by simply setting the S3_ENDPOINT_URLenvironment variable to point to the Alluxio S3
API endpoint, the exact same script is run again. This time, Alluxio serves the data from its cache, reducing the read time to about 15
seconds. The performance improvement is visually confirmed using a Grafana dashboard, which shows the data is now cached in Alluxio
and the cache hit rate is 100%.
● Objective (0:01): Accelerate PyTorch data pipelines that use S3 by leveraging the Alluxio S3 API, without requiring any changes to
the existing application code.
● The Mechanism (0:26): The integration works by redirecting S3 traffic to the Alluxio cluster. This is achieved by setting the
S3_ENDPOINT_URLenvironment variable to the Alluxio S3 endpoint address.
● Baseline Performance - Cold Read (1:03): The first run of the script with the endpoint URL unset reads data directly from AWS S3,
establishing a "cold read" benchmark of approximately 25 seconds.
● Accelerated Performance - Hot Read (1:52): For the second run, the S3_ENDPOINT_URLis set to the Alluxio endpoint. The exact
same script now reads the data from the Alluxio cache in about 15 seconds, demonstrating a significant performance improvement.
● Verification with Metrics (3:25): The performance gain is visually confirmed using a Grafana dashboard, which shows that 3 GiB of
data is now cached in Alluxio and the cache hit rate for the second run is 100%.

Demo: WebUI - Cluster Deployment
link 4:17
This demo introduces the new features of the Alluxio Management Console in version 3.7. The presenter walks through how to access
the console and use its new graphical user interface to deploy and manage Alluxio clusters. He demonstrates the two methods for
cluster creation: a manual, form-based configuration and a streamlined YAML upload option. The demo highlights how the console
provides a detailed, real-time status view of cluster resources, including workloads and individual pods, and shows how to easily
access pod logs for quick troubleshooting.
● Accessing the Console (0:11): The Alluxio Operator now deploys a dedicated alluxio-consolepod and service, which can
be accessed locally by port-forwarding its service to your machine.
● Cluster Creation (0:55): The console offers two ways to deploy a new Alluxio cluster: a step-by-step "Manual Configuration"
form or by simply uploading an existing YAML configuration file.
● Live Status Monitoring (2:56): The "View Status" page provides a real-time dashboard showing the creation progress and
health of all cluster workloads and their associated pods.
● Simplified Troubleshooting (3:29): Users can now view the logs of any pod directly from the console's UI with a single click,
making it easy to diagnose issues like a pod in an error state.
● Integrated Cluster Management (3:52): For existing clusters, the "View Console" button provides a direct link to that specific
cluster's traditional Alluxio UI, allowing for more detailed management operations like preloading data and managing storage.

Thanks.
Schedule Demo at alluxio.io/demo

Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency

More Related Content

Similar to Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency (20)

More from Alluxio, Inc. (20)

Recently uploaded (20)

Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency