SlideShare a Scribd company logo
Alluxio + Spark: Accelerating Auto Data Tagging
in WeRide
Feifei Cai, Hao Zhu
12/14/2021
• Introduction
• Alluxio Overview
• WeRide Use Case
• Future Work
• Feifei Cai
• Hao Zhu
• @WeRide
Introduction
• Introduction
• Alluxio Overview
• WeRide Use Case
• Future Work
• Fast I/O
• Simple
• Easy
Alluxio Overview
Image Source:alluxio.io
• Introduction
• Alluxio Overview
• WeRide Use Case
• Future Work
• Data Driven
• Storage
• Computing
Autonomous Driving Development
Data
Collection
Analytics
&
Selection
Data
Labeling
Model
Training
Simulation
&
Validation
Deploy
• Tags vs Labels
• Scenarios
• Data Selection
Data Tagging
Image Source:CARLA
• Hybrid Cloud
• Kubernetes
• Data Locality
Alluxio + Spark in Kubernetes
• MEM / SSD: 2TB / 8TB
• Tasks:7x faster
Test Results
• Introduction
• Alluxio Overview
• WeRide Use Case
• Future Work
• Integrate with more: Presto...
Future Work
• Spark + GPU + Alluxio (working on it)
Future Work
Image Source:NVIDIA

More Related Content

What's hot (20)

PDF
Building an open data platform with apache iceberg
Alluxio, Inc.
 
PDF
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Alluxio, Inc.
 
PDF
Best Practices for Using Alluxio with Spark
Alluxio, Inc.
 
PDF
Speeding Up Atlas Deep Learning Platform with Alluxio + Fluid
Alluxio, Inc.
 
PDF
Presto on Alluxio Hands-On Lab
Alluxio, Inc.
 
PDF
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Alluxio, Inc.
 
PDF
Building Fast SQL Analytics on Anything with Presto, Alluxio
Alluxio, Inc.
 
PDF
High Performance Data Lake with Apache Hudi and Alluxio at T3Go
Alluxio, Inc.
 
PDF
How to Develop and Operate Cloud Native Data Platforms and Applications
Alluxio, Inc.
 
PDF
How to teach your data scientist to leverage an analytics cluster with Presto...
Alluxio, Inc.
 
PDF
Alluxio - Virtual Unified File System
Alluxio, Inc.
 
PDF
Unified Data Access with Gimel
Alluxio, Inc.
 
PDF
Presto: Fast SQL-on-Anything Across Data Lakes, DBMS, and NoSQL Data Stores
Alluxio, Inc.
 
PDF
Using Alluxio as a Fault Tolerant Pluggable Optimization Component to Compute...
Alluxio, Inc.
 
PDF
The Practice of Presto & Alluxio in E-Commerce Big Data Platform
Alluxio, Inc.
 
PDF
Apache Iceberg - A Table Format for Hige Analytic Datasets
Alluxio, Inc.
 
PDF
Introducing the Hub for Data Orchestration
Alluxio, Inc.
 
PDF
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Adam Doyle
 
PDF
Data Orchestration for the Hybrid Cloud Era
Alluxio, Inc.
 
PDF
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio, Inc.
 
Building an open data platform with apache iceberg
Alluxio, Inc.
 
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Alluxio, Inc.
 
Best Practices for Using Alluxio with Spark
Alluxio, Inc.
 
Speeding Up Atlas Deep Learning Platform with Alluxio + Fluid
Alluxio, Inc.
 
Presto on Alluxio Hands-On Lab
Alluxio, Inc.
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Alluxio, Inc.
 
Building Fast SQL Analytics on Anything with Presto, Alluxio
Alluxio, Inc.
 
High Performance Data Lake with Apache Hudi and Alluxio at T3Go
Alluxio, Inc.
 
How to Develop and Operate Cloud Native Data Platforms and Applications
Alluxio, Inc.
 
How to teach your data scientist to leverage an analytics cluster with Presto...
Alluxio, Inc.
 
Alluxio - Virtual Unified File System
Alluxio, Inc.
 
Unified Data Access with Gimel
Alluxio, Inc.
 
Presto: Fast SQL-on-Anything Across Data Lakes, DBMS, and NoSQL Data Stores
Alluxio, Inc.
 
Using Alluxio as a Fault Tolerant Pluggable Optimization Component to Compute...
Alluxio, Inc.
 
The Practice of Presto & Alluxio in E-Commerce Big Data Platform
Alluxio, Inc.
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Alluxio, Inc.
 
Introducing the Hub for Data Orchestration
Alluxio, Inc.
 
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Adam Doyle
 
Data Orchestration for the Hybrid Cloud Era
Alluxio, Inc.
 
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio, Inc.
 

Similar to Alluxio + Spark: Accelerating Auto Data Tagging in WeRide (20)

PDF
Best Practice in Accelerating Data Applications with Spark+Alluxio
Alluxio, Inc.
 
PDF
Improving Data Locality for Spark Jobs on Kubernetes Using Alluxio
Alluxio, Inc.
 
PDF
Best Practices for Using Alluxio with Apache Spark with Gene Pang
Spark Summit
 
PDF
Best Practices for Using Alluxio with Spark
Alluxio, Inc.
 
PDF
Spark Summit EU talk by Jiri Simsa
Spark Summit
 
PDF
Spark Summit EU talk by Jiri Simsa
Alluxio, Inc.
 
PDF
Flexible and Fast Storage for Deep Learning with Alluxio
Alluxio, Inc.
 
PDF
Best Practices for Using Alluxio with Apache Spark with Cheng Chang and Haoyu...
Databricks
 
PDF
Getting Started with Apache Spark and Alluxio for Blazingly Fast Analytics
Alluxio, Inc.
 
PDF
Alluxio @ Uber Seattle Meetup
Alluxio, Inc.
 
PDF
Best Practices for Using Alluxio with Spark
Alluxio, Inc.
 
PDF
Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017
Alluxio, Inc.
 
PPTX
Spark Pipelines in the Cloud with Alluxio by Bin Fan
Data Con LA
 
PDF
Getting Started with Alluxio + Spark + S3
Alluxio, Inc.
 
PDF
CNCF Member Webinar: Improving Data Locality for Analytics Jobs on Kubernetes...
Alluxio, Inc.
 
PPTX
Alluxio: Unify Data at Memory Speed
Alluxio, Inc.
 
PDF
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio, Inc.
 
PDF
Accelerating Spark with Kubernetes
Alluxio, Inc.
 
PDF
AI/ML Infra Meetup | Maximizing GPU Efficiency : Optimizing Model Training wi...
Alluxio, Inc.
 
PDF
Accelerating Spark Workloads in a Mesos Environment with Alluxio
Alluxio, Inc.
 
Best Practice in Accelerating Data Applications with Spark+Alluxio
Alluxio, Inc.
 
Improving Data Locality for Spark Jobs on Kubernetes Using Alluxio
Alluxio, Inc.
 
Best Practices for Using Alluxio with Apache Spark with Gene Pang
Spark Summit
 
Best Practices for Using Alluxio with Spark
Alluxio, Inc.
 
Spark Summit EU talk by Jiri Simsa
Spark Summit
 
Spark Summit EU talk by Jiri Simsa
Alluxio, Inc.
 
Flexible and Fast Storage for Deep Learning with Alluxio
Alluxio, Inc.
 
Best Practices for Using Alluxio with Apache Spark with Cheng Chang and Haoyu...
Databricks
 
Getting Started with Apache Spark and Alluxio for Blazingly Fast Analytics
Alluxio, Inc.
 
Alluxio @ Uber Seattle Meetup
Alluxio, Inc.
 
Best Practices for Using Alluxio with Spark
Alluxio, Inc.
 
Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017
Alluxio, Inc.
 
Spark Pipelines in the Cloud with Alluxio by Bin Fan
Data Con LA
 
Getting Started with Alluxio + Spark + S3
Alluxio, Inc.
 
CNCF Member Webinar: Improving Data Locality for Analytics Jobs on Kubernetes...
Alluxio, Inc.
 
Alluxio: Unify Data at Memory Speed
Alluxio, Inc.
 
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio, Inc.
 
Accelerating Spark with Kubernetes
Alluxio, Inc.
 
AI/ML Infra Meetup | Maximizing GPU Efficiency : Optimizing Model Training wi...
Alluxio, Inc.
 
Accelerating Spark Workloads in a Mesos Environment with Alluxio
Alluxio, Inc.
 
Ad

More from Alluxio, Inc. (20)

PDF
Introduction to Apache Iceberg™ & Tableflow
Alluxio, Inc.
 
PDF
Optimizing Tiered Storage for Low-Latency Real-Time Analytics at AI Scale
Alluxio, Inc.
 
PDF
Meet in the Middle: Solving the Low-Latency Challenge for Agentic AI
Alluxio, Inc.
 
PDF
From Data Preparation to Inference: How Alluxio Speeds Up AI
Alluxio, Inc.
 
PDF
Best Practice for LLM Serving in the Cloud
Alluxio, Inc.
 
PDF
Meet You in the Middle: 1000x Performance for Parquet Queries on PB-Scale Dat...
Alluxio, Inc.
 
PDF
How Coupang Leverages Distributed Cache to Accelerate ML Model Training
Alluxio, Inc.
 
PDF
Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...
Alluxio, Inc.
 
PDF
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
Alluxio, Inc.
 
PDF
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
Alluxio, Inc.
 
PDF
AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio: Preprocessing, ...
Alluxio, Inc.
 
PDF
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
Alluxio, Inc.
 
PDF
Alluxio Webinar | What’s New in Alluxio AI: 3X Faster Checkpoint File Creatio...
Alluxio, Inc.
 
PDF
AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack
Alluxio, Inc.
 
PDF
AI/ML Infra Meetup | Balancing Cost, Performance, and Scale - Running GPU/CPU...
Alluxio, Inc.
 
PDF
AI/ML Infra Meetup | RAYvolution - The Last Mile: Mastering AI Deployment wit...
Alluxio, Inc.
 
PDF
Alluxio Webinar | Accelerate AI: Alluxio 101
Alluxio, Inc.
 
PDF
AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI
Alluxio, Inc.
 
PDF
AI/ML Infra Meetup | Exploring Distributed Caching for Faster GPU Training wi...
Alluxio, Inc.
 
PDF
AI/ML Infra Meetup | Big Data and AI, Zoom Developers
Alluxio, Inc.
 
Introduction to Apache Iceberg™ & Tableflow
Alluxio, Inc.
 
Optimizing Tiered Storage for Low-Latency Real-Time Analytics at AI Scale
Alluxio, Inc.
 
Meet in the Middle: Solving the Low-Latency Challenge for Agentic AI
Alluxio, Inc.
 
From Data Preparation to Inference: How Alluxio Speeds Up AI
Alluxio, Inc.
 
Best Practice for LLM Serving in the Cloud
Alluxio, Inc.
 
Meet You in the Middle: 1000x Performance for Parquet Queries on PB-Scale Dat...
Alluxio, Inc.
 
How Coupang Leverages Distributed Cache to Accelerate ML Model Training
Alluxio, Inc.
 
Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...
Alluxio, Inc.
 
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
Alluxio, Inc.
 
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
Alluxio, Inc.
 
AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio: Preprocessing, ...
Alluxio, Inc.
 
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
Alluxio, Inc.
 
Alluxio Webinar | What’s New in Alluxio AI: 3X Faster Checkpoint File Creatio...
Alluxio, Inc.
 
AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack
Alluxio, Inc.
 
AI/ML Infra Meetup | Balancing Cost, Performance, and Scale - Running GPU/CPU...
Alluxio, Inc.
 
AI/ML Infra Meetup | RAYvolution - The Last Mile: Mastering AI Deployment wit...
Alluxio, Inc.
 
Alluxio Webinar | Accelerate AI: Alluxio 101
Alluxio, Inc.
 
AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI
Alluxio, Inc.
 
AI/ML Infra Meetup | Exploring Distributed Caching for Faster GPU Training wi...
Alluxio, Inc.
 
AI/ML Infra Meetup | Big Data and AI, Zoom Developers
Alluxio, Inc.
 
Ad

Recently uploaded (20)

PDF
Adobe Illustrator Crack Full Download (Latest Version 2025) Pre-Activated
imang66g
 
PDF
Step-by-Step Guide to Install SAP HANA Studio | Complete Installation Tutoria...
SAP Vista, an A L T Z E N Company
 
PPTX
Presentation about Database and Database Administrator
abhishekchauhan86963
 
PPTX
GALILEO CRS SYSTEM | GALILEO TRAVEL SOFTWARE
philipnathen82
 
PDF
System Center 2025 vs. 2022; What’s new, what’s next_PDF.pdf
Q-Advise
 
PDF
How Agentic AI Networks are Revolutionizing Collaborative AI Ecosystems in 2025
ronakdubey419
 
PDF
advancepresentationskillshdhdhhdhdhdhhfhf
jasmenrojas249
 
PDF
Protecting the Digital World Cyber Securit
dnthakkar16
 
PDF
Using licensed Data Loss Prevention (DLP) as a strategic proactive data secur...
Q-Advise
 
PPTX
Web Testing.pptx528278vshbuqffqhhqiwnwuq
studylike474
 
PDF
What companies do with Pharo (ESUG 2025)
ESUG
 
PDF
Applitools Platform Pulse: What's New and What's Coming - July 2025
Applitools
 
PDF
Supabase Meetup: Build in a weekend, scale to millions
Carlo Gilmar Padilla Santana
 
PDF
SAP GUI Installation Guide for macOS (iOS) | Connect to SAP Systems on Mac
SAP Vista, an A L T Z E N Company
 
PDF
Why Are More Businesses Choosing Partners Over Freelancers for Salesforce.pdf
Cymetrix Software
 
PDF
Enhancing Security in VAST: Towards Static Vulnerability Scanning
ESUG
 
PDF
Infrastructure planning and resilience - Keith Hastings.pptx.pdf
Safe Software
 
PDF
Download iTop VPN Free 6.1.0.5882 Crack Full Activated Pre Latest 2025
imang66g
 
PDF
10 posting ideas for community engagement with AI prompts
Pankaj Taneja
 
PDF
SAP GUI Installation Guide for Windows | Step-by-Step Setup for SAP Access
SAP Vista, an A L T Z E N Company
 
Adobe Illustrator Crack Full Download (Latest Version 2025) Pre-Activated
imang66g
 
Step-by-Step Guide to Install SAP HANA Studio | Complete Installation Tutoria...
SAP Vista, an A L T Z E N Company
 
Presentation about Database and Database Administrator
abhishekchauhan86963
 
GALILEO CRS SYSTEM | GALILEO TRAVEL SOFTWARE
philipnathen82
 
System Center 2025 vs. 2022; What’s new, what’s next_PDF.pdf
Q-Advise
 
How Agentic AI Networks are Revolutionizing Collaborative AI Ecosystems in 2025
ronakdubey419
 
advancepresentationskillshdhdhhdhdhdhhfhf
jasmenrojas249
 
Protecting the Digital World Cyber Securit
dnthakkar16
 
Using licensed Data Loss Prevention (DLP) as a strategic proactive data secur...
Q-Advise
 
Web Testing.pptx528278vshbuqffqhhqiwnwuq
studylike474
 
What companies do with Pharo (ESUG 2025)
ESUG
 
Applitools Platform Pulse: What's New and What's Coming - July 2025
Applitools
 
Supabase Meetup: Build in a weekend, scale to millions
Carlo Gilmar Padilla Santana
 
SAP GUI Installation Guide for macOS (iOS) | Connect to SAP Systems on Mac
SAP Vista, an A L T Z E N Company
 
Why Are More Businesses Choosing Partners Over Freelancers for Salesforce.pdf
Cymetrix Software
 
Enhancing Security in VAST: Towards Static Vulnerability Scanning
ESUG
 
Infrastructure planning and resilience - Keith Hastings.pptx.pdf
Safe Software
 
Download iTop VPN Free 6.1.0.5882 Crack Full Activated Pre Latest 2025
imang66g
 
10 posting ideas for community engagement with AI prompts
Pankaj Taneja
 
SAP GUI Installation Guide for Windows | Step-by-Step Setup for SAP Access
SAP Vista, an A L T Z E N Company
 

Alluxio + Spark: Accelerating Auto Data Tagging in WeRide