8
Most read
9
Most read
18
Most read
Rusty Waters
Elevating Lakehouses Beyond Spark
Satya Mandavilli
Solution Architect
S&P Global Commodity Insights
Madhukar Nathala
Associate Director
S&P Global Commodity Insights
Agenda
Challenges
Problem Statement
Context
Solution Benefits
About Us
• At S&P Global Commodity Insights,
our complete view of global energy and
commodities markets enables our
customers to make decisions with
conviction and create long-term, sustainable
value.
• Our team is at the forefront driving innovation
and leveraging data to deliver valuable
insights in the commodity market.
• We combine cutting-edge technology with
robust data strategies to empower our clients
and enable informed decision-making.
Data Analyst
Data Engineer
Data Scientist
AI/ML Engineer
Data Governance Lead
Data Steward
Our Platform
1k+
Active users
37k+
Tables
Onboarded
75k+
Jobs daily
3k+
TBs of Processed
Data
The Context
FOCUS ON BUILDING
& ENHANCING OUR
UNIFIED,
ENTERPRISE-GRADE
LAKEHOUSE
PLATFORM
01
A DYNAMIC
COLLABORATION,
DRIVING A
TRANSFORMATIVE
JOURNEY TO
TACKLE DATA
CHALLENGES.
02
FLEXIBILITY FOR
USERS TO CHOOSE
THEIR TOOLS AND
TECHNOLOGIES
03
The Problem
Spark may not be the optimal choice for every workflow
Wait time for resources
Smaller workflows becomes expensive
Knowledge Gap
The Requirement
90% of the workloads is 100MB or
less
Source: Big Data is Dead - MotherDuck Blog
The Solution
 Native Rust library for Delta Lake,
providing efficient and reliable data
processing.
 Enables data engineers to interact with
Delta Lake without Apache Spark,
reducing infrastructure costs and
simplifying data pipelines through a
standalone Rust API.
Delta-rs in Action
Example
The Effect
 Cost per Month: $52
 Resource Spin up Time: 6Mins
 Job Run Time: 10Mins
 Cost per Month: $7.3
 Resource Spin up Time: 1.4Sec
 Job Run Time: 4Mins
Before (Spark + Notebook) After (Delta-RS + Lambda)
Workflow: Processing 100MB dataset on an hourly schedule
The Benefits
Easy to onboard Cost Savings
High Performance Open Source
 Fixed with 0.19.0
Change Data Feed
Concurrent Writes
 Multi-Cluster concurrent writes into Delta Lake tables in S3
 Fixed with 0.23.0*
* Conditional put support is only safe if you don’t have spark/DBR writers on the table
Closing Thoughts
 No single engine is perfect for every data workload!
 Avoid getting trapped in a Spark-only mindset; not every challenge is a Big Data
problem.
 With Rust, we gain advantages such as memory safety, efficient concurrency, and
high performance, all of which enhance the Lakehouse capabilities.
 Consider the potential savings in both costs and complexity for your Data Platform.
 Focus on understanding your data and selecting the right engines for your
workloads.
Thank you!
Questions?
https://go.delta.io/slack

More Related Content

PPTX
Databricks Platform.pptx
PDF
Building Data Intensive Analytic Application on Top of Delta Lakes
PDF
Intro to Delta Lake
PDF
What Is Delta Lake ???
PDF
Building End-to-End Delta Pipelines on GCP
PDF
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
PDF
Building Robust Production Data Pipelines with Databricks Delta
PDF
Building Robust Production Data Pipelines with Databricks Delta
Databricks Platform.pptx
Building Data Intensive Analytic Application on Top of Delta Lakes
Intro to Delta Lake
What Is Delta Lake ???
Building End-to-End Delta Pipelines on GCP
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Building Robust Production Data Pipelines with Databricks Delta
Building Robust Production Data Pipelines with Databricks Delta

Similar to Rusty Waters: Elevating Lakehouses Beyond Spark (20)

PDF
Building the Next-gen Digital Meter Platform for Fluvius
PDF
Streaming Data Into Your Lakehouse With Frank Munz | Current 2022
PPTX
Building Reliable Lakehouses with Apache Flink and Delta Lake
PPTX
Turning Raw Data Into Gold With A Data Lakehouse.pptx
PDF
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
PPTX
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
PPTX
Data Engineering A Deep Dive into Databricks
PPTX
Free Training: How to Build a Lakehouse
PDF
Introduction SQL Analytics on Lakehouse Architecture
PDF
Agile data lake? An oxymoron?
PDF
Simple, Modular and Extensible Big Data Platform Concept
PDF
Building Reliable Data Lakes at Scale with Delta Lake
PDF
Rise of Intermediate APIs - Beam and Alluxio at Alluxio Meetup 2016
PPTX
Data Engineering with Databricks Presentation
PDF
The Revolution Will be Streamed
PDF
Building Data Quality pipelines with Apache Spark and Delta Lake
PDF
Lakehouse in Azure
PDF
Getting Started with Delta Lake on Databricks
PDF
An overview of modern scalable web development
PPTX
Rakuten techconf2015.baiji.he.bigdataforsmallstartupandbeyond
Building the Next-gen Digital Meter Platform for Fluvius
Streaming Data Into Your Lakehouse With Frank Munz | Current 2022
Building Reliable Lakehouses with Apache Flink and Delta Lake
Turning Raw Data Into Gold With A Data Lakehouse.pptx
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Data Engineering A Deep Dive into Databricks
Free Training: How to Build a Lakehouse
Introduction SQL Analytics on Lakehouse Architecture
Agile data lake? An oxymoron?
Simple, Modular and Extensible Big Data Platform Concept
Building Reliable Data Lakes at Scale with Delta Lake
Rise of Intermediate APIs - Beam and Alluxio at Alluxio Meetup 2016
Data Engineering with Databricks Presentation
The Revolution Will be Streamed
Building Data Quality pipelines with Apache Spark and Delta Lake
Lakehouse in Azure
Getting Started with Delta Lake on Databricks
An overview of modern scalable web development
Rakuten techconf2015.baiji.he.bigdataforsmallstartupandbeyond
Ad

More from carlyakerly1 (7)

PDF
Daft Presentation | Open Lakehouse + AI Amsterdam
PDF
Composable Open Table Formats | Open Lakehouse + AI Amsterdam
PDF
DuckLake Presentation | Open Lakehouse + AI Amsterdam
PDF
Transcript - Delta Lake Tips, Tricks & Best Practices (1).pdf
PDF
Bay Area Apache Spark ™ Meetup: Upcoming Apache Spark 4.0.0 Release
PPTX
Delta Lake Tips, Tricks, and Best Practices WIP.pptx
PPTX
The Best of Both Worlds: Hybrid Clustering with Delta Lake
Daft Presentation | Open Lakehouse + AI Amsterdam
Composable Open Table Formats | Open Lakehouse + AI Amsterdam
DuckLake Presentation | Open Lakehouse + AI Amsterdam
Transcript - Delta Lake Tips, Tricks & Best Practices (1).pdf
Bay Area Apache Spark ™ Meetup: Upcoming Apache Spark 4.0.0 Release
Delta Lake Tips, Tricks, and Best Practices WIP.pptx
The Best of Both Worlds: Hybrid Clustering with Delta Lake
Ad

Recently uploaded (20)

PDF
The-Future-of-Automotive-Quality-is-Here-AI-Driven-Engineering.pdf
PPTX
Internet of Everything -Basic concepts details
PDF
Convolutional neural network based encoder-decoder for efficient real-time ob...
PDF
sbt 2.0: go big (Scala Days 2025 edition)
PPTX
SGT Report The Beast Plan and Cyberphysical Systems of Control
PDF
Transform-Your-Streaming-Platform-with-AI-Driven-Quality-Engineering.pdf
PPTX
agenticai-neweraofintelligence-250529192801-1b5e6870.pptx
PDF
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
PDF
LMS bot: enhanced learning management systems for improved student learning e...
PPT
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
PPTX
AI-driven Assurance Across Your End-to-end Network With ThousandEyes
PPTX
Custom Battery Pack Design Considerations for Performance and Safety
PDF
CXOs-Are-you-still-doing-manual-DevOps-in-the-age-of-AI.pdf
PPTX
Configure Apache Mutual Authentication
PDF
IT-ITes Industry bjjbnkmkhkhknbmhkhmjhjkhj
PDF
Accessing-Finance-in-Jordan-MENA 2024 2025.pdf
PDF
The-2025-Engineering-Revolution-AI-Quality-and-DevOps-Convergence.pdf
PDF
Enhancing plagiarism detection using data pre-processing and machine learning...
PPTX
future_of_ai_comprehensive_20250822032121.pptx
PPTX
MuleSoft-Compete-Deck for midddleware integrations
The-Future-of-Automotive-Quality-is-Here-AI-Driven-Engineering.pdf
Internet of Everything -Basic concepts details
Convolutional neural network based encoder-decoder for efficient real-time ob...
sbt 2.0: go big (Scala Days 2025 edition)
SGT Report The Beast Plan and Cyberphysical Systems of Control
Transform-Your-Streaming-Platform-with-AI-Driven-Quality-Engineering.pdf
agenticai-neweraofintelligence-250529192801-1b5e6870.pptx
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
LMS bot: enhanced learning management systems for improved student learning e...
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
AI-driven Assurance Across Your End-to-end Network With ThousandEyes
Custom Battery Pack Design Considerations for Performance and Safety
CXOs-Are-you-still-doing-manual-DevOps-in-the-age-of-AI.pdf
Configure Apache Mutual Authentication
IT-ITes Industry bjjbnkmkhkhknbmhkhmjhjkhj
Accessing-Finance-in-Jordan-MENA 2024 2025.pdf
The-2025-Engineering-Revolution-AI-Quality-and-DevOps-Convergence.pdf
Enhancing plagiarism detection using data pre-processing and machine learning...
future_of_ai_comprehensive_20250822032121.pptx
MuleSoft-Compete-Deck for midddleware integrations

Rusty Waters: Elevating Lakehouses Beyond Spark

  • 1. Rusty Waters Elevating Lakehouses Beyond Spark Satya Mandavilli Solution Architect S&P Global Commodity Insights Madhukar Nathala Associate Director S&P Global Commodity Insights
  • 3. About Us • At S&P Global Commodity Insights, our complete view of global energy and commodities markets enables our customers to make decisions with conviction and create long-term, sustainable value. • Our team is at the forefront driving innovation and leveraging data to deliver valuable insights in the commodity market. • We combine cutting-edge technology with robust data strategies to empower our clients and enable informed decision-making.
  • 4. Data Analyst Data Engineer Data Scientist AI/ML Engineer Data Governance Lead Data Steward
  • 6. The Context FOCUS ON BUILDING & ENHANCING OUR UNIFIED, ENTERPRISE-GRADE LAKEHOUSE PLATFORM 01 A DYNAMIC COLLABORATION, DRIVING A TRANSFORMATIVE JOURNEY TO TACKLE DATA CHALLENGES. 02 FLEXIBILITY FOR USERS TO CHOOSE THEIR TOOLS AND TECHNOLOGIES 03
  • 7. The Problem Spark may not be the optimal choice for every workflow Wait time for resources Smaller workflows becomes expensive Knowledge Gap
  • 9. 90% of the workloads is 100MB or less Source: Big Data is Dead - MotherDuck Blog
  • 10. The Solution  Native Rust library for Delta Lake, providing efficient and reliable data processing.  Enables data engineers to interact with Delta Lake without Apache Spark, reducing infrastructure costs and simplifying data pipelines through a standalone Rust API.
  • 13. The Effect  Cost per Month: $52  Resource Spin up Time: 6Mins  Job Run Time: 10Mins  Cost per Month: $7.3  Resource Spin up Time: 1.4Sec  Job Run Time: 4Mins Before (Spark + Notebook) After (Delta-RS + Lambda) Workflow: Processing 100MB dataset on an hourly schedule
  • 14. The Benefits Easy to onboard Cost Savings High Performance Open Source
  • 15.  Fixed with 0.19.0 Change Data Feed
  • 16. Concurrent Writes  Multi-Cluster concurrent writes into Delta Lake tables in S3  Fixed with 0.23.0* * Conditional put support is only safe if you don’t have spark/DBR writers on the table
  • 17. Closing Thoughts  No single engine is perfect for every data workload!  Avoid getting trapped in a Spark-only mindset; not every challenge is a Big Data problem.  With Rust, we gain advantages such as memory safety, efficient concurrency, and high performance, all of which enhance the Lakehouse capabilities.  Consider the potential savings in both costs and complexity for your Data Platform.  Focus on understanding your data and selecting the right engines for your workloads.