SlideShare a Scribd company logo
Jakub Háva, Michal Malohlava; H2O.ai
Productionizing H2O
Models with Apache
Spark
#ML4SAIS
#ML4SAIS
Who are we?
• Michal
• Chief Architect of Platforms at H2O.ai
• Creator of Sparkling Water
• Ph.D at Charles University (CZ), PostDoc at Purdue Uni (US)
• Kuba
• Senior Software engineer at H2O.ai - Core Sparkling Water
• Master’s at Charles University (CZ)
• Implemented high-performance cluster monitoring tool for
JVM based languages (JNI, JVMTI, instrumentation)
2
Machine Learning
(ML) Lifecycle
#ML4SAIS
4
Model
Training	
Algorithm
Feature	
Engineering
Model	Pipeline	Building
Training	
Predictions
Data	Engineering
Basic ML Lifecycle
#ML4SAIS
5
Model
Training	
Algorithm
Feature	
Engineering
Featurization	Pipeline
Model
Model	Pipeline	Building
Training	
Predictions
Deployment	
Predictions
Data	Engineering
Model	Pipeline	Deployment
Basic ML Lifecycle
#ML4SAIS
Example Implementations
6#ML4SAIS
Data Engineering
Feature
Engineering
Training Algorithm
Deployment
Pipeline
Model
Spark H2O Spark H2O MOJO
Spark H2O Driverless AI Spark
H2O Driverless AI
MOJO
Model Building Model Deployment
H2O + Spark =
Sparkling
Water
#ML4SAIS
#ML4SAIS
H2O + Spark
• H2O
• Machine Learning Library
• Distributed Algorithms
• For ML experts
• Sparkling Water
• Integrates H2O & Spark Ecosystems
• Transparent for Spark users
• Based on Spark pipelines & H2O
8
Basic ML Lifecycle: Sparkling Water
9
Model
Training	
Algorithm
Feature	
Engineering
Spark	Transformers H2O	MOJO	Model
Training	
Predictions
Deployment	
Predictions
AutoML
Pipeline
#ML4SAIS
Demo:
Spark Pipeline
#ML4SAIS
H2O Driverless AI
#ML4SAIS
#ML4SAIS
H2O Driverless AI
• What if I’m not expert ?
• H2O Driverless AI
• H2O Driverless AI
• No expert knowledge required
• Automatic Feature Engineering & ML
12
Basic ML Lifecycle: Driverless AI
13
Model
Training	
Algorithm
Feature	
Engineering
Driverless	AI	Feature	Transformations Driverless	AI	Model
Training	
Predictions
Deployment	
Predictions
PipelineDriverless	AI	MOJO	as
#ML4SAIS
Demo:
Driverless AI as
Spark Pipeline
#ML4SAIS
15#ML4SAIS
Driverless AI Pipeline
16#ML4SAIS
Governed ML
Lifecycle
#ML4SAIS
Governed ML Lifecycle
18
Model
Training	
Algorithm
Feature	
Engineering
Featurization	Pipeline
Model
Model	Pipeline	Building
Training	
Predictions
Deployment	
Predictions
Model	
Management
Data	Engineering
Model	Pipeline	Deployment
Model	
Monitoring
Auto	
Documentation
#ML4SAIS
#ML4SAIS
Materials
19
https://bit.ly/2sxowxD
#ML4SAIS
Sparkling Water
enables
deployment of
H2O ML models
with Spark
Pipelines
20
Thank you!

More Related Content

What's hot (20)

PDF
Getting Ready to Use Redis with Apache Spark with Tague Griffith
Databricks
 
PDF
Moving a Fraud-Fighting Random Forest from scikit-learn to Spark with MLlib, ...
Databricks
 
PDF
Productionalizing Models through CI/CD Design with MLflow
Databricks
 
PDF
Scaling Ride-Hailing with Machine Learning on MLflow
Databricks
 
PDF
NLP Text Recommendation System Journey to Automated Training
Databricks
 
PDF
An Introduction to Sparkling Water by Michal Malohlava
Spark Summit
 
PPTX
Spark ML Pipeline serving
Stepan Pushkarev
 
PDF
A Collaborative Data Science Development Workflow
Databricks
 
PDF
H2O Rains with Databricks Cloud - NY 02.16.16
Sri Ambati
 
PDF
MLflow with R
Databricks
 
PDF
Insights Without Tradeoffs: Using Structured Streaming
Databricks
 
PDF
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
Databricks
 
PDF
10 Things Learned Releasing Databricks Enterprise Wide
Databricks
 
PDF
From Idea to Model: Productionizing Data Pipelines with Apache Airflow
Databricks
 
PDF
Sawtooth Windows for Feature Aggregations
Databricks
 
PDF
MLeap: Release Spark ML Pipelines
DataWorks Summit/Hadoop Summit
 
PDF
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
Databricks
 
PDF
Porting R Models into Scala Spark
carl_pulley
 
PDF
Understanding and Improving Code Generation
Databricks
 
PDF
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
Databricks
 
Getting Ready to Use Redis with Apache Spark with Tague Griffith
Databricks
 
Moving a Fraud-Fighting Random Forest from scikit-learn to Spark with MLlib, ...
Databricks
 
Productionalizing Models through CI/CD Design with MLflow
Databricks
 
Scaling Ride-Hailing with Machine Learning on MLflow
Databricks
 
NLP Text Recommendation System Journey to Automated Training
Databricks
 
An Introduction to Sparkling Water by Michal Malohlava
Spark Summit
 
Spark ML Pipeline serving
Stepan Pushkarev
 
A Collaborative Data Science Development Workflow
Databricks
 
H2O Rains with Databricks Cloud - NY 02.16.16
Sri Ambati
 
MLflow with R
Databricks
 
Insights Without Tradeoffs: Using Structured Streaming
Databricks
 
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
Databricks
 
10 Things Learned Releasing Databricks Enterprise Wide
Databricks
 
From Idea to Model: Productionizing Data Pipelines with Apache Airflow
Databricks
 
Sawtooth Windows for Feature Aggregations
Databricks
 
MLeap: Release Spark ML Pipelines
DataWorks Summit/Hadoop Summit
 
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
Databricks
 
Porting R Models into Scala Spark
carl_pulley
 
Understanding and Improving Code Generation
Databricks
 
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
Databricks
 

Similar to Productionizing H2O Models with Apache Spark with Jakub Hava and Michal Malohlava (20)

PDF
Productionizing H2O Models with Apache Spark
Sri Ambati
 
PDF
Jakub Hava, H2O.ai - Productionizing Apache Spark Models using H2O - H2O Worl...
Sri Ambati
 
PPTX
H2O open source sparkling water introduction and deep dive
FengBai4
 
PDF
Madrid Meetup
Sri Ambati
 
PDF
Introduction to Sparkling Water - Spark Summit East 2016
Sri Ambati
 
PDF
H2o.ai presentation at 2nd Virtual Pydata Piraeus meetup
PyData Piraeus
 
PDF
Spark Summit EU talk by Jakub Hava
Spark Summit
 
PDF
Machine Learning With H2O vs SparkML
Arnab Biswas
 
PPTX
"Introduction to Sparkling Water" — Jakub Hava, Senior Software Engineer, at ...
Provectus
 
PDF
Roadmaps and Vision, Michal Malohlavav - H2O World San Francisco
Sri Ambati
 
PDF
Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O
Data Science Milan
 
PPTX
Project "Deep Water"
Jo-fai Chow
 
PDF
H2O Cloud AI Developer Services - Slides (2024)
Sri Ambati
 
PDF
Belgrade R - Intro to H2O and Deep Water
Sri Ambati
 
PDF
H2O at BelgradeR Meetup
Jo-fai Chow
 
PDF
H2O Machine Learning AutoML Roadmap 2016.10
Raymond Peck
 
PDF
Big Data LDN 2017: H2O.ai Driverless AI: Fast, Accurate, Interpretable AI
Matt Stubbs
 
PDF
H2O AutoML roadmap - Ray Peck
Sri Ambati
 
PDF
H2O Deep Water - Making Deep Learning Accessible to Everyone
Jo-fai Chow
 
PDF
Sparkling Water Meetup: Deep Learning for Public Safety
Sri Ambati
 
Productionizing H2O Models with Apache Spark
Sri Ambati
 
Jakub Hava, H2O.ai - Productionizing Apache Spark Models using H2O - H2O Worl...
Sri Ambati
 
H2O open source sparkling water introduction and deep dive
FengBai4
 
Madrid Meetup
Sri Ambati
 
Introduction to Sparkling Water - Spark Summit East 2016
Sri Ambati
 
H2o.ai presentation at 2nd Virtual Pydata Piraeus meetup
PyData Piraeus
 
Spark Summit EU talk by Jakub Hava
Spark Summit
 
Machine Learning With H2O vs SparkML
Arnab Biswas
 
"Introduction to Sparkling Water" — Jakub Hava, Senior Software Engineer, at ...
Provectus
 
Roadmaps and Vision, Michal Malohlavav - H2O World San Francisco
Sri Ambati
 
Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O
Data Science Milan
 
Project "Deep Water"
Jo-fai Chow
 
H2O Cloud AI Developer Services - Slides (2024)
Sri Ambati
 
Belgrade R - Intro to H2O and Deep Water
Sri Ambati
 
H2O at BelgradeR Meetup
Jo-fai Chow
 
H2O Machine Learning AutoML Roadmap 2016.10
Raymond Peck
 
Big Data LDN 2017: H2O.ai Driverless AI: Fast, Accurate, Interpretable AI
Matt Stubbs
 
H2O AutoML roadmap - Ray Peck
Sri Ambati
 
H2O Deep Water - Making Deep Learning Accessible to Everyone
Jo-fai Chow
 
Sparkling Water Meetup: Deep Learning for Public Safety
Sri Ambati
 
Ad

More from Databricks (20)

PPTX
DW Migration Webinar-March 2022.pptx
Databricks
 
PPTX
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
PPT
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
PPTX
Data Lakehouse Symposium | Day 2
Databricks
 
PPTX
Data Lakehouse Symposium | Day 4
Databricks
 
PDF
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
PDF
Democratizing Data Quality Through a Centralized Platform
Databricks
 
PDF
Learn to Use Databricks for Data Science
Databricks
 
PDF
Why APM Is Not the Same As ML Monitoring
Databricks
 
PDF
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
PDF
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
PDF
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
PDF
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
PDF
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
PDF
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
PDF
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
PDF
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
PDF
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
PDF
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
PDF
Machine Learning CI/CD for Email Attack Detection
Databricks
 
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
Machine Learning CI/CD for Email Attack Detection
Databricks
 
Ad

Recently uploaded (20)

PPTX
Usage of Power BI for Pharmaceutical Data analysis.pptx
Anisha Herala
 
PPTX
加拿大尼亚加拉学院毕业证书{Niagara在读证明信Niagara成绩单修改}复刻
Taqyea
 
PDF
WEF_Future_of_Global_Fintech_Second_Edition_2025.pdf
AproximacionAlFuturo
 
PDF
Building Production-Ready AI Agents with LangGraph.pdf
Tamanna
 
PPT
01 presentation finyyyal معهد معايره.ppt
eltohamym057
 
PPTX
Advanced_NLP_with_Transformers_PPT_final 50.pptx
Shiwani Gupta
 
PPT
1 DATALINK CONTROL and it's applications
karunanidhilithesh
 
PPTX
apidays Munich 2025 - Building an AWS Serverless Application with Terraform, ...
apidays
 
PDF
List of all the AI prompt cheat codes.pdf
Avijit Kumar Roy
 
PPTX
This PowerPoint presentation titled "Data Visualization: Turning Data into In...
HemaDivyaKantamaneni
 
PDF
Context Engineering vs. Prompt Engineering, A Comprehensive Guide.pdf
Tamanna
 
PDF
What does good look like - CRAP Brighton 8 July 2025
Jan Kierzyk
 
PPTX
Slide studies GC- CRC - PC - HNC baru.pptx
LLen8
 
PDF
apidays Helsinki & North 2025 - REST in Peace? Hunting the Dominant Design fo...
apidays
 
PDF
AUDITABILITY & COMPLIANCE OF AI SYSTEMS IN HEALTHCARE
GAHI Youssef
 
PDF
apidays Helsinki & North 2025 - APIs in the healthcare sector: hospitals inte...
apidays
 
PPTX
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
PPTX
AI Project Cycle and Ethical Frameworks.pptx
RiddhimaVarshney1
 
PDF
How to Connect Your On-Premises Site to AWS Using Site-to-Site VPN.pdf
Tamanna
 
PPTX
Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...
Sease
 
Usage of Power BI for Pharmaceutical Data analysis.pptx
Anisha Herala
 
加拿大尼亚加拉学院毕业证书{Niagara在读证明信Niagara成绩单修改}复刻
Taqyea
 
WEF_Future_of_Global_Fintech_Second_Edition_2025.pdf
AproximacionAlFuturo
 
Building Production-Ready AI Agents with LangGraph.pdf
Tamanna
 
01 presentation finyyyal معهد معايره.ppt
eltohamym057
 
Advanced_NLP_with_Transformers_PPT_final 50.pptx
Shiwani Gupta
 
1 DATALINK CONTROL and it's applications
karunanidhilithesh
 
apidays Munich 2025 - Building an AWS Serverless Application with Terraform, ...
apidays
 
List of all the AI prompt cheat codes.pdf
Avijit Kumar Roy
 
This PowerPoint presentation titled "Data Visualization: Turning Data into In...
HemaDivyaKantamaneni
 
Context Engineering vs. Prompt Engineering, A Comprehensive Guide.pdf
Tamanna
 
What does good look like - CRAP Brighton 8 July 2025
Jan Kierzyk
 
Slide studies GC- CRC - PC - HNC baru.pptx
LLen8
 
apidays Helsinki & North 2025 - REST in Peace? Hunting the Dominant Design fo...
apidays
 
AUDITABILITY & COMPLIANCE OF AI SYSTEMS IN HEALTHCARE
GAHI Youssef
 
apidays Helsinki & North 2025 - APIs in the healthcare sector: hospitals inte...
apidays
 
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
AI Project Cycle and Ethical Frameworks.pptx
RiddhimaVarshney1
 
How to Connect Your On-Premises Site to AWS Using Site-to-Site VPN.pdf
Tamanna
 
Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...
Sease
 

Productionizing H2O Models with Apache Spark with Jakub Hava and Michal Malohlava