SlideShare a Scribd company logo
Intro to AutoML + Hands-on Lab - Erin LeDell, Machine Learning Scientist, H2O.ai
Machine Learning Scientist
@ledell / erin@h2o.ai
Erin LeDell
Automatic Machine Learning
AutoML
MEET
THE MAKERS
ERIN LEDELL
Machine Learning Scientist
NAVDEEP GILL
Software Engineer

& Data Scientist
RAY PECK
Director of Product
Engineering
• Intro to Automatic Machine Learning (AutoML)
• Random Grid Search & Stacked Ensembles
• H2O’s AutoML (R, Python, GUI)
• H2O-3 Roadmap
• Hands-on Tutorial
Agenda
Intro to AutoML
Automatic Machine Learning
Aspects of Automatic ML
Model

Generation
EnsemblesData Prep
Data Prep
• Imputation of missing data
• Standardization of numeric features
• One-hot encoding of categorical features
• Count/Label/Target encoding of categorical features
• Feature selection and/or feature extraction (e.g. PCA)
• Feature engineering
Model Generation
• Cartesian grid search
• Random grid search
• Tune individual models via Early Stopping
• Bayesian Hyperparameter Optimization
Ensembles
• Bagging / Averaging
• Stacking / Super Learning
• Ensemble Selection
Random Stacking
Random Grids + Stacked Ensembles
Stacked Ensembles
• Specify L base learners (with model params).
• Specify a metalearner (just another algo).
• Perform k-fold cross-validation on the base learners.
Stacked Ensembles
• Collect cross-validated predicted values from base
learners.
• Train a second-level metalearning algorithm to find
the optimal combination of base learners.
• Metalearner requires only a small amount of compute
on top of the cross-validation process (it’s cheap).
Random Grid Search + Stacking
• Random Grid Search combined with Stacked
Ensembles is a powerful combination.
• Stacked Ensembles perform particularly well if the
models they are based on (1) are individually
strong, and (2) make uncorrelated errors.
• Random Grid Search is an excellent way to create
a diverse of models for the ensemble.
H2O AutoML
Automatic Machine Learning in H2O
H2O Machine Learning Platform
• Distributed (multi-core + multi-node) implementations of
cutting edge ML algorithms.
• Core algorithms written in high performance Java.
• APIs available in R, Python, Scala & web GUI.
• Works on Hadoop, Spark, EC2, your laptop, etc.
• Easily deploy models to production as pure Java code.
H2O AutoML (first cut)
• Imputation, one-hot encoding, standardization.
• Random Grid Search over a custom hyperparameter
space, defined by expert data scientists.
• Early stopping of individual models and random grids.
• GBMs, Random Forests, Deep Neural Nets, GLMs
• Multiple Stacked Ensembles of models.
• Leaderboard for ranking.
H2O AutoML in R
library(h2o)
h2o.init()



train <- h2o.importFile("train.csv")


aml <- h2o.automl(y = "response_colname", 

training_frame = train,

max_runtime_secs = 600)
lb <- aml@leaderboard

H2O AutoML in Python
import h2o
from h2o.automl import H2OAutoML
h2o.init()



train = h2o.import_file("train.csv")



aml = H2OAutoML(max_runtime_secs = 600)

aml.train(y = "response_colname", 

training_frame = train)



lb = aml.leaderboard
H2O AutoML in Flow
Example Leaderboard for binary classification
H2O AutoML Leaderboard
H2O-3 Roadmap
Coming Soon to H2O
Feature Q1 Q2
New Algorithm: Cox-Proportional Hazards
GLM: Ordinal Regression
GBM: Quasibinomial
NLP Improvements, TF-IDF
Stacked Ensemble: Custom Metalearner
AutoML: New Ensembles
AutoML: Add XGBoost
Distributed XGBoost
New Algorithm: Factorization Machines
H2O-3 Roadmap
https://tinyurl.com/h2o-automl-jira
• Documentation: http://docs.h2o.ai
• Tutorials: https://github.com/h2oai/h2o-tutorials
• Slidedecks: https://github.com/h2oai/h2o-meetups
• Videos: https://www.youtube.com/user/0xdata
• Events & Meetups: http://h2o.ai/events
• Stack Overflow: https://stackoverflow.com/tags/h2o
• Google Group: https://tinyurl.com/h2ostream
• Gitter: http://gitter.im/h2oai/h2o-3
Hands-on Tutorial
DEMO
First-time Qwiklab Account Setup
• Go to http://h2oai.qwiklab.com
• Click on “JOIN”
• Create a new account with a valid email address
• You will receive a confirmation email
• Click on the link in the confirmation email
• Go back to http://h2oai.qwiklab.com and log in
• Go to the Catalog on the left bar
• Choose “Introduction to AutoML in H2O”
• Wait for instructions
https://tinyurl.com/automl-h2oworld17
Code and data available here
H2O AutoML Tutorial

More Related Content

What's hot (20)

PPTX
How to fine-tune and develop your own large language model.pptx
Knoldus Inc.
 
PDF
Large Language Models Bootcamp
Data Science Dojo
 
PDF
What is MLOps
Henrik Skogström
 
PDF
Best Practice on using Azure OpenAI Service
Kumton Suttiraksiri
 
PDF
Intro to LLMs
Loic Merckel
 
PDF
An introduction to the Transformers architecture and BERT
Suman Debnath
 
PDF
Introduction to MLflow
Databricks
 
PDF
Winning Kaggle 101: Introduction to Stacking
Ted Xiao
 
PDF
Training Neural Networks
Databricks
 
PDF
mlflow: Accelerating the End-to-End ML lifecycle
Databricks
 
PDF
General Tips for participating Kaggle Competitions
Mark Peng
 
PDF
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
Sergey Karayev
 
PDF
Efficient Neural Architecture Search via Parameter Sharing
Jinwon Lee
 
PDF
MLOps for production-level machine learning
cnvrg.io AI OS - Hands-on ML Workshops
 
PDF
Mother of Language`s Langchain
Jun-hang Lee
 
PDF
KNIME Software Overview
KNIMESlides
 
PDF
BERT Finetuning Webinar Presentation
bhavesh_physics
 
PDF
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Deep Learning Italia
 
PPTX
Lstm
Mehrnaz Faraz
 
PDF
NLP using transformers
Arvind Devaraj
 
How to fine-tune and develop your own large language model.pptx
Knoldus Inc.
 
Large Language Models Bootcamp
Data Science Dojo
 
What is MLOps
Henrik Skogström
 
Best Practice on using Azure OpenAI Service
Kumton Suttiraksiri
 
Intro to LLMs
Loic Merckel
 
An introduction to the Transformers architecture and BERT
Suman Debnath
 
Introduction to MLflow
Databricks
 
Winning Kaggle 101: Introduction to Stacking
Ted Xiao
 
Training Neural Networks
Databricks
 
mlflow: Accelerating the End-to-End ML lifecycle
Databricks
 
General Tips for participating Kaggle Competitions
Mark Peng
 
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
Sergey Karayev
 
Efficient Neural Architecture Search via Parameter Sharing
Jinwon Lee
 
MLOps for production-level machine learning
cnvrg.io AI OS - Hands-on ML Workshops
 
Mother of Language`s Langchain
Jun-hang Lee
 
KNIME Software Overview
KNIMESlides
 
BERT Finetuning Webinar Presentation
bhavesh_physics
 
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Deep Learning Italia
 
NLP using transformers
Arvind Devaraj
 

Similar to Intro to AutoML + Hands-on Lab - Erin LeDell, Machine Learning Scientist, H2O.ai (20)

PDF
Open Platform for AI & ML modeling
Institute of Contemporary Sciences
 
PDF
Scalable Automatic Machine Learning with H2O
Sri Ambati
 
PPTX
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
Jose Quesada (hiring)
 
PDF
Scalable Data Science in Python and R on Apache Spark
felixcss
 
PDF
Scalable Automatic Machine Learning with H2O” by Erin LeDell, Chief Machine L...
Paris Women in Machine Learning and Data Science
 
PDF
Building a SIMD Supported Vectorized Native Engine for Spark SQL
Databricks
 
PDF
Low Latency Polyglot Model Scoring using Apache Apex
Apache Apex
 
PDF
Spark + H20 = Machine Learning at scale
Mateusz Dymczyk
 
PPTX
R4ML: An R Based Scalable Machine Learning Framework
Alok Singh
 
PDF
Hadoop spark online demo
Tripti Jha
 
PDF
Scalable Automatic Machine Learning in H2O
Sri Ambati
 
PDF
The Nitty Gritty of Advanced Analytics Using Apache Spark in Python
Miklos Christine
 
PPTX
Automate Machine Learning Pipeline Using MLBox
Axel de Romblay
 
PDF
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
PDF
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
Databricks
 
PDF
Ml pipelines with Apache spark and Apache beam - Ottawa Reactive meetup Augus...
Holden Karau
 
PPTX
Building Machine Learning Inference Pipelines at Scale (July 2019)
Julien SIMON
 
PDF
Reproducible AI using MLflow and PyTorch
Databricks
 
PDF
Scalable AutoML for Time Series Forecasting using Ray
Databricks
 
PPTX
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Chester Chen
 
Open Platform for AI & ML modeling
Institute of Contemporary Sciences
 
Scalable Automatic Machine Learning with H2O
Sri Ambati
 
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
Jose Quesada (hiring)
 
Scalable Data Science in Python and R on Apache Spark
felixcss
 
Scalable Automatic Machine Learning with H2O” by Erin LeDell, Chief Machine L...
Paris Women in Machine Learning and Data Science
 
Building a SIMD Supported Vectorized Native Engine for Spark SQL
Databricks
 
Low Latency Polyglot Model Scoring using Apache Apex
Apache Apex
 
Spark + H20 = Machine Learning at scale
Mateusz Dymczyk
 
R4ML: An R Based Scalable Machine Learning Framework
Alok Singh
 
Hadoop spark online demo
Tripti Jha
 
Scalable Automatic Machine Learning in H2O
Sri Ambati
 
The Nitty Gritty of Advanced Analytics Using Apache Spark in Python
Miklos Christine
 
Automate Machine Learning Pipeline Using MLBox
Axel de Romblay
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
Databricks
 
Ml pipelines with Apache spark and Apache beam - Ottawa Reactive meetup Augus...
Holden Karau
 
Building Machine Learning Inference Pipelines at Scale (July 2019)
Julien SIMON
 
Reproducible AI using MLflow and PyTorch
Databricks
 
Scalable AutoML for Time Series Forecasting using Ray
Databricks
 
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Chester Chen
 
Ad

More from Sri Ambati (20)

PDF
H2O Label Genie Starter Track - Support Presentation
Sri Ambati
 
PDF
H2O.ai Agents : From Theory to Practice - Support Presentation
Sri Ambati
 
PDF
H2O Generative AI Starter Track - Support Presentation Slides.pdf
Sri Ambati
 
PDF
H2O Gen AI Ecosystem Overview - Level 1 - Slide Deck
Sri Ambati
 
PDF
An In-depth Exploration of Enterprise h2oGPTe Slide Deck
Sri Ambati
 
PDF
Intro to Enterprise h2oGPTe Presentation Slides
Sri Ambati
 
PDF
Enterprise h2o GPTe Learning Path Slide Deck
Sri Ambati
 
PDF
H2O Wave Course Starter - Presentation Slides
Sri Ambati
 
PDF
Large Language Models (LLMs) - Level 3 Slides
Sri Ambati
 
PDF
Data Science and Machine Learning Platforms (2024) Slides
Sri Ambati
 
PDF
Data Prep for H2O Driverless AI - Slides
Sri Ambati
 
PDF
H2O Cloud AI Developer Services - Slides (2024)
Sri Ambati
 
PDF
LLM Learning Path Level 2 - Presentation Slides
Sri Ambati
 
PDF
LLM Learning Path Level 1 - Presentation Slides
Sri Ambati
 
PDF
Hydrogen Torch - Starter Course - Presentation Slides
Sri Ambati
 
PDF
Presentation Resources - H2O Gen AI Ecosystem Overview - Level 2
Sri Ambati
 
PDF
H2O Driverless AI Starter Course - Slides and Assignments
Sri Ambati
 
PPTX
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
PDF
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
Sri Ambati
 
PPTX
Generative AI Masterclass - Model Risk Management.pptx
Sri Ambati
 
H2O Label Genie Starter Track - Support Presentation
Sri Ambati
 
H2O.ai Agents : From Theory to Practice - Support Presentation
Sri Ambati
 
H2O Generative AI Starter Track - Support Presentation Slides.pdf
Sri Ambati
 
H2O Gen AI Ecosystem Overview - Level 1 - Slide Deck
Sri Ambati
 
An In-depth Exploration of Enterprise h2oGPTe Slide Deck
Sri Ambati
 
Intro to Enterprise h2oGPTe Presentation Slides
Sri Ambati
 
Enterprise h2o GPTe Learning Path Slide Deck
Sri Ambati
 
H2O Wave Course Starter - Presentation Slides
Sri Ambati
 
Large Language Models (LLMs) - Level 3 Slides
Sri Ambati
 
Data Science and Machine Learning Platforms (2024) Slides
Sri Ambati
 
Data Prep for H2O Driverless AI - Slides
Sri Ambati
 
H2O Cloud AI Developer Services - Slides (2024)
Sri Ambati
 
LLM Learning Path Level 2 - Presentation Slides
Sri Ambati
 
LLM Learning Path Level 1 - Presentation Slides
Sri Ambati
 
Hydrogen Torch - Starter Course - Presentation Slides
Sri Ambati
 
Presentation Resources - H2O Gen AI Ecosystem Overview - Level 2
Sri Ambati
 
H2O Driverless AI Starter Course - Slides and Assignments
Sri Ambati
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
Sri Ambati
 
Generative AI Masterclass - Model Risk Management.pptx
Sri Ambati
 
Ad

Recently uploaded (20)

PDF
July Patch Tuesday
Ivanti
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PDF
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PDF
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
July Patch Tuesday
Ivanti
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 

Intro to AutoML + Hands-on Lab - Erin LeDell, Machine Learning Scientist, H2O.ai

  • 4. MEET THE MAKERS ERIN LEDELL Machine Learning Scientist NAVDEEP GILL Software Engineer
 & Data Scientist RAY PECK Director of Product Engineering
  • 5. • Intro to Automatic Machine Learning (AutoML) • Random Grid Search & Stacked Ensembles • H2O’s AutoML (R, Python, GUI) • H2O-3 Roadmap • Hands-on Tutorial Agenda
  • 6. Intro to AutoML Automatic Machine Learning
  • 7. Aspects of Automatic ML Model
 Generation EnsemblesData Prep
  • 8. Data Prep • Imputation of missing data • Standardization of numeric features • One-hot encoding of categorical features • Count/Label/Target encoding of categorical features • Feature selection and/or feature extraction (e.g. PCA) • Feature engineering
  • 9. Model Generation • Cartesian grid search • Random grid search • Tune individual models via Early Stopping • Bayesian Hyperparameter Optimization
  • 10. Ensembles • Bagging / Averaging • Stacking / Super Learning • Ensemble Selection
  • 11. Random Stacking Random Grids + Stacked Ensembles
  • 12. Stacked Ensembles • Specify L base learners (with model params). • Specify a metalearner (just another algo). • Perform k-fold cross-validation on the base learners.
  • 13. Stacked Ensembles • Collect cross-validated predicted values from base learners. • Train a second-level metalearning algorithm to find the optimal combination of base learners. • Metalearner requires only a small amount of compute on top of the cross-validation process (it’s cheap).
  • 14. Random Grid Search + Stacking • Random Grid Search combined with Stacked Ensembles is a powerful combination. • Stacked Ensembles perform particularly well if the models they are based on (1) are individually strong, and (2) make uncorrelated errors. • Random Grid Search is an excellent way to create a diverse of models for the ensemble.
  • 15. H2O AutoML Automatic Machine Learning in H2O
  • 16. H2O Machine Learning Platform • Distributed (multi-core + multi-node) implementations of cutting edge ML algorithms. • Core algorithms written in high performance Java. • APIs available in R, Python, Scala & web GUI. • Works on Hadoop, Spark, EC2, your laptop, etc. • Easily deploy models to production as pure Java code.
  • 17. H2O AutoML (first cut) • Imputation, one-hot encoding, standardization. • Random Grid Search over a custom hyperparameter space, defined by expert data scientists. • Early stopping of individual models and random grids. • GBMs, Random Forests, Deep Neural Nets, GLMs • Multiple Stacked Ensembles of models. • Leaderboard for ranking.
  • 18. H2O AutoML in R library(h2o) h2o.init()
 
 train <- h2o.importFile("train.csv") 
 aml <- h2o.automl(y = "response_colname", 
 training_frame = train,
 max_runtime_secs = 600) lb <- aml@leaderboard

  • 19. H2O AutoML in Python import h2o from h2o.automl import H2OAutoML h2o.init()
 
 train = h2o.import_file("train.csv")
 
 aml = H2OAutoML(max_runtime_secs = 600)
 aml.train(y = "response_colname", 
 training_frame = train)
 
 lb = aml.leaderboard
  • 21. Example Leaderboard for binary classification H2O AutoML Leaderboard
  • 23. Feature Q1 Q2 New Algorithm: Cox-Proportional Hazards GLM: Ordinal Regression GBM: Quasibinomial NLP Improvements, TF-IDF Stacked Ensemble: Custom Metalearner AutoML: New Ensembles AutoML: Add XGBoost Distributed XGBoost New Algorithm: Factorization Machines H2O-3 Roadmap https://tinyurl.com/h2o-automl-jira
  • 24. • Documentation: http://docs.h2o.ai • Tutorials: https://github.com/h2oai/h2o-tutorials • Slidedecks: https://github.com/h2oai/h2o-meetups • Videos: https://www.youtube.com/user/0xdata • Events & Meetups: http://h2o.ai/events • Stack Overflow: https://stackoverflow.com/tags/h2o • Google Group: https://tinyurl.com/h2ostream • Gitter: http://gitter.im/h2oai/h2o-3
  • 26. First-time Qwiklab Account Setup • Go to http://h2oai.qwiklab.com • Click on “JOIN” • Create a new account with a valid email address • You will receive a confirmation email • Click on the link in the confirmation email • Go back to http://h2oai.qwiklab.com and log in • Go to the Catalog on the left bar • Choose “Introduction to AutoML in H2O” • Wait for instructions