SlideShare a Scribd company logo
Scalable Machine Learning
For Smarter Applications
Who am I?
Hank Roark (hank@h2oai.com, @hankroark)
Data Scientist & Hacker @ H2O.ai
Lecturer in Systems Thinking, UIUC
13 years at John Deere doing Research, New Product
Development, New High Tech Ventures
Previously at startups and consulting
Physics Georgia Tech

Systems Design & Management MIT
• Founded: 2011 venture-backed, debuted in 2012
• Product: H2O open source in-memory prediction engine
• Team: 37 - Distributed Systems Engineers doing ML
• HQ: Mountain View, CA
H2O.ai Overview
H2O.ai

Machine Intelligence
25,000 commits / 3yrs
H2O World Conference 2014
Team Work @ H2O.ai
28
Join H2O World Nov 9-11 2015!
What is H2O?
Math Platform
• Open
source in-
memory
prediction
engine
• Parallelized
and
distributed
algorithms
making the
most use
API
• Easy to
use and
adopt
• Written in
Java –
perfect for
Java
Programmer
s
• REST API
(JSON) –
Big Data
• More data?
Or better
models?
BOTH
• Use all of
your data –
model
without
down
sampling
• Run a
H2O.ai

Machine Intelligence
Accuracy with Speed and Scale
H2O.ai

Machine Intelligence
H2O Software Stack
JavaScript R Python Excel/Tableau
Network
Rapids Expression Evaluation
Engine
Scala
Customer
Algorithm
Customer
Algorithm
Parse
GLM
GBM
RF
Deep Learning
K-Means
PCA
In-H2O Prediction
Engine
Fluid Vector Frame
Distributed K/V Store
Non-blocking Hash Map
Job
MRTask
Fork/Join
Flow
Customer Algorithm
Spark Hadoop Standalone H2O
H2O.ai

Machine Intelligence
Reading Data from HDFS into H2O with R
STEP 1
R user
h2o_df = h2o.importFile(“hdfs://path/to/data.csv”)
H2O.ai

Machine Intelligence
Reading Data from HDFS into H2O with R
H2O
H2O
H2O
data.csv
HTTP REST API
request to H2O
has HDFS path
H2O ClusterInitiate distributed
ingest
HDFS
Request data
from HDFS
STEP 2
2.2
2.3
2.4
R
h2o.importFile()
2.1
R function call
H2O.ai

Machine Intelligence
Reading Data from HDFS into H2O with R
H2O
H2O
H2O
R
HDFS
STEP 3
Cluster IP
Cluster Port
Pointer to Data
Return pointer to
data in REST API
JSON Response
HDFS provides
data
3.3
3.4
3.1h2o_df object
created in R
data.csv
h2o_df
H2O
Frame
3.2
Distributed H2O
Frame in DKV
H2O Cluster
H2O.ai

Machine Intelligence
R Script Starting H2O GLM
HTTP
REST/JSON
.h2o.startModelJob()
POST /3/ModelBuilders/glm
h2o.glm()
R script
Standard R process
TCP/IP
HTTP
REST/JSON
/3/ModelBuilders/glm endpoint
Job
GLM algorithm
GLM tasks
Fork/Join
framework
K/V store
framework
H2O process
Network layer
REST layer
H2O - algos
H2O - core
User process
H2O process
Legend
H2O.ai

Machine Intelligence
R Script Retrieving H2O GLM Result
HTTP
REST/JSON
h2o.getModel()
GET /3/Models/glm_model_id
h2o.glm()
R script
Standard R process
TCP/IP
HTTP
REST/JSON
/3/Models endpoint
Fork/Join
framework
K/V store
framework
H2O process
Network layer
REST layer
H2O - algos
H2O - core
User process
H2O process
Legend
Step 1
• Download and install h2o: h2o.ai, hit download
button
• SL Guest, password SerendipityOSW
• Only requirement is JDK 1.7+
• plus required packages if using R or Python
• Pick R, Python (2.7.x), or Standalone for tonight
13
Step 2
• http://bit.ly/1SNkmFa
• Contains training and validation data,
starter R and Python scripts
• Unzip chicago.zip
14
Thank You
(final holdout test set for Chicago Meetup is at https://s3.amazonaws.com/0xdata-public/hank/mikeditka.csv)
Thank you Chicago for a great time!

More Related Content

What's hot (20)

PDF
H2O Rains with Databricks Cloud - NY 02.16.16
Sri Ambati
 
PDF
H2O Big Join Slides
Sri Ambati
 
PDF
Machine Learning with H2O, Spark, and Python at Strata 2015
Sri Ambati
 
PDF
Scalable Machine Learning in R and Python with H2O
Sri Ambati
 
PDF
Intro to H2O Machine Learning in Python - Galvanize Seattle
Sri Ambati
 
PDF
Latest Developments in H2O
Sri Ambati
 
PDF
H2O at BelgradeR Meetup
Jo-fai Chow
 
PDF
Hambug R Meetup - Intro to H2O
Sri Ambati
 
PDF
Making Multimillion-Dollar Baseball Decisions with H2O AutoML, LIME and Shiny
Jo-fai Chow
 
PDF
Automatic and Interpretable Machine Learning in R with H2O and LIME (Milan Ed...
Sri Ambati
 
PPT
GPU Accelerated Machine Learning
Sri Ambati
 
PDF
H2O World 2017 Keynote - Jim McHugh, VP & GM of Data Center, NVIDIA
Sri Ambati
 
PPTX
Project "Deep Water"
Jo-fai Chow
 
PDF
Intro to H2O Machine Learning in R at Santa Clara University
Sri Ambati
 
PPTX
Paris Data Geek - Spark Streaming
Djamel Zouaoui
 
PPTX
Sri Ambati – CEO, 0xdata at MLconf ATL
MLconf
 
PDF
ASGARD Splunk Conf 2016
Keith Kraus
 
PDF
Building Deep Learning Powered Big Data: Spark Summit East talk by Jiao Wang ...
Spark Summit
 
PDF
H2O Deep Water - Making Deep Learning Accessible to Everyone
Sri Ambati
 
PPTX
Applying Machine Learning using H2O
Ian Gomez
 
H2O Rains with Databricks Cloud - NY 02.16.16
Sri Ambati
 
H2O Big Join Slides
Sri Ambati
 
Machine Learning with H2O, Spark, and Python at Strata 2015
Sri Ambati
 
Scalable Machine Learning in R and Python with H2O
Sri Ambati
 
Intro to H2O Machine Learning in Python - Galvanize Seattle
Sri Ambati
 
Latest Developments in H2O
Sri Ambati
 
H2O at BelgradeR Meetup
Jo-fai Chow
 
Hambug R Meetup - Intro to H2O
Sri Ambati
 
Making Multimillion-Dollar Baseball Decisions with H2O AutoML, LIME and Shiny
Jo-fai Chow
 
Automatic and Interpretable Machine Learning in R with H2O and LIME (Milan Ed...
Sri Ambati
 
GPU Accelerated Machine Learning
Sri Ambati
 
H2O World 2017 Keynote - Jim McHugh, VP & GM of Data Center, NVIDIA
Sri Ambati
 
Project "Deep Water"
Jo-fai Chow
 
Intro to H2O Machine Learning in R at Santa Clara University
Sri Ambati
 
Paris Data Geek - Spark Streaming
Djamel Zouaoui
 
Sri Ambati – CEO, 0xdata at MLconf ATL
MLconf
 
ASGARD Splunk Conf 2016
Keith Kraus
 
Building Deep Learning Powered Big Data: Spark Summit East talk by Jiao Wang ...
Spark Summit
 
H2O Deep Water - Making Deep Learning Accessible to Everyone
Sri Ambati
 
Applying Machine Learning using H2O
Ian Gomez
 

Similar to Introduction to data science with H2O-Chicago (20)

PDF
Introduction to Data Science with H2O- Mountain View
Sri Ambati
 
PPTX
Intro to R and H2O with Spencer Aiello
Sri Ambati
 
PDF
High Performance Machine Learning in R with H2O
Sri Ambati
 
PDF
Introducción al Aprendizaje Automatico con H2O-3 (1)
Sri Ambati
 
PDF
Intro to Machine Learning with H2O and AWS
Sri Ambati
 
PDF
Start Getting Your Feet Wet in Open Source Machine and Deep Learning
Ian Gomez
 
PDF
Applied Machine learning using H2O, python and R Workshop
Avkash Chauhan
 
PDF
Introduction to H2O and Model Stacking Use Cases
Jo-fai Chow
 
PDF
Introduction to Machine Learning with H2O and Python
Jo-fai Chow
 
PDF
H2O at Berlin R Meetup
Jo-fai Chow
 
PDF
Berlin R Meetup
Sri Ambati
 
PPTX
ISV Showcase: End-to-end Machine Learning using H2O on Azure
Microsoft Tech Community
 
PDF
H2O at Poznan R Meetup
Jo-fai Chow
 
PPTX
H2O 0xdata MLconf
Sri Ambati
 
PDF
Intro to H2O in Python - Data Science LA
Sri Ambati
 
PDF
Belgrade R - Intro to H2O and Deep Water
Sri Ambati
 
PPTX
Machine Learning for Smarter Apps - Jacksonville Meetup
Sri Ambati
 
PDF
New Developments in H2O: April 2017 Edition
Sri Ambati
 
PPTX
Sparkling Water Webinar October 29th, 2014
Sri Ambati
 
PDF
ArnoCandelScalabledatascienceanddeeplearningwithh2o_gotochg
Sri Ambati
 
Introduction to Data Science with H2O- Mountain View
Sri Ambati
 
Intro to R and H2O with Spencer Aiello
Sri Ambati
 
High Performance Machine Learning in R with H2O
Sri Ambati
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Sri Ambati
 
Intro to Machine Learning with H2O and AWS
Sri Ambati
 
Start Getting Your Feet Wet in Open Source Machine and Deep Learning
Ian Gomez
 
Applied Machine learning using H2O, python and R Workshop
Avkash Chauhan
 
Introduction to H2O and Model Stacking Use Cases
Jo-fai Chow
 
Introduction to Machine Learning with H2O and Python
Jo-fai Chow
 
H2O at Berlin R Meetup
Jo-fai Chow
 
Berlin R Meetup
Sri Ambati
 
ISV Showcase: End-to-end Machine Learning using H2O on Azure
Microsoft Tech Community
 
H2O at Poznan R Meetup
Jo-fai Chow
 
H2O 0xdata MLconf
Sri Ambati
 
Intro to H2O in Python - Data Science LA
Sri Ambati
 
Belgrade R - Intro to H2O and Deep Water
Sri Ambati
 
Machine Learning for Smarter Apps - Jacksonville Meetup
Sri Ambati
 
New Developments in H2O: April 2017 Edition
Sri Ambati
 
Sparkling Water Webinar October 29th, 2014
Sri Ambati
 
ArnoCandelScalabledatascienceanddeeplearningwithh2o_gotochg
Sri Ambati
 
Ad

More from Sri Ambati (20)

PDF
H2O Label Genie Starter Track - Support Presentation
Sri Ambati
 
PDF
H2O.ai Agents : From Theory to Practice - Support Presentation
Sri Ambati
 
PDF
H2O Generative AI Starter Track - Support Presentation Slides.pdf
Sri Ambati
 
PDF
H2O Gen AI Ecosystem Overview - Level 1 - Slide Deck
Sri Ambati
 
PDF
An In-depth Exploration of Enterprise h2oGPTe Slide Deck
Sri Ambati
 
PDF
Intro to Enterprise h2oGPTe Presentation Slides
Sri Ambati
 
PDF
Enterprise h2o GPTe Learning Path Slide Deck
Sri Ambati
 
PDF
H2O Wave Course Starter - Presentation Slides
Sri Ambati
 
PDF
Large Language Models (LLMs) - Level 3 Slides
Sri Ambati
 
PDF
Data Science and Machine Learning Platforms (2024) Slides
Sri Ambati
 
PDF
Data Prep for H2O Driverless AI - Slides
Sri Ambati
 
PDF
H2O Cloud AI Developer Services - Slides (2024)
Sri Ambati
 
PDF
LLM Learning Path Level 2 - Presentation Slides
Sri Ambati
 
PDF
LLM Learning Path Level 1 - Presentation Slides
Sri Ambati
 
PDF
Hydrogen Torch - Starter Course - Presentation Slides
Sri Ambati
 
PDF
Presentation Resources - H2O Gen AI Ecosystem Overview - Level 2
Sri Ambati
 
PDF
H2O Driverless AI Starter Course - Slides and Assignments
Sri Ambati
 
PPTX
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
PDF
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
Sri Ambati
 
PPTX
Generative AI Masterclass - Model Risk Management.pptx
Sri Ambati
 
H2O Label Genie Starter Track - Support Presentation
Sri Ambati
 
H2O.ai Agents : From Theory to Practice - Support Presentation
Sri Ambati
 
H2O Generative AI Starter Track - Support Presentation Slides.pdf
Sri Ambati
 
H2O Gen AI Ecosystem Overview - Level 1 - Slide Deck
Sri Ambati
 
An In-depth Exploration of Enterprise h2oGPTe Slide Deck
Sri Ambati
 
Intro to Enterprise h2oGPTe Presentation Slides
Sri Ambati
 
Enterprise h2o GPTe Learning Path Slide Deck
Sri Ambati
 
H2O Wave Course Starter - Presentation Slides
Sri Ambati
 
Large Language Models (LLMs) - Level 3 Slides
Sri Ambati
 
Data Science and Machine Learning Platforms (2024) Slides
Sri Ambati
 
Data Prep for H2O Driverless AI - Slides
Sri Ambati
 
H2O Cloud AI Developer Services - Slides (2024)
Sri Ambati
 
LLM Learning Path Level 2 - Presentation Slides
Sri Ambati
 
LLM Learning Path Level 1 - Presentation Slides
Sri Ambati
 
Hydrogen Torch - Starter Course - Presentation Slides
Sri Ambati
 
Presentation Resources - H2O Gen AI Ecosystem Overview - Level 2
Sri Ambati
 
H2O Driverless AI Starter Course - Slides and Assignments
Sri Ambati
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
Sri Ambati
 
Generative AI Masterclass - Model Risk Management.pptx
Sri Ambati
 
Ad

Recently uploaded (20)

PDF
Top Agile Project Management Tools for Teams in 2025
Orangescrum
 
PDF
AI + DevOps = Smart Automation with devseccops.ai.pdf
Devseccops.ai
 
PDF
HiHelloHR – Simplify HR Operations for Modern Workplaces
HiHelloHR
 
PDF
MiniTool Partition Wizard Free Crack + Full Free Download 2025
bashirkhan333g
 
PPTX
Migrating Millions of Users with Debezium, Apache Kafka, and an Acyclic Synch...
MD Sayem Ahmed
 
PPTX
AEM User Group: India Chapter Kickoff Meeting
jennaf3
 
PDF
SAP Firmaya İade ABAB Kodları - ABAB ile yazılmıl hazır kod örneği
Salih Küçük
 
PDF
Download Canva Pro 2025 PC Crack Full Latest Version
bashirkhan333g
 
PDF
[Solution] Why Choose the VeryPDF DRM Protector Custom-Built Solution for You...
Lingwen1998
 
PPTX
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pptx
Varsha Nayak
 
PDF
Thread In Android-Mastering Concurrency for Responsive Apps.pdf
Nabin Dhakal
 
PPTX
Transforming Mining & Engineering Operations with Odoo ERP | Streamline Proje...
SatishKumar2651
 
PPTX
Tally software_Introduction_Presentation
AditiBansal54083
 
PDF
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pdf
Varsha Nayak
 
PDF
IDM Crack with Internet Download Manager 6.42 Build 43 with Patch Latest 2025
bashirkhan333g
 
PDF
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
PDF
유니티에서 Burst Compiler+ThreadedJobs+SIMD 적용사례
Seongdae Kim
 
PDF
Unlock Efficiency with Insurance Policy Administration Systems
Insurance Tech Services
 
PPTX
OpenChain @ OSS NA - In From the Cold: Open Source as Part of Mainstream Soft...
Shane Coughlan
 
PDF
Open Chain Q2 Steering Committee Meeting - 2025-06-25
Shane Coughlan
 
Top Agile Project Management Tools for Teams in 2025
Orangescrum
 
AI + DevOps = Smart Automation with devseccops.ai.pdf
Devseccops.ai
 
HiHelloHR – Simplify HR Operations for Modern Workplaces
HiHelloHR
 
MiniTool Partition Wizard Free Crack + Full Free Download 2025
bashirkhan333g
 
Migrating Millions of Users with Debezium, Apache Kafka, and an Acyclic Synch...
MD Sayem Ahmed
 
AEM User Group: India Chapter Kickoff Meeting
jennaf3
 
SAP Firmaya İade ABAB Kodları - ABAB ile yazılmıl hazır kod örneği
Salih Küçük
 
Download Canva Pro 2025 PC Crack Full Latest Version
bashirkhan333g
 
[Solution] Why Choose the VeryPDF DRM Protector Custom-Built Solution for You...
Lingwen1998
 
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pptx
Varsha Nayak
 
Thread In Android-Mastering Concurrency for Responsive Apps.pdf
Nabin Dhakal
 
Transforming Mining & Engineering Operations with Odoo ERP | Streamline Proje...
SatishKumar2651
 
Tally software_Introduction_Presentation
AditiBansal54083
 
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pdf
Varsha Nayak
 
IDM Crack with Internet Download Manager 6.42 Build 43 with Patch Latest 2025
bashirkhan333g
 
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
유니티에서 Burst Compiler+ThreadedJobs+SIMD 적용사례
Seongdae Kim
 
Unlock Efficiency with Insurance Policy Administration Systems
Insurance Tech Services
 
OpenChain @ OSS NA - In From the Cold: Open Source as Part of Mainstream Soft...
Shane Coughlan
 
Open Chain Q2 Steering Committee Meeting - 2025-06-25
Shane Coughlan
 

Introduction to data science with H2O-Chicago

  • 1. Scalable Machine Learning For Smarter Applications
  • 2. Who am I? Hank Roark ([email protected], @hankroark) Data Scientist & Hacker @ H2O.ai Lecturer in Systems Thinking, UIUC 13 years at John Deere doing Research, New Product Development, New High Tech Ventures Previously at startups and consulting Physics Georgia Tech
 Systems Design & Management MIT
  • 3. • Founded: 2011 venture-backed, debuted in 2012 • Product: H2O open source in-memory prediction engine • Team: 37 - Distributed Systems Engineers doing ML • HQ: Mountain View, CA H2O.ai Overview H2O.ai
 Machine Intelligence
  • 4. 25,000 commits / 3yrs H2O World Conference 2014 Team Work @ H2O.ai 28 Join H2O World Nov 9-11 2015!
  • 5. What is H2O? Math Platform • Open source in- memory prediction engine • Parallelized and distributed algorithms making the most use API • Easy to use and adopt • Written in Java – perfect for Java Programmer s • REST API (JSON) – Big Data • More data? Or better models? BOTH • Use all of your data – model without down sampling • Run a H2O.ai
 Machine Intelligence
  • 7. H2O.ai
 Machine Intelligence H2O Software Stack JavaScript R Python Excel/Tableau Network Rapids Expression Evaluation Engine Scala Customer Algorithm Customer Algorithm Parse GLM GBM RF Deep Learning K-Means PCA In-H2O Prediction Engine Fluid Vector Frame Distributed K/V Store Non-blocking Hash Map Job MRTask Fork/Join Flow Customer Algorithm Spark Hadoop Standalone H2O
  • 8. H2O.ai
 Machine Intelligence Reading Data from HDFS into H2O with R STEP 1 R user h2o_df = h2o.importFile(“hdfs://path/to/data.csv”)
  • 9. H2O.ai
 Machine Intelligence Reading Data from HDFS into H2O with R H2O H2O H2O data.csv HTTP REST API request to H2O has HDFS path H2O ClusterInitiate distributed ingest HDFS Request data from HDFS STEP 2 2.2 2.3 2.4 R h2o.importFile() 2.1 R function call
  • 10. H2O.ai
 Machine Intelligence Reading Data from HDFS into H2O with R H2O H2O H2O R HDFS STEP 3 Cluster IP Cluster Port Pointer to Data Return pointer to data in REST API JSON Response HDFS provides data 3.3 3.4 3.1h2o_df object created in R data.csv h2o_df H2O Frame 3.2 Distributed H2O Frame in DKV H2O Cluster
  • 11. H2O.ai
 Machine Intelligence R Script Starting H2O GLM HTTP REST/JSON .h2o.startModelJob() POST /3/ModelBuilders/glm h2o.glm() R script Standard R process TCP/IP HTTP REST/JSON /3/ModelBuilders/glm endpoint Job GLM algorithm GLM tasks Fork/Join framework K/V store framework H2O process Network layer REST layer H2O - algos H2O - core User process H2O process Legend
  • 12. H2O.ai
 Machine Intelligence R Script Retrieving H2O GLM Result HTTP REST/JSON h2o.getModel() GET /3/Models/glm_model_id h2o.glm() R script Standard R process TCP/IP HTTP REST/JSON /3/Models endpoint Fork/Join framework K/V store framework H2O process Network layer REST layer H2O - algos H2O - core User process H2O process Legend
  • 13. Step 1 • Download and install h2o: h2o.ai, hit download button • SL Guest, password SerendipityOSW • Only requirement is JDK 1.7+ • plus required packages if using R or Python • Pick R, Python (2.7.x), or Standalone for tonight 13
  • 14. Step 2 • http://bit.ly/1SNkmFa • Contains training and validation data, starter R and Python scripts • Unzip chicago.zip 14
  • 15. Thank You (final holdout test set for Chicago Meetup is at https://s3.amazonaws.com/0xdata-public/hank/mikeditka.csv) Thank you Chicago for a great time!