SlideShare a Scribd company logo
H2O.ai 
Open Source 
Machine Learning 
for Intelligent Applications 
H2O.ai 
Machine Intelligence
Time is the only non-renewable resource 
Speed Matters! 
H2O.ai 
Machine Intelligence
Sampling 
Law of Large Numbers
Data scientists & Analysts will not 
write Java MapReduce
On Premise 
On / Off Hadoop 
On EC2 
Per Node 
2M Row ingest/sec 
50M Row Regression/sec 
750M Row Aggregates / sec 
Tableau 
R 
JSON 
Scala 
Java 
Python 
H2O Prediction Engine 
SDK / API 
Nano Fast Scoring Engine 
Deep learning 
Regression 
Trees 
Boosting 
Forests 
Solvers 
Gradients 
ensembles 
Cluster 
Query Processor R-engine 
In-Mem Map Reduce 
Distributed fork/join 
Memory Manager 
Columnar Compression 
Classify 
HDFS S3 SQL NoSQL 
Excel 
H2O.ai 
Machine Intelligence
Infrastructure 
Parallelism 
Data Parallel 
Chunking Express! 
Algorithm Parallel 
Parallel Code blocks 
Math Parallelism 
ADMM, HogWild 
Distribution 
Zero-Serialization – 
endian wars have ended
Scalable Machine Learning 
For Smarter Applications 
H2O.ai 
Machine Intelligence 
H2O.ai
Programmable Internet 
H2O.ai 
Machine Intelligence
Programmable Devices 
H2O.ai 
Machine Intelligence
AdSense Sense 
H2O.ai 
Machine Intelligence
Correlation Causality 
H2O.ai 
Machine Intelligence
Data 
Sensors 
Devices 
Events. Signals. TimeSeries 
Semi-structured data. json. 
High velocity. 
High dimensions. 
H2O.ai 
Machine Intelligence
Streaming Data 
Historical Data 
Scoring from prediction 
Anomaly and Outliers Detection 
Unsupervised Learning 
H2O.ai 
Machine Intelligence
Streaming Data 
Historical Data 
Anomaly and Outliers Detection 
model 
Scoring from prediction 
H2O.ai 
Machine Intelligence
Streaming Data 
Historical Data 
Clustering / Unsupervise Learning 
model 
Scoring from prediction 
H2O.ai 
Machine Intelligence
H2O.ai 
Machine Intelligence https://developer.nest.com/documentation/api-reference/devices
Take Models to Production in Java 
H2O.ai 
Machine Intelligence
Onset of Rita 
H2O.ai 
Machine Intelligence
Common ensemble techniques 
Bayesian Classifiers 
Ensembles of all hypotheses in hypothesis-space. 
Bagging 
Each model votes with equal weight. 
Bagging trains models on randomly drawn subset 
Boosting 
Incrementally build an ensemble of each new model 
H2O.ai 
Machine Intelligence
H2O.ai 
Machine Intelligence
H2O.ai 
Machine Intelligence
Gradient Boosting Machine 
H2O.ai 
Machine Intelligence
H2O.ai 
Machine Intelligence
H2O.ai 
Machine Intelligence
Variable Importance Comparison 
Gradient Boosting Machine, 50 trees 
Random Forest, 50 trees 
H2O.ai 
Machine Intelligence
Generalized Linear Modeling – Variable Importance 
GLM, Elastic Net (Binomial) 
GLM, Elastic Net (Binomial) 
Categorical expansion on Age 
H2O.ai 
Machine Intelligence
Variable Importance Comparison 
Deep Learning (Tanh / 4-layer) 
Deep Learning (Tanh / 3-layer) 
H2O.ai 
Machine Intelligence
every generation needs to invent it’s math. 
Our data, our tools! 
H2O.ai 
Machine Intelligence
Power-Law
Code is incomplete without Community! 
Open Source Matters! 
H2O.ai 
Machine Intelligence
H2O 0xdata MLconf
Community 
Committers 30 
Meet ups 90 
in 12 months 
Coverage Conference 
Speakers 
Curriculum 
Stanford, MIT, CSU, 
SUNY, SJSU, Purdue
Data Driven Decision Making is hard! 
Courage Matters! 
H2O.ai 
Machine Intelligence
Winning customer trust not just quarters! 
Mindset matters! 
H2O.ai 
Machine Intelligence
Thanks 
Courtney, Nick & MLConf 
for bringing us to ATL
Sparkling Water Application Life 
Cycle 
Sparkling 
App 
jar file 
Spark 
Master 
JVM 
spark-submit 
Spark 
Worker 
JVM 
Spark 
Worker 
JVM 
Spark 
Worker 
JVM 
(1) 
(2) 
(3) 
(1) User submits App to Spark cluster Master node 
(2) App distributed to Spark cluster Worker nodes 
(3) Spark Executor JVMs start for App 
(4) H2O instance starts within each Executor JVM 
(5) App’s Scala main program runs 
Sparkling Water Cluster 
Spark 
Executor 
JVM 
H2O 
(4) 
Spark 
Executor 
JVM 
H2O 
Spark 
Executor 
JVM 
H2O
Sparkling Water Data Distribution 
Sparkling Water Cluster 
H2O 
H2O 
H2O 
Spark Executor JVM 
Data 
Source 
(e.g. 
HDFS) 
(1) 
(2) 
(3) 
(1) Use Spark SQL to read 
data into a Spark RDD 
(2) Convert Spark RDD to 
H2O RDD; H2O RDD is 
column-based and highly 
compressed 
(Not shown) Run modeling 
and prediction workflows 
with H2O 
(3) Convert H2O RDD (e.g. 
predictions) back to Spark 
RDD 
H2O 
RDD 
Spark 
RDD 
Spark Executor JVM 
Spark Executor JVM
H2O 
HHDFS 
H2O 
YARN 
HHDFS 
H2O 
Hadoop MR 
HHDFS 
Standalone YARN H2O in MR 
H HortonWorks, Cloudera, MapR, Intel 2O.ai 
Machine Intelligence
H2O – The Killer-App for Spark 
MLlib H2O SQL 
H2ORDD 
HDFS=DATA 
Sparkling Water 
H2O.ai 
Machine Intelligence 
In-Memory Big Data, Columnar 
ML 100x faster Algos 
R CRAN, API, fast engine 
API Spark API, Java MM 
Community Devs, Data Science
examples 
H2O.ai 
Machine Intelligence
H2O 0xdata MLconf
Fraud / No-fraud 
1/1000 unbalanced 
Click-Stream 
Browse / Click / Buy 
H2O.ai 
Machine Intelligence
Propensity Models 
Merchants –to- Users 
Lifetime Value of Customer 
Pricing Engines 
H2O.ai 
Machine Intelligence

More Related Content

PPTX
Sri Ambati – CEO, 0xdata at MLconf ATL
MLconf
 
PDF
Introduction to data science with H2O-Chicago
Sri Ambati
 
PDF
H2O Overview with Amy Wang at useR! Aalborg
Sri Ambati
 
PDF
h2oensemble with Erin Ledell at useR! Aalborg
Sri Ambati
 
PDF
Python and H2O with Cliff Click at PyData Dallas 2015
Sri Ambati
 
PPTX
Stratio big data spain
Álvaro Agea Herradón
 
PDF
ArnoCandelAIFrontiers011217
Sri Ambati
 
PDF
Introduction to Data Science with H2O- Mountain View
Sri Ambati
 
Sri Ambati – CEO, 0xdata at MLconf ATL
MLconf
 
Introduction to data science with H2O-Chicago
Sri Ambati
 
H2O Overview with Amy Wang at useR! Aalborg
Sri Ambati
 
h2oensemble with Erin Ledell at useR! Aalborg
Sri Ambati
 
Python and H2O with Cliff Click at PyData Dallas 2015
Sri Ambati
 
Stratio big data spain
Álvaro Agea Herradón
 
ArnoCandelAIFrontiers011217
Sri Ambati
 
Introduction to Data Science with H2O- Mountain View
Sri Ambati
 

What's hot (20)

PDF
Building Deep Learning Powered Big Data: Spark Summit East talk by Jiao Wang ...
Spark Summit
 
PPTX
Sparkling Water Webinar October 29th, 2014
Sri Ambati
 
PDF
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
Chetan Khatri
 
PPTX
Lightening Fast Big Data Analytics using Apache Spark
Manish Gupta
 
PDF
H2O PySparkling Water
Sri Ambati
 
PPTX
Apache Spark MLlib - Random Foreset and Desicion Trees
Tuhin Mahmud
 
PDF
Apache spark - Architecture , Overview & libraries
Walaa Hamdy Assy
 
PDF
An efficient data mining solution by integrating Spark and Cassandra
Stratio
 
PDF
Why spark by Stratio - v.1.0
Stratio
 
PPTX
Machine Learning with Spark
elephantscale
 
PDF
Predictive Models at Scale
Nikhil Ketkar
 
PDF
Distributed Deep Learning At Scale On Apache Spark With BigDL
Yulia Tell
 
PDF
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Data Con LA
 
PDF
Applying Machine Learning to Live Patient Data
Carol McDonald
 
PDF
Big data with java
Stefan Angelov
 
PDF
Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)
Spark Summit
 
PPTX
Genomic Scale Big Data Pipelines
Lynn Langit
 
PDF
Intro to H2O Machine Learning in R at Santa Clara University
Sri Ambati
 
PDF
Real Time Processing Using Twitter Heron by Karthik Ramasamy
Data Con LA
 
PDF
DASK and Apache Spark
Databricks
 
Building Deep Learning Powered Big Data: Spark Summit East talk by Jiao Wang ...
Spark Summit
 
Sparkling Water Webinar October 29th, 2014
Sri Ambati
 
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
Chetan Khatri
 
Lightening Fast Big Data Analytics using Apache Spark
Manish Gupta
 
H2O PySparkling Water
Sri Ambati
 
Apache Spark MLlib - Random Foreset and Desicion Trees
Tuhin Mahmud
 
Apache spark - Architecture , Overview & libraries
Walaa Hamdy Assy
 
An efficient data mining solution by integrating Spark and Cassandra
Stratio
 
Why spark by Stratio - v.1.0
Stratio
 
Machine Learning with Spark
elephantscale
 
Predictive Models at Scale
Nikhil Ketkar
 
Distributed Deep Learning At Scale On Apache Spark With BigDL
Yulia Tell
 
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Data Con LA
 
Applying Machine Learning to Live Patient Data
Carol McDonald
 
Big data with java
Stefan Angelov
 
Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)
Spark Summit
 
Genomic Scale Big Data Pipelines
Lynn Langit
 
Intro to H2O Machine Learning in R at Santa Clara University
Sri Ambati
 
Real Time Processing Using Twitter Heron by Karthik Ramasamy
Data Con LA
 
DASK and Apache Spark
Databricks
 
Ad

Similar to H2O 0xdata MLconf (20)

PPTX
ISV Showcase: End-to-end Machine Learning using H2O on Azure
Microsoft Tech Community
 
PDF
Start Getting Your Feet Wet in Open Source Machine and Deep Learning
Ian Gomez
 
PDF
Intro to Machine Learning with H2O and AWS
Sri Ambati
 
PDF
Machine Learning With H2O vs SparkML
Arnab Biswas
 
PPTX
Project "Deep Water"
Jo-fai Chow
 
PDF
H2O at BelgradeR Meetup
Jo-fai Chow
 
PDF
Belgrade R - Intro to H2O and Deep Water
Sri Ambati
 
PPTX
Machine Learning for Smarter Apps - Jacksonville Meetup
Sri Ambati
 
PDF
Machine Learning with H2O, Spark, and Python at Strata 2015
Sri Ambati
 
PDF
Machine Learning on Google Cloud with H2O
Sri Ambati
 
PDF
Introducción al Aprendizaje Automatico con H2O-3 (1)
Sri Ambati
 
PDF
Latest Developments in H2O
Sri Ambati
 
PDF
H2O at Berlin R Meetup
Jo-fai Chow
 
PDF
Berlin R Meetup
Sri Ambati
 
PDF
H2o.ai presentation at 2nd Virtual Pydata Piraeus meetup
PyData Piraeus
 
PDF
H2O Deep Water - Making Deep Learning Accessible to Everyone
Jo-fai Chow
 
PPTX
AI and AutoML: Debunking Myths
Sri Ambati
 
PDF
H2O Deep Water - Making Deep Learning Accessible to Everyone
Sri Ambati
 
PDF
Scalable and Automatic Machine Learning with H2O
Sri Ambati
 
PDF
Introduction to H2O and Model Stacking Use Cases
Jo-fai Chow
 
ISV Showcase: End-to-end Machine Learning using H2O on Azure
Microsoft Tech Community
 
Start Getting Your Feet Wet in Open Source Machine and Deep Learning
Ian Gomez
 
Intro to Machine Learning with H2O and AWS
Sri Ambati
 
Machine Learning With H2O vs SparkML
Arnab Biswas
 
Project "Deep Water"
Jo-fai Chow
 
H2O at BelgradeR Meetup
Jo-fai Chow
 
Belgrade R - Intro to H2O and Deep Water
Sri Ambati
 
Machine Learning for Smarter Apps - Jacksonville Meetup
Sri Ambati
 
Machine Learning with H2O, Spark, and Python at Strata 2015
Sri Ambati
 
Machine Learning on Google Cloud with H2O
Sri Ambati
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Sri Ambati
 
Latest Developments in H2O
Sri Ambati
 
H2O at Berlin R Meetup
Jo-fai Chow
 
Berlin R Meetup
Sri Ambati
 
H2o.ai presentation at 2nd Virtual Pydata Piraeus meetup
PyData Piraeus
 
H2O Deep Water - Making Deep Learning Accessible to Everyone
Jo-fai Chow
 
AI and AutoML: Debunking Myths
Sri Ambati
 
H2O Deep Water - Making Deep Learning Accessible to Everyone
Sri Ambati
 
Scalable and Automatic Machine Learning with H2O
Sri Ambati
 
Introduction to H2O and Model Stacking Use Cases
Jo-fai Chow
 
Ad

More from Sri Ambati (20)

PDF
H2O Label Genie Starter Track - Support Presentation
Sri Ambati
 
PDF
H2O.ai Agents : From Theory to Practice - Support Presentation
Sri Ambati
 
PDF
H2O Generative AI Starter Track - Support Presentation Slides.pdf
Sri Ambati
 
PDF
H2O Gen AI Ecosystem Overview - Level 1 - Slide Deck
Sri Ambati
 
PDF
An In-depth Exploration of Enterprise h2oGPTe Slide Deck
Sri Ambati
 
PDF
Intro to Enterprise h2oGPTe Presentation Slides
Sri Ambati
 
PDF
Enterprise h2o GPTe Learning Path Slide Deck
Sri Ambati
 
PDF
H2O Wave Course Starter - Presentation Slides
Sri Ambati
 
PDF
Large Language Models (LLMs) - Level 3 Slides
Sri Ambati
 
PDF
Data Science and Machine Learning Platforms (2024) Slides
Sri Ambati
 
PDF
Data Prep for H2O Driverless AI - Slides
Sri Ambati
 
PDF
H2O Cloud AI Developer Services - Slides (2024)
Sri Ambati
 
PDF
LLM Learning Path Level 2 - Presentation Slides
Sri Ambati
 
PDF
LLM Learning Path Level 1 - Presentation Slides
Sri Ambati
 
PDF
Hydrogen Torch - Starter Course - Presentation Slides
Sri Ambati
 
PDF
Presentation Resources - H2O Gen AI Ecosystem Overview - Level 2
Sri Ambati
 
PDF
H2O Driverless AI Starter Course - Slides and Assignments
Sri Ambati
 
PPTX
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
PDF
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
Sri Ambati
 
PPTX
Generative AI Masterclass - Model Risk Management.pptx
Sri Ambati
 
H2O Label Genie Starter Track - Support Presentation
Sri Ambati
 
H2O.ai Agents : From Theory to Practice - Support Presentation
Sri Ambati
 
H2O Generative AI Starter Track - Support Presentation Slides.pdf
Sri Ambati
 
H2O Gen AI Ecosystem Overview - Level 1 - Slide Deck
Sri Ambati
 
An In-depth Exploration of Enterprise h2oGPTe Slide Deck
Sri Ambati
 
Intro to Enterprise h2oGPTe Presentation Slides
Sri Ambati
 
Enterprise h2o GPTe Learning Path Slide Deck
Sri Ambati
 
H2O Wave Course Starter - Presentation Slides
Sri Ambati
 
Large Language Models (LLMs) - Level 3 Slides
Sri Ambati
 
Data Science and Machine Learning Platforms (2024) Slides
Sri Ambati
 
Data Prep for H2O Driverless AI - Slides
Sri Ambati
 
H2O Cloud AI Developer Services - Slides (2024)
Sri Ambati
 
LLM Learning Path Level 2 - Presentation Slides
Sri Ambati
 
LLM Learning Path Level 1 - Presentation Slides
Sri Ambati
 
Hydrogen Torch - Starter Course - Presentation Slides
Sri Ambati
 
Presentation Resources - H2O Gen AI Ecosystem Overview - Level 2
Sri Ambati
 
H2O Driverless AI Starter Course - Slides and Assignments
Sri Ambati
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
Sri Ambati
 
Generative AI Masterclass - Model Risk Management.pptx
Sri Ambati
 

Recently uploaded (20)

PDF
blockchain123456789012345678901234567890
tanvikhunt1003
 
PPTX
MR and reffffffvvvvvvvfversal_083605.pptx
manjeshjain
 
PPTX
short term internship project on Data visualization
JMJCollegeComputerde
 
PPTX
IP_Journal_Articles_2025IP_Journal_Articles_2025
mishell212144
 
PDF
Classifcation using Machine Learning and deep learning
bhaveshagrawal35
 
PPTX
INFO8116 - Week 10 - Slides.pptx data analutics
guddipatel10
 
PPTX
Presentation on animal welfare a good topic
kidscream385
 
PPTX
M1-T1.pptxM1-T1.pptxM1-T1.pptxM1-T1.pptx
teodoroferiarevanojr
 
PDF
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
PPTX
White Blue Simple Modern Enhancing Sales Strategy Presentation_20250724_21093...
RamNeymarjr
 
PPTX
Fuzzy_Membership_Functions_Presentation.pptx
pythoncrazy2024
 
PDF
Fundamentals and Techniques of Biophysics and Molecular Biology (Pranav Kumar...
RohitKumar868624
 
PDF
D9110.pdfdsfvsdfvsdfvsdfvfvfsvfsvffsdfvsdfvsd
minhn6673
 
PPTX
Data Security Breach: Immediate Action Plan
varmabhuvan266
 
PDF
The_Future_of_Data_Analytics_by_CA_Suvidha_Chaplot_UPDATED.pdf
CA Suvidha Chaplot
 
PDF
Key_Statistical_Techniques_in_Analytics_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PPTX
Introduction-to-Python-Programming-Language (1).pptx
dhyeysapariya
 
PPTX
short term project on AI Driven Data Analytics
JMJCollegeComputerde
 
PPTX
lecture 13 mind test academy it skills.pptx
ggesjmrasoolpark
 
PPTX
Blue and Dark Blue Modern Technology Presentation.pptx
ap177979
 
blockchain123456789012345678901234567890
tanvikhunt1003
 
MR and reffffffvvvvvvvfversal_083605.pptx
manjeshjain
 
short term internship project on Data visualization
JMJCollegeComputerde
 
IP_Journal_Articles_2025IP_Journal_Articles_2025
mishell212144
 
Classifcation using Machine Learning and deep learning
bhaveshagrawal35
 
INFO8116 - Week 10 - Slides.pptx data analutics
guddipatel10
 
Presentation on animal welfare a good topic
kidscream385
 
M1-T1.pptxM1-T1.pptxM1-T1.pptxM1-T1.pptx
teodoroferiarevanojr
 
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
White Blue Simple Modern Enhancing Sales Strategy Presentation_20250724_21093...
RamNeymarjr
 
Fuzzy_Membership_Functions_Presentation.pptx
pythoncrazy2024
 
Fundamentals and Techniques of Biophysics and Molecular Biology (Pranav Kumar...
RohitKumar868624
 
D9110.pdfdsfvsdfvsdfvsdfvfvfsvfsvffsdfvsdfvsd
minhn6673
 
Data Security Breach: Immediate Action Plan
varmabhuvan266
 
The_Future_of_Data_Analytics_by_CA_Suvidha_Chaplot_UPDATED.pdf
CA Suvidha Chaplot
 
Key_Statistical_Techniques_in_Analytics_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
Introduction-to-Python-Programming-Language (1).pptx
dhyeysapariya
 
short term project on AI Driven Data Analytics
JMJCollegeComputerde
 
lecture 13 mind test academy it skills.pptx
ggesjmrasoolpark
 
Blue and Dark Blue Modern Technology Presentation.pptx
ap177979
 

H2O 0xdata MLconf

  • 1. H2O.ai Open Source Machine Learning for Intelligent Applications H2O.ai Machine Intelligence
  • 2. Time is the only non-renewable resource Speed Matters! H2O.ai Machine Intelligence
  • 3. Sampling Law of Large Numbers
  • 4. Data scientists & Analysts will not write Java MapReduce
  • 5. On Premise On / Off Hadoop On EC2 Per Node 2M Row ingest/sec 50M Row Regression/sec 750M Row Aggregates / sec Tableau R JSON Scala Java Python H2O Prediction Engine SDK / API Nano Fast Scoring Engine Deep learning Regression Trees Boosting Forests Solvers Gradients ensembles Cluster Query Processor R-engine In-Mem Map Reduce Distributed fork/join Memory Manager Columnar Compression Classify HDFS S3 SQL NoSQL Excel H2O.ai Machine Intelligence
  • 6. Infrastructure Parallelism Data Parallel Chunking Express! Algorithm Parallel Parallel Code blocks Math Parallelism ADMM, HogWild Distribution Zero-Serialization – endian wars have ended
  • 7. Scalable Machine Learning For Smarter Applications H2O.ai Machine Intelligence H2O.ai
  • 8. Programmable Internet H2O.ai Machine Intelligence
  • 9. Programmable Devices H2O.ai Machine Intelligence
  • 10. AdSense Sense H2O.ai Machine Intelligence
  • 11. Correlation Causality H2O.ai Machine Intelligence
  • 12. Data Sensors Devices Events. Signals. TimeSeries Semi-structured data. json. High velocity. High dimensions. H2O.ai Machine Intelligence
  • 13. Streaming Data Historical Data Scoring from prediction Anomaly and Outliers Detection Unsupervised Learning H2O.ai Machine Intelligence
  • 14. Streaming Data Historical Data Anomaly and Outliers Detection model Scoring from prediction H2O.ai Machine Intelligence
  • 15. Streaming Data Historical Data Clustering / Unsupervise Learning model Scoring from prediction H2O.ai Machine Intelligence
  • 16. H2O.ai Machine Intelligence https://developer.nest.com/documentation/api-reference/devices
  • 17. Take Models to Production in Java H2O.ai Machine Intelligence
  • 18. Onset of Rita H2O.ai Machine Intelligence
  • 19. Common ensemble techniques Bayesian Classifiers Ensembles of all hypotheses in hypothesis-space. Bagging Each model votes with equal weight. Bagging trains models on randomly drawn subset Boosting Incrementally build an ensemble of each new model H2O.ai Machine Intelligence
  • 22. Gradient Boosting Machine H2O.ai Machine Intelligence
  • 25. Variable Importance Comparison Gradient Boosting Machine, 50 trees Random Forest, 50 trees H2O.ai Machine Intelligence
  • 26. Generalized Linear Modeling – Variable Importance GLM, Elastic Net (Binomial) GLM, Elastic Net (Binomial) Categorical expansion on Age H2O.ai Machine Intelligence
  • 27. Variable Importance Comparison Deep Learning (Tanh / 4-layer) Deep Learning (Tanh / 3-layer) H2O.ai Machine Intelligence
  • 28. every generation needs to invent it’s math. Our data, our tools! H2O.ai Machine Intelligence
  • 30. Code is incomplete without Community! Open Source Matters! H2O.ai Machine Intelligence
  • 32. Community Committers 30 Meet ups 90 in 12 months Coverage Conference Speakers Curriculum Stanford, MIT, CSU, SUNY, SJSU, Purdue
  • 33. Data Driven Decision Making is hard! Courage Matters! H2O.ai Machine Intelligence
  • 34. Winning customer trust not just quarters! Mindset matters! H2O.ai Machine Intelligence
  • 35. Thanks Courtney, Nick & MLConf for bringing us to ATL
  • 36. Sparkling Water Application Life Cycle Sparkling App jar file Spark Master JVM spark-submit Spark Worker JVM Spark Worker JVM Spark Worker JVM (1) (2) (3) (1) User submits App to Spark cluster Master node (2) App distributed to Spark cluster Worker nodes (3) Spark Executor JVMs start for App (4) H2O instance starts within each Executor JVM (5) App’s Scala main program runs Sparkling Water Cluster Spark Executor JVM H2O (4) Spark Executor JVM H2O Spark Executor JVM H2O
  • 37. Sparkling Water Data Distribution Sparkling Water Cluster H2O H2O H2O Spark Executor JVM Data Source (e.g. HDFS) (1) (2) (3) (1) Use Spark SQL to read data into a Spark RDD (2) Convert Spark RDD to H2O RDD; H2O RDD is column-based and highly compressed (Not shown) Run modeling and prediction workflows with H2O (3) Convert H2O RDD (e.g. predictions) back to Spark RDD H2O RDD Spark RDD Spark Executor JVM Spark Executor JVM
  • 38. H2O HHDFS H2O YARN HHDFS H2O Hadoop MR HHDFS Standalone YARN H2O in MR H HortonWorks, Cloudera, MapR, Intel 2O.ai Machine Intelligence
  • 39. H2O – The Killer-App for Spark MLlib H2O SQL H2ORDD HDFS=DATA Sparkling Water H2O.ai Machine Intelligence In-Memory Big Data, Columnar ML 100x faster Algos R CRAN, API, fast engine API Spark API, Java MM Community Devs, Data Science
  • 40. examples H2O.ai Machine Intelligence
  • 42. Fraud / No-fraud 1/1000 unbalanced Click-Stream Browse / Click / Buy H2O.ai Machine Intelligence
  • 43. Propensity Models Merchants –to- Users Lifetime Value of Customer Pricing Engines H2O.ai Machine Intelligence