SlideShare a Scribd company logo
Kaggle 1st Place in 30 Minutes: Putting
AutoML to Work with Enterprise Data
HILA LAMM
Chief Strategy Officer
at Firefly.AI
Data Science Salon: Kaggle 1st Place in 30 minutes: Putting AutoML to Work with Enterprise Data
AI is disruptive
https://courses.lumenlearning.com/suny-hccc-worldhistory/chapter/the-printing-revolution/
The Printing
Revolution
• The printing press was a factor in the
establishment of a community of scientists who
could easily communicate their discoveries through
widely disseminated scholarly journals, helping to
bring on the scientific revolution.
• Because the printing process ensured that the
same information fell on the same pages, page
numbering, tables of contents, and indices became
common.
• The arrival of mechanical movable type printing
introduced the era of mass communication, which
permanently altered the structure of society. The
relatively unrestricted circulation of information and
revolutionary ideas transcended borders.
https://courses.lumenlearning.com/suny-hccc-worldhistory/chapter/the-printing-revolution/
Two Types of Applied ML
Mimic human ability
Automation =
Faster, cheaper, more consistent
Improve on human
computation ability
Better decision making
• To lend or not to lend?
• When should I service the equipment?
• Is she about to leave me? What can I do to
make her stay?
• Is this a cyberattack or are you just happy
to see me?
• Is there someone on the train tracks or is it
just a cloud?
Which Decision Can I Superpower?
Great! Let’s do ML!
Kaggle 2017 The State of Data Science & Machine Learning
What barriers are faced at work?
7,376 responses, showing top 15 responses
Lack of data science talent
Lack of management support
Lack of clear questions to ask
Reality Check
1. Find the questions behind key daily
decisions
2. Evaluate the impact of taking the
decisions faster or in more accurate way
3. Go fetch the data
Lack of clear questions to ask
Lack of management support >
What do executives want?
Business value (ROI)
Predictable time to delivery
Easy to scale up
Machine learning techniques
Research project
A new research project
FOCUS ON THE
BUSINESS
SHORT TIME TO
DELIVERY
SCALABLE AND
AUTOMATED
FAST, COST EFFECTIVE
EXPERIMENTS
Lack of data science talent
• Liberate talent from
routine work
• Use tools to make
machine learning
accessible to more people
AutoML features to look for:
• Algorithm availability
• Preprocessing capabilities
• Search methods
• Ensembles
• Explainability
• Enterprise readiness
Automatically build in parallel multiple
models to select the best
Open-Source/Paid AutoML tools
• AutoWEKA
• Auto-sklearn
• TPOT
• Google Cloud AutoML
• H20 AutoML
Apply data
preprocessing
Research to pinpoint
the right ML algorithm
Optimize
hyperparameters for
selected algorithms
Golden ML ensemble
Automate the design of
machine learning models:
AutoML
Data
Import &
Analysis
Regression
Classification
Recommendation
Time Series
Generate solution:
Ensemble of best
algorithms and
models
Meta learning
Preprocessing
Firefly API
Firefly Lab - Model building
Model exports
AutoML Platform
Anomaly Detection
Report results
and model insights
Algorithm selection,
Hyperparameter
optimization
Firefly Predict
Deploy on premises in
operational system
Firefly user interface
Upload dataset for
batch predictions
Real-time
predict requests
Batch predict
requests
Target: Reduce false alarm rate of
existing video analytics system
Data: Feature extraction from
moving objects in the videos - a
series of ellipses indicating areas of
change
Solution: Model per camera/sensor
location
Results: Reduced by 90% false
alarms
Case Study:
Homeland Security
Case Study: Homeland Security
Target: Identify cyberattacks based
on behavioral indicators
Data: Hundreds of features of
network IoT data
Need: Fast experiments to identify
relevant features per environment
Solution: Highly accurate, dedicated
models per environment
Case Study:
Cybersecurity
Predict the time it takes to pass
testing for different permutations
of Mercedes-Benz car features
Mercedes-Benz reliability
prediction
1st place
Of 3835 teams
377 30 min
Data Scientist
time
Features
Predicting customer satisfaction
using customer features
Santander Bank Kaggle
challenge
1st place
Of 5123 teams
370
Features
20 min
Data Scientist
time
Thank You!
hila@firefly.ai

More Related Content

PPTX
Machine Learning In Production
Samir Bessalah
 
PPTX
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
Sri Ambati
 
PPTX
Driverless AI - Arno Candel, H2O.ai
Sri Ambati
 
PPTX
Production ready big ml workflows from zero to hero daniel marcous @ waze
Ido Shilon
 
PDF
Machine Learning with Big Data using Apache Spark
InSemble
 
PPTX
ODSC East 2018
Cameron Sim
 
PPTX
Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...
Sri Ambati
 
PDF
Pm.ais ummit 180917 final
Nisha Talagala
 
Machine Learning In Production
Samir Bessalah
 
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
Sri Ambati
 
Driverless AI - Arno Candel, H2O.ai
Sri Ambati
 
Production ready big ml workflows from zero to hero daniel marcous @ waze
Ido Shilon
 
Machine Learning with Big Data using Apache Spark
InSemble
 
ODSC East 2018
Cameron Sim
 
Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...
Sri Ambati
 
Pm.ais ummit 180917 final
Nisha Talagala
 

What's hot (20)

PDF
Ideas spracklen-final
supportlogic
 
PDF
Basic Data Engineering
Novita Sari
 
PDF
Architecting for Data Science
Johann Schleier-Smith
 
PPTX
Data Science as a Service: Intersection of Cloud Computing and Data Science
Pouria Amirian
 
PDF
Modern Machine Learning Infrastructure and Practices
Will Gardella
 
PDF
Strata San Jose 2016: Scalable Ensemble Learning with H2O
Sri Ambati
 
PDF
Distributed machine learning 101 using apache spark from a browser devoxx.b...
Andy Petrella
 
PPTX
Machine Learning with GraphLab Create
Turi, Inc.
 
PDF
Azure Machine Learning
Mostafa
 
PDF
Strata parallel m-ml-ops_sept_2017
Nisha Talagala
 
PDF
Importance of ML Reproducibility & Applications with MLfLow
Databricks
 
PDF
H2O World - Solving Customer Churn with Machine Learning - Julian Bharadwaj
Sri Ambati
 
PPTX
Machine Learning with Apache Spark
IBM Cloud Data Services
 
PDF
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
Ed Fernandez
 
PDF
The A-Z of Data: Introduction to MLOps
DataPhoenix
 
PDF
Data Workflows for Machine Learning - Seattle DAML
Paco Nathan
 
PDF
Spark Summit EU 2017 - Preventing revenue leakage and monitoring distributed ...
Flavio Clesio
 
PPTX
Rest microservice ml_deployment_ntalagala_ai_conf_2019
Nisha Talagala
 
PDF
H2O World - Building a Smarter Application - Tom Kraljevic
Sri Ambati
 
PPTX
Production machine learning_infrastructure
joshwills
 
Ideas spracklen-final
supportlogic
 
Basic Data Engineering
Novita Sari
 
Architecting for Data Science
Johann Schleier-Smith
 
Data Science as a Service: Intersection of Cloud Computing and Data Science
Pouria Amirian
 
Modern Machine Learning Infrastructure and Practices
Will Gardella
 
Strata San Jose 2016: Scalable Ensemble Learning with H2O
Sri Ambati
 
Distributed machine learning 101 using apache spark from a browser devoxx.b...
Andy Petrella
 
Machine Learning with GraphLab Create
Turi, Inc.
 
Azure Machine Learning
Mostafa
 
Strata parallel m-ml-ops_sept_2017
Nisha Talagala
 
Importance of ML Reproducibility & Applications with MLfLow
Databricks
 
H2O World - Solving Customer Churn with Machine Learning - Julian Bharadwaj
Sri Ambati
 
Machine Learning with Apache Spark
IBM Cloud Data Services
 
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
Ed Fernandez
 
The A-Z of Data: Introduction to MLOps
DataPhoenix
 
Data Workflows for Machine Learning - Seattle DAML
Paco Nathan
 
Spark Summit EU 2017 - Preventing revenue leakage and monitoring distributed ...
Flavio Clesio
 
Rest microservice ml_deployment_ntalagala_ai_conf_2019
Nisha Talagala
 
H2O World - Building a Smarter Application - Tom Kraljevic
Sri Ambati
 
Production machine learning_infrastructure
joshwills
 
Ad

Similar to Data Science Salon: Kaggle 1st Place in 30 minutes: Putting AutoML to Work with Enterprise Data (20)

PDF
Automated Machine Learning
Yuriy Guts
 
PDF
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...
Matt Stubbs
 
PDF
infoShare AI Roadshow 2018 - Adam Karwan (Groupon) - Jak wykorzystać uczenie ...
Infoshare
 
PDF
Productionising Machine Learning Models
Tash Bickley
 
PPTX
L15.pptx
ImonBennett
 
PDF
AI Hierarchy of Needs
Dylan
 
PDF
Automating and Productionizing Machine Learning Pipelines for Real-Time Scor...
Databricks
 
PDF
Mark Edmondson slides
IIHEvents
 
PDF
From Lab to Factory: Creating value with data
Peadar Coyle
 
PDF
General introduction to AI ML DL DS
Roopesh Kohad
 
PPTX
Machine Learning - It's the Data, Stupid
Eoin Delahunty
 
PPTX
Machine learning 060517
The Solutions Group, Inc.
 
PPTX
Data Leaders Summit Barcelona 2018
Harvinder Atwal
 
PPTX
The Art of Intelligence – A Practical Introduction Machine Learning for Oracl...
Lucas Jellema
 
PDF
What's The Role Of Machine Learning In Fast Data And Streaming Applications?
Lightbend
 
PPTX
Workshop_Presentation.pptx
RUDRAPRASADSABAR
 
PDF
The Data Science Process - Do we need it and how to apply?
Ivo Andreev
 
PPTX
Machine Learning for SEOs - SMXL
Britney Muller
 
PDF
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
DATAVERSITY
 
PDF
Executive Briefing: Why managing machines is harder than you think
Peter Skomoroch
 
Automated Machine Learning
Yuriy Guts
 
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...
Matt Stubbs
 
infoShare AI Roadshow 2018 - Adam Karwan (Groupon) - Jak wykorzystać uczenie ...
Infoshare
 
Productionising Machine Learning Models
Tash Bickley
 
L15.pptx
ImonBennett
 
AI Hierarchy of Needs
Dylan
 
Automating and Productionizing Machine Learning Pipelines for Real-Time Scor...
Databricks
 
Mark Edmondson slides
IIHEvents
 
From Lab to Factory: Creating value with data
Peadar Coyle
 
General introduction to AI ML DL DS
Roopesh Kohad
 
Machine Learning - It's the Data, Stupid
Eoin Delahunty
 
Machine learning 060517
The Solutions Group, Inc.
 
Data Leaders Summit Barcelona 2018
Harvinder Atwal
 
The Art of Intelligence – A Practical Introduction Machine Learning for Oracl...
Lucas Jellema
 
What's The Role Of Machine Learning In Fast Data And Streaming Applications?
Lightbend
 
Workshop_Presentation.pptx
RUDRAPRASADSABAR
 
The Data Science Process - Do we need it and how to apply?
Ivo Andreev
 
Machine Learning for SEOs - SMXL
Britney Muller
 
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
DATAVERSITY
 
Executive Briefing: Why managing machines is harder than you think
Peter Skomoroch
 
Ad

More from Formulatedby (20)

PDF
Data Science Salon: An Experiment on Data Science Algorithms Enabled by a Pil...
Formulatedby
 
PDF
Data Science Salon: Are you sure you're an ethical technologist?: Build your ...
Formulatedby
 
PDF
Data Science Salon: In your own words: computing customer similarity from tex...
Formulatedby
 
PPTX
Data Science Salon: nterpretable Predictive Models in the Healthcare Domain
Formulatedby
 
PDF
Data Science Salon: Applications of Embeddings and Deep Learning at Groupon
Formulatedby
 
PPTX
Data Science Salon: Smart Cities
Formulatedby
 
PDF
Data Science Salon: Building a Data Driven Product Mindset
Formulatedby
 
PPTX
Data Science Salon: Introduction to Machine Learning - Marketing Use Case
Formulatedby
 
PDF
Data Science Salon: Adopting Machine Learning to Drive Revenue and Market Share
Formulatedby
 
PDF
Data Science Salon: Data visualization and Analysis in the Florida Panthers H...
Formulatedby
 
PDF
Data Science Salon: Machine Learning for Personalized Cancer Vaccines
Formulatedby
 
PDF
Data Science Salon: Building a Data Science Culture
Formulatedby
 
PPTX
Data Science Salon: Digital Transformation: The Data Science Catalyst
Formulatedby
 
PDF
Data Science Salon: Quit Wasting Time – Case Studies in Production Machine Le...
Formulatedby
 
PDF
Data Science Salon: Enabling self-service predictive analytics at Bidtellect
Formulatedby
 
PPTX
Data Science Salon: MCL Clustering of Sparse Graphs
Formulatedby
 
PPTX
Data Science Salon: Applying Machine Learning to Modernize Business Processes
Formulatedby
 
PPTX
Data Science Salon: Deep Learning as a Product @ Scribd
Formulatedby
 
PPTX
Data Science Salon: Building smart AI: How Deep Learning Can Get You Into Dee...
Formulatedby
 
PDF
Data Science Salon: The Age of Co-creation
Formulatedby
 
Data Science Salon: An Experiment on Data Science Algorithms Enabled by a Pil...
Formulatedby
 
Data Science Salon: Are you sure you're an ethical technologist?: Build your ...
Formulatedby
 
Data Science Salon: In your own words: computing customer similarity from tex...
Formulatedby
 
Data Science Salon: nterpretable Predictive Models in the Healthcare Domain
Formulatedby
 
Data Science Salon: Applications of Embeddings and Deep Learning at Groupon
Formulatedby
 
Data Science Salon: Smart Cities
Formulatedby
 
Data Science Salon: Building a Data Driven Product Mindset
Formulatedby
 
Data Science Salon: Introduction to Machine Learning - Marketing Use Case
Formulatedby
 
Data Science Salon: Adopting Machine Learning to Drive Revenue and Market Share
Formulatedby
 
Data Science Salon: Data visualization and Analysis in the Florida Panthers H...
Formulatedby
 
Data Science Salon: Machine Learning for Personalized Cancer Vaccines
Formulatedby
 
Data Science Salon: Building a Data Science Culture
Formulatedby
 
Data Science Salon: Digital Transformation: The Data Science Catalyst
Formulatedby
 
Data Science Salon: Quit Wasting Time – Case Studies in Production Machine Le...
Formulatedby
 
Data Science Salon: Enabling self-service predictive analytics at Bidtellect
Formulatedby
 
Data Science Salon: MCL Clustering of Sparse Graphs
Formulatedby
 
Data Science Salon: Applying Machine Learning to Modernize Business Processes
Formulatedby
 
Data Science Salon: Deep Learning as a Product @ Scribd
Formulatedby
 
Data Science Salon: Building smart AI: How Deep Learning Can Get You Into Dee...
Formulatedby
 
Data Science Salon: The Age of Co-creation
Formulatedby
 

Recently uploaded (20)

PDF
Mastering Financial Analysis Materials.pdf
SalamiAbdullahi
 
PDF
WISE main accomplishments for ISQOLS award July 2025.pdf
StatsCommunications
 
PDF
The_Future_of_Data_Analytics_by_CA_Suvidha_Chaplot_UPDATED.pdf
CA Suvidha Chaplot
 
PDF
Classifcation using Machine Learning and deep learning
bhaveshagrawal35
 
PDF
blockchain123456789012345678901234567890
tanvikhunt1003
 
PPTX
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 
PPTX
Blue and Dark Blue Modern Technology Presentation.pptx
ap177979
 
PPTX
MR and reffffffvvvvvvvfversal_083605.pptx
manjeshjain
 
PDF
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
PPT
From Vision to Reality: The Digital India Revolution
Harsh Bharvadiya
 
PPTX
Introduction to Biostatistics Presentation.pptx
AtemJoshua
 
PPTX
Introduction to computer chapter one 2017.pptx
mensunmarley
 
PPTX
Fuzzy_Membership_Functions_Presentation.pptx
pythoncrazy2024
 
PDF
Technical Writing Module-I Complete Notes.pdf
VedprakashArya13
 
PDF
An Uncut Conversation With Grok | PDF Document
Mike Hydes
 
PPTX
Data-Users-in-Database-Management-Systems (1).pptx
dharmik832021
 
PPTX
INFO8116 - Week 10 - Slides.pptx data analutics
guddipatel10
 
PDF
Key_Statistical_Techniques_in_Analytics_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PPTX
Data Security Breach: Immediate Action Plan
varmabhuvan266
 
PPTX
Introduction-to-Python-Programming-Language (1).pptx
dhyeysapariya
 
Mastering Financial Analysis Materials.pdf
SalamiAbdullahi
 
WISE main accomplishments for ISQOLS award July 2025.pdf
StatsCommunications
 
The_Future_of_Data_Analytics_by_CA_Suvidha_Chaplot_UPDATED.pdf
CA Suvidha Chaplot
 
Classifcation using Machine Learning and deep learning
bhaveshagrawal35
 
blockchain123456789012345678901234567890
tanvikhunt1003
 
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 
Blue and Dark Blue Modern Technology Presentation.pptx
ap177979
 
MR and reffffffvvvvvvvfversal_083605.pptx
manjeshjain
 
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
From Vision to Reality: The Digital India Revolution
Harsh Bharvadiya
 
Introduction to Biostatistics Presentation.pptx
AtemJoshua
 
Introduction to computer chapter one 2017.pptx
mensunmarley
 
Fuzzy_Membership_Functions_Presentation.pptx
pythoncrazy2024
 
Technical Writing Module-I Complete Notes.pdf
VedprakashArya13
 
An Uncut Conversation With Grok | PDF Document
Mike Hydes
 
Data-Users-in-Database-Management-Systems (1).pptx
dharmik832021
 
INFO8116 - Week 10 - Slides.pptx data analutics
guddipatel10
 
Key_Statistical_Techniques_in_Analytics_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
Data Security Breach: Immediate Action Plan
varmabhuvan266
 
Introduction-to-Python-Programming-Language (1).pptx
dhyeysapariya
 

Data Science Salon: Kaggle 1st Place in 30 minutes: Putting AutoML to Work with Enterprise Data

  • 1. Kaggle 1st Place in 30 Minutes: Putting AutoML to Work with Enterprise Data HILA LAMM Chief Strategy Officer at Firefly.AI
  • 5. • The printing press was a factor in the establishment of a community of scientists who could easily communicate their discoveries through widely disseminated scholarly journals, helping to bring on the scientific revolution. • Because the printing process ensured that the same information fell on the same pages, page numbering, tables of contents, and indices became common. • The arrival of mechanical movable type printing introduced the era of mass communication, which permanently altered the structure of society. The relatively unrestricted circulation of information and revolutionary ideas transcended borders. https://courses.lumenlearning.com/suny-hccc-worldhistory/chapter/the-printing-revolution/
  • 6. Two Types of Applied ML Mimic human ability Automation = Faster, cheaper, more consistent Improve on human computation ability Better decision making
  • 7. • To lend or not to lend? • When should I service the equipment? • Is she about to leave me? What can I do to make her stay? • Is this a cyberattack or are you just happy to see me? • Is there someone on the train tracks or is it just a cloud? Which Decision Can I Superpower?
  • 9. Kaggle 2017 The State of Data Science & Machine Learning What barriers are faced at work? 7,376 responses, showing top 15 responses Lack of data science talent Lack of management support Lack of clear questions to ask Reality Check
  • 10. 1. Find the questions behind key daily decisions 2. Evaluate the impact of taking the decisions faster or in more accurate way 3. Go fetch the data Lack of clear questions to ask
  • 11. Lack of management support > What do executives want? Business value (ROI) Predictable time to delivery Easy to scale up Machine learning techniques Research project A new research project FOCUS ON THE BUSINESS SHORT TIME TO DELIVERY SCALABLE AND AUTOMATED FAST, COST EFFECTIVE EXPERIMENTS
  • 12. Lack of data science talent • Liberate talent from routine work • Use tools to make machine learning accessible to more people
  • 13. AutoML features to look for: • Algorithm availability • Preprocessing capabilities • Search methods • Ensembles • Explainability • Enterprise readiness Automatically build in parallel multiple models to select the best Open-Source/Paid AutoML tools • AutoWEKA • Auto-sklearn • TPOT • Google Cloud AutoML • H20 AutoML Apply data preprocessing Research to pinpoint the right ML algorithm Optimize hyperparameters for selected algorithms Golden ML ensemble Automate the design of machine learning models: AutoML
  • 14. Data Import & Analysis Regression Classification Recommendation Time Series Generate solution: Ensemble of best algorithms and models Meta learning Preprocessing Firefly API Firefly Lab - Model building Model exports AutoML Platform Anomaly Detection Report results and model insights Algorithm selection, Hyperparameter optimization Firefly Predict Deploy on premises in operational system Firefly user interface Upload dataset for batch predictions Real-time predict requests Batch predict requests
  • 15. Target: Reduce false alarm rate of existing video analytics system Data: Feature extraction from moving objects in the videos - a series of ellipses indicating areas of change Solution: Model per camera/sensor location Results: Reduced by 90% false alarms Case Study: Homeland Security
  • 17. Target: Identify cyberattacks based on behavioral indicators Data: Hundreds of features of network IoT data Need: Fast experiments to identify relevant features per environment Solution: Highly accurate, dedicated models per environment Case Study: Cybersecurity
  • 18. Predict the time it takes to pass testing for different permutations of Mercedes-Benz car features Mercedes-Benz reliability prediction 1st place Of 3835 teams 377 30 min Data Scientist time Features
  • 19. Predicting customer satisfaction using customer features Santander Bank Kaggle challenge 1st place Of 5123 teams 370 Features 20 min Data Scientist time