SlideShare a Scribd company logo
Prasad Chandravihar, Lennox International
Janath Manohararaj, Lennox International
How Azure Databricks
helped make IoT
Analytics a reality
#Ent7SAIS
Agenda
• About Lennox
• Problem Statement
• How did we start
• Challenges
• Machine Learning using Databricks
• Wrap – up
• Q&A
2#Ent7SAIS
About Lennox
#Ent7SAIS#Ent7SAIS
• Lennox International Inc. is an intercontinental
provider of climate control products for the heating,
ventilation, air conditioning, and refrigeration markets.
• Founded in 1895, in Marshalltown, Iowa, by Dave
Lennox.
• Three core businesses: Residential Heating and
Cooling, Commercial Heating and Cooling, and
Refrigeration.
• The company headquarters are in Richardson, Texas,
near Dallas.
Problem Statement
Particular type of mechanical switch trips
pretty often and causes discomfort to the
users
Use data sets captured from the IoT
devices to analyze
Build ML models to predict the trips and
also detect patterns/influencing variables
#Ent7SAIS#Ent7SAIS
How did we start
• Initial Model was developed in desktop tools with about
15 devices
• Accuracy levels – Recall : 65%, Specificity : 80%
Challenges
• Consuming the entire data set
• Data orchestration – 10 Billion rows
• Assembling different skill sets
• Team structure, working model
• Right Computing platform – Cloud ?
• collaboration among Data Scientists &
Engineers
Small file problem
• Traverse through 7 million
directory paths to gather data
Solution:
Developed a batch program which would gather
data from each file in each destination and
append to a table
Machine Learning Core team
Data
Engineering
• Helps on
identifying the
right data sets
• Helps in data
orchestration
Big Data Cloud Architecture - helps the team on the best use of the tools and resources
• Constantly works with Data science and Engineering members during data orchestration
Data Science
• Helps on getting
the data ready for
the model (parts of
ETL)
• Explore different
data science
models
Platform Eng.
• Analyzes the
feasibility to
productize
Eng. SME
• Feedback on the
outputs
• Helps in making
sure the analysis
is progressing in
the right direction
Why Azure Databricks ?
• Unified platform for Machine Learning & Data
engineering
• Provides collaboration between team members
• Spark with minimal tuning
• Average time to start a cluster – 4 mins
• Automatic scaling – very important as different
jobs are different sizes
• Ability to integrate sparkling water, H20 driverless
AI and Tensorflow through GUI
• Ability to integrate PowerBI
Machine Learning using Azure
Databricks
• Equipment information
• Weather information
• Demographic information
• ELT pipeline
• Training the ML model
• Validation and scoring
• Sends data
every minute
70% of the time – structuring and preparing the data, creating labels etc..
Machine Learning using Azure
Databricks
Class Imbalance:
We had way more 0’s than 1’s as labels , for ex: for every 590M 0’s we had 5K 1’s.
We then tried 3 different approaches:
• Over sample the 1’s (SMOTE, duplicating rows etc..)
• Under sample the 0’s (relatively small set of 0’s which represent the population)
• Multiple samples of 0’s and the same set of 1’s
– each set of 0’s & 1’s will have its own model
– Created the ensemble of models
Machine Learning using Azure
Databricks
First sample of 0’s &
1’s with the
assembler of features
#Bins, #trees, #depth
and threshold tuned
using an exhaustive
grid search
Model was tuned for the best Area under Precision Recall
Sample output from Grid
search
Machine Learning using Azure
Databricks
………
Test Data
Ensemble
Train 1 Train 2 Train n
Machine Learning using Azure
Databricks
• Right from data engineering to ML models
was done in one single notebook
• Coding language - PySpark
• Each ML model was tuned to get the best
hyper parameters using an automated grid
search
• The ensemble model helped us reduce a lot
of false positives
Wrap – up
Journey to ML 15 devices 500 devices 25K devices
Accuracy
TP - 65% TP - 80% TP – 84.6%
TN - 80% TN - 85% TN – 99.5%
Data Volume
2 Million 100 Million 10 Billion
Tools
Desktop tools Spark ML,H2O Azure Databricks
Cloud Services
Provider
n/a Microsoft Microsoft
ML models used
Decision Tree Multiple (Gradient Boosted
Trees, Random Forest, etc)
Multiple (Gradient Boosted
Trees, Random Forest, etc)
Time to run ML
model
6 hours 30 mins 50 mins
Wrap – up
• Build a cross functional team to execute machine learning projects
• In most of projects 70% of the time is spent on cleansing and
transforming the data set
• Give a lot of focus into engineering features
• Explore sparkling water (H20 on databricks) gives a lot of auto ML
options
• Platform which lets team members collaborate and develop the
project end to end
17#Ent7SAIS

More Related Content

What's hot (20)

PDF
201905 Azure Databricks for Machine Learning
Mark Tabladillo
 
PPTX
Global AI Bootcamp Madrid - Azure Databricks
Alberto Diaz Martin
 
PPTX
Azure Databricks - An Introduction (by Kris Bock)
Daniel Toomey
 
PPTX
Ai & Data Analytics 2018 - Azure Databricks for data scientist
Alberto Diaz Martin
 
PPTX
Develop scalable analytical solutions with Azure Data Factory & Azure SQL Dat...
Microsoft Tech Community
 
PPTX
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Michael Rys
 
PDF
Designing a modern data warehouse in azure
Antonios Chatzipavlis
 
PDF
DataOps for the Modern Data Warehouse on Microsoft Azure @ NDCOslo 2020 - Lac...
Lace Lofranco
 
PPTX
Azure Databricks for Data Scientists
Richard Garris
 
PPTX
Building Modern Data Platform with Microsoft Azure
Dmitry Anoshin
 
PPTX
Introducing Azure SQL Data Warehouse
James Serra
 
PPTX
Running cost effective big data workloads with Azure Synapse and Azure Data L...
Michael Rys
 
PDF
Big Data Adavnced Analytics on Microsoft Azure
Mark Tabladillo
 
PPTX
Spark Streaming with Azure Databricks
Dustin Vannoy
 
PPTX
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Michael Rys
 
PDF
Moving to Databricks & Delta
Databricks
 
PPTX
Delta Lake with Azure Databricks
Dustin Vannoy
 
PPTX
Big Data on Azure Tutorial
rustd
 
PPTX
Overview on Azure Machine Learning
James Serra
 
PPTX
Azure data bricks by Eugene Polonichko
Alex Tumanoff
 
201905 Azure Databricks for Machine Learning
Mark Tabladillo
 
Global AI Bootcamp Madrid - Azure Databricks
Alberto Diaz Martin
 
Azure Databricks - An Introduction (by Kris Bock)
Daniel Toomey
 
Ai & Data Analytics 2018 - Azure Databricks for data scientist
Alberto Diaz Martin
 
Develop scalable analytical solutions with Azure Data Factory & Azure SQL Dat...
Microsoft Tech Community
 
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Michael Rys
 
Designing a modern data warehouse in azure
Antonios Chatzipavlis
 
DataOps for the Modern Data Warehouse on Microsoft Azure @ NDCOslo 2020 - Lac...
Lace Lofranco
 
Azure Databricks for Data Scientists
Richard Garris
 
Building Modern Data Platform with Microsoft Azure
Dmitry Anoshin
 
Introducing Azure SQL Data Warehouse
James Serra
 
Running cost effective big data workloads with Azure Synapse and Azure Data L...
Michael Rys
 
Big Data Adavnced Analytics on Microsoft Azure
Mark Tabladillo
 
Spark Streaming with Azure Databricks
Dustin Vannoy
 
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Michael Rys
 
Moving to Databricks & Delta
Databricks
 
Delta Lake with Azure Databricks
Dustin Vannoy
 
Big Data on Azure Tutorial
rustd
 
Overview on Azure Machine Learning
James Serra
 
Azure data bricks by Eugene Polonichko
Alex Tumanoff
 

Similar to How Azure Databricks helped make IoT Analytics a Reality with Janath Manohararaj and Prasad Chandravihar (20)

PDF
Turn Data Into Actionable Insights - StampedeCon 2016
StampedeCon
 
PDF
Making Data Science Scalable - 5 Lessons Learned
Laurenz Wuttke
 
PPTX
Introduction to Databricks - AccentFuture
Accentfuture
 
PDF
C19013010 the tutorial to build shared ai services session 1
Bill Liu
 
PPTX
Machine Learning and AI
James Serra
 
PDF
201908 Overview of Automated ML
Mark Tabladillo
 
PDF
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
Ilkay Altintas, Ph.D.
 
PPTX
Designing Artificial Intelligence
David Chou
 
PPTX
Big Data Analytics Strategy and Roadmap
Srinath Perera
 
PPTX
Microsoft Azure BI Solutions in the Cloud
Mark Kromer
 
PPTX
Essential Data Engineering for Data Scientist
SoftServe
 
PDF
Adf and ala design c sharp corner toronto chapter feb 2019 meetup nik shahriar
Nilesh Shah
 
PPTX
The Challenges of Bringing Machine Learning to the Masses
Alice Zheng
 
PDF
Agile Data Rationalization for Operational Intelligence
Inside Analysis
 
PDF
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Spark Summit
 
PPTX
A machine learning and data science pipeline for real companies
DataWorks Summit
 
PDF
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Perficient, Inc.
 
PPTX
From Insight to Action: Using Data Science to Transform Your Organization
Cloudera, Inc.
 
PDF
Scaling up with Cisco Big Data: Data + Science = Data Science
eRic Choo
 
PDF
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
DATAVERSITY
 
Turn Data Into Actionable Insights - StampedeCon 2016
StampedeCon
 
Making Data Science Scalable - 5 Lessons Learned
Laurenz Wuttke
 
Introduction to Databricks - AccentFuture
Accentfuture
 
C19013010 the tutorial to build shared ai services session 1
Bill Liu
 
Machine Learning and AI
James Serra
 
201908 Overview of Automated ML
Mark Tabladillo
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
Ilkay Altintas, Ph.D.
 
Designing Artificial Intelligence
David Chou
 
Big Data Analytics Strategy and Roadmap
Srinath Perera
 
Microsoft Azure BI Solutions in the Cloud
Mark Kromer
 
Essential Data Engineering for Data Scientist
SoftServe
 
Adf and ala design c sharp corner toronto chapter feb 2019 meetup nik shahriar
Nilesh Shah
 
The Challenges of Bringing Machine Learning to the Masses
Alice Zheng
 
Agile Data Rationalization for Operational Intelligence
Inside Analysis
 
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Spark Summit
 
A machine learning and data science pipeline for real companies
DataWorks Summit
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Perficient, Inc.
 
From Insight to Action: Using Data Science to Transform Your Organization
Cloudera, Inc.
 
Scaling up with Cisco Big Data: Data + Science = Data Science
eRic Choo
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
DATAVERSITY
 
Ad

More from Databricks (20)

PPTX
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
PPT
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
PPTX
Data Lakehouse Symposium | Day 2
Databricks
 
PPTX
Data Lakehouse Symposium | Day 4
Databricks
 
PDF
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
PDF
Democratizing Data Quality Through a Centralized Platform
Databricks
 
PDF
Learn to Use Databricks for Data Science
Databricks
 
PDF
Why APM Is Not the Same As ML Monitoring
Databricks
 
PDF
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
PDF
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
PDF
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
PDF
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
PDF
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
PDF
Sawtooth Windows for Feature Aggregations
Databricks
 
PDF
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
PDF
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
PDF
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
PDF
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
PDF
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
PDF
Machine Learning CI/CD for Email Attack Detection
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
Machine Learning CI/CD for Email Attack Detection
Databricks
 
Ad

Recently uploaded (20)

PPTX
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
PPTX
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
PDF
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
PPT
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
PDF
Choosing the Right Database for Indexing.pdf
Tamanna
 
PDF
apidays Helsinki & North 2025 - REST in Peace? Hunting the Dominant Design fo...
apidays
 
PDF
apidays Helsinki & North 2025 - APIs in the healthcare sector: hospitals inte...
apidays
 
PPTX
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
 
PDF
OPPOTUS - Malaysias on Malaysia 1Q2025.pdf
Oppotus
 
PPTX
GenAI-Introduction-to-Copilot-for-Bing-March-2025-FOR-HUB.pptx
cleydsonborges1
 
PPTX
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 
PDF
Early_Diabetes_Detection_using_Machine_L.pdf
maria879693
 
PDF
How to Connect Your On-Premises Site to AWS Using Site-to-Site VPN.pdf
Tamanna
 
PPTX
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
PPTX
Dr djdjjdsjsjsjsjsjsjjsjdjdjdjdjjd1.pptx
Nandy31
 
PPTX
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
PPTX
The _Operations_on_Functions_Addition subtruction Multiplication and Division...
mdregaspi24
 
PPTX
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
PDF
Copia de Strategic Roadmap Infographics by Slidesgo.pptx (1).pdf
ssuserd4c6911
 
PDF
R Cookbook - Processing and Manipulating Geological spatial data with R.pdf
OtnielSimopiaref2
 
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
Choosing the Right Database for Indexing.pdf
Tamanna
 
apidays Helsinki & North 2025 - REST in Peace? Hunting the Dominant Design fo...
apidays
 
apidays Helsinki & North 2025 - APIs in the healthcare sector: hospitals inte...
apidays
 
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
 
OPPOTUS - Malaysias on Malaysia 1Q2025.pdf
Oppotus
 
GenAI-Introduction-to-Copilot-for-Bing-March-2025-FOR-HUB.pptx
cleydsonborges1
 
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 
Early_Diabetes_Detection_using_Machine_L.pdf
maria879693
 
How to Connect Your On-Premises Site to AWS Using Site-to-Site VPN.pdf
Tamanna
 
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
Dr djdjjdsjsjsjsjsjsjjsjdjdjdjdjjd1.pptx
Nandy31
 
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
The _Operations_on_Functions_Addition subtruction Multiplication and Division...
mdregaspi24
 
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
Copia de Strategic Roadmap Infographics by Slidesgo.pptx (1).pdf
ssuserd4c6911
 
R Cookbook - Processing and Manipulating Geological spatial data with R.pdf
OtnielSimopiaref2
 

How Azure Databricks helped make IoT Analytics a Reality with Janath Manohararaj and Prasad Chandravihar

  • 1. Prasad Chandravihar, Lennox International Janath Manohararaj, Lennox International How Azure Databricks helped make IoT Analytics a reality #Ent7SAIS
  • 2. Agenda • About Lennox • Problem Statement • How did we start • Challenges • Machine Learning using Databricks • Wrap – up • Q&A 2#Ent7SAIS
  • 3. About Lennox #Ent7SAIS#Ent7SAIS • Lennox International Inc. is an intercontinental provider of climate control products for the heating, ventilation, air conditioning, and refrigeration markets. • Founded in 1895, in Marshalltown, Iowa, by Dave Lennox. • Three core businesses: Residential Heating and Cooling, Commercial Heating and Cooling, and Refrigeration. • The company headquarters are in Richardson, Texas, near Dallas.
  • 4. Problem Statement Particular type of mechanical switch trips pretty often and causes discomfort to the users Use data sets captured from the IoT devices to analyze Build ML models to predict the trips and also detect patterns/influencing variables #Ent7SAIS#Ent7SAIS
  • 5. How did we start • Initial Model was developed in desktop tools with about 15 devices • Accuracy levels – Recall : 65%, Specificity : 80%
  • 6. Challenges • Consuming the entire data set • Data orchestration – 10 Billion rows • Assembling different skill sets • Team structure, working model • Right Computing platform – Cloud ? • collaboration among Data Scientists & Engineers
  • 7. Small file problem • Traverse through 7 million directory paths to gather data Solution: Developed a batch program which would gather data from each file in each destination and append to a table
  • 8. Machine Learning Core team Data Engineering • Helps on identifying the right data sets • Helps in data orchestration Big Data Cloud Architecture - helps the team on the best use of the tools and resources • Constantly works with Data science and Engineering members during data orchestration Data Science • Helps on getting the data ready for the model (parts of ETL) • Explore different data science models Platform Eng. • Analyzes the feasibility to productize Eng. SME • Feedback on the outputs • Helps in making sure the analysis is progressing in the right direction
  • 9. Why Azure Databricks ? • Unified platform for Machine Learning & Data engineering • Provides collaboration between team members • Spark with minimal tuning • Average time to start a cluster – 4 mins • Automatic scaling – very important as different jobs are different sizes • Ability to integrate sparkling water, H20 driverless AI and Tensorflow through GUI • Ability to integrate PowerBI
  • 10. Machine Learning using Azure Databricks • Equipment information • Weather information • Demographic information • ELT pipeline • Training the ML model • Validation and scoring • Sends data every minute 70% of the time – structuring and preparing the data, creating labels etc..
  • 11. Machine Learning using Azure Databricks Class Imbalance: We had way more 0’s than 1’s as labels , for ex: for every 590M 0’s we had 5K 1’s. We then tried 3 different approaches: • Over sample the 1’s (SMOTE, duplicating rows etc..) • Under sample the 0’s (relatively small set of 0’s which represent the population) • Multiple samples of 0’s and the same set of 1’s – each set of 0’s & 1’s will have its own model – Created the ensemble of models
  • 12. Machine Learning using Azure Databricks First sample of 0’s & 1’s with the assembler of features #Bins, #trees, #depth and threshold tuned using an exhaustive grid search Model was tuned for the best Area under Precision Recall Sample output from Grid search
  • 13. Machine Learning using Azure Databricks ……… Test Data Ensemble Train 1 Train 2 Train n
  • 14. Machine Learning using Azure Databricks • Right from data engineering to ML models was done in one single notebook • Coding language - PySpark • Each ML model was tuned to get the best hyper parameters using an automated grid search • The ensemble model helped us reduce a lot of false positives
  • 15. Wrap – up Journey to ML 15 devices 500 devices 25K devices Accuracy TP - 65% TP - 80% TP – 84.6% TN - 80% TN - 85% TN – 99.5% Data Volume 2 Million 100 Million 10 Billion Tools Desktop tools Spark ML,H2O Azure Databricks Cloud Services Provider n/a Microsoft Microsoft ML models used Decision Tree Multiple (Gradient Boosted Trees, Random Forest, etc) Multiple (Gradient Boosted Trees, Random Forest, etc) Time to run ML model 6 hours 30 mins 50 mins
  • 16. Wrap – up • Build a cross functional team to execute machine learning projects • In most of projects 70% of the time is spent on cleansing and transforming the data set • Give a lot of focus into engineering features • Explore sparkling water (H20 on databricks) gives a lot of auto ML options • Platform which lets team members collaborate and develop the project end to end