SlideShare a Scribd company logo
DECEMBER 14
GLOBAL AI BOOTCAMP IS POWERED BY:
The Power of Auto ML
How does AutoML “Magic” Happen
Thanks to our Sponsors:
• Software Architect @
o 17+ years professional experience
• Microsoft Azure MVP
• External Expert Horizon 2020, Eurostars-Eureka
• External Expert InnoFund Denmark, RIF Cyprus
• Business Interests
o Web Development, SOA, Integration
o IoT, Machine Learning, Computer Intelligence
o Security & Performance Optimization
• Contact
ivelin.andreev@icb.bg
www.linkedin.com/in/ivelin
www.slideshare.net/ivoandreev
About me
Contents
1. Machine Learning Workflow
2. Visual Interface for Azure ML Service
3. Automated ML
4. Advanced ML with Azure Monitor
5. Deep Learning with Tensorflow
6. AI Ops
7. Cognitive Vision Services
8. Insights with Text Analytics and Vision
9. Cognitive Decision Service
10. Cognitive Search Service
11. Version Control for ML
12. VS Code for Python ML
13. Bot Framework
14. Search Bots with Cognitive Services
15. Bot Architecture Best Practices
16. AI and Cognitive Services in Power BI
17. Form Processing with AI Builder
AGENDA
Auto ML
Pipelines
Auto ML Under the Hood
Azure ML Designer
Demo (AutoML Python SDK)
ML is a Process
• Iterative data science process:
o Business problem understanding
o Data collection, cleaning, exploration
o Model building
o Performance evaluation
o Deployment
• Auto ML: Automate environment,
data preparation,
experimentation,
deployment
AutoML is not Auto Data Science
• Any ML Task = {data} + {problem type} + {loss function}
• ML project effort and budget
o 80% data preparation, 15% modeling and evaluation
o Repetitive effort (react to changes in objectives and data)
• AutoML as a tool
o A recommender system for ML pipelines
to achieve accuracy with less time
• Objective
o Offload data scientists from of repetitive tasks
o Automate problem solution on data with minimal loss
AutoML fills the gap
between “supply” and
“demand” on ML market
AutoML outperforms an
average Data Scientist
Auto ML Builds ML Pipelines
User Input: Dataset, Performance goals, Constraints (CPU, RAM, time)
Auto ML Magic
Results: Automatically determine a pipeline structure with minimal loss on the
validation set within CPU/Memory constraints
Auto ML Steps
1. Determine pipeline structure
2. Select algorithm for each step
3. Tune hyper-parameters
Performance Evaluation
• All 3 steps shall be completed;
• Iterate until performance goals reached
ML Pipeline Steps
An ML pipeline is a technical solution to stitch ML phases and automate workflows
• Data
o Select preprocessing strategy (imbalanced and missing data, normalization, outliers)
o Features (feature extraction, engineering, selection)
• Modeling
o Select algorithm
o Tune hyperparameters (i.e. number of trees)
o Train multiple models, create ensemble
o Score, evaluate, select the best model
• Training & Deployment
o Parallel training on a cluster, Maintain versioning
ML Pipeline Benefits
• Advantages of ML Pipelines
o Parallel and unattended execution
o Reusability through pipeline templates for specific scenarios
o Versioning data and results using pipeline SDK
o Modularity separating areas of concern
o Collaboration among data scientists across ML design process
o Scalability – single ML pipeline can be trained on multiple machines;
different ML pipelines can be tested in parallel on many nodes
• Open Issue
How do pipelines “learn” what to do???
“No free lunch” theorem simplified
(David Wolpert, 1996)
1. Model is simplification of reality
2. Simplification is based on bias
3. Bias fails in some situations
Conclusion 1: No algorithm or
parameter set is always the best.
Conclusion 2: Use knowledge
about data and context.
Automated Data Preparation
Step 1: Data Ingestion
• Requires data storage (Azure Blob mounted by default)
• Data quality issues are common (missing data, mixed units and formats)
• Evaluate quality, select initial features (statistical analysis and visualization)
Rule of Thumb: No algorithm could achieve good results with bad data input
Step 2: Data profiling and cleansing
• AutoML provides a variety of statistics to verify dataset is ready for modelling
o Non-numeric (Min, Max, Count)
o Numeric (Mean, StdDev, Variance, Distribution histogram)
• Cleansing cannot be done in GUI
o Python SDK: azureml.dataprep
o ML Turn on “Automtic preprocessing” option
Auto ML Guardrails
What is: Safeguard users against common issues with data and make corrections
Missing Values
• Strategies: Drop rows; intelligently replace missing values based on other data
Class Imbalances
• Most ML algorithms assume equal distribution, majority classes add more bias
• Strategies: Oversampling (add instances to minority class); Undersampling (majority)
Data Leakage
• Dataset includes information that would not be available at time of prediction
• Actual outcome is already known, model performance will be perfect
• Strategies: Remove leaky features; Add noise; Hold back unseen test data
Automated Data Preparation
Step 3: Feature Engineering
• Impute missing values (mode for categorical, mean for numerical)
• Create categorical features from numeric with low diversity
• YYYY, MM, dd, HH, mm, ss, Day of week, Day of year, Quarter, Week Nr from date
• One-hot encode low cardinality categorical vars (i.e. Gender -> IsMale, IsFemale)
• K-means clustering on each numeric columns for distance to centroid feature
• Term frequency for text variables
• Outlier treatment
Note: General-purpose steps are not domain specific (i.e. income/debt ratio)
Automated Data Preparation
Step 3 just got you into a problem 
• Feature engineering could generate too many features
• Solution need to avoid overfitting, reduce model training time
• We did not put domain knowledge
Step 4: Feature Selection (limited in AutoML)
• Drop high cardinality variables (noise)
• Drop no variance variables (non-informative)
Possible future improvements
• Drop highly correlated fields
Algorithm Selection and Hyperparametrization
Challenges of Configuration Space
• High-dimensionality (multiple continuous, categorical, binary variables)
• Conditionality (some parameter values are relevant in combination)
• No Gradient (loss function has no gradient, expensive evaluation)
Opt3: Bayesian OptimizationOpt1: Grid Search / Brute Force
• Cartesian product on hyperparameter combinations
• The simplest method, dimensionality curse
Opt2: Random Search
• Random configurations within certain budget
• Good baseline, no assumptions, easy parallelization
Meta Learning in AutoML
Challenges
• Avoid starting from scratch on new ML tasks
• Learn from experience, efficiently and in systematic data-driven way
Prerequisite
• Collect meta-data to describe previous tasks (parameters, pipeline structure, evaluations)
Result
• Meta-learner to recommend promising configurations w/o exhaustive search
Notes
• If datasets have similar results on few pipelines => similar results on remaining pipelines
• Operates similarly to recommender systems
• Privacy: AML has no need to access customer data, only pipeline results
Cross-Validation and Ensembling
Cross Validation
• Divide training data in k-subsets
• Repeat k-times: hold out ki, validate on k-1 subsets;
• Average error estimation across k error estimations
Ensembling (bagging, boosting, stacking)
• Combine few of best ML models for improved accuracy at no extra cost
Building Azure ML Pipelines
Azure ML Designer vs Azure ML Studio
• ML Studio – collaborative drag-drop workspace to build, test and deploy ML
• Azure ML – designer, SDK and CLI for data prep., train and deploy ML at scale
Azure ML Designer ML Studio (Classic)
Availability Preview (2019) Generally available (GA) (2015)
Drag-drop interface Yes Yes
Scalability With compute target Up to 10GB data limit training
Module rich Important only Multiple
Compute AML computer CPU/GPU Proprietary compute, CPU only
ML Pipeline Authoring, publishing N/A
ML Ops Flexible deployment and versioning Basic management and deploy
Model portability Portable Proprietary, non-portable
Auto ML Through SDK N/A
Azure ML
What is: cloud-based environment to rapidly build and deploy machine learning
models, by auto-scaling powerful CPU or GPU clusters
How to:
1. 4 Development environments for AML – cloud-based notebook VM (easiest);
local (with Azure subscription), Data Science VM and Azure Databricks
2. Create workspace (Python SDK or Azure Portal)
3. azureml.dataprep Python package to explore, cleanse and transform
4. Train target (Local PC, Azure Linux VM, HDInsight for Spark)
5. azureml.train recommend pipeline based on target metrics
6. Register models for tag, search and deploy (even models trained outside AML)
7. Deploy to Azure Container Instance serverless containers
Interpreting Learning Results (Classification)
• Confusion Matrix
o Rows – true class, Columns – predicted class
o Good model = most values along the diagonal
• Precision-Recall Chart
o Precision = TP / (TP + FP), ability to label correctly
o Recall = TP / (TP + FN), ability to find all instances
o Macro Average PR – independent PR average
o Micro Average PR – weighted PR average (imbalanced)
o Draw PR chart - at different threshold values
• ROC Chart – TP Rate / FP Rate over different thresholds
FPR = FP / (FP + TN) (best is close to 0), TPR = TP (TP + FN) (best is close to 1)
Lift, Gain and Calibration Charts
• Lift Chart – How many times the model is better than random
o Ratio of gain%/random expectation% at a given decile level
o Green line – baseline random guess
• Gain Chart – how much to sample to get target sensitivity (TPR)
o X – percentile addressed, Y - portion positive responses
o Green line - baseline random guess
• Calibration Chart
o Confidence of a predictive model
o Predicted vs actual probability
o Good model: y=x
o Overly confident: y=0 and y=1
Note: perfectly calibrated classifier != perfect classifier
Containers meet Machine Learning
• Steps: (from Portal or AML SDK management API)
o Add model (from local workspace or upload model)
o Add driver script
o Add package dependency file (YML)
o The system creates Docker image and register to Workspace
• Deployment
o Azure Container Instance (ACI) - test, Azure Kubernetes Service (AKS) - prod
o Azure ML Compute, Azure IoT Edge
• Operationalization
o REST API is created automatically
Operationalization
• REST APIs
o Deployment an AML model web service creates single and batch REST API
o APIs consumed by azureml.core.webservice
• Performance Degradation
o Performance in real life may differ from during training
o Data drift - change in characteristics of input data over time
• Monitoring and Drift Analysis
o Input data change over time and lead to performance degradation
o Configure inference data to snapshot and profile against baseline
o ML model trained to detect differences
o Model performance converted to drift coefficient
Takeaways
• Books
o AI MVP Book: Automated Machine Learning
https://www.amazon.com/gp/aw/d/B082P5MK8Y
o Practical Automated ML on Azure
• The No Free Lunch Theorem
https://www.kdnuggets.com/2019/09/no-free-lunch-data-science.html
• Azure ML Studio vs Azure ML Services designer
https://www.codit.eu/blog/azure-machine-learning-studio-vs-services/
https://docs.microsoft.com/en-us/azure/machine-learning/compare-azure-ml-to-
studio-classic
• Bayes Theorem
https://towardsdatascience.com/understanding-bayes-theorem-7e31b8434d4b
Azure ML StudioAure ML Service
Thanks to our Sponsors:

More Related Content

What's hot (20)

PDF
AutoML lectures (ACDL 2019)
Joaquin Vanschoren
 
PDF
Automatic machine learning (AutoML) 101
QuantUniversity
 
PPTX
Introduction to Auto ML
Dmitry Petukhov
 
PPTX
Regulating Generative AI - LLMOps pipelines with Transparency
Debmalya Biswas
 
PDF
Deep Learning - The Past, Present and Future of Artificial Intelligence
Lukas Masuch
 
PDF
MLOps by Sasha Rosenbaum
Sasha Rosenbaum
 
PDF
Implications of GPT-3
Raven Jiang
 
PDF
generative-ai-fundamentals and Large language models
AdventureWorld5
 
PPSX
Autonomous medical coding with discriminative transformers
Patrick Nicolas
 
PDF
ML-Ops how to bring your data science to production
Herman Wu
 
PDF
MLOps Bridging the gap between Data Scientists and Ops.
Knoldus Inc.
 
PDF
Simplifying Model Management with MLflow
Databricks
 
PPTX
MLOps.pptx
AllenPeter7
 
PDF
Using MLOps to Bring ML to Production/The Promise of MLOps
Weaveworks
 
PPTX
Machine Learning
Kumar P
 
PDF
mlflow: Accelerating the End-to-End ML lifecycle
Databricks
 
PDF
Machine Learning Using Cloud Services
SC5.io
 
PDF
Unlocking the Power of Generative AI An Executive's Guide.pdf
PremNaraindas1
 
PDF
Cavalry Ventures | Deep Dive: Generative AI
Cavalry Ventures
 
PPTX
Journey of Generative AI
thomasjvarghese49
 
AutoML lectures (ACDL 2019)
Joaquin Vanschoren
 
Automatic machine learning (AutoML) 101
QuantUniversity
 
Introduction to Auto ML
Dmitry Petukhov
 
Regulating Generative AI - LLMOps pipelines with Transparency
Debmalya Biswas
 
Deep Learning - The Past, Present and Future of Artificial Intelligence
Lukas Masuch
 
MLOps by Sasha Rosenbaum
Sasha Rosenbaum
 
Implications of GPT-3
Raven Jiang
 
generative-ai-fundamentals and Large language models
AdventureWorld5
 
Autonomous medical coding with discriminative transformers
Patrick Nicolas
 
ML-Ops how to bring your data science to production
Herman Wu
 
MLOps Bridging the gap between Data Scientists and Ops.
Knoldus Inc.
 
Simplifying Model Management with MLflow
Databricks
 
MLOps.pptx
AllenPeter7
 
Using MLOps to Bring ML to Production/The Promise of MLOps
Weaveworks
 
Machine Learning
Kumar P
 
mlflow: Accelerating the End-to-End ML lifecycle
Databricks
 
Machine Learning Using Cloud Services
SC5.io
 
Unlocking the Power of Generative AI An Executive's Guide.pdf
PremNaraindas1
 
Cavalry Ventures | Deep Dive: Generative AI
Cavalry Ventures
 
Journey of Generative AI
thomasjvarghese49
 

Similar to The Power of Auto ML and How Does it Work (20)

PPTX
Automated machine learning - Global AI night 2019
Marco Zamana
 
PPTX
Azure machine learning tech mela
Yogendra Tamang
 
PDF
The Data Science Process - Do we need it and how to apply?
Ivo Andreev
 
PPTX
MCT Summit Azure automated Machine Learning
Usama Wahab Khan Cloud, Data and AI
 
PDF
201906 04 Overview of Automated ML June 2019
Mark Tabladillo
 
PDF
GDG DEvFest Hellas 2020 - Automated ML - Panagiotis Papaemmanouil
Panagiotis Papaemmanouil
 
PDF
Azure Machine Learning
Mostafa
 
PPTX
Machine learning
Saravanan Subburayal
 
PPTX
AzureML – zero to hero
Govind Kanshi
 
PDF
Machine learning for IoT - unpacking the blackbox
Ivo Andreev
 
PPTX
Deeplearning and dev ops azure
Vishwas N
 
PDF
The Machine Learning Workflow with Azure
Ivo Andreev
 
PDF
Unleashing the Power of Machine Learning Prototyping Using Azure AutoML and P...
Luca Zavarella
 
PDF
I want my model to be deployed ! (another story of MLOps)
AZUG FR
 
PPTX
Azure Machine Learning Challenge_Speakers Presentation.pptx
DrSatwinderSingh3
 
PDF
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
Florian Roscheck
 
PPTX
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
DotNetCampus
 
PPTX
Net campus2015 antimomusone
DotNetCampus
 
PDF
Walk through of azure machine learning studio new features
Luca Zavarella
 
PPTX
Azure ML Studio
Vikas Sinha
 
Automated machine learning - Global AI night 2019
Marco Zamana
 
Azure machine learning tech mela
Yogendra Tamang
 
The Data Science Process - Do we need it and how to apply?
Ivo Andreev
 
MCT Summit Azure automated Machine Learning
Usama Wahab Khan Cloud, Data and AI
 
201906 04 Overview of Automated ML June 2019
Mark Tabladillo
 
GDG DEvFest Hellas 2020 - Automated ML - Panagiotis Papaemmanouil
Panagiotis Papaemmanouil
 
Azure Machine Learning
Mostafa
 
Machine learning
Saravanan Subburayal
 
AzureML – zero to hero
Govind Kanshi
 
Machine learning for IoT - unpacking the blackbox
Ivo Andreev
 
Deeplearning and dev ops azure
Vishwas N
 
The Machine Learning Workflow with Azure
Ivo Andreev
 
Unleashing the Power of Machine Learning Prototyping Using Azure AutoML and P...
Luca Zavarella
 
I want my model to be deployed ! (another story of MLOps)
AZUG FR
 
Azure Machine Learning Challenge_Speakers Presentation.pptx
DrSatwinderSingh3
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
Florian Roscheck
 
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
DotNetCampus
 
Net campus2015 antimomusone
DotNetCampus
 
Walk through of azure machine learning studio new features
Luca Zavarella
 
Azure ML Studio
Vikas Sinha
 
Ad

More from Ivo Andreev (20)

PDF
Multi-Agent Era will Define the Future of Software
Ivo Andreev
 
PDF
LLM-based Multi-Agent Systems to Replace Traditional Software
Ivo Andreev
 
PDF
LLM Security - Smart to protect, but too smart to be protected
Ivo Andreev
 
PDF
What are Phi Small Language Models Capable of
Ivo Andreev
 
PDF
Autonomous Control AI Training from Data
Ivo Andreev
 
PDF
Autonomous Systems for Optimization and Control
Ivo Andreev
 
PDF
Cybersecurity and Generative AI - for Good and Bad vol.2
Ivo Andreev
 
PDF
Architecting AI Solutions in Azure for Business
Ivo Andreev
 
PDF
Cybersecurity Challenges with Generative AI - for Good and Bad
Ivo Andreev
 
PDF
JS-Experts - Cybersecurity for Generative AI
Ivo Andreev
 
PDF
How do OpenAI GPT Models Work - Misconceptions and Tips for Developers
Ivo Andreev
 
PDF
OpenAI GPT in Depth - Questions and Misconceptions
Ivo Andreev
 
PDF
Cutting Edge Computer Vision for Everyone
Ivo Andreev
 
PDF
Collecting and Analysing Spaceborn Data
Ivo Andreev
 
PDF
Collecting and Analysing Satellite Data with Azure Orbital
Ivo Andreev
 
PDF
Language Studio and Custom Models
Ivo Andreev
 
PDF
CosmosDB for IoT Scenarios
Ivo Andreev
 
PDF
Forecasting time series powerful and simple
Ivo Andreev
 
PDF
Constrained Optimization with Genetic Algorithms and Project Bonsai
Ivo Andreev
 
PDF
Azure security guidelines for developers
Ivo Andreev
 
Multi-Agent Era will Define the Future of Software
Ivo Andreev
 
LLM-based Multi-Agent Systems to Replace Traditional Software
Ivo Andreev
 
LLM Security - Smart to protect, but too smart to be protected
Ivo Andreev
 
What are Phi Small Language Models Capable of
Ivo Andreev
 
Autonomous Control AI Training from Data
Ivo Andreev
 
Autonomous Systems for Optimization and Control
Ivo Andreev
 
Cybersecurity and Generative AI - for Good and Bad vol.2
Ivo Andreev
 
Architecting AI Solutions in Azure for Business
Ivo Andreev
 
Cybersecurity Challenges with Generative AI - for Good and Bad
Ivo Andreev
 
JS-Experts - Cybersecurity for Generative AI
Ivo Andreev
 
How do OpenAI GPT Models Work - Misconceptions and Tips for Developers
Ivo Andreev
 
OpenAI GPT in Depth - Questions and Misconceptions
Ivo Andreev
 
Cutting Edge Computer Vision for Everyone
Ivo Andreev
 
Collecting and Analysing Spaceborn Data
Ivo Andreev
 
Collecting and Analysing Satellite Data with Azure Orbital
Ivo Andreev
 
Language Studio and Custom Models
Ivo Andreev
 
CosmosDB for IoT Scenarios
Ivo Andreev
 
Forecasting time series powerful and simple
Ivo Andreev
 
Constrained Optimization with Genetic Algorithms and Project Bonsai
Ivo Andreev
 
Azure security guidelines for developers
Ivo Andreev
 
Ad

Recently uploaded (20)

PDF
10 posting ideas for community engagement with AI prompts
Pankaj Taneja
 
PPTX
Web Testing.pptx528278vshbuqffqhhqiwnwuq
studylike474
 
PDF
Supabase Meetup: Build in a weekend, scale to millions
Carlo Gilmar Padilla Santana
 
PPTX
Presentation about Database and Database Administrator
abhishekchauhan86963
 
PPTX
Contractor Management Platform and Software Solution for Compliance
SHEQ Network Limited
 
PPTX
Farrell__10e_ch04_PowerPoint.pptx Programming Logic and Design slides
bashnahara11
 
PPTX
classification of computer and basic part of digital computer
ravisinghrajpurohit3
 
PDF
ChatPharo: an Open Architecture for Understanding How to Talk Live to LLMs
ESUG
 
PDF
WatchTraderHub - Watch Dealer software with inventory management and multi-ch...
WatchDealer Pavel
 
PDF
Protecting the Digital World Cyber Securit
dnthakkar16
 
PDF
MiniTool Power Data Recovery Crack New Pre Activated Version Latest 2025
imang66g
 
PDF
Troubleshooting Virtual Threads in Java!
Tier1 app
 
PPT
Activate_Methodology_Summary presentatio
annapureddyn
 
PDF
Download iTop VPN Free 6.1.0.5882 Crack Full Activated Pre Latest 2025
imang66g
 
PDF
SAP GUI Installation Guide for macOS (iOS) | Connect to SAP Systems on Mac
SAP Vista, an A L T Z E N Company
 
PPT
Brief History of Python by Learning Python in three hours
adanechb21
 
PPTX
ASSIGNMENT_1[1][1][1][1][1] (1) variables.pptx
kr2589474
 
PDF
Infrastructure planning and resilience - Keith Hastings.pptx.pdf
Safe Software
 
PDF
Using licensed Data Loss Prevention (DLP) as a strategic proactive data secur...
Q-Advise
 
PPTX
Presentation about variables and constant.pptx
kr2589474
 
10 posting ideas for community engagement with AI prompts
Pankaj Taneja
 
Web Testing.pptx528278vshbuqffqhhqiwnwuq
studylike474
 
Supabase Meetup: Build in a weekend, scale to millions
Carlo Gilmar Padilla Santana
 
Presentation about Database and Database Administrator
abhishekchauhan86963
 
Contractor Management Platform and Software Solution for Compliance
SHEQ Network Limited
 
Farrell__10e_ch04_PowerPoint.pptx Programming Logic and Design slides
bashnahara11
 
classification of computer and basic part of digital computer
ravisinghrajpurohit3
 
ChatPharo: an Open Architecture for Understanding How to Talk Live to LLMs
ESUG
 
WatchTraderHub - Watch Dealer software with inventory management and multi-ch...
WatchDealer Pavel
 
Protecting the Digital World Cyber Securit
dnthakkar16
 
MiniTool Power Data Recovery Crack New Pre Activated Version Latest 2025
imang66g
 
Troubleshooting Virtual Threads in Java!
Tier1 app
 
Activate_Methodology_Summary presentatio
annapureddyn
 
Download iTop VPN Free 6.1.0.5882 Crack Full Activated Pre Latest 2025
imang66g
 
SAP GUI Installation Guide for macOS (iOS) | Connect to SAP Systems on Mac
SAP Vista, an A L T Z E N Company
 
Brief History of Python by Learning Python in three hours
adanechb21
 
ASSIGNMENT_1[1][1][1][1][1] (1) variables.pptx
kr2589474
 
Infrastructure planning and resilience - Keith Hastings.pptx.pdf
Safe Software
 
Using licensed Data Loss Prevention (DLP) as a strategic proactive data secur...
Q-Advise
 
Presentation about variables and constant.pptx
kr2589474
 

The Power of Auto ML and How Does it Work

  • 1. DECEMBER 14 GLOBAL AI BOOTCAMP IS POWERED BY: The Power of Auto ML How does AutoML “Magic” Happen
  • 2. Thanks to our Sponsors:
  • 3. • Software Architect @ o 17+ years professional experience • Microsoft Azure MVP • External Expert Horizon 2020, Eurostars-Eureka • External Expert InnoFund Denmark, RIF Cyprus • Business Interests o Web Development, SOA, Integration o IoT, Machine Learning, Computer Intelligence o Security & Performance Optimization • Contact [email protected] www.linkedin.com/in/ivelin www.slideshare.net/ivoandreev About me
  • 4. Contents 1. Machine Learning Workflow 2. Visual Interface for Azure ML Service 3. Automated ML 4. Advanced ML with Azure Monitor 5. Deep Learning with Tensorflow 6. AI Ops 7. Cognitive Vision Services 8. Insights with Text Analytics and Vision 9. Cognitive Decision Service 10. Cognitive Search Service 11. Version Control for ML 12. VS Code for Python ML 13. Bot Framework 14. Search Bots with Cognitive Services 15. Bot Architecture Best Practices 16. AI and Cognitive Services in Power BI 17. Form Processing with AI Builder
  • 5. AGENDA Auto ML Pipelines Auto ML Under the Hood Azure ML Designer Demo (AutoML Python SDK)
  • 6. ML is a Process • Iterative data science process: o Business problem understanding o Data collection, cleaning, exploration o Model building o Performance evaluation o Deployment • Auto ML: Automate environment, data preparation, experimentation, deployment
  • 7. AutoML is not Auto Data Science • Any ML Task = {data} + {problem type} + {loss function} • ML project effort and budget o 80% data preparation, 15% modeling and evaluation o Repetitive effort (react to changes in objectives and data) • AutoML as a tool o A recommender system for ML pipelines to achieve accuracy with less time • Objective o Offload data scientists from of repetitive tasks o Automate problem solution on data with minimal loss
  • 8. AutoML fills the gap between “supply” and “demand” on ML market AutoML outperforms an average Data Scientist
  • 9. Auto ML Builds ML Pipelines User Input: Dataset, Performance goals, Constraints (CPU, RAM, time) Auto ML Magic Results: Automatically determine a pipeline structure with minimal loss on the validation set within CPU/Memory constraints Auto ML Steps 1. Determine pipeline structure 2. Select algorithm for each step 3. Tune hyper-parameters Performance Evaluation • All 3 steps shall be completed; • Iterate until performance goals reached
  • 10. ML Pipeline Steps An ML pipeline is a technical solution to stitch ML phases and automate workflows • Data o Select preprocessing strategy (imbalanced and missing data, normalization, outliers) o Features (feature extraction, engineering, selection) • Modeling o Select algorithm o Tune hyperparameters (i.e. number of trees) o Train multiple models, create ensemble o Score, evaluate, select the best model • Training & Deployment o Parallel training on a cluster, Maintain versioning
  • 11. ML Pipeline Benefits • Advantages of ML Pipelines o Parallel and unattended execution o Reusability through pipeline templates for specific scenarios o Versioning data and results using pipeline SDK o Modularity separating areas of concern o Collaboration among data scientists across ML design process o Scalability – single ML pipeline can be trained on multiple machines; different ML pipelines can be tested in parallel on many nodes • Open Issue How do pipelines “learn” what to do???
  • 12. “No free lunch” theorem simplified (David Wolpert, 1996) 1. Model is simplification of reality 2. Simplification is based on bias 3. Bias fails in some situations Conclusion 1: No algorithm or parameter set is always the best. Conclusion 2: Use knowledge about data and context.
  • 13. Automated Data Preparation Step 1: Data Ingestion • Requires data storage (Azure Blob mounted by default) • Data quality issues are common (missing data, mixed units and formats) • Evaluate quality, select initial features (statistical analysis and visualization) Rule of Thumb: No algorithm could achieve good results with bad data input Step 2: Data profiling and cleansing • AutoML provides a variety of statistics to verify dataset is ready for modelling o Non-numeric (Min, Max, Count) o Numeric (Mean, StdDev, Variance, Distribution histogram) • Cleansing cannot be done in GUI o Python SDK: azureml.dataprep o ML Turn on “Automtic preprocessing” option
  • 14. Auto ML Guardrails What is: Safeguard users against common issues with data and make corrections Missing Values • Strategies: Drop rows; intelligently replace missing values based on other data Class Imbalances • Most ML algorithms assume equal distribution, majority classes add more bias • Strategies: Oversampling (add instances to minority class); Undersampling (majority) Data Leakage • Dataset includes information that would not be available at time of prediction • Actual outcome is already known, model performance will be perfect • Strategies: Remove leaky features; Add noise; Hold back unseen test data
  • 15. Automated Data Preparation Step 3: Feature Engineering • Impute missing values (mode for categorical, mean for numerical) • Create categorical features from numeric with low diversity • YYYY, MM, dd, HH, mm, ss, Day of week, Day of year, Quarter, Week Nr from date • One-hot encode low cardinality categorical vars (i.e. Gender -> IsMale, IsFemale) • K-means clustering on each numeric columns for distance to centroid feature • Term frequency for text variables • Outlier treatment Note: General-purpose steps are not domain specific (i.e. income/debt ratio)
  • 16. Automated Data Preparation Step 3 just got you into a problem  • Feature engineering could generate too many features • Solution need to avoid overfitting, reduce model training time • We did not put domain knowledge Step 4: Feature Selection (limited in AutoML) • Drop high cardinality variables (noise) • Drop no variance variables (non-informative) Possible future improvements • Drop highly correlated fields
  • 17. Algorithm Selection and Hyperparametrization Challenges of Configuration Space • High-dimensionality (multiple continuous, categorical, binary variables) • Conditionality (some parameter values are relevant in combination) • No Gradient (loss function has no gradient, expensive evaluation) Opt3: Bayesian OptimizationOpt1: Grid Search / Brute Force • Cartesian product on hyperparameter combinations • The simplest method, dimensionality curse Opt2: Random Search • Random configurations within certain budget • Good baseline, no assumptions, easy parallelization
  • 18. Meta Learning in AutoML Challenges • Avoid starting from scratch on new ML tasks • Learn from experience, efficiently and in systematic data-driven way Prerequisite • Collect meta-data to describe previous tasks (parameters, pipeline structure, evaluations) Result • Meta-learner to recommend promising configurations w/o exhaustive search Notes • If datasets have similar results on few pipelines => similar results on remaining pipelines • Operates similarly to recommender systems • Privacy: AML has no need to access customer data, only pipeline results
  • 19. Cross-Validation and Ensembling Cross Validation • Divide training data in k-subsets • Repeat k-times: hold out ki, validate on k-1 subsets; • Average error estimation across k error estimations Ensembling (bagging, boosting, stacking) • Combine few of best ML models for improved accuracy at no extra cost
  • 20. Building Azure ML Pipelines
  • 21. Azure ML Designer vs Azure ML Studio • ML Studio – collaborative drag-drop workspace to build, test and deploy ML • Azure ML – designer, SDK and CLI for data prep., train and deploy ML at scale Azure ML Designer ML Studio (Classic) Availability Preview (2019) Generally available (GA) (2015) Drag-drop interface Yes Yes Scalability With compute target Up to 10GB data limit training Module rich Important only Multiple Compute AML computer CPU/GPU Proprietary compute, CPU only ML Pipeline Authoring, publishing N/A ML Ops Flexible deployment and versioning Basic management and deploy Model portability Portable Proprietary, non-portable Auto ML Through SDK N/A
  • 22. Azure ML What is: cloud-based environment to rapidly build and deploy machine learning models, by auto-scaling powerful CPU or GPU clusters How to: 1. 4 Development environments for AML – cloud-based notebook VM (easiest); local (with Azure subscription), Data Science VM and Azure Databricks 2. Create workspace (Python SDK or Azure Portal) 3. azureml.dataprep Python package to explore, cleanse and transform 4. Train target (Local PC, Azure Linux VM, HDInsight for Spark) 5. azureml.train recommend pipeline based on target metrics 6. Register models for tag, search and deploy (even models trained outside AML) 7. Deploy to Azure Container Instance serverless containers
  • 23. Interpreting Learning Results (Classification) • Confusion Matrix o Rows – true class, Columns – predicted class o Good model = most values along the diagonal • Precision-Recall Chart o Precision = TP / (TP + FP), ability to label correctly o Recall = TP / (TP + FN), ability to find all instances o Macro Average PR – independent PR average o Micro Average PR – weighted PR average (imbalanced) o Draw PR chart - at different threshold values • ROC Chart – TP Rate / FP Rate over different thresholds FPR = FP / (FP + TN) (best is close to 0), TPR = TP (TP + FN) (best is close to 1)
  • 24. Lift, Gain and Calibration Charts • Lift Chart – How many times the model is better than random o Ratio of gain%/random expectation% at a given decile level o Green line – baseline random guess • Gain Chart – how much to sample to get target sensitivity (TPR) o X – percentile addressed, Y - portion positive responses o Green line - baseline random guess • Calibration Chart o Confidence of a predictive model o Predicted vs actual probability o Good model: y=x o Overly confident: y=0 and y=1 Note: perfectly calibrated classifier != perfect classifier
  • 25. Containers meet Machine Learning • Steps: (from Portal or AML SDK management API) o Add model (from local workspace or upload model) o Add driver script o Add package dependency file (YML) o The system creates Docker image and register to Workspace • Deployment o Azure Container Instance (ACI) - test, Azure Kubernetes Service (AKS) - prod o Azure ML Compute, Azure IoT Edge • Operationalization o REST API is created automatically
  • 26. Operationalization • REST APIs o Deployment an AML model web service creates single and batch REST API o APIs consumed by azureml.core.webservice • Performance Degradation o Performance in real life may differ from during training o Data drift - change in characteristics of input data over time • Monitoring and Drift Analysis o Input data change over time and lead to performance degradation o Configure inference data to snapshot and profile against baseline o ML model trained to detect differences o Model performance converted to drift coefficient
  • 27. Takeaways • Books o AI MVP Book: Automated Machine Learning https://www.amazon.com/gp/aw/d/B082P5MK8Y o Practical Automated ML on Azure • The No Free Lunch Theorem https://www.kdnuggets.com/2019/09/no-free-lunch-data-science.html • Azure ML Studio vs Azure ML Services designer https://www.codit.eu/blog/azure-machine-learning-studio-vs-services/ https://docs.microsoft.com/en-us/azure/machine-learning/compare-azure-ml-to- studio-classic • Bayes Theorem https://towardsdatascience.com/understanding-bayes-theorem-7e31b8434d4b
  • 28. Azure ML StudioAure ML Service
  • 29. Thanks to our Sponsors: