SlideShare a Scribd company logo
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Michael Brückner
Manager Machine Learning
25/02/2016
Machine Learning 101
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Agenda
• What is Machine Learning and why do we need it?
• Model Building
• Model Evaluation & Tuning
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What is Machine Learning?
Methods and Systems that
…
Adapt
based on
recorded
data
Predict
new data
based on
recorded
data
Optimize
an action
given a
utility
function
Extract
hidden
structure
from the
data
Summarize
data into
concise
descriptions
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What is Machine Learning NOT?
Methods and Systems that
…
can yield
Garbage-In
Knowledge-
Out
perform well
without
data modeling
& feature
engineering
avoid the
curse-of-
dimensionality
are a
replacement
for business
rules
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Infer-Predict-Decide Cycle
Inference
Build & evaluate
Predictor
Prediction
Apply the learned
Predictor
Decision Making
Adjust Business loss
and get new/more data
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What for?
Automate tasks, which typically require humans in order to
• scale
• improve over humans (non-experts)
• preserve privacy
or solve tasks that are impossible for humans
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Examples: Personalized Recommandation
• Input:
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Examples: Personalized Recommandation
• Output:
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Examples: Face Detection & Recognition
Face detection
• Input: image
• Output: face position
Face recognition
• Input: face (image & face position)
• Output: person’s name
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Examples: Full-Text Translation
• Input: text in one language
• Output: text of another language
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Examples: Spam Filtering
• Input: email (text, images, …)
• Output: spam/non-spam flag
• Challenges:
• extremely high precision for
legitimate emails
• spam changes constantly
• noisy ground truth
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Supervised Machine Learning
1. Model problem in terms of input data and output data
2. Collect sample of input-output pairs
3. Learn a mapping that produces the output given the
input
4. Apply this function on new inputs to make predictions
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
A Programer’s Perspective
Traditional Programming (Predicting)
Supervised Machine Learning
Computer
Input Data
Mapping
Output Data
Computer
Input Data
Output Data
Mapping
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Advantages
• Use data instead of intuition to derive the mapping
• Can solve very complex tasks
• Can adapt to new situations (collect more data)
• Does not require much expert knowledge
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Input Data
Description Type Cost Actual Cost Diff In Catalogue
Movies Entertainment $50 $28 $22 Yes
Music (CDs, MP3s, etc.) $500 $30 $470 No
Sporting Events Entertainment $0 $40 ($40) No
Dining Out Food $1,000 $1,200 ($200) Yes
Groceries $100 $0 $100 Yes
Charity 1 Gifts and Charity $200 $200 $0 No
Charity 2 $500 $500 $0 No
Cable/Satellite Housing $100 $100 $0 Yes
Electric Housing $45 $40 $5 Yes
Mortgage or Rent $700 $700 $0 Yes
Health Insurance $400 $400 $0 Yes
Home Insurance $400 $400 $0 No
Credit Card 1 $0 Yes
Dataset
Categorical Data
Missing Data
Binary Data
Numerical Data
Attribute Name
Attribute Value
Attribute
Text Data
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Description Type Cost Actual Cost Diff In Catalogue
Movies Entertainment $50 $28 $22 Yes
Music (CDs, MP3s, etc.) ? $500 $30 $470 No
Sporting Events Entertainment $0 $40 ($40) No
Dining Out Food $1,000 $1,200 ($200) Yes
Groceries ? $100 $0 $100 Yes
Charity 1 Gifts and Charity $200 $200 $0 No
Charity 2 ? $500 $500 $0 No
Cable/Satellite Housing $100 $100 $0 Yes
Electric Housing $45 $40 $5 Yes
Mortgage or Rent ? $700 $700 $0 Yes
Health Insurance $400 $400 $0 Yes
Home Insurance $400 $400 $0 No
Credit Card 1 ? $0 Yes
Output Data
Target Attribute Values
Target Attribute
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Agenda
• What is Machine Learning and why do we need it?
• Model Building
• Model Evaluation & Tuning
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Problem Setting
• Input: vector of observable attributes, x
• Output: target attribute value, y
• Training data: pairs of input and corresponding output,
D = (x1,y1),…,(xN,yN)
• Application data: inputs only
• Goal: learn mapping fw:x ↦ y
Predictor
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Challenges in Model Building
• Which function class for Predictor (data modeling)?
• How to pre-process the data (feature engineering)?
• How to learn this Predictor from our training data?
• How to generalize to new data?
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Which function class for Predictor?
Types of prediction tasks (output type):
• Binary Classification ⇒ binary target y  {–1, +1}
• Multinomial Classification ⇒ categorical target y  {1… K}
• Regression ⇒ numeric target y  [l,u]  R
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Which function class for Binary Classification?
• Decision Tree
+
+-
-
-
x2 > 7?
no yes
+
+
+
+
+
x1 < 3?
no yes
x2 < 5?
no yes
x1 < 1?
no yes
+
+
-
-
x2
x1
1 3
5
7
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Which function class for Binary Classification?
• Decision Tree
+-
x2 > 7?
no yes
+
x1 < 3?
no yes
x2 < 5?
no yes
x1 < 1?
no yes
+ -
x2
x1
+
-
-
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Which function class for Binary Classification?
• Linear function
• binary target attribute
values y  {–1, +1}
x2
x1
Hw +
-
y(x) = sign( fw
(x))
Hw
={x| fw
(x) = xT
w+w0
= 0}
^
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Which function class for Binary Classification?
• Generalized linear function
(Kernel methods)
• Layered Generalized linear
function (Neural Networks)
• Ensemble of functions
• …
x2
x1
+
- +
-
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
How to pre-process the data?
• Predictor’s function class defined for limited input domain
⇒ transform/extract attributes first (pre-processing)
• Number to (normalized) Number:
• z-standardization, min-max normalization
• Number to Category:
• Binning (quantile, equidistant)
• Category to (numeric) Vector:
• One-hot encoding
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
How to pre-process the data?
• Predictor’s function class defined for limited input domain
⇒ transform/extract attributes first (pre-processing)
• Text to (numeric) Vector:
• Normalization, tokenization, stemming
• Bag-of-Words, Bag-of-NGrams, TI-IDF ⇒ sparse vector
• Latent word embedding (LSI, word2vec, LDA) ⇒ dense vector
• Image to (numeric) Vector:
• HoG, DAISY, color histogram
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
How to learn a Predictor?
• Loss of Predictor fw:x ↦ y for a given input-output pair:
Loss function PredictionGround Truth
L(y, fw
(x))
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
How to learn a Predictor?
Loss functions for binary classification (target ):y Î{-1,+1}
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
How to learn a Predictor?
Function Class Loss Function Learning Algorithm
Decision Trees 0/1 loss ID3
Decision Trees Quadratic loss CART
Linear function Quadratic loss Least-squares regression
Linear function Logistic loss Logistic regression
Linear function Hinge loss Support Vector Machines
Layered Generalized
Linear function
Logistic loss Neural Networks
(Binary Classification)
Layered Generalized
Linear function
Quadratic loss Neural Networks
(Regression)
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
How to learn a Predictor?
• Theoretical Risk:
• Empirical Risk:
Average over all possible data
Average over training data
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
How to learn a Predictor?
• Prediction depends on Predictor with model
parameters w
• Minimize Risk w.r.t. those model parameters w
⇒ mathematical Optimisation Problem
• Gradient-based first or second-order methods
• Coordinate-descent methods
• (Greedy) Search
y(x)^ fw
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
How to generalize to new data?
Error
Model Complexity
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
How to generalize to new data?
• Empirical Risk:
• Structural Risk: Regularizer
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Agenda
• What is Machine Learning and why do we need it?
• Model Building
• Model Evaluation & Tuning
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Performance for Binary Classification
Total number of
data points (N)
True Target
positive negative
Predicted
Target
positive
True
Positive
False
Positive
negative
False
Negative
True
Negative
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Performance for Binary Classification
• Accuracy:
• Recall (true positive rate):
• Precision:
• Fall-out (false positive rate):
TP+TN
N
TP
TP+ FN
TP
TP+ FP
FP
TN + FP
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Performance for Binary Classification
Decision function
AUC
(Area Under roc Curve)
y(x) = sign( fw
(x)+b)^
Predictor Decision threshold
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Training vs. Test Performance
How do we know that a Predictor works well on new data?
Small error on training
data ≠ small error on
new data (test data)!
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Hold-out Evaluation
• Put some data aside before training = test data
• Use this hold-out data for evaluation
• Disadvantages:
• What if we were (un)lucky when choosing the hold-out data?
• We do NOT use all the data for model training!
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
K-Fold Cross Validation-based Evaluation
• Split data into K partitions (folds)
• Take all but one partition to train a Predictor
• Evaluate Predictor on the left-out partition
• Repeat this for all partitions
• Average performance for all K evaluations
• Finally train a Predictor on all data
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Model Tuning
Learning methods and Predictors have hyper-parameters
• Amount of regularization
• Choice of loss function
• Decision threshold score
• Learning rate
• …
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Example: Decision threshold
Decision threshold
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
How to choose hyper-parameters?
Grid Search:
• Evaluate Predictor for all grid points (hyper-parameter
combinations)
• Take best grid point
Very expensive!


2
10 0
10 2
10
1
2
0
2
1
2
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
How to choose hyper-parameters?
Bayesian Optimisation:
• Learn model to predict evaluation outcomes
• Evaluate Predictor only for promising grid points
• Take best grid point
after fixed number of
evaluations


2
10 0
10 2
10
1
2
0
2
1
2
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Common Pitfalls
• Model tuning is part of training
⇒ Do NOT use test data or test CV partitions!
• Use proper grid resolution and axis scaling
• Use same metric for tuning as for evaluation
Thank you!
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

More Related Content

What's hot (15)

PPTX
Presentazione tutorial
dariospin93
 
PDF
AI/ML Powered Personalized Recommendations in Gaming Industry
Hasan Basri AKIRMAK, MSc,ExecMBA
 
PDF
Azure Machine Learning
Mostafa
 
PPTX
Azure Machine Learning Intro
Damir Dobric
 
PDF
Guiding through a typical Machine Learning Pipeline
Michael Gerke
 
PPTX
Azure Machine Learning 101
Andrew Badera
 
PPTX
Borys Rybak “Azure Machine Learning Studio & Azure Workbench & R + Python”
Lviv Startup Club
 
PPTX
Real-time Recommendations for Retail: Architecture, Algorithms, and Design
Juliet Hougland
 
PDF
Square's Machine Learning Infrastructure and Applications - Rong Yan
Hakka Labs
 
PDF
How Artificial Intelligence & Machine Learning Are Transforming Modern Marketing
CleverTap
 
PDF
The Power of Auto ML and How Does it Work
Ivo Andreev
 
PDF
How Artificial Intelligence & Machine Learning Are Transforming Modern Market...
DigiMarCon - Digital Marketing, Media and Advertising Conferences & Exhibitions
 
PDF
Artificial Intelligence at LinkedIn
Bill Liu
 
PDF
Azure Machine Learning and ML on Premises
Ivo Andreev
 
PPTX
Introduction to Azure Machine Learning
Paul Prae
 
Presentazione tutorial
dariospin93
 
AI/ML Powered Personalized Recommendations in Gaming Industry
Hasan Basri AKIRMAK, MSc,ExecMBA
 
Azure Machine Learning
Mostafa
 
Azure Machine Learning Intro
Damir Dobric
 
Guiding through a typical Machine Learning Pipeline
Michael Gerke
 
Azure Machine Learning 101
Andrew Badera
 
Borys Rybak “Azure Machine Learning Studio & Azure Workbench & R + Python”
Lviv Startup Club
 
Real-time Recommendations for Retail: Architecture, Algorithms, and Design
Juliet Hougland
 
Square's Machine Learning Infrastructure and Applications - Rong Yan
Hakka Labs
 
How Artificial Intelligence & Machine Learning Are Transforming Modern Marketing
CleverTap
 
The Power of Auto ML and How Does it Work
Ivo Andreev
 
How Artificial Intelligence & Machine Learning Are Transforming Modern Market...
DigiMarCon - Digital Marketing, Media and Advertising Conferences & Exhibitions
 
Artificial Intelligence at LinkedIn
Bill Liu
 
Azure Machine Learning and ML on Premises
Ivo Andreev
 
Introduction to Azure Machine Learning
Paul Prae
 

Similar to Machine Learning 101 - AWS Machine Learning Web Day (20)

PDF
Machine Learning with Big Data using Apache Spark
InSemble
 
PDF
Machine learning
Dr Geetha Mohan
 
PPTX
rsec2a-2016-jheaton-morning
Jeff Heaton
 
PDF
Camp IT: Making the World More Efficient Using AI & Machine Learning
Krzysztof Kowalczyk
 
PPTX
AI-900 - Fundamental Principles of ML.pptx
kprasad8
 
PDF
super-cheatsheet-artificial-intelligence.pdf
ssuser089265
 
PDF
Choosing a Machine Learning technique to solve your need
GibDevs
 
PPTX
Predire il futuro con Machine Learning & Big Data
Data Driven Innovation
 
PDF
Fundamentals Of Machine Learning For Predictive Data Analytics Algorithms Wor...
allerparede
 
PPTX
machine learning types methods classification regression decision tree
drmohamadaboutaam
 
PDF
The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)
Universitat Politècnica de Catalunya
 
PDF
Machine learning intro
Tõnu Jaansoo
 
PDF
Machine learning for IoT - unpacking the blackbox
Ivo Andreev
 
PDF
Hands_On_Machine_Learning_with_Scikit_Le.pdf
Shems192009
 
PPTX
Machine learning ppt unit one syllabuspptx
VenkateswaraBabuRavi
 
PDF
Intro to machine learning
Tamir Taha
 
PPTX
Machine Learning Seminar
Edwin Efraín Jiménez Lepe
 
PPTX
Tech meetup Data Driven - Codemotion
antimo musone
 
PPTX
Artificial Neural Network
Dessy Amirudin
 
PPTX
Machine learning Method and techniques
MarkMojumdar
 
Machine Learning with Big Data using Apache Spark
InSemble
 
Machine learning
Dr Geetha Mohan
 
rsec2a-2016-jheaton-morning
Jeff Heaton
 
Camp IT: Making the World More Efficient Using AI & Machine Learning
Krzysztof Kowalczyk
 
AI-900 - Fundamental Principles of ML.pptx
kprasad8
 
super-cheatsheet-artificial-intelligence.pdf
ssuser089265
 
Choosing a Machine Learning technique to solve your need
GibDevs
 
Predire il futuro con Machine Learning & Big Data
Data Driven Innovation
 
Fundamentals Of Machine Learning For Predictive Data Analytics Algorithms Wor...
allerparede
 
machine learning types methods classification regression decision tree
drmohamadaboutaam
 
The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)
Universitat Politècnica de Catalunya
 
Machine learning intro
Tõnu Jaansoo
 
Machine learning for IoT - unpacking the blackbox
Ivo Andreev
 
Hands_On_Machine_Learning_with_Scikit_Le.pdf
Shems192009
 
Machine learning ppt unit one syllabuspptx
VenkateswaraBabuRavi
 
Intro to machine learning
Tamir Taha
 
Machine Learning Seminar
Edwin Efraín Jiménez Lepe
 
Tech meetup Data Driven - Codemotion
antimo musone
 
Artificial Neural Network
Dessy Amirudin
 
Machine learning Method and techniques
MarkMojumdar
 
Ad

More from AWS Germany (20)

PDF
Analytics Web Day | From Theory to Practice: Big Data Stories from the Field
AWS Germany
 
PDF
Analytics Web Day | Query your Data in S3 with SQL and optimize for Cost and ...
AWS Germany
 
PDF
Modern Applications Web Day | Impress Your Friends with Your First Serverless...
AWS Germany
 
PDF
Modern Applications Web Day | Manage Your Infrastructure and Configuration on...
AWS Germany
 
PDF
Modern Applications Web Day | Container Workloads on AWS
AWS Germany
 
PDF
Modern Applications Web Day | Continuous Delivery to Amazon EKS with Spinnaker
AWS Germany
 
PDF
Building Smart Home skills for Alexa
AWS Germany
 
PDF
Hotel or Taxi? "Sorting hat" for travel expenses with AWS ML infrastructure
AWS Germany
 
PDF
Wild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
AWS Germany
 
PDF
Log Analytics with AWS
AWS Germany
 
PDF
Deep Dive into Concepts and Tools for Analyzing Streaming Data on AWS
AWS Germany
 
PDF
AWS Programme für Nonprofits
AWS Germany
 
PDF
Microservices and Data Design
AWS Germany
 
PDF
Serverless vs. Developers – the real crash
AWS Germany
 
PDF
Query your data in S3 with SQL and optimize for cost and performance
AWS Germany
 
PDF
Secret Management with Hashicorp’s Vault
AWS Germany
 
PDF
EKS Workshop
AWS Germany
 
PDF
Scale to Infinity with ECS
AWS Germany
 
PDF
Containers on AWS - State of the Union
AWS Germany
 
PDF
Deploying and Scaling Your First Cloud Application with Amazon Lightsail
AWS Germany
 
Analytics Web Day | From Theory to Practice: Big Data Stories from the Field
AWS Germany
 
Analytics Web Day | Query your Data in S3 with SQL and optimize for Cost and ...
AWS Germany
 
Modern Applications Web Day | Impress Your Friends with Your First Serverless...
AWS Germany
 
Modern Applications Web Day | Manage Your Infrastructure and Configuration on...
AWS Germany
 
Modern Applications Web Day | Container Workloads on AWS
AWS Germany
 
Modern Applications Web Day | Continuous Delivery to Amazon EKS with Spinnaker
AWS Germany
 
Building Smart Home skills for Alexa
AWS Germany
 
Hotel or Taxi? "Sorting hat" for travel expenses with AWS ML infrastructure
AWS Germany
 
Wild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
AWS Germany
 
Log Analytics with AWS
AWS Germany
 
Deep Dive into Concepts and Tools for Analyzing Streaming Data on AWS
AWS Germany
 
AWS Programme für Nonprofits
AWS Germany
 
Microservices and Data Design
AWS Germany
 
Serverless vs. Developers – the real crash
AWS Germany
 
Query your data in S3 with SQL and optimize for cost and performance
AWS Germany
 
Secret Management with Hashicorp’s Vault
AWS Germany
 
EKS Workshop
AWS Germany
 
Scale to Infinity with ECS
AWS Germany
 
Containers on AWS - State of the Union
AWS Germany
 
Deploying and Scaling Your First Cloud Application with Amazon Lightsail
AWS Germany
 
Ad

Recently uploaded (20)

PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PDF
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 

Machine Learning 101 - AWS Machine Learning Web Day

  • 1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Michael Brückner Manager Machine Learning 25/02/2016 Machine Learning 101
  • 2. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda • What is Machine Learning and why do we need it? • Model Building • Model Evaluation & Tuning
  • 3. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What is Machine Learning? Methods and Systems that … Adapt based on recorded data Predict new data based on recorded data Optimize an action given a utility function Extract hidden structure from the data Summarize data into concise descriptions
  • 4. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What is Machine Learning NOT? Methods and Systems that … can yield Garbage-In Knowledge- Out perform well without data modeling & feature engineering avoid the curse-of- dimensionality are a replacement for business rules
  • 5. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Infer-Predict-Decide Cycle Inference Build & evaluate Predictor Prediction Apply the learned Predictor Decision Making Adjust Business loss and get new/more data
  • 6. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What for? Automate tasks, which typically require humans in order to • scale • improve over humans (non-experts) • preserve privacy or solve tasks that are impossible for humans
  • 7. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Examples: Personalized Recommandation • Input:
  • 8. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Examples: Personalized Recommandation • Output:
  • 9. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Examples: Face Detection & Recognition Face detection • Input: image • Output: face position Face recognition • Input: face (image & face position) • Output: person’s name
  • 10. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Examples: Full-Text Translation • Input: text in one language • Output: text of another language
  • 11. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Examples: Spam Filtering • Input: email (text, images, …) • Output: spam/non-spam flag • Challenges: • extremely high precision for legitimate emails • spam changes constantly • noisy ground truth
  • 12. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Supervised Machine Learning 1. Model problem in terms of input data and output data 2. Collect sample of input-output pairs 3. Learn a mapping that produces the output given the input 4. Apply this function on new inputs to make predictions
  • 13. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. A Programer’s Perspective Traditional Programming (Predicting) Supervised Machine Learning Computer Input Data Mapping Output Data Computer Input Data Output Data Mapping
  • 14. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Advantages • Use data instead of intuition to derive the mapping • Can solve very complex tasks • Can adapt to new situations (collect more data) • Does not require much expert knowledge
  • 15. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Input Data Description Type Cost Actual Cost Diff In Catalogue Movies Entertainment $50 $28 $22 Yes Music (CDs, MP3s, etc.) $500 $30 $470 No Sporting Events Entertainment $0 $40 ($40) No Dining Out Food $1,000 $1,200 ($200) Yes Groceries $100 $0 $100 Yes Charity 1 Gifts and Charity $200 $200 $0 No Charity 2 $500 $500 $0 No Cable/Satellite Housing $100 $100 $0 Yes Electric Housing $45 $40 $5 Yes Mortgage or Rent $700 $700 $0 Yes Health Insurance $400 $400 $0 Yes Home Insurance $400 $400 $0 No Credit Card 1 $0 Yes Dataset Categorical Data Missing Data Binary Data Numerical Data Attribute Name Attribute Value Attribute Text Data
  • 16. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Description Type Cost Actual Cost Diff In Catalogue Movies Entertainment $50 $28 $22 Yes Music (CDs, MP3s, etc.) ? $500 $30 $470 No Sporting Events Entertainment $0 $40 ($40) No Dining Out Food $1,000 $1,200 ($200) Yes Groceries ? $100 $0 $100 Yes Charity 1 Gifts and Charity $200 $200 $0 No Charity 2 ? $500 $500 $0 No Cable/Satellite Housing $100 $100 $0 Yes Electric Housing $45 $40 $5 Yes Mortgage or Rent ? $700 $700 $0 Yes Health Insurance $400 $400 $0 Yes Home Insurance $400 $400 $0 No Credit Card 1 ? $0 Yes Output Data Target Attribute Values Target Attribute
  • 17. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda • What is Machine Learning and why do we need it? • Model Building • Model Evaluation & Tuning
  • 18. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Problem Setting • Input: vector of observable attributes, x • Output: target attribute value, y • Training data: pairs of input and corresponding output, D = (x1,y1),…,(xN,yN) • Application data: inputs only • Goal: learn mapping fw:x ↦ y Predictor
  • 19. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Challenges in Model Building • Which function class for Predictor (data modeling)? • How to pre-process the data (feature engineering)? • How to learn this Predictor from our training data? • How to generalize to new data?
  • 20. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Which function class for Predictor? Types of prediction tasks (output type): • Binary Classification ⇒ binary target y  {–1, +1} • Multinomial Classification ⇒ categorical target y  {1… K} • Regression ⇒ numeric target y  [l,u]  R
  • 21. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Which function class for Binary Classification? • Decision Tree + +- - - x2 > 7? no yes + + + + + x1 < 3? no yes x2 < 5? no yes x1 < 1? no yes + + - - x2 x1 1 3 5 7
  • 22. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Which function class for Binary Classification? • Decision Tree +- x2 > 7? no yes + x1 < 3? no yes x2 < 5? no yes x1 < 1? no yes + - x2 x1 + - -
  • 23. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Which function class for Binary Classification? • Linear function • binary target attribute values y  {–1, +1} x2 x1 Hw + - y(x) = sign( fw (x)) Hw ={x| fw (x) = xT w+w0 = 0} ^
  • 24. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Which function class for Binary Classification? • Generalized linear function (Kernel methods) • Layered Generalized linear function (Neural Networks) • Ensemble of functions • … x2 x1 + - + -
  • 25. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. How to pre-process the data? • Predictor’s function class defined for limited input domain ⇒ transform/extract attributes first (pre-processing) • Number to (normalized) Number: • z-standardization, min-max normalization • Number to Category: • Binning (quantile, equidistant) • Category to (numeric) Vector: • One-hot encoding
  • 26. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. How to pre-process the data? • Predictor’s function class defined for limited input domain ⇒ transform/extract attributes first (pre-processing) • Text to (numeric) Vector: • Normalization, tokenization, stemming • Bag-of-Words, Bag-of-NGrams, TI-IDF ⇒ sparse vector • Latent word embedding (LSI, word2vec, LDA) ⇒ dense vector • Image to (numeric) Vector: • HoG, DAISY, color histogram
  • 27. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. How to learn a Predictor? • Loss of Predictor fw:x ↦ y for a given input-output pair: Loss function PredictionGround Truth L(y, fw (x))
  • 28. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. How to learn a Predictor? Loss functions for binary classification (target ):y Î{-1,+1}
  • 29. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. How to learn a Predictor? Function Class Loss Function Learning Algorithm Decision Trees 0/1 loss ID3 Decision Trees Quadratic loss CART Linear function Quadratic loss Least-squares regression Linear function Logistic loss Logistic regression Linear function Hinge loss Support Vector Machines Layered Generalized Linear function Logistic loss Neural Networks (Binary Classification) Layered Generalized Linear function Quadratic loss Neural Networks (Regression)
  • 30. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. How to learn a Predictor? • Theoretical Risk: • Empirical Risk: Average over all possible data Average over training data
  • 31. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. How to learn a Predictor? • Prediction depends on Predictor with model parameters w • Minimize Risk w.r.t. those model parameters w ⇒ mathematical Optimisation Problem • Gradient-based first or second-order methods • Coordinate-descent methods • (Greedy) Search y(x)^ fw
  • 32. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. How to generalize to new data? Error Model Complexity
  • 33. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. How to generalize to new data? • Empirical Risk: • Structural Risk: Regularizer
  • 34. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda • What is Machine Learning and why do we need it? • Model Building • Model Evaluation & Tuning
  • 35. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Performance for Binary Classification Total number of data points (N) True Target positive negative Predicted Target positive True Positive False Positive negative False Negative True Negative
  • 36. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Performance for Binary Classification • Accuracy: • Recall (true positive rate): • Precision: • Fall-out (false positive rate): TP+TN N TP TP+ FN TP TP+ FP FP TN + FP
  • 37. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Performance for Binary Classification Decision function AUC (Area Under roc Curve) y(x) = sign( fw (x)+b)^ Predictor Decision threshold
  • 38. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Training vs. Test Performance How do we know that a Predictor works well on new data? Small error on training data ≠ small error on new data (test data)!
  • 39. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Hold-out Evaluation • Put some data aside before training = test data • Use this hold-out data for evaluation • Disadvantages: • What if we were (un)lucky when choosing the hold-out data? • We do NOT use all the data for model training!
  • 40. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. K-Fold Cross Validation-based Evaluation • Split data into K partitions (folds) • Take all but one partition to train a Predictor • Evaluate Predictor on the left-out partition • Repeat this for all partitions • Average performance for all K evaluations • Finally train a Predictor on all data
  • 41. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Model Tuning Learning methods and Predictors have hyper-parameters • Amount of regularization • Choice of loss function • Decision threshold score • Learning rate • …
  • 42. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Example: Decision threshold Decision threshold
  • 43. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. How to choose hyper-parameters? Grid Search: • Evaluate Predictor for all grid points (hyper-parameter combinations) • Take best grid point Very expensive!   2 10 0 10 2 10 1 2 0 2 1 2
  • 44. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. How to choose hyper-parameters? Bayesian Optimisation: • Learn model to predict evaluation outcomes • Evaluate Predictor only for promising grid points • Take best grid point after fixed number of evaluations   2 10 0 10 2 10 1 2 0 2 1 2
  • 45. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Common Pitfalls • Model tuning is part of training ⇒ Do NOT use test data or test CV partitions! • Use proper grid resolution and axis scaling • Use same metric for tuning as for evaluation
  • 46. Thank you! © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.