SlideShare a Scribd company logo
Machine Learning Essentials
Part 1: Basic algorithms
Lior King
Lior.King@gmail.com
1
Agenda
• Introduction to Machine Learning (ML)
• What is Machine Learning?
• The problems we can solved using ML.
• The learning process
• Basic ML Algorithms using Python and Scikit-learn library
• Linear Regression
• Naïve Bayes
• K-Means
• Artificial Neural Networks (ANN) and Deep Learning (DL)
using TensorFlow library
• Single layered ANN (using MNIST demo).
• Deep Learning (DL) with Multi-layered Neural Networks
• DL example: Convolutional Neural Network (CNN).
2
An astronaut lands on an alien planet
3
The
astronaut’s
dilemma
4
Male
or
Female ?
An alien lands on earth
5
6
Male
or
Female ?
The alien’s
dilemma
Gender recognition algorithm for the alien
If the height is > 180 cm
and/or
weight > 75 kg
or
has a beard
or
has short hair
or
has a deep voice
or
is bold
or…
7
There might be exceptions…
The Rule based
approach
The learning approach
• We show the alien 500 humans and tell them who are the
males and who are the female
• The alien will find the characteristics that differentiate males
and females – by identifying repeated patterns.
• The alien needs to be exposed to a lot of humans to identify
repeated patterns. That is how he gains EXPERIENCE.
8
How can a computer learn?
Experience = Data
9
What is machine learning approach?
• An alternative for rule based approach
• Based on a lot of data.
• Implements a pre-determined model that use a standard algorithm that
finds DATA CORRELATIONS.
10
Some ML use cases
• Self driving cars and auto pilots
• Cortana, Siri, Google Assistant (NLP – Natural Language Processing)
• Recommendations - Netflix and Amazon know what you like
• Data security
• Healthcare – Computer Assisted Diagnosis
• Spam detection
• Fraud detection
• Algo-trading
• IoT
11
AI vs. ML vs. DL
12
When to use machine learning?
•When it is difficult for humans to express rules
•Too many variables
•Difficult to understand relationships
•When there is a large amount of available historical data
•When data items relationships and pattern are dynamic and
keep changing
13
ML is not new - why is it so hot these days?
Problem 1: ML usually requires a lot of data
Solution: We are in the “big data” era.
Problem 2: ML requires a lot of computations.
Solutions:
• The CPUs have got very fast
• GPUs can be harnessed and multiply the speed.
• The cloud enables you to build a computing grid fast
and cheap.
Problem 3: ML is complex and difficult
Solution: Available free open source libraries and tools
14
How to use machine learning?
1. Define the problem you wish to solve – ask the right question.
2. Prepare the data - make sure you use relevant data which is represented with
meaningful numbers
3. Choose the right algorithm.
4. Use the algorithm to train a model with training data.
5. Test the model to see if it is correct enough.
15
Define the
problem
Represent
Data with
numbers
Choose the
algorithm
Train the
model
Test the
model
On going learning
16
Rules
Historical
Data
New Data
Retraining
Deploying
Use
cases
17
Machine learning - problem categories
18
Supervised
Unsupervised
Reinforcement
Machine Learning Categories
•Supervised machine learning:
• The program is “trained” on a pre-defined set of “training examples”
• Can reach a pretty accurate conclusion when given new data.
•Unsupervised machine learning:
• The program is given a bunch of data and must find patterns and relationships
therein.
•Reinforcement machine learning:
• The program is a given just an “environment” and a “reward” function for successful
actions – without an exact instructions what to do.
• The program finds a set of actions that will grant it maximum total “rewards”.
19
Machine learning - problem categories
20
Classification
Regression
(Prediction)
Clustering
(grouping)
Classification
• A “Yes or No” choices:
• Does the patient have cancer?
• Is this email a SPAM?
• Is this credit card transaction – a fraud?
• Is the stock market is going up or down?
• A discrete number of choices:
• Determine age group: 0-18, 18-35, 35-60, 60+
• Recognize handwritten characters – a, b, c, d …
• Customers sentiment analysis – very positive/slightly positive/neutral/slightly negative/strongly negative.
Classification requires training data
21
Classification
(discrete number)
Regression
•Regression – for Predictions or forecasts
• What will be the value of MSFT stock tomorrow?
• How much will we sell in the next quarter?
• How many bugs will we need to fix?
• How long will it take to commute from A to B?
• Outputs a continuous value – a float
• Requires training data
22
Regression
(Prediction)
Clustering
• Clustering is grouping variables into groups
• Customers segmentation
• Pattern recognition and image analysis
• Bio informatics
• Training data is not required (unsupervised).
23
Clustering
The set of rules known as a MODEL
MODEL = A quantitative representation of relationships between variables.
• Can be a mathematical equation
Or
• A set of if-then-else statements created dynamically.
Example:
A spam filtering model represents the relationship between the text in the email and
whether it is a spam or not.
26
Model = Function
⁞
27
Model
f (X1, X2, … ,Xn)
Data attribute
Data attribute
Data attribute
Data attribute
Outcome
Numbers
A number
The Goal
To find the best model (function) that
produces the desired result on any set
of inputs
28
Supervised Learning
29
Supervised learning – the training process
30
Prepared
Data
Training
Data
Test Data
Algorithm
Model
Splitting the data Training a model Testing the model
Model
Good
Bad
31
Basic ML Algorithms
and how to use them with Python
Most common ML algorithms
• Prediction:
• Linear Regression
• Polynomial Regression
• Decision Tree Regression
• Random Forest Regression
• Support Vector Regression (SVR)
• Classification:
• Naïve Bayes
• Logistic Regression
• Decision Tree Classification
• Random Forest Classification
• Support Vector Machines (SVM)
• K-Nearest Neighbors classification (K-NN)
32
• Clustering:
• K-Means
• Hierarchical clustering
• Artificial Neural Networks:
• Convolutional Neural Network (CNN)
• Recurrent Neural Network (RNN)
Some Other Algorithms
• Enhanced algorithms:
• Variations of basic algorithms
• Enhanced to perform better and/or add more functionality.
• Complex to understand and use properly
• Ensemble algorithms:
• Special algorithms that contain/combine multiple algorithms under one interface
• Used when you need to tune the model to increase performance
• Can be complex and difficult to debug and troubleshoot.
33
Regression problems
34
Regression
(Prediction)
Common ML algorithms for regression
• Linear Regression
• Polynomial Regression
• Decision Trees
• Random Forest
35
Regression
(Prediction)
Regression examples
• What will be the stocks returns?
• What will be the sales of a product next week?
• If flight is delayed, how does this affect customer satisfaction?
• If I change my investment portfolio, how would it affect my risk?
• How much will I get on my house?
36
Linear regression
Finding the relation between the age
and the salary.
Predicting the salary for any given age
38
Historical
Data points
Experience
Salary
Historical
Data points
Salary (dependent)
Minimize the error
The Error (or Residual) is the offset of
the dependent variable from the
independent variable.
The goal of any regression is to minimize
the error for the training data and to
FIND THE OPTIMAL LINE (or curve in
case of logistic regression).
39
Error
Experience (independent)
Historical
Data points
Salary (dependent)
Minimize the error – sum of square diffs
The error = 𝑖=1
𝑁
(𝑦𝑖 − 𝑦𝑖)2
40
y
Error
𝒚
Experience
Minimize the error with Stochastic Gradient
Descent (SGD)
Error =
1
𝑁 𝑖=1
𝑁
(𝑦𝑖 − 𝑦𝑖)2
N -> number of historical data points
1. Initialize some value for the slope
and intercept.
2. Find the current value of the error
function.
41
Error
Slope
Intercept
3. Find the slope at the current point (partial derivative) and move slightly
downwards in the direction.
4. Repeat until you reach a minimum OR stop after certain number of iterations
Historical
Data points
Salary (dependent)
Experience
Minimize the error
The iterative SGD process will slowly
change the slope and the intercept until
the error is minimal.
42
Multiple Linear Regression
• Simple linear regression:
𝑌 = 𝑏0 + 𝑏1*𝑥1
• Multiple linear regression:
𝑌 = 𝑏0 + 𝑏1*𝑥1 + 𝑏2*𝑥2 + … + 𝑏 𝑛∗𝑥 𝑛
Important note:
You need to exclude variables that will “mess” the prediction and keep the ones
that actually help predicting the desired result.
43
Polynomial Linear Regression
44
Simple linear regression:
𝑌 = 𝑏0 + 𝑏1*𝑥1
Polynomial linear regression:
𝑌 = 𝑏0 + 𝑏1*𝑥1 + 𝑏2∗𝑥1
𝟐
+ … + 𝑏 𝑛∗𝑥1
𝒏
Quadratic: degree = 2
Cubic: degree = 3
Why Python?
• Fast learning curve
• Combines the power of general-purpose language with the ease of use.
• Everything you need for ML: Libraries for data loading, visualization, statistics, natural
language processing, image processing, and more:
• numpy, scipy
• scikit-learn
• matplotlib
• pandas
• tensorflow, pytorch, GraphLab
• A lot of free IDEs and iterative tools (like Spyder, PyCharm, VS code and more)
• Allows for the creation of complex graphical user interfaces (GUIs)
• Easy integration into existing systems and web services.
45
Python becomes the leader for ML
46
* KDnuggests is a leading news site
on Business Analytics, Big Data, Data Mining,
Data Science, and Machine Learning
Python’s Scikit-learn library
• Makes it easier to perform training and evaluation tasks:
• Splitting the data into training and test sets.
• Pre processing before we train with it.
• Selection the important features.
• Model training
• Model tuning for better performance
• Provides a common interface for accessing algorithms
• Based on often used mathematical libraries such as NumPy, SciPy, Matplotlib
• Supports Pandas dataframes.
47
Regression Demo
48
Classification problems
49
Classification
(Yes/No or a
discrete number)
Common ML algorithms for classification
• Naïve Bayes
• Logistic Regression
• Support Vector Machines
• Decision Trees
• K-Nearest Neighbors
50
Classification
(Yes/No or a
discrete number)
Classification examples
• Gender detection:
• Using the first name, length, prefix/suffix, ends with a vowel?
• Age group detection:
• Using users selection and preferences
• Sentiment Analysis – Positive or Negative (polarity identification)
• Using a large bank of tweets and post – unstructured and complicated.
• Trading stocks/derivatives – Up day or Down day?
• Using week day, month, prices in previous days, prices of related stocks.
51
Classification examples
• Detecting Ads – Is the image an Ad or not an Ad?
• Using the Image URL, page URL, Page text, Image caption and so on…
• Customer Churn – Is the customer is about to quit?
• Using: purchases, days since the last purchase, geo location etc…
• Fraud detection – a Fraud or not a Fraud
• Using: payment type, location, failed attempts history, frequency of use
• Credit risk – Will the customer default on a loan?
• Using: Income, employment sector, education, history of defaults
52
The goal is to classify an unknown review as
positive or negative.
Sentiment Analysis Classification
53
ClassificationMovie review
Positive“The movie was pretty good”
Negative“It was boring. I almost fell asleep”
Positive“We had a great evening”
Negative“The leading actor really sucked”
…
Negative“It is the worst film ever”
Naïve Bayes
𝑃 𝑐 𝑥 ) =
𝑃 𝑥1 𝑐 ) ∗ 𝑃 𝑥2 𝑐 ) ∗ … ∗ 𝑃 𝑥𝑛 𝑐 ) ∗ 𝑃(𝑐)
𝑃(𝑥)
“A great movie” – is it a positive review?
54
Prior probabilityLikelihood
Marginal likelihoodPosterior probability
𝑃 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 "𝐺𝑟𝑒𝑎𝑡 𝑀𝑜𝑣𝑖𝑒" ) =
𝑃 "𝐺𝑟𝑒𝑎𝑡" 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒) ∗ 𝑃 "𝑀𝑜𝑣𝑖𝑒" 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒) ∗ 𝑃(𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒)
𝑃 "Great" ∗ 𝑃("𝑀𝑜𝑣𝑖𝑒")
Prior probability – What is the
probability of a positive review
Likelihood – what is the
probability to find the word X
in a positive review
Marginal likelihood – What is the
probability of the word in all the
set (positive & negative)
Posterior probability – What is
the probability of the word X
to indicate a positive review
Naïve Bayes
𝑃 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 "𝐴 𝑔𝑟𝑒𝑎𝑡 𝑚𝑜𝑣𝑖𝑒" ) =
𝑃 "𝐺𝑟𝑒𝑎𝑡" 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒) ∗ 𝑃 "𝑀𝑜𝑣𝑖𝑒" 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒) ∗ 𝑃(𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒)
𝑃(𝑋)
𝑃 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒 "𝐴 𝑔𝑟𝑒𝑎𝑡 𝑚𝑜𝑣𝑖𝑒") =
𝑃 "𝐺𝑟𝑒𝑎𝑡" 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒) ∗ 𝑃 "𝑀𝑜𝑣𝑖𝑒" 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒) ∗ 𝑃(𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒)
𝑃(𝑋)
𝑃 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 "𝐴 𝑔𝑟𝑒𝑎𝑡 𝑚𝑜𝑣𝑖𝑒") > ? < 𝑃 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒 "𝐴 𝑔𝑟𝑒𝑎𝑡 𝑚𝑜𝑣𝑖𝑒")
55
Naïve Bayes algorithm
1. Extract every word (get rid of words like the/is/a etc.).
2. Calculate the probability of each word in positive comments.
𝑃𝑃𝑜𝑠("𝑤𝑜𝑟𝑑") =
Sum(freq. of “word” in positive comments)
Sum (Freq. of “word” in the entire set).
3. For every sentence, calculate PPos and PNeg.
𝑃𝑃𝑜𝑠 sentence = 𝑃𝑃𝑜𝑠 word1 ∗ 𝑃𝑃𝑜𝑠 𝑤𝑜𝑟𝑑2 ∗ ⋯ ∗ 𝑃𝑃𝑜𝑠 𝐴𝑙𝑙
𝑃𝑁𝑒𝑔 sentence =
(1 − 𝑃𝑃𝑜𝑠 word1 ) ∗ (1 − 𝑃𝑃𝑜𝑠 𝑤𝑜𝑟𝑑2 ) ∗ ⋯ ∗ (1 − 𝑃𝑃𝑜𝑠 𝐴𝑙𝑙 )
4. Compare PPos(sentence) and Pneg(sentence)
56
ClassificationMovie review
Positive“The movie was pretty good”
Negative“It was boring. I almost fell asleep”
Positive“We had a great evening”
Negative“The leading actor really sucked”
…
Negative“It is the worst film ever”
PPos(word)Word
95%Great
10%Boring
50%Movie
10%Worst
…
For the entire set: 55% positive -> PPos (All)
PPos(“The movie is great”) =
0.5*0.95*0.55 = 0.261
PNeg(“The movie is great”) =
(1-0.5)*(1-0.95)* (1-0.55) = 0.011
0.261 > 0.011 → Positive 
Naïve Bayes – Continuous values (Gaussian)
57
Salary
Age
Features: Age, Salary
Blue circle = did not purchase = 40
Red cross = purchased = 30
did not purchase = 15
purchased = 10
The chance that X will purchase =
The chance the customers around X purchased *
The chance of purchasing in general /
The chance for a customer to be around X
=
(# of purchases around X/Total purchases) *
(# of purchases/Total customers) /
(Total customers around X/ Total purchases)
𝑃 𝑝𝑢𝑟𝑐ℎ𝑎𝑠𝑒 𝑥) =
𝑃 𝑥 𝑝𝑢𝑟𝑐ℎ𝑎𝑠𝑒) ∗ 𝑃(𝑝𝑢𝑟𝑐ℎ𝑎𝑠𝑒)
𝑃(𝑥)
10
30
∗
30
70
25
70
= 0.4 = 40%
Assuming normal
distribution around X
Naïve Bayes Algorithm
• Each attribute (in our case – word) is used independently (hence the term “naïve”).
• Phrases are not taken under consideration like “far out”.
• Simple to understand
• Fast training
• Stable – insensitive to small changes in the training data
• Can be very robust for solving many classification problems – especially in cases :
• There is a small amount of training data
• You don’t have a lot of knowledge about the problem itself
58
Naïve Bayes Demo
(Gaussian)
59
Logistic Regression
60
K-NN (K Nearest Neighbors)
61
Clustering problems
62
Clustering
Common ML algorithms for clustering
• K-Means
• Fuzzy clustering
• Hierarchical clustering
• Density based clustering
• Distribution based clustering
63
Clustering
Clustering use case example
• A cellular company need to put antennas in a region so that its users receive
optimum signal processing
• Locating police stations so they can arrive fast to areas of high crime rate.
• Identify important products features from customer feedbacks, emails etc.
• Perform efficient data compression
64
Reviews theme clustering
• We need to represent each review to have numeric attributes.
• In this example we’ll use a technique called “Term Frequency Representation”(TFR).
Sample: “With tears in my eyes”
All words: (movie, good, bad, with, boring, tears, yesterday, my, eyes)
(0, 0, 0, 1, 0, 1, 0, 1, 1 )
We represent each review using frequencies of words.
Some words characterize a document more than the others: “With tears in my eyes”.
These words usually occur more rarely and differentiate the review from the others.
65
Reviews theme clustering
Some words characterize a document more than the others: “With tears in my eyes”.
These words usually occur more rarely and differentiate the review from the others.
“With tears in my eyes”.
We now weight the word frequencies to make the rare words stand out and the
common words to have minimal weight.
New weight= 1/frequency of the word
This representation is called Term Frequency – Inverse Document Frequency (TF-IDF)
66
Common Rare Common Common Rare
K-Means algorithm
• Every review is a tuple with N numbers:
(0, 3, 0, 4, 0, ….)
• So every review is a point in an N-
dimensional space hypercube.
• With K-means algorithm you define K
which is “how many groups you want to
converge in clusters”.
1. Initialize the mean points (also call
“centroids”).
67
K-Means iteration 1
2. Assign each review (point) to the
nearest centroid.
3. Look at each cluster and find a
new centroid for the cluster.
4. Repeat 2,3 until the means stop
changing.
68
K-Means iteration 2
2. Assign each review (point) to the
nearest centroid.
3. Look at each cluster and find a
new centroid for the cluster.
4. Repeat 2,3 until the means stop
changing.
69
K-Means iteration 3
2. Assign each review (point) to the
nearest centroid.
3. Look at each cluster and find a
new centroid for the cluster.
4. Repeat 2,3 until the means stop
changing.
70
K-Means iteration 4
2. Assign each review (point) to the
nearest centroid.
3. Look at each cluster and find a new
centroid for the cluster.
4. Repeat 2,3 until the means stop
changing.
71
K-Means iteration 5
2. Assign each review (point) to the
nearest centroid.
3. Look at each cluster and find a new
centroid for the cluster.
4. Repeat 2,3 until the means stop
changing.
72
K-Means iteration 6
2. Assign each review (point) to the
nearest centroid.
3. Look at each cluster and find a new
centroid for the cluster.
4. Repeat 2,3 until the means stop
changing.
73
K-Means algorithm
1. Initialize the mean points
(also call “centroids”).
2. Assign each review
(point) to the nearest
centroid.
3. Look at each cluster and
find a new centroid for
the cluster.
4. Repeat 2,3 until the
means stop changing.
74
Reviews theme clustering
• We need to represent each review to have numeric attributes.
• In this example we’ll use a technique called “Term Frequency Representation”(TFR).
Sample: “With tears in my eyes”
All words: (movie, good, bad, with, boring, tears, yesterday, my, eyes)
(0, 0, 0, 1, 0, 1, 0, 1, 1 )
We represent each review using frequencies of words.
75
Reviews theme clustering
Some words characterize a document more than the others: “With tears in my eyes”.
These words usually occur more rarely and differentiate the review from the others.
“With tears in my eyes”.
We now weight the word frequencies to make the rare words stand out and the
common words to have minimal weight.
New weight= 1/frequency of the word
This representation is called Term Frequency – Inverse Document Frequency (TF-IDF)
76
Common Rare Common Common Rare
K-Means algorithm
• Every review is a tuple with N
numbers: (0, 3, 0, 4, 0, ….)
• So every review is a point in an N-
dimensional space hypercube.
77
Clustering vs. Classification
• Classification – When you want classify data into pre-defined categories.
• Clustering – Grouping data into a set of categories that is NOT known before hand.
• We can mix them both:
• Start with clustering the data
• Then train the data to recognize each cluster and create a model.
• Use the classification model to classify new data.
78
79
Thank you !
Lior.King@gmail.com

More Related Content

What's hot (20)

PDF
Machine Learning Lecture 2 Basics
ananth
 
PPTX
Artificial Intelligence, Machine Learning and Deep Learning
Sujit Pal
 
PDF
Introduction of Deep Reinforcement Learning
NAVER Engineering
 
PDF
Generative Adversarial Networks : Basic architecture and variants
ananth
 
PDF
Prediction of Exchange Rate Using Deep Neural Network
Tomoki Hayashi
 
PDF
Machine learning the next revolution or just another hype
Jorge Ferrer
 
PPTX
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
Balázs Hidasi
 
PDF
Face recognition and deep learning โดย ดร. สรรพฤทธิ์ มฤคทัต NECTEC
BAINIDA
 
PPTX
Machine learning Algorithms with a Sagemaker demo
Hridyesh Bisht
 
PPTX
Algorithms Design Patterns
Ashwin Shiv
 
PDF
Deep learning and image analytics using Python by Dr Sanparit
BAINIDA
 
PPTX
Basics of Machine Learning
Pranav Challa
 
PDF
Deeplearning in finance
Sebastien Jehan
 
PPTX
Deep learning simplified
Lovelyn Rose
 
PPTX
Machine Learning Overview
Mykhailo Koval
 
PDF
Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021
Chris Ohk
 
PPTX
Piotr Mirowski - Review Autoencoders (Deep Learning) - CIUUK14
Daniel Lewis
 
PPTX
Reinforcement Learning
DongHyun Kwak
 
PPTX
Reinforcement Learning and Artificial Neural Nets
Pierre de Lacaze
 
PDF
Getting started with Machine Learning
Gaurav Bhalotia
 
Machine Learning Lecture 2 Basics
ananth
 
Artificial Intelligence, Machine Learning and Deep Learning
Sujit Pal
 
Introduction of Deep Reinforcement Learning
NAVER Engineering
 
Generative Adversarial Networks : Basic architecture and variants
ananth
 
Prediction of Exchange Rate Using Deep Neural Network
Tomoki Hayashi
 
Machine learning the next revolution or just another hype
Jorge Ferrer
 
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
Balázs Hidasi
 
Face recognition and deep learning โดย ดร. สรรพฤทธิ์ มฤคทัต NECTEC
BAINIDA
 
Machine learning Algorithms with a Sagemaker demo
Hridyesh Bisht
 
Algorithms Design Patterns
Ashwin Shiv
 
Deep learning and image analytics using Python by Dr Sanparit
BAINIDA
 
Basics of Machine Learning
Pranav Challa
 
Deeplearning in finance
Sebastien Jehan
 
Deep learning simplified
Lovelyn Rose
 
Machine Learning Overview
Mykhailo Koval
 
Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021
Chris Ohk
 
Piotr Mirowski - Review Autoencoders (Deep Learning) - CIUUK14
Daniel Lewis
 
Reinforcement Learning
DongHyun Kwak
 
Reinforcement Learning and Artificial Neural Nets
Pierre de Lacaze
 
Getting started with Machine Learning
Gaurav Bhalotia
 

Similar to Machine Learning Essentials Demystified part1 | Big Data Demystified (20)

PDF
ML.pdf
SamuelAwuah1
 
PPTX
Selected Topics in CS-CHapter-twooo.pptx
BachaLamessaa
 
PPTX
Python Machine Learning January 2018 - Ho Chi Minh City
Andrew Schwabe
 
PPTX
Unit - 1 - Introduction of the machine learning
Taranpreet Singh
 
PPTX
Machine Learning, hype or hit?
fredverheul
 
PPTX
Machine_Learning_Presentation.pptx application
maryamsafibaig
 
PDF
Fundementals of Machine Learning and Deep Learning
ParrotAI
 
PPTX
Lec1 intoduction.pptx
Oussama Haj Salem
 
PPTX
Intro to ML for product school meetup
Erez Shilon
 
PDF
Introduction to machine learning and deep learning
Shishir Choudhary
 
PPTX
ML Lec 1 (1).pptx
MuhammadTalha278665
 
PPT
Machine learning and deep learning algorithms
KannanA29
 
PDF
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...
Maninda Edirisooriya
 
PPTX
L15.pptx
ImonBennett
 
PPTX
Introduction to ML (Machine Learning)
SwatiTripathi44
 
PDF
newmicrosoftpowerpointpresentation-210512111200.pdf
abhimanyurajjha002
 
PPTX
SEMINAR 1 gift college bbrs one tiVB.pptx
ChandankumarChoudhur1
 
PPTX
SEMINAR 1 VB.pptxgyvycvhvhvyvyvyyhhyvyfvv
ChandankumarChoudhur1
 
PPTX
Machine learning basics using python programking
Anupamasindgi
 
ML.pdf
SamuelAwuah1
 
Selected Topics in CS-CHapter-twooo.pptx
BachaLamessaa
 
Python Machine Learning January 2018 - Ho Chi Minh City
Andrew Schwabe
 
Unit - 1 - Introduction of the machine learning
Taranpreet Singh
 
Machine Learning, hype or hit?
fredverheul
 
Machine_Learning_Presentation.pptx application
maryamsafibaig
 
Fundementals of Machine Learning and Deep Learning
ParrotAI
 
Lec1 intoduction.pptx
Oussama Haj Salem
 
Intro to ML for product school meetup
Erez Shilon
 
Introduction to machine learning and deep learning
Shishir Choudhary
 
ML Lec 1 (1).pptx
MuhammadTalha278665
 
Machine learning and deep learning algorithms
KannanA29
 
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...
Maninda Edirisooriya
 
L15.pptx
ImonBennett
 
Introduction to ML (Machine Learning)
SwatiTripathi44
 
newmicrosoftpowerpointpresentation-210512111200.pdf
abhimanyurajjha002
 
SEMINAR 1 gift college bbrs one tiVB.pptx
ChandankumarChoudhur1
 
SEMINAR 1 VB.pptxgyvycvhvhvyvyvyyhhyvyfvv
ChandankumarChoudhur1
 
Machine learning basics using python programking
Anupamasindgi
 
Ad

More from Omid Vahdaty (20)

PDF
Data Pipline Observability meetup
Omid Vahdaty
 
PPTX
Couchbase Data Platform | Big Data Demystified
Omid Vahdaty
 
PPTX
The technology of fake news between a new front and a new frontier | Big Dat...
Omid Vahdaty
 
PPTX
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Omid Vahdaty
 
PDF
Making your analytics talk business | Big Data Demystified
Omid Vahdaty
 
PPTX
BI STRATEGY FROM A BIRD'S EYE VIEW (How to become a trusted advisor) | Omri H...
Omid Vahdaty
 
PPTX
AI and Big Data in Health Sector Opportunities and challenges | Big Data Demy...
Omid Vahdaty
 
PDF
Aerospike meetup july 2019 | Big Data Demystified
Omid Vahdaty
 
PPTX
ALIGNING YOUR BI OPERATIONS WITH YOUR CUSTOMERS' UNSPOKEN NEEDS, by Eyal Stei...
Omid Vahdaty
 
PPTX
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
Omid Vahdaty
 
PPTX
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
Omid Vahdaty
 
PPTX
AWS Big Data Demystified #4 data governance demystified [security, networ...
Omid Vahdaty
 
PPTX
AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...
Omid Vahdaty
 
PPTX
AWS Big Data Demystified #2 | Athena, Spectrum, Emr, Hive
Omid Vahdaty
 
PPTX
Amazon aws big data demystified | Introduction to streaming and messaging flu...
Omid Vahdaty
 
PPTX
AWS Big Data Demystified #1: Big data architecture lessons learned
Omid Vahdaty
 
PPTX
Emr spark tuning demystified
Omid Vahdaty
 
PPTX
Emr zeppelin & Livy demystified
Omid Vahdaty
 
PPTX
Zeppelin and spark sql demystified
Omid Vahdaty
 
PPTX
Introduction to AWS Big Data
Omid Vahdaty
 
Data Pipline Observability meetup
Omid Vahdaty
 
Couchbase Data Platform | Big Data Demystified
Omid Vahdaty
 
The technology of fake news between a new front and a new frontier | Big Dat...
Omid Vahdaty
 
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Omid Vahdaty
 
Making your analytics talk business | Big Data Demystified
Omid Vahdaty
 
BI STRATEGY FROM A BIRD'S EYE VIEW (How to become a trusted advisor) | Omri H...
Omid Vahdaty
 
AI and Big Data in Health Sector Opportunities and challenges | Big Data Demy...
Omid Vahdaty
 
Aerospike meetup july 2019 | Big Data Demystified
Omid Vahdaty
 
ALIGNING YOUR BI OPERATIONS WITH YOUR CUSTOMERS' UNSPOKEN NEEDS, by Eyal Stei...
Omid Vahdaty
 
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
Omid Vahdaty
 
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
Omid Vahdaty
 
AWS Big Data Demystified #4 data governance demystified [security, networ...
Omid Vahdaty
 
AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...
Omid Vahdaty
 
AWS Big Data Demystified #2 | Athena, Spectrum, Emr, Hive
Omid Vahdaty
 
Amazon aws big data demystified | Introduction to streaming and messaging flu...
Omid Vahdaty
 
AWS Big Data Demystified #1: Big data architecture lessons learned
Omid Vahdaty
 
Emr spark tuning demystified
Omid Vahdaty
 
Emr zeppelin & Livy demystified
Omid Vahdaty
 
Zeppelin and spark sql demystified
Omid Vahdaty
 
Introduction to AWS Big Data
Omid Vahdaty
 
Ad

Recently uploaded (20)

PPTX
Damage of stability of a ship and how its change .pptx
ehamadulhaque
 
PDF
Introduction to Productivity and Quality
মোঃ ফুরকান উদ্দিন জুয়েল
 
DOCX
8th International Conference on Electrical Engineering (ELEN 2025)
elelijjournal653
 
PPTX
Day2 B2 Best.pptx
helenjenefa1
 
PPTX
Introduction to Design of Machine Elements
PradeepKumarS27
 
PDF
Unified_Cloud_Comm_Presentation anil singh ppt
anilsingh298751
 
PDF
MAD Unit - 2 Activity and Fragment Management in Android (Diploma IT)
JappanMavani
 
PPTX
Element 11. ELECTRICITY safety and hazards
merrandomohandas
 
PPTX
The Role of Information Technology in Environmental Protectio....pptx
nallamillisriram
 
PDF
Biomechanics of Gait: Engineering Solutions for Rehabilitation (www.kiu.ac.ug)
publication11
 
PPTX
Depth First Search Algorithm in 🧠 DFS in Artificial Intelligence (AI)
rafeeqshaik212002
 
PPTX
Arduino Based Gas Leakage Detector Project
CircuitDigest
 
PDF
Set Relation Function Practice session 24.05.2025.pdf
DrStephenStrange4
 
DOC
MRRS Strength and Durability of Concrete
CivilMythili
 
PDF
Pressure Measurement training for engineers and Technicians
AIESOLUTIONS
 
PPTX
Introduction to Neural Networks and Perceptron Learning Algorithm.pptx
Kayalvizhi A
 
PDF
Basic_Concepts_in_Clinical_Biochemistry_2018كيمياء_عملي.pdf
AdelLoin
 
PPTX
Hashing Introduction , hash functions and techniques
sailajam21
 
PDF
Reasons for the succes of MENARD PRESSUREMETER.pdf
majdiamz
 
PPTX
Lecture 1 Shell and Tube Heat exchanger-1.pptx
mailforillegalwork
 
Damage of stability of a ship and how its change .pptx
ehamadulhaque
 
Introduction to Productivity and Quality
মোঃ ফুরকান উদ্দিন জুয়েল
 
8th International Conference on Electrical Engineering (ELEN 2025)
elelijjournal653
 
Day2 B2 Best.pptx
helenjenefa1
 
Introduction to Design of Machine Elements
PradeepKumarS27
 
Unified_Cloud_Comm_Presentation anil singh ppt
anilsingh298751
 
MAD Unit - 2 Activity and Fragment Management in Android (Diploma IT)
JappanMavani
 
Element 11. ELECTRICITY safety and hazards
merrandomohandas
 
The Role of Information Technology in Environmental Protectio....pptx
nallamillisriram
 
Biomechanics of Gait: Engineering Solutions for Rehabilitation (www.kiu.ac.ug)
publication11
 
Depth First Search Algorithm in 🧠 DFS in Artificial Intelligence (AI)
rafeeqshaik212002
 
Arduino Based Gas Leakage Detector Project
CircuitDigest
 
Set Relation Function Practice session 24.05.2025.pdf
DrStephenStrange4
 
MRRS Strength and Durability of Concrete
CivilMythili
 
Pressure Measurement training for engineers and Technicians
AIESOLUTIONS
 
Introduction to Neural Networks and Perceptron Learning Algorithm.pptx
Kayalvizhi A
 
Basic_Concepts_in_Clinical_Biochemistry_2018كيمياء_عملي.pdf
AdelLoin
 
Hashing Introduction , hash functions and techniques
sailajam21
 
Reasons for the succes of MENARD PRESSUREMETER.pdf
majdiamz
 
Lecture 1 Shell and Tube Heat exchanger-1.pptx
mailforillegalwork
 

Machine Learning Essentials Demystified part1 | Big Data Demystified

  • 1. Machine Learning Essentials Part 1: Basic algorithms Lior King [email protected] 1
  • 2. Agenda • Introduction to Machine Learning (ML) • What is Machine Learning? • The problems we can solved using ML. • The learning process • Basic ML Algorithms using Python and Scikit-learn library • Linear Regression • Naïve Bayes • K-Means • Artificial Neural Networks (ANN) and Deep Learning (DL) using TensorFlow library • Single layered ANN (using MNIST demo). • Deep Learning (DL) with Multi-layered Neural Networks • DL example: Convolutional Neural Network (CNN). 2
  • 3. An astronaut lands on an alien planet 3
  • 5. An alien lands on earth 5
  • 7. Gender recognition algorithm for the alien If the height is > 180 cm and/or weight > 75 kg or has a beard or has short hair or has a deep voice or is bold or… 7 There might be exceptions… The Rule based approach
  • 8. The learning approach • We show the alien 500 humans and tell them who are the males and who are the female • The alien will find the characteristics that differentiate males and females – by identifying repeated patterns. • The alien needs to be exposed to a lot of humans to identify repeated patterns. That is how he gains EXPERIENCE. 8
  • 9. How can a computer learn? Experience = Data 9
  • 10. What is machine learning approach? • An alternative for rule based approach • Based on a lot of data. • Implements a pre-determined model that use a standard algorithm that finds DATA CORRELATIONS. 10
  • 11. Some ML use cases • Self driving cars and auto pilots • Cortana, Siri, Google Assistant (NLP – Natural Language Processing) • Recommendations - Netflix and Amazon know what you like • Data security • Healthcare – Computer Assisted Diagnosis • Spam detection • Fraud detection • Algo-trading • IoT 11
  • 12. AI vs. ML vs. DL 12
  • 13. When to use machine learning? •When it is difficult for humans to express rules •Too many variables •Difficult to understand relationships •When there is a large amount of available historical data •When data items relationships and pattern are dynamic and keep changing 13
  • 14. ML is not new - why is it so hot these days? Problem 1: ML usually requires a lot of data Solution: We are in the “big data” era. Problem 2: ML requires a lot of computations. Solutions: • The CPUs have got very fast • GPUs can be harnessed and multiply the speed. • The cloud enables you to build a computing grid fast and cheap. Problem 3: ML is complex and difficult Solution: Available free open source libraries and tools 14
  • 15. How to use machine learning? 1. Define the problem you wish to solve – ask the right question. 2. Prepare the data - make sure you use relevant data which is represented with meaningful numbers 3. Choose the right algorithm. 4. Use the algorithm to train a model with training data. 5. Test the model to see if it is correct enough. 15 Define the problem Represent Data with numbers Choose the algorithm Train the model Test the model
  • 18. Machine learning - problem categories 18 Supervised Unsupervised Reinforcement
  • 19. Machine Learning Categories •Supervised machine learning: • The program is “trained” on a pre-defined set of “training examples” • Can reach a pretty accurate conclusion when given new data. •Unsupervised machine learning: • The program is given a bunch of data and must find patterns and relationships therein. •Reinforcement machine learning: • The program is a given just an “environment” and a “reward” function for successful actions – without an exact instructions what to do. • The program finds a set of actions that will grant it maximum total “rewards”. 19
  • 20. Machine learning - problem categories 20 Classification Regression (Prediction) Clustering (grouping)
  • 21. Classification • A “Yes or No” choices: • Does the patient have cancer? • Is this email a SPAM? • Is this credit card transaction – a fraud? • Is the stock market is going up or down? • A discrete number of choices: • Determine age group: 0-18, 18-35, 35-60, 60+ • Recognize handwritten characters – a, b, c, d … • Customers sentiment analysis – very positive/slightly positive/neutral/slightly negative/strongly negative. Classification requires training data 21 Classification (discrete number)
  • 22. Regression •Regression – for Predictions or forecasts • What will be the value of MSFT stock tomorrow? • How much will we sell in the next quarter? • How many bugs will we need to fix? • How long will it take to commute from A to B? • Outputs a continuous value – a float • Requires training data 22 Regression (Prediction)
  • 23. Clustering • Clustering is grouping variables into groups • Customers segmentation • Pattern recognition and image analysis • Bio informatics • Training data is not required (unsupervised). 23 Clustering
  • 24. The set of rules known as a MODEL MODEL = A quantitative representation of relationships between variables. • Can be a mathematical equation Or • A set of if-then-else statements created dynamically. Example: A spam filtering model represents the relationship between the text in the email and whether it is a spam or not. 26
  • 25. Model = Function ⁞ 27 Model f (X1, X2, … ,Xn) Data attribute Data attribute Data attribute Data attribute Outcome Numbers A number
  • 26. The Goal To find the best model (function) that produces the desired result on any set of inputs 28
  • 28. Supervised learning – the training process 30 Prepared Data Training Data Test Data Algorithm Model Splitting the data Training a model Testing the model Model Good Bad
  • 29. 31 Basic ML Algorithms and how to use them with Python
  • 30. Most common ML algorithms • Prediction: • Linear Regression • Polynomial Regression • Decision Tree Regression • Random Forest Regression • Support Vector Regression (SVR) • Classification: • Naïve Bayes • Logistic Regression • Decision Tree Classification • Random Forest Classification • Support Vector Machines (SVM) • K-Nearest Neighbors classification (K-NN) 32 • Clustering: • K-Means • Hierarchical clustering • Artificial Neural Networks: • Convolutional Neural Network (CNN) • Recurrent Neural Network (RNN)
  • 31. Some Other Algorithms • Enhanced algorithms: • Variations of basic algorithms • Enhanced to perform better and/or add more functionality. • Complex to understand and use properly • Ensemble algorithms: • Special algorithms that contain/combine multiple algorithms under one interface • Used when you need to tune the model to increase performance • Can be complex and difficult to debug and troubleshoot. 33
  • 33. Common ML algorithms for regression • Linear Regression • Polynomial Regression • Decision Trees • Random Forest 35 Regression (Prediction)
  • 34. Regression examples • What will be the stocks returns? • What will be the sales of a product next week? • If flight is delayed, how does this affect customer satisfaction? • If I change my investment portfolio, how would it affect my risk? • How much will I get on my house? 36
  • 35. Linear regression Finding the relation between the age and the salary. Predicting the salary for any given age 38 Historical Data points Experience Salary
  • 36. Historical Data points Salary (dependent) Minimize the error The Error (or Residual) is the offset of the dependent variable from the independent variable. The goal of any regression is to minimize the error for the training data and to FIND THE OPTIMAL LINE (or curve in case of logistic regression). 39 Error Experience (independent)
  • 37. Historical Data points Salary (dependent) Minimize the error – sum of square diffs The error = 𝑖=1 𝑁 (𝑦𝑖 − 𝑦𝑖)2 40 y Error 𝒚 Experience
  • 38. Minimize the error with Stochastic Gradient Descent (SGD) Error = 1 𝑁 𝑖=1 𝑁 (𝑦𝑖 − 𝑦𝑖)2 N -> number of historical data points 1. Initialize some value for the slope and intercept. 2. Find the current value of the error function. 41 Error Slope Intercept 3. Find the slope at the current point (partial derivative) and move slightly downwards in the direction. 4. Repeat until you reach a minimum OR stop after certain number of iterations
  • 39. Historical Data points Salary (dependent) Experience Minimize the error The iterative SGD process will slowly change the slope and the intercept until the error is minimal. 42
  • 40. Multiple Linear Regression • Simple linear regression: 𝑌 = 𝑏0 + 𝑏1*𝑥1 • Multiple linear regression: 𝑌 = 𝑏0 + 𝑏1*𝑥1 + 𝑏2*𝑥2 + … + 𝑏 𝑛∗𝑥 𝑛 Important note: You need to exclude variables that will “mess” the prediction and keep the ones that actually help predicting the desired result. 43
  • 41. Polynomial Linear Regression 44 Simple linear regression: 𝑌 = 𝑏0 + 𝑏1*𝑥1 Polynomial linear regression: 𝑌 = 𝑏0 + 𝑏1*𝑥1 + 𝑏2∗𝑥1 𝟐 + … + 𝑏 𝑛∗𝑥1 𝒏 Quadratic: degree = 2 Cubic: degree = 3
  • 42. Why Python? • Fast learning curve • Combines the power of general-purpose language with the ease of use. • Everything you need for ML: Libraries for data loading, visualization, statistics, natural language processing, image processing, and more: • numpy, scipy • scikit-learn • matplotlib • pandas • tensorflow, pytorch, GraphLab • A lot of free IDEs and iterative tools (like Spyder, PyCharm, VS code and more) • Allows for the creation of complex graphical user interfaces (GUIs) • Easy integration into existing systems and web services. 45
  • 43. Python becomes the leader for ML 46 * KDnuggests is a leading news site on Business Analytics, Big Data, Data Mining, Data Science, and Machine Learning
  • 44. Python’s Scikit-learn library • Makes it easier to perform training and evaluation tasks: • Splitting the data into training and test sets. • Pre processing before we train with it. • Selection the important features. • Model training • Model tuning for better performance • Provides a common interface for accessing algorithms • Based on often used mathematical libraries such as NumPy, SciPy, Matplotlib • Supports Pandas dataframes. 47
  • 47. Common ML algorithms for classification • Naïve Bayes • Logistic Regression • Support Vector Machines • Decision Trees • K-Nearest Neighbors 50 Classification (Yes/No or a discrete number)
  • 48. Classification examples • Gender detection: • Using the first name, length, prefix/suffix, ends with a vowel? • Age group detection: • Using users selection and preferences • Sentiment Analysis – Positive or Negative (polarity identification) • Using a large bank of tweets and post – unstructured and complicated. • Trading stocks/derivatives – Up day or Down day? • Using week day, month, prices in previous days, prices of related stocks. 51
  • 49. Classification examples • Detecting Ads – Is the image an Ad or not an Ad? • Using the Image URL, page URL, Page text, Image caption and so on… • Customer Churn – Is the customer is about to quit? • Using: purchases, days since the last purchase, geo location etc… • Fraud detection – a Fraud or not a Fraud • Using: payment type, location, failed attempts history, frequency of use • Credit risk – Will the customer default on a loan? • Using: Income, employment sector, education, history of defaults 52
  • 50. The goal is to classify an unknown review as positive or negative. Sentiment Analysis Classification 53 ClassificationMovie review Positive“The movie was pretty good” Negative“It was boring. I almost fell asleep” Positive“We had a great evening” Negative“The leading actor really sucked” … Negative“It is the worst film ever”
  • 51. Naïve Bayes 𝑃 𝑐 𝑥 ) = 𝑃 𝑥1 𝑐 ) ∗ 𝑃 𝑥2 𝑐 ) ∗ … ∗ 𝑃 𝑥𝑛 𝑐 ) ∗ 𝑃(𝑐) 𝑃(𝑥) “A great movie” – is it a positive review? 54 Prior probabilityLikelihood Marginal likelihoodPosterior probability 𝑃 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 "𝐺𝑟𝑒𝑎𝑡 𝑀𝑜𝑣𝑖𝑒" ) = 𝑃 "𝐺𝑟𝑒𝑎𝑡" 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒) ∗ 𝑃 "𝑀𝑜𝑣𝑖𝑒" 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒) ∗ 𝑃(𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒) 𝑃 "Great" ∗ 𝑃("𝑀𝑜𝑣𝑖𝑒") Prior probability – What is the probability of a positive review Likelihood – what is the probability to find the word X in a positive review Marginal likelihood – What is the probability of the word in all the set (positive & negative) Posterior probability – What is the probability of the word X to indicate a positive review
  • 52. Naïve Bayes 𝑃 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 "𝐴 𝑔𝑟𝑒𝑎𝑡 𝑚𝑜𝑣𝑖𝑒" ) = 𝑃 "𝐺𝑟𝑒𝑎𝑡" 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒) ∗ 𝑃 "𝑀𝑜𝑣𝑖𝑒" 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒) ∗ 𝑃(𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒) 𝑃(𝑋) 𝑃 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒 "𝐴 𝑔𝑟𝑒𝑎𝑡 𝑚𝑜𝑣𝑖𝑒") = 𝑃 "𝐺𝑟𝑒𝑎𝑡" 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒) ∗ 𝑃 "𝑀𝑜𝑣𝑖𝑒" 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒) ∗ 𝑃(𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒) 𝑃(𝑋) 𝑃 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 "𝐴 𝑔𝑟𝑒𝑎𝑡 𝑚𝑜𝑣𝑖𝑒") > ? < 𝑃 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒 "𝐴 𝑔𝑟𝑒𝑎𝑡 𝑚𝑜𝑣𝑖𝑒") 55
  • 53. Naïve Bayes algorithm 1. Extract every word (get rid of words like the/is/a etc.). 2. Calculate the probability of each word in positive comments. 𝑃𝑃𝑜𝑠("𝑤𝑜𝑟𝑑") = Sum(freq. of “word” in positive comments) Sum (Freq. of “word” in the entire set). 3. For every sentence, calculate PPos and PNeg. 𝑃𝑃𝑜𝑠 sentence = 𝑃𝑃𝑜𝑠 word1 ∗ 𝑃𝑃𝑜𝑠 𝑤𝑜𝑟𝑑2 ∗ ⋯ ∗ 𝑃𝑃𝑜𝑠 𝐴𝑙𝑙 𝑃𝑁𝑒𝑔 sentence = (1 − 𝑃𝑃𝑜𝑠 word1 ) ∗ (1 − 𝑃𝑃𝑜𝑠 𝑤𝑜𝑟𝑑2 ) ∗ ⋯ ∗ (1 − 𝑃𝑃𝑜𝑠 𝐴𝑙𝑙 ) 4. Compare PPos(sentence) and Pneg(sentence) 56 ClassificationMovie review Positive“The movie was pretty good” Negative“It was boring. I almost fell asleep” Positive“We had a great evening” Negative“The leading actor really sucked” … Negative“It is the worst film ever” PPos(word)Word 95%Great 10%Boring 50%Movie 10%Worst … For the entire set: 55% positive -> PPos (All) PPos(“The movie is great”) = 0.5*0.95*0.55 = 0.261 PNeg(“The movie is great”) = (1-0.5)*(1-0.95)* (1-0.55) = 0.011 0.261 > 0.011 → Positive 
  • 54. Naïve Bayes – Continuous values (Gaussian) 57 Salary Age Features: Age, Salary Blue circle = did not purchase = 40 Red cross = purchased = 30 did not purchase = 15 purchased = 10 The chance that X will purchase = The chance the customers around X purchased * The chance of purchasing in general / The chance for a customer to be around X = (# of purchases around X/Total purchases) * (# of purchases/Total customers) / (Total customers around X/ Total purchases) 𝑃 𝑝𝑢𝑟𝑐ℎ𝑎𝑠𝑒 𝑥) = 𝑃 𝑥 𝑝𝑢𝑟𝑐ℎ𝑎𝑠𝑒) ∗ 𝑃(𝑝𝑢𝑟𝑐ℎ𝑎𝑠𝑒) 𝑃(𝑥) 10 30 ∗ 30 70 25 70 = 0.4 = 40% Assuming normal distribution around X
  • 55. Naïve Bayes Algorithm • Each attribute (in our case – word) is used independently (hence the term “naïve”). • Phrases are not taken under consideration like “far out”. • Simple to understand • Fast training • Stable – insensitive to small changes in the training data • Can be very robust for solving many classification problems – especially in cases : • There is a small amount of training data • You don’t have a lot of knowledge about the problem itself 58
  • 58. K-NN (K Nearest Neighbors) 61
  • 60. Common ML algorithms for clustering • K-Means • Fuzzy clustering • Hierarchical clustering • Density based clustering • Distribution based clustering 63 Clustering
  • 61. Clustering use case example • A cellular company need to put antennas in a region so that its users receive optimum signal processing • Locating police stations so they can arrive fast to areas of high crime rate. • Identify important products features from customer feedbacks, emails etc. • Perform efficient data compression 64
  • 62. Reviews theme clustering • We need to represent each review to have numeric attributes. • In this example we’ll use a technique called “Term Frequency Representation”(TFR). Sample: “With tears in my eyes” All words: (movie, good, bad, with, boring, tears, yesterday, my, eyes) (0, 0, 0, 1, 0, 1, 0, 1, 1 ) We represent each review using frequencies of words. Some words characterize a document more than the others: “With tears in my eyes”. These words usually occur more rarely and differentiate the review from the others. 65
  • 63. Reviews theme clustering Some words characterize a document more than the others: “With tears in my eyes”. These words usually occur more rarely and differentiate the review from the others. “With tears in my eyes”. We now weight the word frequencies to make the rare words stand out and the common words to have minimal weight. New weight= 1/frequency of the word This representation is called Term Frequency – Inverse Document Frequency (TF-IDF) 66 Common Rare Common Common Rare
  • 64. K-Means algorithm • Every review is a tuple with N numbers: (0, 3, 0, 4, 0, ….) • So every review is a point in an N- dimensional space hypercube. • With K-means algorithm you define K which is “how many groups you want to converge in clusters”. 1. Initialize the mean points (also call “centroids”). 67
  • 65. K-Means iteration 1 2. Assign each review (point) to the nearest centroid. 3. Look at each cluster and find a new centroid for the cluster. 4. Repeat 2,3 until the means stop changing. 68
  • 66. K-Means iteration 2 2. Assign each review (point) to the nearest centroid. 3. Look at each cluster and find a new centroid for the cluster. 4. Repeat 2,3 until the means stop changing. 69
  • 67. K-Means iteration 3 2. Assign each review (point) to the nearest centroid. 3. Look at each cluster and find a new centroid for the cluster. 4. Repeat 2,3 until the means stop changing. 70
  • 68. K-Means iteration 4 2. Assign each review (point) to the nearest centroid. 3. Look at each cluster and find a new centroid for the cluster. 4. Repeat 2,3 until the means stop changing. 71
  • 69. K-Means iteration 5 2. Assign each review (point) to the nearest centroid. 3. Look at each cluster and find a new centroid for the cluster. 4. Repeat 2,3 until the means stop changing. 72
  • 70. K-Means iteration 6 2. Assign each review (point) to the nearest centroid. 3. Look at each cluster and find a new centroid for the cluster. 4. Repeat 2,3 until the means stop changing. 73
  • 71. K-Means algorithm 1. Initialize the mean points (also call “centroids”). 2. Assign each review (point) to the nearest centroid. 3. Look at each cluster and find a new centroid for the cluster. 4. Repeat 2,3 until the means stop changing. 74
  • 72. Reviews theme clustering • We need to represent each review to have numeric attributes. • In this example we’ll use a technique called “Term Frequency Representation”(TFR). Sample: “With tears in my eyes” All words: (movie, good, bad, with, boring, tears, yesterday, my, eyes) (0, 0, 0, 1, 0, 1, 0, 1, 1 ) We represent each review using frequencies of words. 75
  • 73. Reviews theme clustering Some words characterize a document more than the others: “With tears in my eyes”. These words usually occur more rarely and differentiate the review from the others. “With tears in my eyes”. We now weight the word frequencies to make the rare words stand out and the common words to have minimal weight. New weight= 1/frequency of the word This representation is called Term Frequency – Inverse Document Frequency (TF-IDF) 76 Common Rare Common Common Rare
  • 74. K-Means algorithm • Every review is a tuple with N numbers: (0, 3, 0, 4, 0, ….) • So every review is a point in an N- dimensional space hypercube. 77
  • 75. Clustering vs. Classification • Classification – When you want classify data into pre-defined categories. • Clustering – Grouping data into a set of categories that is NOT known before hand. • We can mix them both: • Start with clustering the data • Then train the data to recognize each cluster and create a model. • Use the classification model to classify new data. 78