4
Most read
8
Most read
9
Most read
Support Vector Machine (SVM)
Dr. Prasenjit Dey
Dr.
Prasenjit
Dey
Linear Separators
• What could be the optimal line to separate the blue dots from red dots?
Dr.
Prasenjit
Dey
Support Vector Machine (SVM)
• A SVM is a classifier that provides an optimal hyperplane on the feature space using the
training data.
Optimal hyperplane
Dr.
Prasenjit
Dey
Classification margin
w
x
w b
r i
T


r
ρ
• Margin is the perpendicular distance between the closest data points
and the hyperplane.
• The closest point where the margin distance is calculated is
called the support vector.
• Margin of the hyperplane is the distance between support
vectors.
• Distance from an example xi to the hyperplane is:
ρ
Dr.
Prasenjit
Dey
Maximum classification margin
• The best optimal hyperplane with maximum margin is called maximum margin
hyperplane.
• It is an important observation that for maximizing the margin, only the support vectors
are matter. The remaining examples are ignorable.
r
ρ
Dr.
Prasenjit
Dey
Mathematical representation of the linear SVM (contd.)
• For every support vector xs , the above inequality is an equality. After rescaling w and b
by ρ/2 in the equality, we obtain that distance between each xs and the hyperplane is
w
2
2 
 r

• Then the margin can be expressed through (rescaled) w and b as:
• Objective is to find w and b such that is maximized and
for all (xi, yi), i=1..n : yi(wTxi + b) ≥ 1
w
2


w
w
x
w 1
)
(
y



b
r s
T
s
• The objective can be reformulated as: find w and b such that Φ(w) = ||w||2= wT w is minimized
and for all (xi, yi), i=1..n : yi (wTxi + b) ≥ 1
Dr.
Prasenjit
Dey
Mathematical representation of the linear SVM
• Let the training set is {(xi, yi)}i=1..n, xiRd, yi  {-1, 1} and the hyperplane with
margin ρ is the separator for the training set.
wTxi + b ≤ - ρ/2 if yi = -1
wTxi + b ≥ ρ/2 if yi = 1
or, yi(wTxi + b) ≥ ρ/2
• Then for each training sample (xi, yi):
r
ρ
Dr.
Prasenjit
Dey
Linear and non-linear data
0
0
0
x2
x
x
x
• For linearly separable data, SVM is perform well even the data contain noise.
• However, if the data are non-linear, for SVM it is too hard to draw the separator line.
• Solution: Map the data into a the higher dimensional space:
Dr.
Prasenjit
Dey
Non-linear data: example 1
Hyperplane in the higher
dimension
• The original feature space can always be mapped to some higher dimensional feature space
where the training set is separable.
Dr.
Prasenjit
Dey
Non-linear data: example 2
• For this hyperplane, three red dots are fall into
the blue categories (misclassification)
• Here, the classification is not perfect
• This separator removes the misclassification.
However, it is difficult to train model like this
• For this, the regularization parameter is
required
Dr.
Prasenjit
Dey
• The linear classifier relies on inner product between vectors K(xi,xj) = xi
Txj
• If every datapoint is mapped into high-dimensional space via some transformation Φ: x → φ(x), the inner
product becomes: K(xi,xj)= φ(xi) Tφ(xj)
• A kernel function is a function that is equivalent to an inner product in some feature space.
• Example: 2-dimensional vectors x=[x1 x2]; let K(xi,xj) = (1 + xi
Txj)2
,
Need to show that K(xi,xj) = φ(xi) Tφ(xj):
K(xi,xj)=(1 + xi
Txj)2
,= 1+ xi1
2xj1
2 + 2 xi1xj1 xi2xj2+ xi2
2xj2
2 + 2xi1xj1 + 2xi2xj2=
= [1 xi1
2 √2 xi1xi2 xi2
2 √2xi1 √2xi2]T [1 xj1
2 √2 xj1xj2 xj2
2 √2xj1 √2xj2] =
= φ(xi) Tφ(xj), where φ(x) = [1 x1
2 √2 x1x2 x2
2 √2x1 √2x2]
• Thus, a kernel function implicitly maps data to a high-dimensional space (without the need to compute
each φ(x) explicitly).
The Kernel Functions
Dr.
Prasenjit
Dey
The various kernel functions
• Linear: K(xi,xj)= xi
Txj
Mapping Φ: x → φ(x), where φ(x) is x itself
• Polynomial of power p: K(xi,xj)= (1+ xi
Txj)p
Mapping Φ: x → φ(x), where φ(x) has dimensions
• Gaussian (radial-basis function): K(xi,xj) =
Mapping Φ: x → φ(x), where φ(x) is infinite-dimensional
 every point is mapped to a function (a Gaussian)
 combination of functions for support vectors is the separator.
2
2
2
j
i
e
x
x 








 
p
p
d
Dr.
Prasenjit
Dey
The main idea of SVM is summarized below
• Margin, Regularization, Gamma, Kernel
• Define an optimal hyperplane: maximize the margin
• Generalize to non-linearly separable problems: use penalty based regularization to deal with the
misclassification
• Map the data into a higher dimensional space where it is easier to classify with linear decision
surface: use the kernel function for transformation of the data from one feature space to another.
Tunable parameters of SVM
Dr.
Prasenjit
Dey
Regularization
• For non-linearly separable problems, slack variables ξi can be added to allow misclassification of
difficult or noisy examples. Here, the margin is called soft margin
ξi
ξi
• For soft margin classification, the old formulation of objective
is modified:
Find w and b such that
Φ(w) = wTw + CΣξi is minimized
and for all (xi ,yi), i=1..n : yi (wTxi + b) ≥ 1 – ξi, , ξi ≥ 0
• Parameter C can be viewed as a way to control overfitting: it
“trades off” the relative importance of maximizing the margin
and fitting the training data.
Dr.
Prasenjit
Dey
The effect of regularization parameter ‘C’:
• For small value of C  large margin (possible of misclassification)  underfitting
• For large value of C  small margin  overfitting
For small C
For large C
Dr.
Prasenjit
Dey
Gamma
• Gamma parameter involves with the RBF kernel function. It controls the distance of influence of a single
training point.
• Low values of gamma indicates a large similarity radius which results in more points being grouped
together.
• For high values of gamma, the points need to be very close to each other in order to be considered in the
same group (or class).
Low gamma
High gamma
Dr.
Prasenjit
Dey
The effect of Gamma:
Gamma=0.001
Gamma=0.01
Gamma=0.1 Gamma=1 (Chances of overfitting)
(Considering as one class)
Dr.
Prasenjit
Dey
Thank you
Dr.
Prasenjit
Dey

More Related Content

PDF
Supervised and Unsupervised Machine Learning
PPTX
Random Forest
PPTX
Support vector machine
PPTX
Support vector machine
PPT
Support Vector Machines
PPTX
K-Means Clustering Algorithm.pptx
PPT
Naive bayes
PDF
Understanding Bagging and Boosting
Supervised and Unsupervised Machine Learning
Random Forest
Support vector machine
Support vector machine
Support Vector Machines
K-Means Clustering Algorithm.pptx
Naive bayes
Understanding Bagging and Boosting

What's hot (20)

PDF
Support Vector Machines for Classification
PPTX
Unsupervised learning
DOCX
Exercise solution of chapter1 of datawarehouse cs614(solution of exercise)
PPTX
Presentation on unsupervised learning
PPTX
Activation function
PPTX
Support Vector Machine ppt presentation
PDF
Support Vector Machines ( SVM )
PDF
Feature Engineering
PPTX
Introduction to Machine Learning
PDF
Mining Frequent Patterns And Association Rules
PPT
Support Vector machine
PPTX
Support vector machine
PPTX
Introduction to Linear Discriminant Analysis
ODP
Machine Learning with Decision trees
PDF
Nonlinear component analysis as a kernel eigenvalue problem
PPTX
Support vector machines (svm)
PDF
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
PPTX
Hidden Markov Model - The Most Probable Path
PDF
Deep Dive into Hyperparameter Tuning
PDF
Supervised vs Unsupervised vs Reinforcement Learning | Edureka
Support Vector Machines for Classification
Unsupervised learning
Exercise solution of chapter1 of datawarehouse cs614(solution of exercise)
Presentation on unsupervised learning
Activation function
Support Vector Machine ppt presentation
Support Vector Machines ( SVM )
Feature Engineering
Introduction to Machine Learning
Mining Frequent Patterns And Association Rules
Support Vector machine
Support vector machine
Introduction to Linear Discriminant Analysis
Machine Learning with Decision trees
Nonlinear component analysis as a kernel eigenvalue problem
Support vector machines (svm)
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
Hidden Markov Model - The Most Probable Path
Deep Dive into Hyperparameter Tuning
Supervised vs Unsupervised vs Reinforcement Learning | Edureka
Ad

Similar to Support vector machine (20)

PPTX
svm-proyekt.pptx
PPTX
Support Vector Machine.pptx
PPTX
Lecture09 SVM Intro, Kernel Trick (updated).pptx
PPTX
ML-Lec-17-SVM,sshwqw - Non-Linear (1).pptx
PPTX
Classification-Support Vector Machines.pptx
PPTX
SVM[Support vector Machine] Machine learning
PDF
Machine Learning-Lec8 support vector machine.pdf
PPTX
classification algorithms in machine learning.pptx
PPTX
Support Vector Machine topic of machine learning.pptx
DOC
Introduction to Support Vector Machines
PPTX
Support vector machines
PPTX
Support vector machine-SVM's
PPT
PERFORMANCE EVALUATION PARAMETERS FOR MACHINE LEARNING
PPTX
Support vector machine learning.pptx
PPT
Introduction to Support Vector Machine 221 CMU.ppt
PDF
Support Vector Machines: Optimal Hyperplane for Classification and Regression
PPT
Support Vector Machine using machin learning
PPT
4.Support Vector Machines.ppt machine learning and development
PDF
Data Science - Part IX - Support Vector Machine
PPT
Support Vector Machine.ppt
svm-proyekt.pptx
Support Vector Machine.pptx
Lecture09 SVM Intro, Kernel Trick (updated).pptx
ML-Lec-17-SVM,sshwqw - Non-Linear (1).pptx
Classification-Support Vector Machines.pptx
SVM[Support vector Machine] Machine learning
Machine Learning-Lec8 support vector machine.pdf
classification algorithms in machine learning.pptx
Support Vector Machine topic of machine learning.pptx
Introduction to Support Vector Machines
Support vector machines
Support vector machine-SVM's
PERFORMANCE EVALUATION PARAMETERS FOR MACHINE LEARNING
Support vector machine learning.pptx
Introduction to Support Vector Machine 221 CMU.ppt
Support Vector Machines: Optimal Hyperplane for Classification and Regression
Support Vector Machine using machin learning
4.Support Vector Machines.ppt machine learning and development
Data Science - Part IX - Support Vector Machine
Support Vector Machine.ppt
Ad

More from Prasenjit Dey (20)

PPTX
Dynamic interconnection networks
PPTX
Machine Learning in Agriculture Module 6: classification
PPTX
Machine Learning in Agriculture Module 3: linear regression
PPTX
Machine learning in agriculture module 2
PPTX
Machine Learning in Agriculture Module 1
PPTX
Numerical on general pipelines
PPTX
General pipeline concepts
PPTX
Evaluation of computer performance
PPTX
Instruction Set Architecture: MIPS
PPTX
Page replacement and thrashing
PPTX
Addressing mode
PPTX
Register transfer and microoperations part 2
PPTX
Instruction set (prasenjit dey)
PPTX
Register transfer and microoperations part 1
PPTX
Different types of memory and hardware designs of RAM and ROM
PPTX
Cache memory
PPTX
Carry look ahead adder
PPTX
Binary division restoration and non restoration algorithm
PPTX
Booth's algorithm
PPTX
Computer organization basics and number systems
Dynamic interconnection networks
Machine Learning in Agriculture Module 6: classification
Machine Learning in Agriculture Module 3: linear regression
Machine learning in agriculture module 2
Machine Learning in Agriculture Module 1
Numerical on general pipelines
General pipeline concepts
Evaluation of computer performance
Instruction Set Architecture: MIPS
Page replacement and thrashing
Addressing mode
Register transfer and microoperations part 2
Instruction set (prasenjit dey)
Register transfer and microoperations part 1
Different types of memory and hardware designs of RAM and ROM
Cache memory
Carry look ahead adder
Binary division restoration and non restoration algorithm
Booth's algorithm
Computer organization basics and number systems

Recently uploaded (20)

PPTX
PPT for Diseases.pptx, there are 3 types of diseases
PPTX
Chapter security of computer_8_v8.1.pptx
PPTX
Introduction to Fundamentals of Data Security
PDF
Hikvision-IR-PPT---EN.pdfSADASDASSAAAAAAAAAAAAAAA
PPTX
OJT-Narrative-Presentation-Entrep-group.pptx_20250808_102837_0000.pptx
PPT
expt-design-lecture-12 hghhgfggjhjd (1).ppt
PPTX
Hushh Hackathon for IIT Bombay: Create your very own Agents
PPTX
865628565-Pertemuan-2-chapter-03-NUMERICAL-MEASURES.pptx
PPTX
GPS sensor used agriculture land for automation
PPTX
chuitkarjhanbijunsdivndsijvndiucbhsaxnmzsicvjsd
PPT
dsa Lec-1 Introduction FOR THE STUDENTS OF bscs
PPTX
ifsm.pptx, institutional food service management
PPTX
indiraparyavaranbhavan-240418134200-31d840b3.pptx
PPTX
Hushh.ai: Your Personal Data, Your Business
PPTX
Machine Learning and working of machine Learning
PPTX
langchainpptforbeginners_easy_explanation.pptx
PPT
PROJECT CYCLE MANAGEMENT FRAMEWORK (PCM).ppt
PDF
book-34714 (2).pdfhjkkljgfdssawtjiiiiiujj
PPTX
9 Bioterrorism.pptxnsbhsjdgdhdvkdbebrkndbd
PPTX
Sheep Seg. Marketing Plan_C2 2025 (1).pptx
PPT for Diseases.pptx, there are 3 types of diseases
Chapter security of computer_8_v8.1.pptx
Introduction to Fundamentals of Data Security
Hikvision-IR-PPT---EN.pdfSADASDASSAAAAAAAAAAAAAAA
OJT-Narrative-Presentation-Entrep-group.pptx_20250808_102837_0000.pptx
expt-design-lecture-12 hghhgfggjhjd (1).ppt
Hushh Hackathon for IIT Bombay: Create your very own Agents
865628565-Pertemuan-2-chapter-03-NUMERICAL-MEASURES.pptx
GPS sensor used agriculture land for automation
chuitkarjhanbijunsdivndsijvndiucbhsaxnmzsicvjsd
dsa Lec-1 Introduction FOR THE STUDENTS OF bscs
ifsm.pptx, institutional food service management
indiraparyavaranbhavan-240418134200-31d840b3.pptx
Hushh.ai: Your Personal Data, Your Business
Machine Learning and working of machine Learning
langchainpptforbeginners_easy_explanation.pptx
PROJECT CYCLE MANAGEMENT FRAMEWORK (PCM).ppt
book-34714 (2).pdfhjkkljgfdssawtjiiiiiujj
9 Bioterrorism.pptxnsbhsjdgdhdvkdbebrkndbd
Sheep Seg. Marketing Plan_C2 2025 (1).pptx

Support vector machine

  • 1. Support Vector Machine (SVM) Dr. Prasenjit Dey Dr. Prasenjit Dey
  • 2. Linear Separators • What could be the optimal line to separate the blue dots from red dots? Dr. Prasenjit Dey
  • 3. Support Vector Machine (SVM) • A SVM is a classifier that provides an optimal hyperplane on the feature space using the training data. Optimal hyperplane Dr. Prasenjit Dey
  • 4. Classification margin w x w b r i T   r ρ • Margin is the perpendicular distance between the closest data points and the hyperplane. • The closest point where the margin distance is calculated is called the support vector. • Margin of the hyperplane is the distance between support vectors. • Distance from an example xi to the hyperplane is: ρ Dr. Prasenjit Dey
  • 5. Maximum classification margin • The best optimal hyperplane with maximum margin is called maximum margin hyperplane. • It is an important observation that for maximizing the margin, only the support vectors are matter. The remaining examples are ignorable. r ρ Dr. Prasenjit Dey
  • 6. Mathematical representation of the linear SVM (contd.) • For every support vector xs , the above inequality is an equality. After rescaling w and b by ρ/2 in the equality, we obtain that distance between each xs and the hyperplane is w 2 2   r  • Then the margin can be expressed through (rescaled) w and b as: • Objective is to find w and b such that is maximized and for all (xi, yi), i=1..n : yi(wTxi + b) ≥ 1 w 2   w w x w 1 ) ( y    b r s T s • The objective can be reformulated as: find w and b such that Φ(w) = ||w||2= wT w is minimized and for all (xi, yi), i=1..n : yi (wTxi + b) ≥ 1 Dr. Prasenjit Dey
  • 7. Mathematical representation of the linear SVM • Let the training set is {(xi, yi)}i=1..n, xiRd, yi  {-1, 1} and the hyperplane with margin ρ is the separator for the training set. wTxi + b ≤ - ρ/2 if yi = -1 wTxi + b ≥ ρ/2 if yi = 1 or, yi(wTxi + b) ≥ ρ/2 • Then for each training sample (xi, yi): r ρ Dr. Prasenjit Dey
  • 8. Linear and non-linear data 0 0 0 x2 x x x • For linearly separable data, SVM is perform well even the data contain noise. • However, if the data are non-linear, for SVM it is too hard to draw the separator line. • Solution: Map the data into a the higher dimensional space: Dr. Prasenjit Dey
  • 9. Non-linear data: example 1 Hyperplane in the higher dimension • The original feature space can always be mapped to some higher dimensional feature space where the training set is separable. Dr. Prasenjit Dey
  • 10. Non-linear data: example 2 • For this hyperplane, three red dots are fall into the blue categories (misclassification) • Here, the classification is not perfect • This separator removes the misclassification. However, it is difficult to train model like this • For this, the regularization parameter is required Dr. Prasenjit Dey
  • 11. • The linear classifier relies on inner product between vectors K(xi,xj) = xi Txj • If every datapoint is mapped into high-dimensional space via some transformation Φ: x → φ(x), the inner product becomes: K(xi,xj)= φ(xi) Tφ(xj) • A kernel function is a function that is equivalent to an inner product in some feature space. • Example: 2-dimensional vectors x=[x1 x2]; let K(xi,xj) = (1 + xi Txj)2 , Need to show that K(xi,xj) = φ(xi) Tφ(xj): K(xi,xj)=(1 + xi Txj)2 ,= 1+ xi1 2xj1 2 + 2 xi1xj1 xi2xj2+ xi2 2xj2 2 + 2xi1xj1 + 2xi2xj2= = [1 xi1 2 √2 xi1xi2 xi2 2 √2xi1 √2xi2]T [1 xj1 2 √2 xj1xj2 xj2 2 √2xj1 √2xj2] = = φ(xi) Tφ(xj), where φ(x) = [1 x1 2 √2 x1x2 x2 2 √2x1 √2x2] • Thus, a kernel function implicitly maps data to a high-dimensional space (without the need to compute each φ(x) explicitly). The Kernel Functions Dr. Prasenjit Dey
  • 12. The various kernel functions • Linear: K(xi,xj)= xi Txj Mapping Φ: x → φ(x), where φ(x) is x itself • Polynomial of power p: K(xi,xj)= (1+ xi Txj)p Mapping Φ: x → φ(x), where φ(x) has dimensions • Gaussian (radial-basis function): K(xi,xj) = Mapping Φ: x → φ(x), where φ(x) is infinite-dimensional  every point is mapped to a function (a Gaussian)  combination of functions for support vectors is the separator. 2 2 2 j i e x x            p p d Dr. Prasenjit Dey
  • 13. The main idea of SVM is summarized below • Margin, Regularization, Gamma, Kernel • Define an optimal hyperplane: maximize the margin • Generalize to non-linearly separable problems: use penalty based regularization to deal with the misclassification • Map the data into a higher dimensional space where it is easier to classify with linear decision surface: use the kernel function for transformation of the data from one feature space to another. Tunable parameters of SVM Dr. Prasenjit Dey
  • 14. Regularization • For non-linearly separable problems, slack variables ξi can be added to allow misclassification of difficult or noisy examples. Here, the margin is called soft margin ξi ξi • For soft margin classification, the old formulation of objective is modified: Find w and b such that Φ(w) = wTw + CΣξi is minimized and for all (xi ,yi), i=1..n : yi (wTxi + b) ≥ 1 – ξi, , ξi ≥ 0 • Parameter C can be viewed as a way to control overfitting: it “trades off” the relative importance of maximizing the margin and fitting the training data. Dr. Prasenjit Dey
  • 15. The effect of regularization parameter ‘C’: • For small value of C  large margin (possible of misclassification)  underfitting • For large value of C  small margin  overfitting For small C For large C Dr. Prasenjit Dey
  • 16. Gamma • Gamma parameter involves with the RBF kernel function. It controls the distance of influence of a single training point. • Low values of gamma indicates a large similarity radius which results in more points being grouped together. • For high values of gamma, the points need to be very close to each other in order to be considered in the same group (or class). Low gamma High gamma Dr. Prasenjit Dey
  • 17. The effect of Gamma: Gamma=0.001 Gamma=0.01 Gamma=0.1 Gamma=1 (Chances of overfitting) (Considering as one class) Dr. Prasenjit Dey