SlideShare a Scribd company logo
3
Most read
4
Most read
9
Most read
Support Vector Machine (SVM)
Dr. Prasenjit Dey
Dr.
Prasenjit
Dey
Linear Separators
• What could be the optimal line to separate the blue dots from red dots?
Dr.
Prasenjit
Dey
Support Vector Machine (SVM)
• A SVM is a classifier that provides an optimal hyperplane on the feature space using the
training data.
Optimal hyperplane
Dr.
Prasenjit
Dey
Classification margin
w
x
w b
r i
T


r
ρ
• Margin is the perpendicular distance between the closest data points
and the hyperplane.
• The closest point where the margin distance is calculated is
called the support vector.
• Margin of the hyperplane is the distance between support
vectors.
• Distance from an example xi to the hyperplane is:
ρ
Dr.
Prasenjit
Dey
Maximum classification margin
• The best optimal hyperplane with maximum margin is called maximum margin
hyperplane.
• It is an important observation that for maximizing the margin, only the support vectors
are matter. The remaining examples are ignorable.
r
ρ
Dr.
Prasenjit
Dey
Mathematical representation of the linear SVM (contd.)
• For every support vector xs , the above inequality is an equality. After rescaling w and b
by ρ/2 in the equality, we obtain that distance between each xs and the hyperplane is
w
2
2 
 r

• Then the margin can be expressed through (rescaled) w and b as:
• Objective is to find w and b such that is maximized and
for all (xi, yi), i=1..n : yi(wTxi + b) ≥ 1
w
2


w
w
x
w 1
)
(
y



b
r s
T
s
• The objective can be reformulated as: find w and b such that Φ(w) = ||w||2= wT w is minimized
and for all (xi, yi), i=1..n : yi (wTxi + b) ≥ 1
Dr.
Prasenjit
Dey
Mathematical representation of the linear SVM
• Let the training set is {(xi, yi)}i=1..n, xiRd, yi  {-1, 1} and the hyperplane with
margin ρ is the separator for the training set.
wTxi + b ≤ - ρ/2 if yi = -1
wTxi + b ≥ ρ/2 if yi = 1
or, yi(wTxi + b) ≥ ρ/2
• Then for each training sample (xi, yi):
r
ρ
Dr.
Prasenjit
Dey
Linear and non-linear data
0
0
0
x2
x
x
x
• For linearly separable data, SVM is perform well even the data contain noise.
• However, if the data are non-linear, for SVM it is too hard to draw the separator line.
• Solution: Map the data into a the higher dimensional space:
Dr.
Prasenjit
Dey
Non-linear data: example 1
Hyperplane in the higher
dimension
• The original feature space can always be mapped to some higher dimensional feature space
where the training set is separable.
Dr.
Prasenjit
Dey
Non-linear data: example 2
• For this hyperplane, three red dots are fall into
the blue categories (misclassification)
• Here, the classification is not perfect
• This separator removes the misclassification.
However, it is difficult to train model like this
• For this, the regularization parameter is
required
Dr.
Prasenjit
Dey
• The linear classifier relies on inner product between vectors K(xi,xj) = xi
Txj
• If every datapoint is mapped into high-dimensional space via some transformation Φ: x → φ(x), the inner
product becomes: K(xi,xj)= φ(xi) Tφ(xj)
• A kernel function is a function that is equivalent to an inner product in some feature space.
• Example: 2-dimensional vectors x=[x1 x2]; let K(xi,xj) = (1 + xi
Txj)2
,
Need to show that K(xi,xj) = φ(xi) Tφ(xj):
K(xi,xj)=(1 + xi
Txj)2
,= 1+ xi1
2xj1
2 + 2 xi1xj1 xi2xj2+ xi2
2xj2
2 + 2xi1xj1 + 2xi2xj2=
= [1 xi1
2 √2 xi1xi2 xi2
2 √2xi1 √2xi2]T [1 xj1
2 √2 xj1xj2 xj2
2 √2xj1 √2xj2] =
= φ(xi) Tφ(xj), where φ(x) = [1 x1
2 √2 x1x2 x2
2 √2x1 √2x2]
• Thus, a kernel function implicitly maps data to a high-dimensional space (without the need to compute
each φ(x) explicitly).
The Kernel Functions
Dr.
Prasenjit
Dey
The various kernel functions
• Linear: K(xi,xj)= xi
Txj
Mapping Φ: x → φ(x), where φ(x) is x itself
• Polynomial of power p: K(xi,xj)= (1+ xi
Txj)p
Mapping Φ: x → φ(x), where φ(x) has dimensions
• Gaussian (radial-basis function): K(xi,xj) =
Mapping Φ: x → φ(x), where φ(x) is infinite-dimensional
 every point is mapped to a function (a Gaussian)
 combination of functions for support vectors is the separator.
2
2
2
j
i
e
x
x 








 
p
p
d
Dr.
Prasenjit
Dey
The main idea of SVM is summarized below
• Margin, Regularization, Gamma, Kernel
• Define an optimal hyperplane: maximize the margin
• Generalize to non-linearly separable problems: use penalty based regularization to deal with the
misclassification
• Map the data into a higher dimensional space where it is easier to classify with linear decision
surface: use the kernel function for transformation of the data from one feature space to another.
Tunable parameters of SVM
Dr.
Prasenjit
Dey
Regularization
• For non-linearly separable problems, slack variables ξi can be added to allow misclassification of
difficult or noisy examples. Here, the margin is called soft margin
ξi
ξi
• For soft margin classification, the old formulation of objective
is modified:
Find w and b such that
Φ(w) = wTw + CΣξi is minimized
and for all (xi ,yi), i=1..n : yi (wTxi + b) ≥ 1 – ξi, , ξi ≥ 0
• Parameter C can be viewed as a way to control overfitting: it
“trades off” the relative importance of maximizing the margin
and fitting the training data.
Dr.
Prasenjit
Dey
The effect of regularization parameter ‘C’:
• For small value of C  large margin (possible of misclassification)  underfitting
• For large value of C  small margin  overfitting
For small C
For large C
Dr.
Prasenjit
Dey
Gamma
• Gamma parameter involves with the RBF kernel function. It controls the distance of influence of a single
training point.
• Low values of gamma indicates a large similarity radius which results in more points being grouped
together.
• For high values of gamma, the points need to be very close to each other in order to be considered in the
same group (or class).
Low gamma
High gamma
Dr.
Prasenjit
Dey
The effect of Gamma:
Gamma=0.001
Gamma=0.01
Gamma=0.1 Gamma=1 (Chances of overfitting)
(Considering as one class)
Dr.
Prasenjit
Dey
Thank you
Dr.
Prasenjit
Dey

More Related Content

PPTX
Support vector machine
Rishabh Gupta
 
PPT
2.6 support vector machines and associative classifiers revised
Krish_ver2
 
PPTX
Random forest algorithm
Rashid Ansari
 
PDF
Support Vector Machines for Classification
Prakash Pimpale
 
PPTX
Support vector machines
Ujjawal
 
PPTX
Support vector machine
zekeLabs Technologies
 
PDF
Unsupervised Learning in Machine Learning
Pyingkodi Maran
 
Support vector machine
Rishabh Gupta
 
2.6 support vector machines and associative classifiers revised
Krish_ver2
 
Random forest algorithm
Rashid Ansari
 
Support Vector Machines for Classification
Prakash Pimpale
 
Support vector machines
Ujjawal
 
Support vector machine
zekeLabs Technologies
 
Unsupervised Learning in Machine Learning
Pyingkodi Maran
 

What's hot (20)

PPTX
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Simplilearn
 
PDF
K - Nearest neighbor ( KNN )
Mohammad Junaid Khan
 
PDF
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Edureka!
 
PPTX
Knn Algorithm presentation
RishavSharma112
 
PPTX
Lect5 principal component analysis
hktripathy
 
ODP
Machine Learning With Logistic Regression
Knoldus Inc.
 
PPTX
K Nearest Neighbor Algorithm
Tharuka Vishwajith Sarathchandra
 
PDF
Csc446: Pattern Recognition
Mostafa G. M. Mostafa
 
PPTX
Introduction to Linear Discriminant Analysis
Jaclyn Kokx
 
PPTX
random forest regression
Akhilesh Joshi
 
PPTX
Linear regression in machine learning
Shajun Nisha
 
PPTX
Support Vector Machine (SVM)
Sana Rahim
 
PPTX
Machine Learning-Linear regression
kishanthkumaar
 
PPTX
Data Science Interview Questions | Data Science Interview Questions And Answe...
Simplilearn
 
PPTX
Image Classification And Support Vector Machine
Shao-Chuan Wang
 
PPT
Support Vector Machines
nextlib
 
PPT
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Salah Amean
 
PPT
Bayseian decision theory
sia16
 
PPT
Decision tree and random forest
Lippo Group Digital
 
PPTX
DBSCAN : A Clustering Algorithm
Pınar Yahşi
 
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Simplilearn
 
K - Nearest neighbor ( KNN )
Mohammad Junaid Khan
 
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Edureka!
 
Knn Algorithm presentation
RishavSharma112
 
Lect5 principal component analysis
hktripathy
 
Machine Learning With Logistic Regression
Knoldus Inc.
 
K Nearest Neighbor Algorithm
Tharuka Vishwajith Sarathchandra
 
Csc446: Pattern Recognition
Mostafa G. M. Mostafa
 
Introduction to Linear Discriminant Analysis
Jaclyn Kokx
 
random forest regression
Akhilesh Joshi
 
Linear regression in machine learning
Shajun Nisha
 
Support Vector Machine (SVM)
Sana Rahim
 
Machine Learning-Linear regression
kishanthkumaar
 
Data Science Interview Questions | Data Science Interview Questions And Answe...
Simplilearn
 
Image Classification And Support Vector Machine
Shao-Chuan Wang
 
Support Vector Machines
nextlib
 
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Salah Amean
 
Bayseian decision theory
sia16
 
Decision tree and random forest
Lippo Group Digital
 
DBSCAN : A Clustering Algorithm
Pınar Yahşi
 
Ad

Similar to Support vector machine (20)

PPTX
svm-proyekt.pptx
ElinEliyev
 
PDF
Support Vector Machines ( SVM )
Mohammad Junaid Khan
 
PPTX
Support Vector Machine.pptx
HarishNayak44
 
PPTX
Lecture09 SVM Intro, Kernel Trick (updated).pptx
DrMTayyabChaudhry1
 
PPTX
ML-Lec-17-SVM,sshwqw - Non-Linear (1).pptx
shafanahmad06
 
PPTX
Classification-Support Vector Machines.pptx
Ciceer Ghimirey
 
PPTX
SVM[Support vector Machine] Machine learning
aawezix
 
PDF
Machine Learning-Lec8 support vector machine.pdf
BeshoyArnest
 
PPTX
classification algorithms in machine learning.pptx
jasontseng19
 
PPTX
Support Vector Machine topic of machine learning.pptx
CodingChamp1
 
DOC
Introduction to Support Vector Machines
Silicon Mentor
 
PPTX
Support vector machines
manaswinimysore
 
PPTX
Support vector machine-SVM's
Anudeep Chowdary Kamepalli
 
PPT
PERFORMANCE EVALUATION PARAMETERS FOR MACHINE LEARNING
abeeratariq20011
 
PPTX
Support vector machine learning.pptx
Abhiroop Bhattacharya
 
PPT
Introduction to Support Vector Machine 221 CMU.ppt
MuhammadImtiazHossai
 
PDF
Support Vector Machines: Optimal Hyperplane for Classification and Regression
adityacse1001
 
PPT
Support Vector Machine using machin learning
GauravRaj772344
 
PPT
4.Support Vector Machines.ppt machine learning and development
PriyankaRamavath3
 
PDF
Data Science - Part IX - Support Vector Machine
Derek Kane
 
svm-proyekt.pptx
ElinEliyev
 
Support Vector Machines ( SVM )
Mohammad Junaid Khan
 
Support Vector Machine.pptx
HarishNayak44
 
Lecture09 SVM Intro, Kernel Trick (updated).pptx
DrMTayyabChaudhry1
 
ML-Lec-17-SVM,sshwqw - Non-Linear (1).pptx
shafanahmad06
 
Classification-Support Vector Machines.pptx
Ciceer Ghimirey
 
SVM[Support vector Machine] Machine learning
aawezix
 
Machine Learning-Lec8 support vector machine.pdf
BeshoyArnest
 
classification algorithms in machine learning.pptx
jasontseng19
 
Support Vector Machine topic of machine learning.pptx
CodingChamp1
 
Introduction to Support Vector Machines
Silicon Mentor
 
Support vector machines
manaswinimysore
 
Support vector machine-SVM's
Anudeep Chowdary Kamepalli
 
PERFORMANCE EVALUATION PARAMETERS FOR MACHINE LEARNING
abeeratariq20011
 
Support vector machine learning.pptx
Abhiroop Bhattacharya
 
Introduction to Support Vector Machine 221 CMU.ppt
MuhammadImtiazHossai
 
Support Vector Machines: Optimal Hyperplane for Classification and Regression
adityacse1001
 
Support Vector Machine using machin learning
GauravRaj772344
 
4.Support Vector Machines.ppt machine learning and development
PriyankaRamavath3
 
Data Science - Part IX - Support Vector Machine
Derek Kane
 
Ad

More from Prasenjit Dey (20)

PPTX
Dynamic interconnection networks
Prasenjit Dey
 
PPTX
Machine Learning in Agriculture Module 6: classification
Prasenjit Dey
 
PPTX
Machine Learning in Agriculture Module 3: linear regression
Prasenjit Dey
 
PPTX
Machine learning in agriculture module 2
Prasenjit Dey
 
PPTX
Machine Learning in Agriculture Module 1
Prasenjit Dey
 
PPTX
Numerical on general pipelines
Prasenjit Dey
 
PPTX
General pipeline concepts
Prasenjit Dey
 
PPTX
Evaluation of computer performance
Prasenjit Dey
 
PPTX
Instruction Set Architecture: MIPS
Prasenjit Dey
 
PPTX
Page replacement and thrashing
Prasenjit Dey
 
PPTX
Addressing mode
Prasenjit Dey
 
PPTX
Register transfer and microoperations part 2
Prasenjit Dey
 
PPTX
Instruction set (prasenjit dey)
Prasenjit Dey
 
PPTX
Register transfer and microoperations part 1
Prasenjit Dey
 
PPTX
Different types of memory and hardware designs of RAM and ROM
Prasenjit Dey
 
PPTX
Cache memory
Prasenjit Dey
 
PPTX
Carry look ahead adder
Prasenjit Dey
 
PPTX
Binary division restoration and non restoration algorithm
Prasenjit Dey
 
PPTX
Booth's algorithm
Prasenjit Dey
 
PPTX
Computer organization basics and number systems
Prasenjit Dey
 
Dynamic interconnection networks
Prasenjit Dey
 
Machine Learning in Agriculture Module 6: classification
Prasenjit Dey
 
Machine Learning in Agriculture Module 3: linear regression
Prasenjit Dey
 
Machine learning in agriculture module 2
Prasenjit Dey
 
Machine Learning in Agriculture Module 1
Prasenjit Dey
 
Numerical on general pipelines
Prasenjit Dey
 
General pipeline concepts
Prasenjit Dey
 
Evaluation of computer performance
Prasenjit Dey
 
Instruction Set Architecture: MIPS
Prasenjit Dey
 
Page replacement and thrashing
Prasenjit Dey
 
Addressing mode
Prasenjit Dey
 
Register transfer and microoperations part 2
Prasenjit Dey
 
Instruction set (prasenjit dey)
Prasenjit Dey
 
Register transfer and microoperations part 1
Prasenjit Dey
 
Different types of memory and hardware designs of RAM and ROM
Prasenjit Dey
 
Cache memory
Prasenjit Dey
 
Carry look ahead adder
Prasenjit Dey
 
Binary division restoration and non restoration algorithm
Prasenjit Dey
 
Booth's algorithm
Prasenjit Dey
 
Computer organization basics and number systems
Prasenjit Dey
 

Recently uploaded (20)

PPTX
short term internship project on Data visualization
JMJCollegeComputerde
 
PPTX
Web dev -ppt that helps us understand web technology
shubhragoyal12
 
PDF
WISE main accomplishments for ISQOLS award July 2025.pdf
StatsCommunications
 
PPTX
M1-T1.pptxM1-T1.pptxM1-T1.pptxM1-T1.pptx
teodoroferiarevanojr
 
PPTX
Databricks-DE-Associate Certification Questions-june-2024.pptx
pedelli41
 
PDF
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
PPTX
Fluvial_Civilizations_Presentation (1).pptx
alisslovemendoza7
 
PPTX
Introduction to Biostatistics Presentation.pptx
AtemJoshua
 
PDF
blockchain123456789012345678901234567890
tanvikhunt1003
 
PPTX
World-population.pptx fire bunberbpeople
umutunsalnsl4402
 
PDF
Mastering Financial Analysis Materials.pdf
SalamiAbdullahi
 
PDF
717629748-Databricks-Certified-Data-Engineer-Professional-Dumps-by-Ball-21-03...
pedelli41
 
PPTX
Data Security Breach: Immediate Action Plan
varmabhuvan266
 
PDF
The_Future_of_Data_Analytics_by_CA_Suvidha_Chaplot_UPDATED.pdf
CA Suvidha Chaplot
 
PDF
TIC ACTIVIDAD 1geeeeeeeeeeeeeeeeeeeeeeeeeeeeeer3.pdf
Thais Ruiz
 
PPTX
IP_Journal_Articles_2025IP_Journal_Articles_2025
mishell212144
 
PPTX
Data-Driven Machine Learning for Rail Infrastructure Health Monitoring
Sione Palu
 
PDF
An Uncut Conversation With Grok | PDF Document
Mike Hydes
 
PPTX
Probability systematic sampling methods.pptx
PrakashRajput19
 
PPTX
INFO8116 - Week 10 - Slides.pptx data analutics
guddipatel10
 
short term internship project on Data visualization
JMJCollegeComputerde
 
Web dev -ppt that helps us understand web technology
shubhragoyal12
 
WISE main accomplishments for ISQOLS award July 2025.pdf
StatsCommunications
 
M1-T1.pptxM1-T1.pptxM1-T1.pptxM1-T1.pptx
teodoroferiarevanojr
 
Databricks-DE-Associate Certification Questions-june-2024.pptx
pedelli41
 
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
Fluvial_Civilizations_Presentation (1).pptx
alisslovemendoza7
 
Introduction to Biostatistics Presentation.pptx
AtemJoshua
 
blockchain123456789012345678901234567890
tanvikhunt1003
 
World-population.pptx fire bunberbpeople
umutunsalnsl4402
 
Mastering Financial Analysis Materials.pdf
SalamiAbdullahi
 
717629748-Databricks-Certified-Data-Engineer-Professional-Dumps-by-Ball-21-03...
pedelli41
 
Data Security Breach: Immediate Action Plan
varmabhuvan266
 
The_Future_of_Data_Analytics_by_CA_Suvidha_Chaplot_UPDATED.pdf
CA Suvidha Chaplot
 
TIC ACTIVIDAD 1geeeeeeeeeeeeeeeeeeeeeeeeeeeeeer3.pdf
Thais Ruiz
 
IP_Journal_Articles_2025IP_Journal_Articles_2025
mishell212144
 
Data-Driven Machine Learning for Rail Infrastructure Health Monitoring
Sione Palu
 
An Uncut Conversation With Grok | PDF Document
Mike Hydes
 
Probability systematic sampling methods.pptx
PrakashRajput19
 
INFO8116 - Week 10 - Slides.pptx data analutics
guddipatel10
 

Support vector machine

  • 1. Support Vector Machine (SVM) Dr. Prasenjit Dey Dr. Prasenjit Dey
  • 2. Linear Separators • What could be the optimal line to separate the blue dots from red dots? Dr. Prasenjit Dey
  • 3. Support Vector Machine (SVM) • A SVM is a classifier that provides an optimal hyperplane on the feature space using the training data. Optimal hyperplane Dr. Prasenjit Dey
  • 4. Classification margin w x w b r i T   r ρ • Margin is the perpendicular distance between the closest data points and the hyperplane. • The closest point where the margin distance is calculated is called the support vector. • Margin of the hyperplane is the distance between support vectors. • Distance from an example xi to the hyperplane is: ρ Dr. Prasenjit Dey
  • 5. Maximum classification margin • The best optimal hyperplane with maximum margin is called maximum margin hyperplane. • It is an important observation that for maximizing the margin, only the support vectors are matter. The remaining examples are ignorable. r ρ Dr. Prasenjit Dey
  • 6. Mathematical representation of the linear SVM (contd.) • For every support vector xs , the above inequality is an equality. After rescaling w and b by ρ/2 in the equality, we obtain that distance between each xs and the hyperplane is w 2 2   r  • Then the margin can be expressed through (rescaled) w and b as: • Objective is to find w and b such that is maximized and for all (xi, yi), i=1..n : yi(wTxi + b) ≥ 1 w 2   w w x w 1 ) ( y    b r s T s • The objective can be reformulated as: find w and b such that Φ(w) = ||w||2= wT w is minimized and for all (xi, yi), i=1..n : yi (wTxi + b) ≥ 1 Dr. Prasenjit Dey
  • 7. Mathematical representation of the linear SVM • Let the training set is {(xi, yi)}i=1..n, xiRd, yi  {-1, 1} and the hyperplane with margin ρ is the separator for the training set. wTxi + b ≤ - ρ/2 if yi = -1 wTxi + b ≥ ρ/2 if yi = 1 or, yi(wTxi + b) ≥ ρ/2 • Then for each training sample (xi, yi): r ρ Dr. Prasenjit Dey
  • 8. Linear and non-linear data 0 0 0 x2 x x x • For linearly separable data, SVM is perform well even the data contain noise. • However, if the data are non-linear, for SVM it is too hard to draw the separator line. • Solution: Map the data into a the higher dimensional space: Dr. Prasenjit Dey
  • 9. Non-linear data: example 1 Hyperplane in the higher dimension • The original feature space can always be mapped to some higher dimensional feature space where the training set is separable. Dr. Prasenjit Dey
  • 10. Non-linear data: example 2 • For this hyperplane, three red dots are fall into the blue categories (misclassification) • Here, the classification is not perfect • This separator removes the misclassification. However, it is difficult to train model like this • For this, the regularization parameter is required Dr. Prasenjit Dey
  • 11. • The linear classifier relies on inner product between vectors K(xi,xj) = xi Txj • If every datapoint is mapped into high-dimensional space via some transformation Φ: x → φ(x), the inner product becomes: K(xi,xj)= φ(xi) Tφ(xj) • A kernel function is a function that is equivalent to an inner product in some feature space. • Example: 2-dimensional vectors x=[x1 x2]; let K(xi,xj) = (1 + xi Txj)2 , Need to show that K(xi,xj) = φ(xi) Tφ(xj): K(xi,xj)=(1 + xi Txj)2 ,= 1+ xi1 2xj1 2 + 2 xi1xj1 xi2xj2+ xi2 2xj2 2 + 2xi1xj1 + 2xi2xj2= = [1 xi1 2 √2 xi1xi2 xi2 2 √2xi1 √2xi2]T [1 xj1 2 √2 xj1xj2 xj2 2 √2xj1 √2xj2] = = φ(xi) Tφ(xj), where φ(x) = [1 x1 2 √2 x1x2 x2 2 √2x1 √2x2] • Thus, a kernel function implicitly maps data to a high-dimensional space (without the need to compute each φ(x) explicitly). The Kernel Functions Dr. Prasenjit Dey
  • 12. The various kernel functions • Linear: K(xi,xj)= xi Txj Mapping Φ: x → φ(x), where φ(x) is x itself • Polynomial of power p: K(xi,xj)= (1+ xi Txj)p Mapping Φ: x → φ(x), where φ(x) has dimensions • Gaussian (radial-basis function): K(xi,xj) = Mapping Φ: x → φ(x), where φ(x) is infinite-dimensional  every point is mapped to a function (a Gaussian)  combination of functions for support vectors is the separator. 2 2 2 j i e x x            p p d Dr. Prasenjit Dey
  • 13. The main idea of SVM is summarized below • Margin, Regularization, Gamma, Kernel • Define an optimal hyperplane: maximize the margin • Generalize to non-linearly separable problems: use penalty based regularization to deal with the misclassification • Map the data into a higher dimensional space where it is easier to classify with linear decision surface: use the kernel function for transformation of the data from one feature space to another. Tunable parameters of SVM Dr. Prasenjit Dey
  • 14. Regularization • For non-linearly separable problems, slack variables ξi can be added to allow misclassification of difficult or noisy examples. Here, the margin is called soft margin ξi ξi • For soft margin classification, the old formulation of objective is modified: Find w and b such that Φ(w) = wTw + CΣξi is minimized and for all (xi ,yi), i=1..n : yi (wTxi + b) ≥ 1 – ξi, , ξi ≥ 0 • Parameter C can be viewed as a way to control overfitting: it “trades off” the relative importance of maximizing the margin and fitting the training data. Dr. Prasenjit Dey
  • 15. The effect of regularization parameter ‘C’: • For small value of C  large margin (possible of misclassification)  underfitting • For large value of C  small margin  overfitting For small C For large C Dr. Prasenjit Dey
  • 16. Gamma • Gamma parameter involves with the RBF kernel function. It controls the distance of influence of a single training point. • Low values of gamma indicates a large similarity radius which results in more points being grouped together. • For high values of gamma, the points need to be very close to each other in order to be considered in the same group (or class). Low gamma High gamma Dr. Prasenjit Dey
  • 17. The effect of Gamma: Gamma=0.001 Gamma=0.01 Gamma=0.1 Gamma=1 (Chances of overfitting) (Considering as one class) Dr. Prasenjit Dey