Support Vector Machines
(SVM)—(SVR)
Dr. Marwa M. Emam
Faculty of computers and Information
Minia University
Dr. Marwa M. Emam 1
Agenda
 Introduction
 Key Concepts
 Linear SVM
 Non- linear SVM
 Kernel Trick
 SVR
Dr. Marwa M. Emam 2
Introduction
 SVM is a supervised learning algorithm used for classification and
regression tasks.
 It's particularly effective in high-dimensional spaces and is well-suited
for both linear and non-linear data.
 The SVM aims to find the linear boundary that is located as far as
possible from the points in the dataset.
3
Introduction …
 We learned about linear classifiers. With two-dimensional data,
these are defined by a line that separates a dataset consisting of
points with two labels.
 However, we may have noticed that many different lines can separate
a dataset, and this raises the following question: how do we know
which is the best line?
Dr. Marwa M. Emam 4
Introduction …
 In this figure, we can see three different linear classifiers that
separate this dataset. Which one do you prefer, classifier 1, 2, or 3?
Dr. Marwa M. Emam
5
Introduction …
 If you said classifier 2, we agree. All three lines separate the dataset well,
but the second line is better placed.
 The first and third lines are very close to some of the points, whereas the
second line is far from all the points. If we were to wiggle the three lines
around a little bit, the first and the third may go over some of the points,
misclassifying some of them in the process, whereas the second one will
still classify them all correctly.
 Thus, classifier 2 is more robust than classifiers 1 and 3.
Dr. Marwa M. Emam 6
Introduction …
 Main Goal of the SVM to design a hyperplane that classify all training
vectors into two classes.
 An SVM classifier uses two parallel lines instead of one line. It tries to
classify the data correctly and also tries to space the lines as much as
possible.

Dr. Marwa M. Emam
7
SVM
Dr. Marwa M. Emam 8
Dr. Marwa M. Emam 9
Introduction …
 Support Vectors: Support vectors are the data points that are closest to
the hyperplane. These are crucial in defining the optimal hyperplane, as
they contribute to maximizing the margin.
 Margin: The margin is the distance between the hyperplane and the
nearest data point from either class. SVM aims to maximize this margin,
leading to better generalization on unseen data.
Dr. Marwa M. Emam 10
The Objective Function Of SVM
 Support Vector Machines (SVMs) aim to find a hyperplane that separates
data points of different classes while maximizing the margin between
them.
 The objective is typically formulated as a margin maximization problem,
and the error function is associated with minimizing the classification
error or maximizing the margin.
Dr. Marwa M. Emam 11
The Objective Function
 The objective function can be formulated using the concept of margin
and regularization. For linearly separable data, a common
formulation is:
Dr. Marwa M. Emam 12
The Objective Function
 The SVM uses two parallel lines, parallel lines have similar equations;
they have the same weights but a different bias. Thus, in our SVM,
we use the central line as a frame of reference L with equation
 w1x1 + w2x2 + b = 0, and construct two lines, one above it and one
below it, with the respective equations:
Dr. Marwa M. Emam 13
SVM….
Dr. Marwa M. Emam 14
SVM …
Dr. Marwa M. Emam 15
Dr. Marwa M. Emam 16
Dr. Marwa M. Emam 17
The Objective Function: from maximization
to minimization
 The objective function can be formulated using the concept of margin
and regularization. For linearly separable data, a common
formulation is:
Dr. Marwa M. Emam 18
SVM as a minimization problem
 Optimization Problem:
 The goal is to find the optimal values for w and b that minimize the
objective function while satisfying the constraints. This is a quadratic
optimization problem subject to linear constraints.
Dr. Marwa M. Emam
19
Dr. Marwa M. Emam 20
Dr. Marwa M. Emam 21
Dr. Marwa M. Emam 22
Dr. Marwa M. Emam 23
We wish to find the w and b which minimizes, and the α which
maximizes LP(whilst keeping αi
≥ 0 ∀
i
)
. We can do this by differentiating LP with respect to w and
b and setting the derivatives to zero:
Non-Linearity:
 In some cases, the relationship between features and the target
variable may not be linear. The kernel trick enables SVMs to capture
non-linear patterns by projecting data into a higher-dimensional
space. In some cases, the relationship between features and the
target variable may not be linear. The kernel trick enables SVMs to
capture non-linear patterns by projecting data into a higher-
dimensional space.
Dr. Marwa M. Emam 25
27
Non-linear SVMs: Feature spaces
• General idea:the original feature space can
always be mapped to some higher-
dimensional feature space where the
training set is separable:
Φ: x →φ(x)
Kernel Trick
 The kernel trick is a technique used in Support Vector Machines (SVM) to
handle non-linear decision boundaries by implicitly mapping the input
features into a higher-dimensional space without explicitly calculating
the transformation. This allows SVMs to efficiently classify data that is
not linearly separable in the original feature space.
 The kernel trick provides flexibility in choosing different kernel functions
to capture various types of non-linear relationships. Common kernels
include polynomial, radial basis function (RBF), and sigmoid kernels.
Dr. Marwa M. Emam 31
Kernel Function
Dr. Marwa M. Emam 32
Thanks
Dr. Marwa M. Emam 33

Machine Learning-Lec8 support vector machine.pdf

  • 1.
    Support Vector Machines (SVM)—(SVR) Dr.Marwa M. Emam Faculty of computers and Information Minia University Dr. Marwa M. Emam 1
  • 2.
    Agenda  Introduction  KeyConcepts  Linear SVM  Non- linear SVM  Kernel Trick  SVR Dr. Marwa M. Emam 2
  • 3.
    Introduction  SVM isa supervised learning algorithm used for classification and regression tasks.  It's particularly effective in high-dimensional spaces and is well-suited for both linear and non-linear data.  The SVM aims to find the linear boundary that is located as far as possible from the points in the dataset. 3
  • 4.
    Introduction …  Welearned about linear classifiers. With two-dimensional data, these are defined by a line that separates a dataset consisting of points with two labels.  However, we may have noticed that many different lines can separate a dataset, and this raises the following question: how do we know which is the best line? Dr. Marwa M. Emam 4
  • 5.
    Introduction …  Inthis figure, we can see three different linear classifiers that separate this dataset. Which one do you prefer, classifier 1, 2, or 3? Dr. Marwa M. Emam 5
  • 6.
    Introduction …  Ifyou said classifier 2, we agree. All three lines separate the dataset well, but the second line is better placed.  The first and third lines are very close to some of the points, whereas the second line is far from all the points. If we were to wiggle the three lines around a little bit, the first and the third may go over some of the points, misclassifying some of them in the process, whereas the second one will still classify them all correctly.  Thus, classifier 2 is more robust than classifiers 1 and 3. Dr. Marwa M. Emam 6
  • 7.
    Introduction …  MainGoal of the SVM to design a hyperplane that classify all training vectors into two classes.  An SVM classifier uses two parallel lines instead of one line. It tries to classify the data correctly and also tries to space the lines as much as possible.  Dr. Marwa M. Emam 7
  • 8.
  • 9.
  • 10.
    Introduction …  SupportVectors: Support vectors are the data points that are closest to the hyperplane. These are crucial in defining the optimal hyperplane, as they contribute to maximizing the margin.  Margin: The margin is the distance between the hyperplane and the nearest data point from either class. SVM aims to maximize this margin, leading to better generalization on unseen data. Dr. Marwa M. Emam 10
  • 11.
    The Objective FunctionOf SVM  Support Vector Machines (SVMs) aim to find a hyperplane that separates data points of different classes while maximizing the margin between them.  The objective is typically formulated as a margin maximization problem, and the error function is associated with minimizing the classification error or maximizing the margin. Dr. Marwa M. Emam 11
  • 12.
    The Objective Function The objective function can be formulated using the concept of margin and regularization. For linearly separable data, a common formulation is: Dr. Marwa M. Emam 12
  • 13.
    The Objective Function The SVM uses two parallel lines, parallel lines have similar equations; they have the same weights but a different bias. Thus, in our SVM, we use the central line as a frame of reference L with equation  w1x1 + w2x2 + b = 0, and construct two lines, one above it and one below it, with the respective equations: Dr. Marwa M. Emam 13
  • 14.
  • 15.
  • 16.
    Dr. Marwa M.Emam 16
  • 17.
    Dr. Marwa M.Emam 17
  • 18.
    The Objective Function:from maximization to minimization  The objective function can be formulated using the concept of margin and regularization. For linearly separable data, a common formulation is: Dr. Marwa M. Emam 18
  • 19.
    SVM as aminimization problem  Optimization Problem:  The goal is to find the optimal values for w and b that minimize the objective function while satisfying the constraints. This is a quadratic optimization problem subject to linear constraints. Dr. Marwa M. Emam 19
  • 20.
    Dr. Marwa M.Emam 20
  • 21.
    Dr. Marwa M.Emam 21
  • 22.
    Dr. Marwa M.Emam 22
  • 23.
    Dr. Marwa M.Emam 23
  • 24.
    We wish tofind the w and b which minimizes, and the α which maximizes LP(whilst keeping αi ≥ 0 ∀ i ) . We can do this by differentiating LP with respect to w and b and setting the derivatives to zero:
  • 25.
    Non-Linearity:  In somecases, the relationship between features and the target variable may not be linear. The kernel trick enables SVMs to capture non-linear patterns by projecting data into a higher-dimensional space. In some cases, the relationship between features and the target variable may not be linear. The kernel trick enables SVMs to capture non-linear patterns by projecting data into a higher- dimensional space. Dr. Marwa M. Emam 25
  • 30.
    27 Non-linear SVMs: Featurespaces • General idea:the original feature space can always be mapped to some higher- dimensional feature space where the training set is separable: Φ: x →φ(x)
  • 31.
    Kernel Trick  Thekernel trick is a technique used in Support Vector Machines (SVM) to handle non-linear decision boundaries by implicitly mapping the input features into a higher-dimensional space without explicitly calculating the transformation. This allows SVMs to efficiently classify data that is not linearly separable in the original feature space.  The kernel trick provides flexibility in choosing different kernel functions to capture various types of non-linear relationships. Common kernels include polynomial, radial basis function (RBF), and sigmoid kernels. Dr. Marwa M. Emam 31
  • 32.
  • 33.