MachineLearning_Unit-II.pptxScrum.pptxAgile Model.pptxAgile Model.pptxAgile Model.pptx

Machine Learning
Unit-II
Mrs. B. Ujwala,
Asst. Professor

Linear regression
• Regression is essentially finding a relationship (or)
association between the dependent variable (Y) and the
independent variable(s) (X), i.e. to find the function ‘f ’ for
the association Y = f (X).
• Linear regression is a statistical model that is used to
predict a continuous dependent variable from one or more
independent variables
• It is called "linear" because the model is based on the idea
that the relationship between the dependent and independent
variables is linear.
• In a linear regression model, the independent variables are
referred to as the predictors and the dependent variable is
the response/target.

Linear regression
• The goal is to find the "best" line that fits the data. The
"best" line is the one that minimizes the sum of the
squared differences between the observed responses in the
dataset and the responses predicted by the line.
• For example, if you were using linear regression to model
the relationship between the temperature outside and the
number of ice cream cones sold at an ice cream shop, you
could use the model to predict how many ice cream cones
you would sell on a hot day given the temperature outside.

Linear regression Cont…
• Linear Regression is Supervised Learning
The most common regression algorithms are
1. Simple linear regression
2. Multiple linear regression
3. Polynomial regression
4. Multivariate adaptive regression splines
5. Logistic regression
6. Maximum likelihood estimation (least squares)

Simple linear regression
• Simple linear regression is the simplest regression model
which involves only one predictor. This model assumes a
linear relationship between the dependent variable and the
predictor variable.
• For example, you might use simple linear regression to
model the relationship between the temperature outside and
the number of ice cream cones sold at an ice cream shop.
• The temperature would be the independent variable and
the number of ice cream cones sold would be the
dependent variable.

Simple linear regression
• The value of intercept indicates the value of Y when X = 0. It is known
as ‘the intercept or Y intercept’ because it specifies where the straight
line crosses the vertical or Y-axis.
• Slope of a straight line represents how much the line in a graph changes
in the vertical direction (Y-axis) over a change in the horizontal
direction (X-axis)
Slope = Change in Y/Change in X

To fit a line to this data, we can use the following equation:
y = ax + b
Where:
 y is the dependent variable (the number of ice cream cones sold)
 x is the independent variable (the temperature)
 a is the slope of the line
 b is the y-intercept (the point at which the line crosses the y-axis)
Simple linear regression Cont…

Simple linear regression Cont…
• Example: If we take Price of a Property as the dependent
variable and the Area of the Property (in sq. m.) as the
predictor variable, we can build a model using simple
linear regression.
PriceProperty = f(AreaProperty )
• Assuming a linear association, we can reformulate the
model as
PriceProperty = a + b. AreaProperty
• where ‘a’and ‘b’are intercept and slope of the straight
line, respectively.

Slope of the simple linear regression model
• Slope of a straight line represents how much the line in a
graph changes in the vertical direction (Y-axis) over a change
in the horizontal direction (X-axis) as shown in Figure 8.2.
Slope = Change in Y/Change in X
• Rise is the change in Y-axis (Y − Y ) and Run is the change in
X-axis (X − X ). So, slope is represented as given below:

Loss functions
• Suppose the model is trained and gives the predicted output
then the loss is the difference between the predicted values
and actual data values.
Type of loss in a linear model
MAE-This is the difference between the predicted and actual
values. It is also called mean absolute error (MAE).

Loss functions
MSE- the squared average difference between the predicted
and actual value. It is also known as Mean Squared Error
(MSE). The formula of MSE loss is shown below.

Loss functions
RSME Error: It tells the error rate by the square root of the L2
loss i.e. MSE. The formula of RSME is shown below.

Loss functions
• R-squared error: It tells the good fit of the model-predicted line
with the actual values of data. The coefficient value range is from
0 to 1 i.e. the value close to 1 is a well-fitted line. The formula is
shown below.

Slope Equation
• Least Square Regression is a method which minimizes
the error in such a way that the sum of all square error is
minimized. Here are the steps you use to calculate the
Least square regression.
• First, the formula for calculating m = slope is
• The lower the error, lesser the overall deviation from the
original point.

Ordinary Least Squares (OLS) algorithm
Step 1: Calculate the mean of X and Y
Step 2: Calculate the errors of X and Y
Step 3: Get the product
Step 4: Get the summation of the products
Step 5: Square the difference of X
Step 6: Get the sum of the squared difference
Step 7: Divide output of step 4 by output of step 6 to
calculate ‘b’
Step 8: Calculate ‘a’ using the value of ‘b’

Exercise Problem
• A college professor believes that if the grade for
internal examination is high in a class, the grade for
external examination will also be high. A random
sample of 15 students in that class was selected, and
the data is given below:

Maximum and minimum point of curves
• Maximum and minimum points on a graph are
found at points where the slope of the curve is zero.
• The maximum point is the point on the curve of the
graph with the highest y-coordinate and a slope of
zero.
• The minimum point is the point on the curve of the
graph with the lowest y-coordinate and a slope of
zero.

Multiple Linear Regression
• In a multiple regression model, two or more independent
variables, i.e. predictors are involved.
• Example: A model which can predict the correct value of a real
estate if it has certain standard inputs such as area (sq. m.) of the
property, location, floor, number of years since purchase,
amenities available etc as independent variables.
• We can form a multiple regression equation as shown below:
PriceProperty = f (AreaProperty , location, floor, Ageing, Amenities)
• The following expression describes the equation involving the
relationship with two predictor variables, namely X1 and X2 .

• The model describes a plane in the three-dimensional space of
Ŷ, X1 , and X2 . Parameter ‘a’ is the intercept of this plane.
Parameters ‘b1’ and ‘b2’ are referred to as partial regression
coefficients.
• Parameter b1 represents the change in the mean response
corresponding to a unit change in X1 when X2 is held constant.
• Parameter b2 represents the change in the mean response
corresponding to a unit change in X2 when X1 is held constant.

• Consider the following example of a multiple linear
regression model with two predictor variables, namely
X1 and X2
Multiple regression plane

• Multiple regression for estimating equation when
there are ‘n’predictor variables is as follows:
• While finding the best fit line, we can fit either a
polynomial or curvilinear regression. These are
known as polynomial or curvilinear regression,
respectively.

Use the following steps to fit a multiple
linear regression model
Step 1: Calculate X1
2
, X2
2
, X1y, X2y and X1X2
Step 2: Calculate Regression Sums.
Step 3: Calculate b0, b1, and b2.
Step 4: Place b0, b1, and b2 in the estimated linear
regression equation.

Step 1: Calculate X1
2
, X2
2
, X1y, X2y and X1X2

Step 2: Calculate Regression Sums.

Step 3: Calculate b0, b1, and b2

Step 4: Place b0, b1, and b2 in the estimated
linear regression equation.

Assumptions in Regression Analysis
1. Linear relationship between the features and target
2. Little or no multicollinearity between the features
3. Normal Distribution of error terms
4. Little or no autocorrelation among residuals
5. Homoscedasticity of the errors i.e., the variance of
the residuals must be constant across the predicted
values

Improving Accuracy of the Linear Regression Model
• Accuracy refers to how close the estimation is near
the actual value
• Prediction refers to continuous estimation of the
value.
Bias and Variance is similar to accuracy and prediction
• High bias = low accuracy (not close to real value)
• High variance = low prediction (values are scattered)
• Low bias = high accuracy (close to real value)
• Low variance = high prediction (values are close to each
other)

• A regression model which is highly accurate and
highly predictive, the overall error of the model will
be low, implying a low bias (high accuracy) and low
variance (high prediction) - highly preferable
• Similarly, if the variance increases (low prediction),
the spread of our data points increases, which results
in less accurate prediction. As the bias increases (low
accuracy), the error between our predicted value and
the observed values increases.
• Balancing out bias and accuracy is essential in a
regression model.

• In the linear regression model, it is assumed that the
number of observations (n) is greater than the number
of parameters (k) to be estimated, i.e. n > k, and in that
case, the least squares estimates tend to have low
variance and hence will perform well on test
observations.
• However, if observations (n) is not much larger than
parameters (k), then there can be high variability in the
least squares fit, resulting in overfitting and leading to
poor predictions.
• If k > n, then linear regression is not usable.

• Accuracy of linear regression can be improved using
the following three methods:
1. Shrinkage Approach
2. Subset Selection
3. Dimensionality (Variable) Reduction

Shrinkage (Regularization) approach
• This approach involves fitting a model involving all
predictors. However, the estimated coefficients are
shrunken towards zero relative to the least squares
estimates.
• This shrinkage (also known as regularization) has the
effect of reducing the overall variance. Some of the
coefficients may also be estimated to be exactly zero,
thereby indirectly performing variable selection.
• The two best-known techniques for shrinking the
regression coefficients towards zero are
1. ridge regression
2. lasso (Least Absolute Shrinkage Selector Operator)

1. Ridge Regression : It modifies the over-fitted or
under fitted models by adding the penalty
equivalent to the sum of the squares of the
magnitude of coefficients.
Ridge Regression performs regularization by shrinking
the coefficients present.

Ridge Regression
• Ridge regression decreases the complexity of a model but does not
reduce the number of variables since it never leads to a coefficient
been zero rather only minimizes it
• As the regularization parameter increases, the value of the
coefficient tends towards zero. This leads to both low variance (as
some coefficient leads to negligible effect on prediction) and low
bias (minimization of coefficient reduces the dependency of
prediction on a particular variable)
• Ridge regression is not good for feature reduction

2. lasso (Least Absolute Shrinkage Selector Operator): It
modifies the over-fitted or under-fitted models by
adding the penalty equivalent to the sum of the
absolute values of coefficients.

Lasso Regression
• If the regularization parameter is very high in In Lasso Regression,
then it can be used to select important features of a dataset and
shrinks the coefficients of less important features to exactly 0
• If the number of features (p) is greater than the number of
observations (n), Lasso will pick at most n features as non-zero,
even if all features are relevant
• Lasso can be used to select important features of a dataset
• The difference between ridge and lasso regression is that lasso
tends to make coefficients to absolute zero as compared to Ridge
which never sets the value of the coefficient to absolute zero

Subset selection
• Identify a subset of the predictors that is assumed to
be related to the response and then fit a model
using OLS on the selected reduced subset of
variables.
• There are two methods in which subset of the
regression can be selected:
1. Best subset selection (considers all the possible (2k
))
2. Stepwise subset selection
i. Forward stepwise selection (0 to k)
ii. Backward stepwise selection (k to 0)

Dimensionality reduction (Variable reduction)
• In dimensionality reduction, predictors (X) are
transformed, and the model is set up using the
transformed variables after dimensionality reduction.
• The number of variables is reduced using the
dimensionality reduction method.
• Principal component analysis is one of the most
important dimensionality (variable) reduction
techniques.

MachineLearning_Unit-II.pptxScrum.pptxAgile Model.pptxAgile Model.pptxAgile Model.pptx

More Related Content

Similar to MachineLearning_Unit-II.pptxScrum.pptxAgile Model.pptxAgile Model.pptxAgile Model.pptx

More from 22eg105n11

Recently uploaded

MachineLearning_Unit-II.pptxScrum.pptxAgile Model.pptxAgile Model.pptxAgile Model.pptx