Machine Learning
Unit-II
Mrs. B. Ujwala,
Asst. Professor
Linear regression
• Regression is essentially finding a relationship (or)
association between the dependent variable (Y) and the
independent variable(s) (X), i.e. to find the function ‘f ’ for
the association Y = f (X).
• Linear regression is a statistical model that is used to
predict a continuous dependent variable from one or more
independent variables
• It is called "linear" because the model is based on the idea
that the relationship between the dependent and independent
variables is linear.
• In a linear regression model, the independent variables are
referred to as the predictors and the dependent variable is
the response/target.
Linear regression
• The goal is to find the "best" line that fits the data. The
"best" line is the one that minimizes the sum of the
squared differences between the observed responses in the
dataset and the responses predicted by the line.
• For example, if you were using linear regression to model
the relationship between the temperature outside and the
number of ice cream cones sold at an ice cream shop, you
could use the model to predict how many ice cream cones
you would sell on a hot day given the temperature outside.
Linear regression Cont…
• Linear Regression is Supervised Learning
The most common regression algorithms are
1. Simple linear regression
2. Multiple linear regression
3. Polynomial regression
4. Multivariate adaptive regression splines
5. Logistic regression
6. Maximum likelihood estimation (least squares)
Simple linear regression
• Simple linear regression is the simplest regression model
which involves only one predictor. This model assumes a
linear relationship between the dependent variable and the
predictor variable.
• For example, you might use simple linear regression to
model the relationship between the temperature outside and
the number of ice cream cones sold at an ice cream shop.
• The temperature would be the independent variable and
the number of ice cream cones sold would be the
dependent variable.
Simple linear regression
• The value of intercept indicates the value of Y when X = 0. It is known
as ‘the intercept or Y intercept’ because it specifies where the straight
line crosses the vertical or Y-axis.
• Slope of a straight line represents how much the line in a graph changes
in the vertical direction (Y-axis) over a change in the horizontal
direction (X-axis)
Slope = Change in Y/Change in X
To fit a line to this data, we can use the following equation:
y = ax + b
Where:
 y is the dependent variable (the number of ice cream cones sold)
 x is the independent variable (the temperature)
 a is the slope of the line
 b is the y-intercept (the point at which the line crosses the y-axis)
Simple linear regression Cont…
Simple linear regression Cont…
• Example: If we take Price of a Property as the dependent
variable and the Area of the Property (in sq. m.) as the
predictor variable, we can build a model using simple
linear regression.
PriceProperty = f(AreaProperty )
• Assuming a linear association, we can reformulate the
model as
PriceProperty = a + b. AreaProperty
• where ‘a’and ‘b’are intercept and slope of the straight
line, respectively.
Slope of the simple linear regression model
• Slope of a straight line represents how much the line in a
graph changes in the vertical direction (Y-axis) over a change
in the horizontal direction (X-axis) as shown in Figure 8.2.
Slope = Change in Y/Change in X
• Rise is the change in Y-axis (Y − Y ) and Run is the change in
X-axis (X − X ). So, slope is represented as given below:
Loss functions
• Suppose the model is trained and gives the predicted output
then the loss is the difference between the predicted values
and actual data values.
Type of loss in a linear model
MAE-This is the difference between the predicted and actual
values. It is also called mean absolute error (MAE).
Loss functions
Type of loss in a linear model
MSE- the squared average difference between the predicted
and actual value. It is also known as Mean Squared Error
(MSE). The formula of MSE loss is shown below.
Loss functions
Type of loss in a linear model
RSME Error: It tells the error rate by the square root of the L2
loss i.e. MSE. The formula of RSME is shown below.
Loss functions
Type of loss in a linear model
• R-squared error: It tells the good fit of the model-predicted line
with the actual values of data. The coefficient value range is from
0 to 1 i.e. the value close to 1 is a well-fitted line. The formula is
shown below.
Slope Equation
• Least Square Regression is a method which minimizes
the error in such a way that the sum of all square error is
minimized. Here are the steps you use to calculate the
Least square regression.
• First, the formula for calculating m = slope is
• The lower the error, lesser the overall deviation from the
original point.
Ordinary Least Squares (OLS) algorithm
Step 1: Calculate the mean of X and Y
Step 2: Calculate the errors of X and Y
Step 3: Get the product
Step 4: Get the summation of the products
Step 5: Square the difference of X
Step 6: Get the sum of the squared difference
Step 7: Divide output of step 4 by output of step 6 to
calculate ‘b’
Step 8: Calculate ‘a’ using the value of ‘b’
Exercise Problem
• A college professor believes that if the grade for
internal examination is high in a class, the grade for
external examination will also be high. A random
sample of 15 students in that class was selected, and
the data is given below:
• Solution
Maximum and minimum point of curves
• Maximum and minimum points on a graph are
found at points where the slope of the curve is zero.
• The maximum point is the point on the curve of the
graph with the highest y-coordinate and a slope of
zero.
• The minimum point is the point on the curve of the
graph with the lowest y-coordinate and a slope of
zero.
Maximum point
Minimum point
Multiple Linear Regression
• In a multiple regression model, two or more independent
variables, i.e. predictors are involved.
• Example: A model which can predict the correct value of a real
estate if it has certain standard inputs such as area (sq. m.) of the
property, location, floor, number of years since purchase,
amenities available etc as independent variables.
• We can form a multiple regression equation as shown below:
PriceProperty = f (AreaProperty , location, floor, Ageing, Amenities)
• The following expression describes the equation involving the
relationship with two predictor variables, namely X1 and X2 .
Multiple Linear Regression
• The model describes a plane in the three-dimensional space of
Ŷ, X1 , and X2 . Parameter ‘a’ is the intercept of this plane.
Parameters ‘b1’ and ‘b2’ are referred to as partial regression
coefficients.
• Parameter b1 represents the change in the mean response
corresponding to a unit change in X1 when X2 is held constant.
• Parameter b2 represents the change in the mean response
corresponding to a unit change in X2 when X1 is held constant.
Multiple Linear Regression
• Consider the following example of a multiple linear
regression model with two predictor variables, namely
X1 and X2
Multiple regression plane
Multiple Linear Regression
• Multiple regression for estimating equation when
there are ‘n’predictor variables is as follows:
• While finding the best fit line, we can fit either a
polynomial or curvilinear regression. These are
known as polynomial or curvilinear regression,
respectively.
Use the following steps to fit a multiple
linear regression model
Step 1: Calculate X1
2
, X2
2
, X1y, X2y and X1X2
Step 2: Calculate Regression Sums.
Step 3: Calculate b0, b1, and b2.
Step 4: Place b0, b1, and b2 in the estimated linear
regression equation.
Step 1: Calculate X1
2
, X2
2
, X1y, X2y and X1X2
Step 2: Calculate Regression Sums.
Step 2: Calculate Regression Sums.
Step 3: Calculate b0, b1, and b2
Step 4: Place b0, b1, and b2 in the estimated
linear regression equation.
Assumptions in Regression Analysis
1. Linear relationship between the features and target
2. Little or no multicollinearity between the features
3. Normal Distribution of error terms
4. Little or no autocorrelation among residuals
5. Homoscedasticity of the errors i.e., the variance of
the residuals must be constant across the predicted
values
Improving Accuracy of the Linear Regression Model
• Accuracy refers to how close the estimation is near
the actual value
• Prediction refers to continuous estimation of the
value.
Bias and Variance is similar to accuracy and prediction
• High bias = low accuracy (not close to real value)
• High variance = low prediction (values are scattered)
• Low bias = high accuracy (close to real value)
• Low variance = high prediction (values are close to each
other)
Improving Accuracy of the Linear Regression Model
• A regression model which is highly accurate and
highly predictive, the overall error of the model will
be low, implying a low bias (high accuracy) and low
variance (high prediction) - highly preferable
• Similarly, if the variance increases (low prediction),
the spread of our data points increases, which results
in less accurate prediction. As the bias increases (low
accuracy), the error between our predicted value and
the observed values increases.
• Balancing out bias and accuracy is essential in a
regression model.
Improving Accuracy of the Linear Regression Model
• In the linear regression model, it is assumed that the
number of observations (n) is greater than the number
of parameters (k) to be estimated, i.e. n > k, and in that
case, the least squares estimates tend to have low
variance and hence will perform well on test
observations.
• However, if observations (n) is not much larger than
parameters (k), then there can be high variability in the
least squares fit, resulting in overfitting and leading to
poor predictions.
• If k > n, then linear regression is not usable.
Improving Accuracy of the Linear Regression Model
• Accuracy of linear regression can be improved using
the following three methods:
1. Shrinkage Approach
2. Subset Selection
3. Dimensionality (Variable) Reduction
Shrinkage (Regularization) approach
• This approach involves fitting a model involving all
predictors. However, the estimated coefficients are
shrunken towards zero relative to the least squares
estimates.
• This shrinkage (also known as regularization) has the
effect of reducing the overall variance. Some of the
coefficients may also be estimated to be exactly zero,
thereby indirectly performing variable selection.
• The two best-known techniques for shrinking the
regression coefficients towards zero are
1. ridge regression
2. lasso (Least Absolute Shrinkage Selector Operator)
Shrinkage (Regularization) approach
1. Ridge Regression : It modifies the over-fitted or
under fitted models by adding the penalty
equivalent to the sum of the squares of the
magnitude of coefficients.
Ridge Regression performs regularization by shrinking
the coefficients present.
Ridge Regression
• Ridge regression decreases the complexity of a model but does not
reduce the number of variables since it never leads to a coefficient
been zero rather only minimizes it
• As the regularization parameter increases, the value of the
coefficient tends towards zero. This leads to both low variance (as
some coefficient leads to negligible effect on prediction) and low
bias (minimization of coefficient reduces the dependency of
prediction on a particular variable)
• Ridge regression is not good for feature reduction
Shrinkage (Regularization) approach
2. lasso (Least Absolute Shrinkage Selector Operator): It
modifies the over-fitted or under-fitted models by
adding the penalty equivalent to the sum of the
absolute values of coefficients.
Lasso Regression
• If the regularization parameter is very high in In Lasso Regression,
then it can be used to select important features of a dataset and
shrinks the coefficients of less important features to exactly 0
• If the number of features (p) is greater than the number of
observations (n), Lasso will pick at most n features as non-zero,
even if all features are relevant
• Lasso can be used to select important features of a dataset
• The difference between ridge and lasso regression is that lasso
tends to make coefficients to absolute zero as compared to Ridge
which never sets the value of the coefficient to absolute zero
Shrinkage (Regularization) approach
Subset selection
• Identify a subset of the predictors that is assumed to
be related to the response and then fit a model
using OLS on the selected reduced subset of
variables.
• There are two methods in which subset of the
regression can be selected:
1. Best subset selection (considers all the possible (2k
))
2. Stepwise subset selection
i. Forward stepwise selection (0 to k)
ii. Backward stepwise selection (k to 0)
Dimensionality reduction (Variable reduction)
• In dimensionality reduction, predictors (X) are
transformed, and the model is set up using the
transformed variables after dimensionality reduction.
• The number of variables is reduced using the
dimensionality reduction method.
• Principal component analysis is one of the most
important dimensionality (variable) reduction
techniques.
Thank You

MachineLearning_Unit-II.pptxScrum.pptxAgile Model.pptxAgile Model.pptxAgile Model.pptx

  • 1.
    Machine Learning Unit-II Mrs. B.Ujwala, Asst. Professor
  • 2.
    Linear regression • Regressionis essentially finding a relationship (or) association between the dependent variable (Y) and the independent variable(s) (X), i.e. to find the function ‘f ’ for the association Y = f (X). • Linear regression is a statistical model that is used to predict a continuous dependent variable from one or more independent variables • It is called "linear" because the model is based on the idea that the relationship between the dependent and independent variables is linear. • In a linear regression model, the independent variables are referred to as the predictors and the dependent variable is the response/target.
  • 3.
    Linear regression • Thegoal is to find the "best" line that fits the data. The "best" line is the one that minimizes the sum of the squared differences between the observed responses in the dataset and the responses predicted by the line. • For example, if you were using linear regression to model the relationship between the temperature outside and the number of ice cream cones sold at an ice cream shop, you could use the model to predict how many ice cream cones you would sell on a hot day given the temperature outside.
  • 4.
    Linear regression Cont… •Linear Regression is Supervised Learning The most common regression algorithms are 1. Simple linear regression 2. Multiple linear regression 3. Polynomial regression 4. Multivariate adaptive regression splines 5. Logistic regression 6. Maximum likelihood estimation (least squares)
  • 5.
    Simple linear regression •Simple linear regression is the simplest regression model which involves only one predictor. This model assumes a linear relationship between the dependent variable and the predictor variable. • For example, you might use simple linear regression to model the relationship between the temperature outside and the number of ice cream cones sold at an ice cream shop. • The temperature would be the independent variable and the number of ice cream cones sold would be the dependent variable.
  • 6.
    Simple linear regression •The value of intercept indicates the value of Y when X = 0. It is known as ‘the intercept or Y intercept’ because it specifies where the straight line crosses the vertical or Y-axis. • Slope of a straight line represents how much the line in a graph changes in the vertical direction (Y-axis) over a change in the horizontal direction (X-axis) Slope = Change in Y/Change in X
  • 7.
    To fit aline to this data, we can use the following equation: y = ax + b Where:  y is the dependent variable (the number of ice cream cones sold)  x is the independent variable (the temperature)  a is the slope of the line  b is the y-intercept (the point at which the line crosses the y-axis) Simple linear regression Cont…
  • 8.
    Simple linear regressionCont… • Example: If we take Price of a Property as the dependent variable and the Area of the Property (in sq. m.) as the predictor variable, we can build a model using simple linear regression. PriceProperty = f(AreaProperty ) • Assuming a linear association, we can reformulate the model as PriceProperty = a + b. AreaProperty • where ‘a’and ‘b’are intercept and slope of the straight line, respectively.
  • 9.
    Slope of thesimple linear regression model • Slope of a straight line represents how much the line in a graph changes in the vertical direction (Y-axis) over a change in the horizontal direction (X-axis) as shown in Figure 8.2. Slope = Change in Y/Change in X • Rise is the change in Y-axis (Y − Y ) and Run is the change in X-axis (X − X ). So, slope is represented as given below:
  • 10.
    Loss functions • Supposethe model is trained and gives the predicted output then the loss is the difference between the predicted values and actual data values. Type of loss in a linear model MAE-This is the difference between the predicted and actual values. It is also called mean absolute error (MAE).
  • 11.
    Loss functions Type ofloss in a linear model MSE- the squared average difference between the predicted and actual value. It is also known as Mean Squared Error (MSE). The formula of MSE loss is shown below.
  • 12.
    Loss functions Type ofloss in a linear model RSME Error: It tells the error rate by the square root of the L2 loss i.e. MSE. The formula of RSME is shown below.
  • 13.
    Loss functions Type ofloss in a linear model • R-squared error: It tells the good fit of the model-predicted line with the actual values of data. The coefficient value range is from 0 to 1 i.e. the value close to 1 is a well-fitted line. The formula is shown below.
  • 14.
    Slope Equation • LeastSquare Regression is a method which minimizes the error in such a way that the sum of all square error is minimized. Here are the steps you use to calculate the Least square regression. • First, the formula for calculating m = slope is • The lower the error, lesser the overall deviation from the original point.
  • 15.
    Ordinary Least Squares(OLS) algorithm Step 1: Calculate the mean of X and Y Step 2: Calculate the errors of X and Y Step 3: Get the product Step 4: Get the summation of the products Step 5: Square the difference of X Step 6: Get the sum of the squared difference Step 7: Divide output of step 4 by output of step 6 to calculate ‘b’ Step 8: Calculate ‘a’ using the value of ‘b’
  • 16.
    Exercise Problem • Acollege professor believes that if the grade for internal examination is high in a class, the grade for external examination will also be high. A random sample of 15 students in that class was selected, and the data is given below:
  • 17.
  • 19.
    Maximum and minimumpoint of curves • Maximum and minimum points on a graph are found at points where the slope of the curve is zero. • The maximum point is the point on the curve of the graph with the highest y-coordinate and a slope of zero. • The minimum point is the point on the curve of the graph with the lowest y-coordinate and a slope of zero.
  • 20.
  • 21.
  • 22.
    Multiple Linear Regression •In a multiple regression model, two or more independent variables, i.e. predictors are involved. • Example: A model which can predict the correct value of a real estate if it has certain standard inputs such as area (sq. m.) of the property, location, floor, number of years since purchase, amenities available etc as independent variables. • We can form a multiple regression equation as shown below: PriceProperty = f (AreaProperty , location, floor, Ageing, Amenities) • The following expression describes the equation involving the relationship with two predictor variables, namely X1 and X2 .
  • 23.
    Multiple Linear Regression •The model describes a plane in the three-dimensional space of Ŷ, X1 , and X2 . Parameter ‘a’ is the intercept of this plane. Parameters ‘b1’ and ‘b2’ are referred to as partial regression coefficients. • Parameter b1 represents the change in the mean response corresponding to a unit change in X1 when X2 is held constant. • Parameter b2 represents the change in the mean response corresponding to a unit change in X2 when X1 is held constant.
  • 24.
    Multiple Linear Regression •Consider the following example of a multiple linear regression model with two predictor variables, namely X1 and X2 Multiple regression plane
  • 25.
    Multiple Linear Regression •Multiple regression for estimating equation when there are ‘n’predictor variables is as follows: • While finding the best fit line, we can fit either a polynomial or curvilinear regression. These are known as polynomial or curvilinear regression, respectively.
  • 26.
    Use the followingsteps to fit a multiple linear regression model Step 1: Calculate X1 2 , X2 2 , X1y, X2y and X1X2 Step 2: Calculate Regression Sums. Step 3: Calculate b0, b1, and b2. Step 4: Place b0, b1, and b2 in the estimated linear regression equation.
  • 27.
    Step 1: CalculateX1 2 , X2 2 , X1y, X2y and X1X2
  • 28.
    Step 2: CalculateRegression Sums.
  • 29.
    Step 2: CalculateRegression Sums.
  • 30.
    Step 3: Calculateb0, b1, and b2
  • 31.
    Step 4: Placeb0, b1, and b2 in the estimated linear regression equation.
  • 32.
    Assumptions in RegressionAnalysis 1. Linear relationship between the features and target 2. Little or no multicollinearity between the features 3. Normal Distribution of error terms 4. Little or no autocorrelation among residuals 5. Homoscedasticity of the errors i.e., the variance of the residuals must be constant across the predicted values
  • 33.
    Improving Accuracy ofthe Linear Regression Model • Accuracy refers to how close the estimation is near the actual value • Prediction refers to continuous estimation of the value. Bias and Variance is similar to accuracy and prediction • High bias = low accuracy (not close to real value) • High variance = low prediction (values are scattered) • Low bias = high accuracy (close to real value) • Low variance = high prediction (values are close to each other)
  • 34.
    Improving Accuracy ofthe Linear Regression Model • A regression model which is highly accurate and highly predictive, the overall error of the model will be low, implying a low bias (high accuracy) and low variance (high prediction) - highly preferable • Similarly, if the variance increases (low prediction), the spread of our data points increases, which results in less accurate prediction. As the bias increases (low accuracy), the error between our predicted value and the observed values increases. • Balancing out bias and accuracy is essential in a regression model.
  • 35.
    Improving Accuracy ofthe Linear Regression Model • In the linear regression model, it is assumed that the number of observations (n) is greater than the number of parameters (k) to be estimated, i.e. n > k, and in that case, the least squares estimates tend to have low variance and hence will perform well on test observations. • However, if observations (n) is not much larger than parameters (k), then there can be high variability in the least squares fit, resulting in overfitting and leading to poor predictions. • If k > n, then linear regression is not usable.
  • 36.
    Improving Accuracy ofthe Linear Regression Model • Accuracy of linear regression can be improved using the following three methods: 1. Shrinkage Approach 2. Subset Selection 3. Dimensionality (Variable) Reduction
  • 37.
    Shrinkage (Regularization) approach •This approach involves fitting a model involving all predictors. However, the estimated coefficients are shrunken towards zero relative to the least squares estimates. • This shrinkage (also known as regularization) has the effect of reducing the overall variance. Some of the coefficients may also be estimated to be exactly zero, thereby indirectly performing variable selection. • The two best-known techniques for shrinking the regression coefficients towards zero are 1. ridge regression 2. lasso (Least Absolute Shrinkage Selector Operator)
  • 38.
    Shrinkage (Regularization) approach 1.Ridge Regression : It modifies the over-fitted or under fitted models by adding the penalty equivalent to the sum of the squares of the magnitude of coefficients. Ridge Regression performs regularization by shrinking the coefficients present.
  • 39.
    Ridge Regression • Ridgeregression decreases the complexity of a model but does not reduce the number of variables since it never leads to a coefficient been zero rather only minimizes it • As the regularization parameter increases, the value of the coefficient tends towards zero. This leads to both low variance (as some coefficient leads to negligible effect on prediction) and low bias (minimization of coefficient reduces the dependency of prediction on a particular variable) • Ridge regression is not good for feature reduction
  • 40.
    Shrinkage (Regularization) approach 2.lasso (Least Absolute Shrinkage Selector Operator): It modifies the over-fitted or under-fitted models by adding the penalty equivalent to the sum of the absolute values of coefficients.
  • 41.
    Lasso Regression • Ifthe regularization parameter is very high in In Lasso Regression, then it can be used to select important features of a dataset and shrinks the coefficients of less important features to exactly 0 • If the number of features (p) is greater than the number of observations (n), Lasso will pick at most n features as non-zero, even if all features are relevant • Lasso can be used to select important features of a dataset • The difference between ridge and lasso regression is that lasso tends to make coefficients to absolute zero as compared to Ridge which never sets the value of the coefficient to absolute zero
  • 42.
  • 43.
    Subset selection • Identifya subset of the predictors that is assumed to be related to the response and then fit a model using OLS on the selected reduced subset of variables. • There are two methods in which subset of the regression can be selected: 1. Best subset selection (considers all the possible (2k )) 2. Stepwise subset selection i. Forward stepwise selection (0 to k) ii. Backward stepwise selection (k to 0)
  • 44.
    Dimensionality reduction (Variablereduction) • In dimensionality reduction, predictors (X) are transformed, and the model is set up using the transformed variables after dimensionality reduction. • The number of variables is reduced using the dimensionality reduction method. • Principal component analysis is one of the most important dimensionality (variable) reduction techniques.
  • 45.