Regression
Dr. Marwa M. Emam
Faculty of computers and Information
Agenda
 Linear regression with one variable : model
representation.
 what is linear regression.
 Fitting a line through a set of data points
 Build a linear regression model to predict housing prices in
a real dataset.
 Discussing examples of linear regression in the real world,
such as medical applications and recommender systems.
 Cost function.
 Gradient Descent
Regression
 Regression in machine learning is a supervised learning
technique used to model the relationship between a dependent
variable (target) and one or more independent variables
(features or predictors).
 The goal of regression is to create a predictive model that can
make continuous predictions or estimates, as opposed to
classification, which deals with discrete categories or classes.
Regression
 In a regression problem, you're typically trying to find a function
that best fits the data, allowing you to predict a numeric value
for the dependent variable given the values of the independent
variables.
 This function is often represented as a linear equation, but in
more complex cases, it can be nonlinear as well.
 Regression models are the types of models that predict
numerical data. The output of a regression model is a number
A simple example of a regression
problem:
 Problem: Predicting House Prices.
 Description: Imagine you have a dataset that contains
information about various houses, including features like the
number of bedrooms, square footage, neighborhood, and other
relevant factors.
 Goal: is to build a regression model that can predict the selling
price of a house based on these features.
A simple example of a regression
problem:
 In this example, the dependent variable (target) is the house's selling
price, which is a continuous numeric value.
 The independent variables (features) might include the number of
bedrooms, square footage, location, and other attributes that can
influence the price.
 You would use regression algorithm, such as linear regression, to create a
predictive model. This model would learn the relationships between the
features and the house prices from the training data, and then you can
use it to make price predictions for new, unseen houses.
Linear Regression
 Linear regression is a powerful and widely used method to estimate
values, such as the price of a house, the value of a certain stock,
the life expectancy of an individual, or the amount of time a user
will watch a video or spend on a website.
Linear regression with one variable
 Simple linear regression involves just two variables: one
independent variable (predictor) and one dependent variable
(target).
 The goal is to find a linear relationship between the predictor
and the target. This relationship is expressed as a straight-line
equation:
Y = b0 + b1*X
Linear regression with one variable
 where:
 Y is the dependent variable (target).
 X is the independent variable (predictor).
 b0 is the intercept (the point where the line intersects the Y-axis).
 b1 is the slope (the change in Y for a unit change in X).
 The model's objective is to find the values of b0 and b1 that
minimize the sum of squared differences between the predicted
values and the actual values.
The problem: We need to predict the price
of a house
 Let’s say that we are real estate agents in charge of selling a new
house. We don’t know the price, and we want to infer it by comparing
it with other houses.
 We look at features of the house that could influence the price, such
as size, number of rooms, location, crime rate, school quality, and
distance to commerce. At the end of the day, we want a formula for
all these features that gives us the price of the house, or at least a
good estimate for it.
The solution: Building a regression model
for housing prices
 Let’s go with as simple an example as possible. We look at only
one of the features—the number of rooms.
 Our house has four rooms, and there are six houses nearby, with
one, two, three, five, six, and seven rooms, respectively. Their
prices are shown in table 3.1.
The solution: Building a regression model
for housing prices
 What price would you give to house 4, just based on the information on this table? If you
said $300, then we made the same guess.
 You probably saw a pattern and used it to infer the price of the house.
 What you did in your head was linear regression.
 You may have noticed that each time you add a room, $50 is added to the price of the
house.
 More specifically, we can think of the price of a house as a combination of two things:
a base price of $100, and an extra charge of $50 for each of the rooms.
 This can be summarized in a simple formula:
Price = 100 + 50(Number of rooms)
 What we did here is come up with a model represented by a
formula that gives us a prediction of the price of the house,
based on the feature, which is the number of rooms.
 The price per room is called the weight of that corresponding
feature, and the base price is called the bias of the model.
Important Concepts
 features The features of a data point are those properties that we use to make our
prediction. In this case, the features are the number of rooms in the house, the crime
rate, the age of the house, the size, and so on. For our case, we’ve decided on one
feature: the number of rooms in the house.
 labels This is the target that we try to predict from the features. In this case, the label
is the price of the house.
 model A machine learning model is a rule, or a formula, which predicts a label from the
features. In this case, the model is the equation we found for the price.
Important Concepts
 prediction The prediction is the output of the model. If the model says, “I
think the house with four rooms is going to cost $300,” then the prediction is
300.
 weights In the formula corresponding to the model, each feature is multiplied
by a corresponding factor. These factors are the weights. In the previous
formula, the only feature is the number of rooms, and its corresponding weight
is 50.
 bias As you can see, the formula corresponding to the model has a constant
that is not attached to any of the features. This constant is called the bias. In
this model, the bias is 100, and it corresponds to the base price of a house.
 Now the question is, how did we come up with this formula? Or more
specifically, how do we get the computer to come up with this weight and
bias?
How do we find this line?
Model Representation
125
0
Supervised Learning
“right answers” or “Labeled
data” given
Regression:
Predict continuous valued
output (price)
Independent variable
dependen
t variable
Model Representation
Training set
Learning algorithm
the job of a learning algorithm to
output a function is usually
denoted lowercase h and h
stands for hypothesis
h
x y
the job of a hypothesis function
is taking the value of x and it
tries to output the estimated
value of y. So h is a function that
maps from x's to y's
Linear Equations
Y
θ1= Slop
Change inY
(ΔY)
Change in X (ΔX)
θ0=Y-intercept
X
 slope The slope of a line is a measure of how steep it is. It is calculated by
dividing the rise over the run (i.e., how many units it goes up, divided by how
many units it goes to the right). This ratio is constant over the whole line. In a
machine learning model, this is the weight of the corresponding feature, and it
tells us how much we expect the value of the label to go up, when we increase
the value of the feature by one unit. If the line is horizontal, then the slope is
zero, and if the line goes down, the slope is negative.
 y-intercept The y-intercept of a line is the height at which the line crosses the
vertical (y-) axis. In a machine learning model, it is the bias and tells us what the
label would be in a data point where all the features are precisely zero.
 linear equation This is the equation of a line.
 It is given by two parameters: the slope and the y-intercept.
 If the slope is m and the y-intercept is b, then the equation of
the line is y =mx + b, and the line is formed by all the points
(x,y) that satisfy the equation.
 In a machine learning model, x is the value of the feature and y
is the prediction for the label.
 The weight and bias of the model are m and b, respectively.
Types of RegressionModels
Positive Linear Relationship
Negative Linear Relationship
Relationship NOT Linear
No Relationship
Cost function
 The cost function, (the error rate), to make more iterations ..
 To compute the error rate we use the MSE function (mean square
error).
How to choose θi’s ?
How to select the best values of bias
and weights????
 Don’t choose it randomly and manually.
 We use the Gradient Descant method.
6
0
4
0
2
0
0
0 2
0
4
0
X
60
Y
Scatterplot
• 1. Plot of All (Xi, Yi) Pairs
• 2. Suggests How Well Model Will
Fit
11
ThinkingChallenge
How would you draw a line through the points?
How do you determine which line ‘fits best’?
6
0
4
0
2
0
0
0 2
0
4
0
X
60
Y
ThinkingChallenge
How would you draw a line through the points?
How do you determine which line ‘fits best’?
6
0
4
0
2
0
0
0 2
0
4
0
X
60
Y
Intercept
unchanged
ThinkingChallenge
How would you draw a line through the points?
How do you determine which line ‘fits best’?
6
0
4
0
2
0
0
0 2
0
4
0
6
0
X
Y
Slope
unchanged
Intercept
changed
ThinkingChallenge
How would you draw a line through the points?
How do you determine which line ‘fits best’?
6
0
4
0
2
0
0
0 2
0
4
0
X
60
Y
Slope
changed
Intercept
changed
LeastSquares
• ‘Best Fit’ Means Difference Between Actual Y
Values and Predicted Y Values is a Minimum.
• So square errors!
  
m m
i
i
i1

i1
2
ˆ2
i
Y  h(x )
17
Least Squares Graphically
2
Y
X
1 3
4
^
^
^
^
Y201
X2 ˆ2
i 1 i
0
hθ(x )  θ  θ X
LS minimizes
n
2 2
1 2 3
2 2
4
2
        
i1

Least Squared errors Linear
Regression
Minimiz
e
,
predictions on
the
training set
the actual
values
Minimiz
e
Cost functionvisualization
Consider a simple case of hypothesis by setting θ0=0, then h becomes : hθ(x)=θ1x
Each value of θ1 corresponds to a different hypothesis as it is the slope of the line
which corresponds to different lines passing through the origin as shown in plots below as y-intercept
i.e. θ0 is nulled out.
Simple Hypothesis
At θ1=2,
At θ1=1,
At θ1=0.5, J(0.5)
Cost functionvisualization
Simple Hypothesis
.5)
At θ1=2,
At θ1=1,
At θ1=0.5, J(0
On plotting points like this further, one gets
the following graph for the cost function which
is dependent on parameter θ1.
plot each value of θ1 corresponds to a
different hypothesizes
Cost functionvisualization
What is the optimal value of θ1 that minimizes
J(θ1) ?
It is clear that best value for θ1 =1 as J(θ1 ) = 0,
which is the minimum.
How to find the best value for θ1 ?
Plotting ?? Not practical specially in high
dimensions?
The solution :
1. Analytical solution: not applicable for large
datasets
2. Numerical solution: ex: Gradient descent
.
Gradient Descent
Iterative solution not only in linear regression. It's
actually used all over the place in machine learning.
 Objective: minimize any function ( Cost Function J)


J(
)
Imagine that this is a landscape of grassy park, and you want
to go to the lowest point in the park as rapidly as
possible
Red: means
high blue:
means low
Starting point
local
minimum

J(
)
Red: means
high blue:
means low

With different starting
point
New Starting
point
New local
minimum
Gradient descentAlgorithm
J(θ1
)
θ1
1
1
1 1 j( )
d
   d
θ1= θ1-(+ve)
- slop
θ1= θ1-(-ve)
J(θ1
)
θ1
+ slop
m
i
i
j
j
 Y 
i1
2
0 1
1
h(x )
d 2m

j( , )
d
d
d

m
 i i
j
j i1
2
0 1 (x ) Y
01
1
d 2m
j( , )
d
d
d
xi
m
i
i
i
i
 Y
 Y
m

m

i1
0 1
1
m
i1
0 1
0
1
j 1:
d
1
j 0:
d
h(x )
h(x )
j( ,)
d
j( ,)
d
Example after implement some
iterations using gradient descent
Iteration1
Iteration2
Iteration3
Iteration4
Iteration5
Iteration6
Iteration7
Iteration8
Iteration9
Iteration10
Iteration11
Thanks

ML_Lec3 introduction to regression problems.pdf

  • 1.
    Regression Dr. Marwa M.Emam Faculty of computers and Information
  • 2.
    Agenda  Linear regressionwith one variable : model representation.  what is linear regression.  Fitting a line through a set of data points  Build a linear regression model to predict housing prices in a real dataset.  Discussing examples of linear regression in the real world, such as medical applications and recommender systems.  Cost function.  Gradient Descent
  • 3.
    Regression  Regression inmachine learning is a supervised learning technique used to model the relationship between a dependent variable (target) and one or more independent variables (features or predictors).  The goal of regression is to create a predictive model that can make continuous predictions or estimates, as opposed to classification, which deals with discrete categories or classes.
  • 4.
    Regression  In aregression problem, you're typically trying to find a function that best fits the data, allowing you to predict a numeric value for the dependent variable given the values of the independent variables.  This function is often represented as a linear equation, but in more complex cases, it can be nonlinear as well.  Regression models are the types of models that predict numerical data. The output of a regression model is a number
  • 5.
    A simple exampleof a regression problem:  Problem: Predicting House Prices.  Description: Imagine you have a dataset that contains information about various houses, including features like the number of bedrooms, square footage, neighborhood, and other relevant factors.  Goal: is to build a regression model that can predict the selling price of a house based on these features.
  • 6.
    A simple exampleof a regression problem:  In this example, the dependent variable (target) is the house's selling price, which is a continuous numeric value.  The independent variables (features) might include the number of bedrooms, square footage, location, and other attributes that can influence the price.  You would use regression algorithm, such as linear regression, to create a predictive model. This model would learn the relationships between the features and the house prices from the training data, and then you can use it to make price predictions for new, unseen houses.
  • 7.
    Linear Regression  Linearregression is a powerful and widely used method to estimate values, such as the price of a house, the value of a certain stock, the life expectancy of an individual, or the amount of time a user will watch a video or spend on a website.
  • 8.
    Linear regression withone variable  Simple linear regression involves just two variables: one independent variable (predictor) and one dependent variable (target).  The goal is to find a linear relationship between the predictor and the target. This relationship is expressed as a straight-line equation: Y = b0 + b1*X
  • 9.
    Linear regression withone variable  where:  Y is the dependent variable (target).  X is the independent variable (predictor).  b0 is the intercept (the point where the line intersects the Y-axis).  b1 is the slope (the change in Y for a unit change in X).  The model's objective is to find the values of b0 and b1 that minimize the sum of squared differences between the predicted values and the actual values.
  • 10.
    The problem: Weneed to predict the price of a house  Let’s say that we are real estate agents in charge of selling a new house. We don’t know the price, and we want to infer it by comparing it with other houses.  We look at features of the house that could influence the price, such as size, number of rooms, location, crime rate, school quality, and distance to commerce. At the end of the day, we want a formula for all these features that gives us the price of the house, or at least a good estimate for it.
  • 11.
    The solution: Buildinga regression model for housing prices  Let’s go with as simple an example as possible. We look at only one of the features—the number of rooms.  Our house has four rooms, and there are six houses nearby, with one, two, three, five, six, and seven rooms, respectively. Their prices are shown in table 3.1.
  • 12.
    The solution: Buildinga regression model for housing prices
  • 13.
     What pricewould you give to house 4, just based on the information on this table? If you said $300, then we made the same guess.  You probably saw a pattern and used it to infer the price of the house.  What you did in your head was linear regression.  You may have noticed that each time you add a room, $50 is added to the price of the house.  More specifically, we can think of the price of a house as a combination of two things: a base price of $100, and an extra charge of $50 for each of the rooms.  This can be summarized in a simple formula: Price = 100 + 50(Number of rooms)
  • 14.
     What wedid here is come up with a model represented by a formula that gives us a prediction of the price of the house, based on the feature, which is the number of rooms.  The price per room is called the weight of that corresponding feature, and the base price is called the bias of the model.
  • 15.
    Important Concepts  featuresThe features of a data point are those properties that we use to make our prediction. In this case, the features are the number of rooms in the house, the crime rate, the age of the house, the size, and so on. For our case, we’ve decided on one feature: the number of rooms in the house.  labels This is the target that we try to predict from the features. In this case, the label is the price of the house.  model A machine learning model is a rule, or a formula, which predicts a label from the features. In this case, the model is the equation we found for the price.
  • 16.
    Important Concepts  predictionThe prediction is the output of the model. If the model says, “I think the house with four rooms is going to cost $300,” then the prediction is 300.  weights In the formula corresponding to the model, each feature is multiplied by a corresponding factor. These factors are the weights. In the previous formula, the only feature is the number of rooms, and its corresponding weight is 50.  bias As you can see, the formula corresponding to the model has a constant that is not attached to any of the features. This constant is called the bias. In this model, the bias is 100, and it corresponds to the base price of a house.
  • 17.
     Now thequestion is, how did we come up with this formula? Or more specifically, how do we get the computer to come up with this weight and bias?
  • 19.
    How do wefind this line?
  • 20.
    Model Representation 125 0 Supervised Learning “rightanswers” or “Labeled data” given Regression: Predict continuous valued output (price) Independent variable dependen t variable
  • 21.
    Model Representation Training set Learningalgorithm the job of a learning algorithm to output a function is usually denoted lowercase h and h stands for hypothesis h x y the job of a hypothesis function is taking the value of x and it tries to output the estimated value of y. So h is a function that maps from x's to y's
  • 22.
    Linear Equations Y θ1= Slop ChangeinY (ΔY) Change in X (ΔX) θ0=Y-intercept X
  • 23.
     slope Theslope of a line is a measure of how steep it is. It is calculated by dividing the rise over the run (i.e., how many units it goes up, divided by how many units it goes to the right). This ratio is constant over the whole line. In a machine learning model, this is the weight of the corresponding feature, and it tells us how much we expect the value of the label to go up, when we increase the value of the feature by one unit. If the line is horizontal, then the slope is zero, and if the line goes down, the slope is negative.  y-intercept The y-intercept of a line is the height at which the line crosses the vertical (y-) axis. In a machine learning model, it is the bias and tells us what the label would be in a data point where all the features are precisely zero.
  • 24.
     linear equationThis is the equation of a line.  It is given by two parameters: the slope and the y-intercept.  If the slope is m and the y-intercept is b, then the equation of the line is y =mx + b, and the line is formed by all the points (x,y) that satisfy the equation.  In a machine learning model, x is the value of the feature and y is the prediction for the label.  The weight and bias of the model are m and b, respectively.
  • 25.
    Types of RegressionModels PositiveLinear Relationship Negative Linear Relationship Relationship NOT Linear No Relationship
  • 26.
    Cost function  Thecost function, (the error rate), to make more iterations ..  To compute the error rate we use the MSE function (mean square error). How to choose θi’s ?
  • 27.
    How to selectthe best values of bias and weights????  Don’t choose it randomly and manually.  We use the Gradient Descant method.
  • 28.
    6 0 4 0 2 0 0 0 2 0 4 0 X 60 Y Scatterplot • 1.Plot of All (Xi, Yi) Pairs • 2. Suggests How Well Model Will Fit
  • 29.
    11 ThinkingChallenge How would youdraw a line through the points? How do you determine which line ‘fits best’? 6 0 4 0 2 0 0 0 2 0 4 0 X 60 Y
  • 30.
    ThinkingChallenge How would youdraw a line through the points? How do you determine which line ‘fits best’? 6 0 4 0 2 0 0 0 2 0 4 0 X 60 Y Intercept unchanged
  • 31.
    ThinkingChallenge How would youdraw a line through the points? How do you determine which line ‘fits best’? 6 0 4 0 2 0 0 0 2 0 4 0 6 0 X Y Slope unchanged Intercept changed
  • 32.
    ThinkingChallenge How would youdraw a line through the points? How do you determine which line ‘fits best’? 6 0 4 0 2 0 0 0 2 0 4 0 X 60 Y Slope changed Intercept changed
  • 33.
    LeastSquares • ‘Best Fit’Means Difference Between Actual Y Values and Predicted Y Values is a Minimum. • So square errors!    m m i i i1  i1 2 ˆ2 i Y  h(x )
  • 34.
    17 Least Squares Graphically 2 Y X 13 4 ^ ^ ^ ^ Y201 X2 ˆ2 i 1 i 0 hθ(x )  θ  θ X LS minimizes n 2 2 1 2 3 2 2 4 2          i1 
  • 35.
    Least Squared errorsLinear Regression
  • 36.
  • 37.
    Cost functionvisualization Consider asimple case of hypothesis by setting θ0=0, then h becomes : hθ(x)=θ1x Each value of θ1 corresponds to a different hypothesis as it is the slope of the line which corresponds to different lines passing through the origin as shown in plots below as y-intercept i.e. θ0 is nulled out. Simple Hypothesis At θ1=2, At θ1=1, At θ1=0.5, J(0.5)
  • 38.
    Cost functionvisualization Simple Hypothesis .5) Atθ1=2, At θ1=1, At θ1=0.5, J(0 On plotting points like this further, one gets the following graph for the cost function which is dependent on parameter θ1. plot each value of θ1 corresponds to a different hypothesizes
  • 39.
    Cost functionvisualization What isthe optimal value of θ1 that minimizes J(θ1) ? It is clear that best value for θ1 =1 as J(θ1 ) = 0, which is the minimum. How to find the best value for θ1 ? Plotting ?? Not practical specially in high dimensions? The solution : 1. Analytical solution: not applicable for large datasets 2. Numerical solution: ex: Gradient descent .
  • 41.
  • 42.
    Iterative solution notonly in linear regression. It's actually used all over the place in machine learning.  Objective: minimize any function ( Cost Function J)
  • 44.
      J( ) Imagine that thisis a landscape of grassy park, and you want to go to the lowest point in the park as rapidly as possible Red: means high blue: means low Starting point local minimum
  • 45.
     J( ) Red: means high blue: meanslow  With different starting point New Starting point New local minimum
  • 46.
  • 47.
    J(θ1 ) θ1 1 1 1 1 j() d    d θ1= θ1-(+ve) - slop θ1= θ1-(-ve) J(θ1 ) θ1 + slop
  • 50.
    m i i j j  Y  i1 2 01 1 h(x ) d 2m  j( , ) d d d  m  i i j j i1 2 0 1 (x ) Y 01 1 d 2m j( , ) d d d xi m i i i i  Y  Y m  m  i1 0 1 1 m i1 0 1 0 1 j 1: d 1 j 0: d h(x ) h(x ) j( ,) d j( ,) d
  • 53.
    Example after implementsome iterations using gradient descent
  • 54.
  • 55.
  • 56.
  • 57.
  • 58.
  • 59.
  • 60.
  • 61.
  • 62.
  • 63.
  • 64.
  • 65.