ML_Lec3 introduction to regression problems.pdf

Regression
Dr. Marwa M. Emam
Faculty of computers and Information

Agenda
 Linear regression with one variable : model
representation.
 what is linear regression.
 Fitting a line through a set of data points
 Build a linear regression model to predict housing prices in
a real dataset.
 Discussing examples of linear regression in the real world,
such as medical applications and recommender systems.
 Cost function.
 Gradient Descent

Regression
 Regression in machine learning is a supervised learning
technique used to model the relationship between a dependent
variable (target) and one or more independent variables
(features or predictors).
 The goal of regression is to create a predictive model that can
make continuous predictions or estimates, as opposed to
classification, which deals with discrete categories or classes.

Regression
 In a regression problem, you're typically trying to find a function
that best fits the data, allowing you to predict a numeric value
for the dependent variable given the values of the independent
variables.
 This function is often represented as a linear equation, but in
more complex cases, it can be nonlinear as well.
 Regression models are the types of models that predict
numerical data. The output of a regression model is a number

A simple example of a regression
problem:
 Problem: Predicting House Prices.
 Description: Imagine you have a dataset that contains
information about various houses, including features like the
number of bedrooms, square footage, neighborhood, and other
relevant factors.
 Goal: is to build a regression model that can predict the selling
price of a house based on these features.

A simple example of a regression
problem:
 In this example, the dependent variable (target) is the house's selling
price, which is a continuous numeric value.
 The independent variables (features) might include the number of
bedrooms, square footage, location, and other attributes that can
influence the price.
 You would use regression algorithm, such as linear regression, to create a
predictive model. This model would learn the relationships between the
features and the house prices from the training data, and then you can
use it to make price predictions for new, unseen houses.

Linear Regression
 Linear regression is a powerful and widely used method to estimate
values, such as the price of a house, the value of a certain stock,
the life expectancy of an individual, or the amount of time a user
will watch a video or spend on a website.

Linear regression with one variable
 Simple linear regression involves just two variables: one
independent variable (predictor) and one dependent variable
(target).
 The goal is to find a linear relationship between the predictor
and the target. This relationship is expressed as a straight-line
equation:
Y = b0 + b1*X

Linear regression with one variable
 where:
 Y is the dependent variable (target).
 X is the independent variable (predictor).
 b0 is the intercept (the point where the line intersects the Y-axis).
 b1 is the slope (the change in Y for a unit change in X).
 The model's objective is to find the values of b0 and b1 that
minimize the sum of squared differences between the predicted
values and the actual values.

The problem: We need to predict the price
of a house
 Let’s say that we are real estate agents in charge of selling a new
house. We don’t know the price, and we want to infer it by comparing
it with other houses.
 We look at features of the house that could influence the price, such
as size, number of rooms, location, crime rate, school quality, and
distance to commerce. At the end of the day, we want a formula for
all these features that gives us the price of the house, or at least a
good estimate for it.

The solution: Building a regression model
for housing prices
 Let’s go with as simple an example as possible. We look at only
one of the features—the number of rooms.
 Our house has four rooms, and there are six houses nearby, with
one, two, three, five, six, and seven rooms, respectively. Their
prices are shown in table 3.1.

The solution: Building a regression model
for housing prices

 What price would you give to house 4, just based on the information on this table? If you
said $300, then we made the same guess.
 You probably saw a pattern and used it to infer the price of the house.
 What you did in your head was linear regression.
 You may have noticed that each time you add a room, $50 is added to the price of the
house.
 More specifically, we can think of the price of a house as a combination of two things:
a base price of $100, and an extra charge of $50 for each of the rooms.
 This can be summarized in a simple formula:
Price = 100 + 50(Number of rooms)

 What we did here is come up with a model represented by a
formula that gives us a prediction of the price of the house,
based on the feature, which is the number of rooms.
 The price per room is called the weight of that corresponding
feature, and the base price is called the bias of the model.

Important Concepts
 features The features of a data point are those properties that we use to make our
prediction. In this case, the features are the number of rooms in the house, the crime
rate, the age of the house, the size, and so on. For our case, we’ve decided on one
feature: the number of rooms in the house.
 labels This is the target that we try to predict from the features. In this case, the label
is the price of the house.
 model A machine learning model is a rule, or a formula, which predicts a label from the
features. In this case, the model is the equation we found for the price.

Important Concepts
 prediction The prediction is the output of the model. If the model says, “I
think the house with four rooms is going to cost $300,” then the prediction is
300.
 weights In the formula corresponding to the model, each feature is multiplied
by a corresponding factor. These factors are the weights. In the previous
formula, the only feature is the number of rooms, and its corresponding weight
is 50.
 bias As you can see, the formula corresponding to the model has a constant
that is not attached to any of the features. This constant is called the bias. In
this model, the bias is 100, and it corresponds to the base price of a house.

 Now the question is, how did we come up with this formula? Or more
specifically, how do we get the computer to come up with this weight and
bias?

Model Representation
125
0
Supervised Learning
“right answers” or “Labeled
data” given
Regression:
Predict continuous valued
output (price)
Independent variable
dependen
t variable

Model Representation
Training set
Learning algorithm
the job of a learning algorithm to
output a function is usually
denoted lowercase h and h
stands for hypothesis
h
x y
the job of a hypothesis function
is taking the value of x and it
tries to output the estimated
value of y. So h is a function that
maps from x's to y's

Linear Equations
Y
θ1= Slop
Change inY
(ΔY)
Change in X (ΔX)
θ0=Y-intercept
X

 slope The slope of a line is a measure of how steep it is. It is calculated by
dividing the rise over the run (i.e., how many units it goes up, divided by how
many units it goes to the right). This ratio is constant over the whole line. In a
machine learning model, this is the weight of the corresponding feature, and it
tells us how much we expect the value of the label to go up, when we increase
the value of the feature by one unit. If the line is horizontal, then the slope is
zero, and if the line goes down, the slope is negative.
 y-intercept The y-intercept of a line is the height at which the line crosses the
vertical (y-) axis. In a machine learning model, it is the bias and tells us what the
label would be in a data point where all the features are precisely zero.

 linear equation This is the equation of a line.
 It is given by two parameters: the slope and the y-intercept.
 If the slope is m and the y-intercept is b, then the equation of
the line is y =mx + b, and the line is formed by all the points
(x,y) that satisfy the equation.
 In a machine learning model, x is the value of the feature and y
is the prediction for the label.
 The weight and bias of the model are m and b, respectively.

Types of RegressionModels
Positive Linear Relationship
Negative Linear Relationship
Relationship NOT Linear
No Relationship

Cost function
 The cost function, (the error rate), to make more iterations ..
 To compute the error rate we use the MSE function (mean square
error).
How to choose θi’s ?

How to select the best values of bias
and weights????
 Don’t choose it randomly and manually.
 We use the Gradient Descant method.

6
0
4
0
2
0
0
0 2
0
4
0
X
60
Y
Scatterplot
• 1. Plot of All (Xi, Yi) Pairs
• 2. Suggests How Well Model Will
Fit

11
ThinkingChallenge
How would you draw a line through the points?
How do you determine which line ‘fits best’?
6
0
4
0
2
0
0
0 2
0
4
0
X
60
Y

ThinkingChallenge
6
0
4
0
2
0
0
0 2
0
4
0
X
60
Y
Intercept
unchanged

ThinkingChallenge
6
0
4
0
2
0
0
0 2
0
4
0
6
0
X
Y
Slope
unchanged
Intercept
changed

ThinkingChallenge
6
0
4
0
2
0
0
0 2
0
4
0
X
60
Y
Slope
changed
Intercept
changed

LeastSquares
• ‘Best Fit’ Means Difference Between Actual Y
Values and Predicted Y Values is a Minimum.
• So square errors!
  
m m
i
i
i1

i1
2
ˆ2
i
Y  h(x )

17
Least Squares Graphically
2
Y
X
1 3
4
^
^
^
^
Y201
X2 ˆ2
i 1 i
0
hθ(x )  θ  θ X
LS minimizes
n
2 2
1 2 3
2 2
4
2
        
i1


Least Squared errors Linear
Regression

Minimiz
e
,
predictions on
the
training set
the actual
values
Minimiz
e

Cost functionvisualization
Consider a simple case of hypothesis by setting θ0=0, then h becomes : hθ(x)=θ1x
Each value of θ1 corresponds to a different hypothesis as it is the slope of the line
which corresponds to different lines passing through the origin as shown in plots below as y-intercept
i.e. θ0 is nulled out.
Simple Hypothesis
At θ1=2,
At θ1=1,
At θ1=0.5, J(0.5)

Simple Hypothesis
.5)
At θ1=2,
At θ1=1,
At θ1=0.5, J(0
On plotting points like this further, one gets
the following graph for the cost function which
is dependent on parameter θ1.
plot each value of θ1 corresponds to a
different hypothesizes

What is the optimal value of θ1 that minimizes
J(θ1) ?
It is clear that best value for θ1 =1 as J(θ1 ) = 0,
which is the minimum.
How to find the best value for θ1 ?
Plotting ?? Not practical specially in high
dimensions?
The solution :
1. Analytical solution: not applicable for large
datasets
2. Numerical solution: ex: Gradient descent
.

Iterative solution not only in linear regression. It's
actually used all over the place in machine learning.
 Objective: minimize any function ( Cost Function J)



J(
)
Imagine that this is a landscape of grassy park, and you want
to go to the lowest point in the park as rapidly as
possible
Red: means
high blue:
means low
Starting point
local
minimum


J(
)
Red: means
high blue:
means low

With different starting
point
New Starting
point
New local
minimum

J(θ1
)
θ1
1
1
1 1 j( )
d
   d
θ1= θ1-(+ve)
- slop
θ1= θ1-(-ve)
J(θ1
)
θ1
+ slop

m
i
i
j
j
 Y 
i1
2
0 1
1
h(x )
d 2m

j( , )
d
d
d

m
 i i
j
j i1
2
0 1 (x ) Y
01
1
d 2m
j( , )
d
d
d
xi
m
i
i
i
i
 Y
 Y
m

m

i1
0 1
1
m
i1
0 1
0
1
j 1:
d
1
j 0:
d
h(x )
h(x )
j( ,)
d
j( ,)
d

Example after implement some
iterations using gradient descent

ML_Lec3 introduction to regression problems.pdf

More Related Content

Similar to ML_Lec3 introduction to regression problems.pdf

More from BeshoyArnest

Recently uploaded

ML_Lec3 introduction to regression problems.pdf