Simple linear regression
It is statistical method that allows us to summarize and study relationships between two continuous (quantitative) variables:
• One variable, denoted x, is regarded as the predictor, explanatory, or independent variable.
• The other variable, denoted y, is regarded as the response, outcome, or dependent variable.
• We will examine the relationship between quantitative variables x and y via a mathematical equation.
• The model has a deterministic and a statistical components
House
Cost
Most lots sell
for $25,000
Building a house costs about
$75 per square foot.
House cost = 25000 + 75(Size)
House cost = 25000 + 75(Size)
House size
House
Cost
Most lots sell
for $25,000
+ e
Since cost behave unpredictably,
we add a random component.
House size
Simple linear regression
• The simplest deterministic mathematical relationship between two variables x and y is a linear
relationship: y = β0 + β1x. (True regression line)
• The objective is to develop an equivalent linear probabilistic model.
• If the two (random) variables are probabilistically related, then for a fixed value of x, there is
uncertainty in the value of the second variable.
• So, we assume y = β0 + β1x + ε, where ε is a random variable.
Simple linear regression
• The points (x1, y1), …, (xn, yn) resulting from n independent observations will then be scattered
about the true regression line:





 x
y 1
0
Simple linear regression
Estimating Model parameters:
• The values of β0, β1 and ε will almost never be known to an investigator.
• Instead, sample data consists of n observed pairs (x1, y1), … , (xn, yn), from which
the model parameters and the true regression line itself can be estimated.
• Where Yi = β0 +β1xi + εi for i = 1, 2, … , n and the n deviations ε1, ε2,…, εn are
independent r.v.’ s.
• Aim is to find the Best Fit Line: the sum of the squared vertical distances
(deviations) from the observed points to that line is as small as it can be.
Simple linear regression
The sum of squared vertical deviations from the points (x1, y1), … , (xn, yn), to the
line is then
  
 
1
0 1
2
2 2
( 1)
xy
xx
i i
xy i i
i
xx i x
SS
b
SS
b y b x
x y
SS x y
n
x
SS x n s
n

 
 
   
 



The point estimates of β0 and β1, denoted by b1
and b0, are called the least squares estimates –
they are those values that minimize using
partial derivatives.
The predicted values are obtained using:
0 1
ŷ b b x
 
Simple linear regression
Simple linear regression
Simple linear regression
Simple linear regression
Simple linear regression
Linear regression, while a powerful tool, has certain limitations that should be considered:
• Linearity: Assumes a linear relationship between the dependent and independent variables. If the relationship is non-
linear, the model may not accurately capture the underlying pattern.
• Independence: Assumes that the errors are independent of each other. If there is autocorrelation in the errors, the
model's estimates may be biased and inefficient.
• Homoscedasticity: Assumes that the variance of the errors is constant across all levels of the independent variable. If
the variance is not constant (heteroscedasticity), the model's estimates may be inefficient.
• Normality: Assumes that the errors are normally distributed. If the errors are not normally distributed, the model's
inferences may be invalid.
• Sensitivity to Outliers: Linear regression can be sensitive to outliers, which can have a significant impact on the
model's estimates. Outliers can distort the relationship between the variables and lead to biased results.
• Limited Flexibility: Linear regression can only model linear relationships. If the relationship between the variables is
complex or non-linear, linear regression may not be able to adequately capture the pattern.

Unit4- Lecture1.pptx simple linear regression

  • 1.
    Simple linear regression Itis statistical method that allows us to summarize and study relationships between two continuous (quantitative) variables: • One variable, denoted x, is regarded as the predictor, explanatory, or independent variable. • The other variable, denoted y, is regarded as the response, outcome, or dependent variable. • We will examine the relationship between quantitative variables x and y via a mathematical equation. • The model has a deterministic and a statistical components House Cost Most lots sell for $25,000 Building a house costs about $75 per square foot. House cost = 25000 + 75(Size) House cost = 25000 + 75(Size) House size House Cost Most lots sell for $25,000 + e Since cost behave unpredictably, we add a random component. House size
  • 2.
    Simple linear regression •The simplest deterministic mathematical relationship between two variables x and y is a linear relationship: y = β0 + β1x. (True regression line) • The objective is to develop an equivalent linear probabilistic model. • If the two (random) variables are probabilistically related, then for a fixed value of x, there is uncertainty in the value of the second variable. • So, we assume y = β0 + β1x + ε, where ε is a random variable.
  • 3.
    Simple linear regression •The points (x1, y1), …, (xn, yn) resulting from n independent observations will then be scattered about the true regression line:       x y 1 0
  • 4.
    Simple linear regression EstimatingModel parameters: • The values of β0, β1 and ε will almost never be known to an investigator. • Instead, sample data consists of n observed pairs (x1, y1), … , (xn, yn), from which the model parameters and the true regression line itself can be estimated. • Where Yi = β0 +β1xi + εi for i = 1, 2, … , n and the n deviations ε1, ε2,…, εn are independent r.v.’ s. • Aim is to find the Best Fit Line: the sum of the squared vertical distances (deviations) from the observed points to that line is as small as it can be.
  • 5.
    Simple linear regression Thesum of squared vertical deviations from the points (x1, y1), … , (xn, yn), to the line is then      1 0 1 2 2 2 ( 1) xy xx i i xy i i i xx i x SS b SS b y b x x y SS x y n x SS x n s n               The point estimates of β0 and β1, denoted by b1 and b0, are called the least squares estimates – they are those values that minimize using partial derivatives. The predicted values are obtained using: 0 1 ŷ b b x  
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
    Simple linear regression Linearregression, while a powerful tool, has certain limitations that should be considered: • Linearity: Assumes a linear relationship between the dependent and independent variables. If the relationship is non- linear, the model may not accurately capture the underlying pattern. • Independence: Assumes that the errors are independent of each other. If there is autocorrelation in the errors, the model's estimates may be biased and inefficient. • Homoscedasticity: Assumes that the variance of the errors is constant across all levels of the independent variable. If the variance is not constant (heteroscedasticity), the model's estimates may be inefficient. • Normality: Assumes that the errors are normally distributed. If the errors are not normally distributed, the model's inferences may be invalid. • Sensitivity to Outliers: Linear regression can be sensitive to outliers, which can have a significant impact on the model's estimates. Outliers can distort the relationship between the variables and lead to biased results. • Limited Flexibility: Linear regression can only model linear relationships. If the relationship between the variables is complex or non-linear, linear regression may not be able to adequately capture the pattern.