SlideShare a Scribd company logo
Experimental Techniques in
Thermofluids
MEL7070
Regression Analysis
By: Dr. Shrutidhara Sarma
Regression Analysis
• Also known as curve fitting.
• Suitable plot of data to indicate the nature of the relation between the
independent and the dependent variables.
• If prediction is within given data range-interpolation, otherwise extrapolation.
• Can be linear, semi log, log-log, nonlinear.
• Tells us about the error incurred while representing the data using that relation.
The linear graph can be of the form
(i) y=ax+b –linear fit- straight line on a linear graph
(ii) y= axb - power law fit-straight line on a log-log graph
(iii) y= aebx – exponential fit-straight line on a semi log graph
The non-linear relationship follows a polynomial relationship of the form
y= ax3+bx2+cx+d.
The parameters a, b, c, d are known as the fit parameters and need to be determined
as a part of the regression analysis.
Linear between x and y
y=ax+b –linear fit- straight line on a linear graph.
Linear between logx and logy
y= axb –linear fit- power law fit-straight line on a log-log graph.
Non-linear between x and y
y= ax3+bx2+cx+d –non-linear fit- polynomial relationship.
Least Square method
Let’s consider there is a linear relation between x and y i.e. their trend is represented
by a straight line.
The straight line does not pass through any of the data points.
If we consider the straight line as a local mean then the deviations are distributed
w.r.t the local mean as a normal distribution. The least square principle can be
applied as:
Minimize
where yf
= ax + b is the desired linear fit to data.
( )
2 2
2 1 1
n n
i f i i
i i
y y y ax b
s
n n
= =
é ù
- - +
é ù
ë û
ë û
= =
å å
Least square method contd.
S2 gives the variance w.r.t. the mean. Hence, minimization requires:
These equations may be rearranged as two simultaneous equations for a and b as
given below (known as normal equations):
Let’s define:
This quantity is known as the covariance i.e. influence in variability of xi on yi and
vice versa.
( ) ( )
2 2
1 1
1 1
2 0; 2 0
n n
i i i i i
i i
s s
y ax b x y ax b
a n b n
= =
¶ ¶
= - - + = = - - + =
é ù é ù
ë û ë û
¶ ¶
å å
( ) ( )
( )
2
i i i i
i i
x a x b x y
x a nb y
+ =
+ =
å å å
å å
2 2
2 2 2 2
, , , ;
i i i i i i
x y xy
x y x y x y
x y x y xy
n n n n n
s s s
= = = - = - = -
å å å å å
Least square method contd.
With these definitions the slope and intercept of the line fit a may be written as
The regression line line passes through the point
Example:
The data is expected to follow a linear relation y=ax+b. Find the slope and intercept.
Find the correlation coefficient.
2
2
2
.
i i
xy
i
x
x y
x y
n
a
x
x
n
b y ax
s
s
-
= =
-
= -
å
å
( , )
x y
yi xi
1.2 1.0
2.0 1.6
2.4 3.4
3.5 4.0
3.5 5.2
Quickly verify
Least square method contd.
Solution:
Sum Sum
Use equations:
Answers: a= 0.54 ; b= 0.879
Hence, y= 0.54x+0.879
yi xi
1.2 1.0
2.0 1.6
2.4 3.4
3.5 4.0
3.5 5.2
12.6 15.2
xiyi xi
2
1.2 1.0
3.2 2.56
8.16 11.56
14.0 16.0
18.2 27.04
44.76 58.16
( ) ( )
( )
2
i i i i
i i
x a x b x y
x a nb y
+ =
+ =
å å å
å å
Standard error
Let’s say the computed value of y is
And suppose there are “n” number of data. Then considering a linear fit we have 2
parameters “a” and “b”.
Hence, DOF = n-p = n-2 [Two parameters a and b are calculated using the same data.]
Hence, standard error is given by:
f
y ax b
= +
[ ]
1/2
2
1
1/2
2
i
1
2
(ax b)
2
n
i f
i
n
i
i
y y
e
n
y
e
n
=
=
ì ü
é ù
-
ï ï
ë û
ï ï
= í ý
-
ï ï
ï ï
î þ
ì ü
- +
ï ï
ï ï
= í ý
-
ï ï
ï ï
î þ
å
å
Goodness of fit
• A measure of how good the regression line as a representation of the data is.
• It is possible to fit two lines to data by treating:
(a) "x" as the independent variable and "y" as the dependent variable or
(b) "y" as the independent variable and "x" as the dependent variable i.e. x=a'y+b'.
Then
The second fit line may be written as:
The slope of this line is which is not the same as “a”.
• If the two slopes are the same the two regression lines coincide.
• The ratio of the slopes of the two lines is a measure of how good the form of the
fit is to the data
2
;
xy
y
a b x a y
s
s
¢ ¢ ¢
= = -
1 b
y x
a a
¢
= -
¢ ¢
1
a¢
Correlation coefficient
The correlation coefficient ρ is defined as:
• The sign of the correlation coefficient is determined by the sign of the covariance.
• The correlation is perfect if ρ = ±1 .
• The correlation is poor if ρ ≈ 0 .
• Absolute value of the correlation coefficient should be greater than 0.5 to indicate
that y and x are related.
2
2
2 2
1
aa
2
xy
x y
xy
x y
slopeof st regressionline
slopeof nd regressionline
or
s
r
s s
s
r
s s
¢
= = =
= ±
Polynomial regression
Sometimes the data may show a non-linear behavior that may be modeled by a
polynomial relation. Ex: y
f
= ax2 + bx + c .
The variance of the data with respect to the fit is again minimized with respect to the
three fit parameters a, b, c to get three normal equations.
Least square principle requires:
( )
2
2
2 1
n
i i i
i
y ax bx c
s
n
=
é ù
- + +
ë û
=
å
( )
( )
( )
2
2 2
1
2
2
1
2
2
1
2
0;
2
0;
2
0;
n
i i i i
i
n
i i i i
i
n
i i i
i
s
y ax bx c x
a n
s
y ax bx c x
b n
s
y ax bx c
c n
=
=
=
¶
é ù
= - + + =
ë û
¶
¶
é ù
= - + + =
ë û
¶
¶
é ù
= - + + =
ë û
¶
å
å
å
Polynomial regression
The earlier equations give:
These are solved for the fit parameters.
( ) ( ) ( )
( ) ( ) ( )
( ) ( )
3 2
4 2
2
3
2
i i i i i
i i i i i
i i i
x a x b x c x y
x a x b x c x y
x a x b nc y
+ + =
+ + =
+ + =
å å å å
å å å å
å å å
Goodness of fit and the index of correlation
In the case of a non-linear fit we define a quantity
known as the index of correlation to determine the
goodness of the fit.
[ ]
2
2
2
2
1 1
f
y
y y
s
y y
r
s
é ù
-
ë û
= ± - = ± -
-
å
å
• If the index of correlation is close to ±1, the fit to be considered good.
• The index of correlation is identical to the correlation coefficient for a linear fit.
• The index of correlation compares the scatter of the data with respect to its own
mean as compared to the scatter of the data with respect to the regression curve
General index of correlation
Let’s suppose a function z is defined as :
Standard error:
LS principle:
Index of correlation:
Basically means variance w.r.t to mean compared to variance w.r.t to local mean (fit).
2
2
1
y
s
or R
r
s
= ± -
( )
( , )
z f x y
z ax by c
=
= + +
( )
( )
( )
1
1
1
2 ( ) 0;
2 ( ) 0;
2 0;
n
i i i i
i
n
i i i i
i
n
i i i
i
S
z ax by c x
a
S
z ax by c y
b
S
z ax by c
c
=
=
=
¶
= - + + - =
é ù
ë û
¶
¶
= - + + - =
é ù
ë û
¶
¶
= - + + =
é ù
ë û
¶
å
å
å
( )
2
1
n
i i i
i
S z ax by c
=
= - + +
é ù
ë û
å
Parity plot
• The data and the fit may be compared by making a parity plot.
• The parity plot is a plot of given data (z) along the abscissa and the fit (zf) along
the ordinate.
• The parity line is a line of equality between the two.
• The departure of the data from the parity line is an indication of the quality of the
fit. When the data is a function of more than one independent variable it is not
always possible to make plots between independent and dependent variables. In
such a case the parity plot is a way out.
General non-linear fit:
What if the fit equation is a non-linear relation that is neither a polynomial nor can be
reducible to the linear form?
Example:
Here, parameter estimation requires the use of a search method to determine the best
parameter set that minimizes the sum of the squares of the residual. i.e. to find (a,
b,….p) such that S is minimized for yf = f (x : a, b, c….p ) – [general non linear
function with p parameters].
Where sum of the squares of the residual given by
Hence, choose the parameters such that
In general it is not possible to set the partial derivatives with respect to the parameters
to zero to obtain the normal equations and thus obtain the fit parameters.
2
b( ln )
(1) (2)
bx x c x d
y ae cx d or y ae + +
= + + =
2
1
(min)
N
i f
i
S y y
=
é ù
= -
ë û
å
.... 0
S S S S
a b c p
¶ ¶ ¶ ¶
= = = =
¶ ¶ ¶ ¶
General non-linear fit:
Let’s consider a 3 parameter system with a, b, c as the parameter.
Now assume certain values which gives some value that may not be
minimum.
Then evaluate
If each of these is zero, then it’s a minimum.
Now S being a function of the parameters,
We can write and minimum is achieved when
Magnitude of gradient
Unit vectors in the direction of components:
(0) (0) (0)
, ,c
a b (0)
S
(0) (0) (0) (0) (0) (0) (0) (0) (0)
, , , , , ,
a b c a b c a b c
S S S
a b c
¶ ¶ ¶
= =
¶ ¶ ¶
( , , )
S f a b c
=
ˆ ˆ
ˆ
. .b .c
S S S
S a
a b c
¶ ¶ ¶
Ñ = + +
¶ ¶ ¶
0
S
Ñ =
2 2 2
S S S
S
a b c
¶ ¶ ¶
æ ö æ ö æ ö
Ñ = + +
ç ÷ ç ÷ ç ÷
¶ ¶ ¶
è ø è ø è ø
, ,
S S S
a b c
S S S
¶ ¶ ¶
¶ ¶ ¶
Ñ Ñ Ñ
General non-linear fit:
To minimize we now move in a direction opposite to the gradient hence reducing the
parameter values by
Thus Note: will be same for all
Hence,
This is repeated until S reaches a value which is minimum.
Since it moves along the steepest path, hence known as Steepest Descent method.
NOTE: Initially you may chose larger values for but once it moves close to
minimum then you must reduce its magnitude.
d
(1) (0) (0)
(1) (0) (0)
(1) (0) (0)
( )
( )
c ( )
a a component along a
b b component along b
c component along c
d
d
d
= -
= -
= -
(0) (0) (0) (0) (0) (0) (0) (0) (0)
(0) (0) (0) (0) (0) (0) (0) (0) (0)
,b ,c ,b ,c ,b ,c
,b ,c ,b ,c ,b ,c
(1) (0) (1) (0) (1) (0)
; ; c
a a a
a a a
S S S
a b c
a a b b c
S S S
d d d
¶ ¶ ¶
¶ ¶ ¶
= - = - = -
Ñ Ñ Ñ
d
e
d
Example : Steepest Descent
Q. Determine the general fit parameters by general non-linear regression if
the data follows the form
X 0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8
y 1.196 1.379 1.581 1.79 2.013 2.279 2.545 2.842 3.173 3.5
bx
f
y ae cx
= +
Solution
Sum of squares of residuals:
Hence,
Assume we get
10 2
1
( )
i
bx
i i
i
S y ae cx
=
é ù
= - +
ë û
å
10
1
10
1
10
1
2 ( ) ( );
2 ( ) ( );
2 ( ) ( )
i i
i i
i
bx bx
i i
i
bx bx
i i i
i
bx
i i i
i
S
y ae cx e
a
S
y ae cx ax e
b
S
y ae cx x
c
=
=
=
¶
é ù
= - + -
ë û
¶
¶
é ù
= - + -
ë û
¶
¶
é ù
= - + -
ë û
¶
å
å
å
11.674;
24.023;
30.682;
23.003
S
S
a
S
b
S
c
=
¶
= -
¶
¶
= -
¶
¶
= -
¶
(0)
(0)
(0)
1;
0.2;
c 0.1
a
b
=
=
=
Magnitude of gradient vector:
Hence unit vector in the direction of components:
Hence,
2 2 2
45.251
S S S
S
a b c
¶ ¶ ¶
æ ö æ ö æ ö
Ñ = + + =
ç ÷ ç ÷ ç ÷
¶ ¶ ¶
è ø è ø è ø
(0) (0) (0)
(0) (0) (0)
,b ,c
,b ,c 24.023
0.531
45.249
a
a
S
a
S
¶
-
¶
= = -
Ñ
30.681
0.678;
45.249
23.003
0.508
45.249
S
b
S
S
c
S
¶
-
¶ = = -
Ñ
¶
-
¶ = = -
Ñ
(1) (0)
(1) (0)
(1) (0)
ˆ
(a) 1 (0.02 0.531) 1.011
ˆ
(b) 0.02 (0.02 0.678) 0.214
ˆ
c ( ) 0.1 (0.02 0.508) 0.11
a a
b b
c c
d
d
d
= - = - ´- =
= - = - ´- =
= - = - ´- =
Now for these values of the new value of
S=10.948
This is repeated until the value of S comes below 0.01 or less (which is specified
already).
For this example calculate the final values of a, b and c with a MATLAB program.
#Assignment
(1) (1) (1)
1.011; 0.214;c 0.11
a b
= = =

More Related Content

PPT
Corr-and-Regress.ppt
MoinPasha12
 
PPT
Corr-and-Regress.ppt
HarunorRashid74
 
PPT
Corr-and-Regress.ppt
krunal soni
 
PPT
Corr-and-Regress (1).ppt
MuhammadAftab89
 
PPT
Corr-and-Regress.ppt
BAGARAGAZAROMUALD2
 
PPT
Cr-and-Regress.ppt
RidaIrfan10
 
PPT
Correlation & Regression for Statistics Social Science
ssuser71ac73
 
PPT
Corr And Regress
rishi.indian
 
Corr-and-Regress.ppt
MoinPasha12
 
Corr-and-Regress.ppt
HarunorRashid74
 
Corr-and-Regress.ppt
krunal soni
 
Corr-and-Regress (1).ppt
MuhammadAftab89
 
Corr-and-Regress.ppt
BAGARAGAZAROMUALD2
 
Cr-and-Regress.ppt
RidaIrfan10
 
Correlation & Regression for Statistics Social Science
ssuser71ac73
 
Corr And Regress
rishi.indian
 

Similar to Regression Analysis.pdf (20)

PDF
Curve Fitting in Numerical Methods Regression
Nada Hikmah
 
PPT
Correlation by Neeraj Bhandari ( Surkhet.Nepal )
Neeraj Bhandari
 
PDF
Nonparametric approach to multiple regression
Alexander Decker
 
PPT
Chapter05
rwmiller
 
PDF
슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)
마이캠퍼스
 
PDF
Determinants
Joey Fontanilla Valdriz
 
PDF
need help with stats 301 assignment help
realnerdovo
 
PDF
ML Module 3.pdf
Shiwani Gupta
 
PDF
4. Linear Equations in Two Variables 2.pdf
silki0908
 
PDF
me310_5_regression.pdf numerical method for engineering
kedirabdisa61
 
PPTX
Regression.pptx
Tigabu Yaya
 
PPTX
Regression.pptx
tayyaba19799
 
PPTX
Regression
Long Beach City College
 
PPTX
Es272 ch5a
Batuhan Yıldırım
 
DOCX
5 regression
Anas Farooq Maniya AFM
 
PPTX
Regression refers to the statistical technique of modeling
AddisalemMenberu
 
PDF
Estimation rs
meharahutsham
 
PDF
Curved fitting by the method of least squar- fitting of straight line.
arijitmandal4578
 
PPT
Least square method
Somya Bagai
 
Curve Fitting in Numerical Methods Regression
Nada Hikmah
 
Correlation by Neeraj Bhandari ( Surkhet.Nepal )
Neeraj Bhandari
 
Nonparametric approach to multiple regression
Alexander Decker
 
Chapter05
rwmiller
 
슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)
마이캠퍼스
 
need help with stats 301 assignment help
realnerdovo
 
ML Module 3.pdf
Shiwani Gupta
 
4. Linear Equations in Two Variables 2.pdf
silki0908
 
me310_5_regression.pdf numerical method for engineering
kedirabdisa61
 
Regression.pptx
Tigabu Yaya
 
Regression.pptx
tayyaba19799
 
Regression refers to the statistical technique of modeling
AddisalemMenberu
 
Estimation rs
meharahutsham
 
Curved fitting by the method of least squar- fitting of straight line.
arijitmandal4578
 
Least square method
Somya Bagai
 
Ad

Recently uploaded (20)

PPTX
FUNDAMENTALS OF ELECTRIC VEHICLES UNIT-1
MikkiliSuresh
 
PDF
EVS+PRESENTATIONS EVS+PRESENTATIONS like
saiyedaqib429
 
PDF
Unit I Part II.pdf : Security Fundamentals
Dr. Madhuri Jawale
 
PDF
Construction of a Thermal Vacuum Chamber for Environment Test of Triple CubeS...
2208441
 
PDF
CAD-CAM U-1 Combined Notes_57761226_2025_04_22_14_40.pdf
shailendrapratap2002
 
PDF
The Effect of Artifact Removal from EEG Signals on the Detection of Epileptic...
Partho Prosad
 
PDF
Machine Learning All topics Covers In This Single Slides
AmritTiwari19
 
PPTX
MT Chapter 1.pptx- Magnetic particle testing
ABCAnyBodyCanRelax
 
PPTX
22PCOAM21 Session 1 Data Management.pptx
Guru Nanak Technical Institutions
 
PDF
2010_Book_EnvironmentalBioengineering (1).pdf
EmilianoRodriguezTll
 
PPTX
Victory Precisions_Supplier Profile.pptx
victoryprecisions199
 
PDF
20ME702-Mechatronics-UNIT-1,UNIT-2,UNIT-3,UNIT-4,UNIT-5, 2025-2026
Mohanumar S
 
PDF
STUDY OF NOVEL CHANNEL MATERIALS USING III-V COMPOUNDS WITH VARIOUS GATE DIEL...
ijoejnl
 
PDF
Packaging Tips for Stainless Steel Tubes and Pipes
heavymetalsandtubes
 
PDF
67243-Cooling and Heating & Calculation.pdf
DHAKA POLYTECHNIC
 
PPTX
business incubation centre aaaaaaaaaaaaaa
hodeeesite4
 
PDF
Natural_Language_processing_Unit_I_notes.pdf
sanguleumeshit
 
PDF
Cryptography and Information :Security Fundamentals
Dr. Madhuri Jawale
 
PPTX
Inventory management chapter in automation and robotics.
atisht0104
 
PPTX
Information Retrieval and Extraction - Module 7
premSankar19
 
FUNDAMENTALS OF ELECTRIC VEHICLES UNIT-1
MikkiliSuresh
 
EVS+PRESENTATIONS EVS+PRESENTATIONS like
saiyedaqib429
 
Unit I Part II.pdf : Security Fundamentals
Dr. Madhuri Jawale
 
Construction of a Thermal Vacuum Chamber for Environment Test of Triple CubeS...
2208441
 
CAD-CAM U-1 Combined Notes_57761226_2025_04_22_14_40.pdf
shailendrapratap2002
 
The Effect of Artifact Removal from EEG Signals on the Detection of Epileptic...
Partho Prosad
 
Machine Learning All topics Covers In This Single Slides
AmritTiwari19
 
MT Chapter 1.pptx- Magnetic particle testing
ABCAnyBodyCanRelax
 
22PCOAM21 Session 1 Data Management.pptx
Guru Nanak Technical Institutions
 
2010_Book_EnvironmentalBioengineering (1).pdf
EmilianoRodriguezTll
 
Victory Precisions_Supplier Profile.pptx
victoryprecisions199
 
20ME702-Mechatronics-UNIT-1,UNIT-2,UNIT-3,UNIT-4,UNIT-5, 2025-2026
Mohanumar S
 
STUDY OF NOVEL CHANNEL MATERIALS USING III-V COMPOUNDS WITH VARIOUS GATE DIEL...
ijoejnl
 
Packaging Tips for Stainless Steel Tubes and Pipes
heavymetalsandtubes
 
67243-Cooling and Heating & Calculation.pdf
DHAKA POLYTECHNIC
 
business incubation centre aaaaaaaaaaaaaa
hodeeesite4
 
Natural_Language_processing_Unit_I_notes.pdf
sanguleumeshit
 
Cryptography and Information :Security Fundamentals
Dr. Madhuri Jawale
 
Inventory management chapter in automation and robotics.
atisht0104
 
Information Retrieval and Extraction - Module 7
premSankar19
 
Ad

Regression Analysis.pdf

  • 2. Regression Analysis • Also known as curve fitting. • Suitable plot of data to indicate the nature of the relation between the independent and the dependent variables. • If prediction is within given data range-interpolation, otherwise extrapolation. • Can be linear, semi log, log-log, nonlinear. • Tells us about the error incurred while representing the data using that relation. The linear graph can be of the form (i) y=ax+b –linear fit- straight line on a linear graph (ii) y= axb - power law fit-straight line on a log-log graph (iii) y= aebx – exponential fit-straight line on a semi log graph The non-linear relationship follows a polynomial relationship of the form y= ax3+bx2+cx+d. The parameters a, b, c, d are known as the fit parameters and need to be determined as a part of the regression analysis.
  • 3. Linear between x and y y=ax+b –linear fit- straight line on a linear graph.
  • 4. Linear between logx and logy y= axb –linear fit- power law fit-straight line on a log-log graph.
  • 5. Non-linear between x and y y= ax3+bx2+cx+d –non-linear fit- polynomial relationship.
  • 6. Least Square method Let’s consider there is a linear relation between x and y i.e. their trend is represented by a straight line. The straight line does not pass through any of the data points. If we consider the straight line as a local mean then the deviations are distributed w.r.t the local mean as a normal distribution. The least square principle can be applied as: Minimize where yf = ax + b is the desired linear fit to data. ( ) 2 2 2 1 1 n n i f i i i i y y y ax b s n n = = é ù - - + é ù ë û ë û = = å å
  • 7. Least square method contd. S2 gives the variance w.r.t. the mean. Hence, minimization requires: These equations may be rearranged as two simultaneous equations for a and b as given below (known as normal equations): Let’s define: This quantity is known as the covariance i.e. influence in variability of xi on yi and vice versa. ( ) ( ) 2 2 1 1 1 1 2 0; 2 0 n n i i i i i i i s s y ax b x y ax b a n b n = = ¶ ¶ = - - + = = - - + = é ù é ù ë û ë û ¶ ¶ å å ( ) ( ) ( ) 2 i i i i i i x a x b x y x a nb y + = + = å å å å å 2 2 2 2 2 2 , , , ; i i i i i i x y xy x y x y x y x y x y xy n n n n n s s s = = = - = - = - å å å å å
  • 8. Least square method contd. With these definitions the slope and intercept of the line fit a may be written as The regression line line passes through the point Example: The data is expected to follow a linear relation y=ax+b. Find the slope and intercept. Find the correlation coefficient. 2 2 2 . i i xy i x x y x y n a x x n b y ax s s - = = - = - å å ( , ) x y yi xi 1.2 1.0 2.0 1.6 2.4 3.4 3.5 4.0 3.5 5.2 Quickly verify
  • 9. Least square method contd. Solution: Sum Sum Use equations: Answers: a= 0.54 ; b= 0.879 Hence, y= 0.54x+0.879 yi xi 1.2 1.0 2.0 1.6 2.4 3.4 3.5 4.0 3.5 5.2 12.6 15.2 xiyi xi 2 1.2 1.0 3.2 2.56 8.16 11.56 14.0 16.0 18.2 27.04 44.76 58.16 ( ) ( ) ( ) 2 i i i i i i x a x b x y x a nb y + = + = å å å å å
  • 10. Standard error Let’s say the computed value of y is And suppose there are “n” number of data. Then considering a linear fit we have 2 parameters “a” and “b”. Hence, DOF = n-p = n-2 [Two parameters a and b are calculated using the same data.] Hence, standard error is given by: f y ax b = + [ ] 1/2 2 1 1/2 2 i 1 2 (ax b) 2 n i f i n i i y y e n y e n = = ì ü é ù - ï ï ë û ï ï = í ý - ï ï ï ï î þ ì ü - + ï ï ï ï = í ý - ï ï ï ï î þ å å
  • 11. Goodness of fit • A measure of how good the regression line as a representation of the data is. • It is possible to fit two lines to data by treating: (a) "x" as the independent variable and "y" as the dependent variable or (b) "y" as the independent variable and "x" as the dependent variable i.e. x=a'y+b'. Then The second fit line may be written as: The slope of this line is which is not the same as “a”. • If the two slopes are the same the two regression lines coincide. • The ratio of the slopes of the two lines is a measure of how good the form of the fit is to the data 2 ; xy y a b x a y s s ¢ ¢ ¢ = = - 1 b y x a a ¢ = - ¢ ¢ 1 a¢
  • 12. Correlation coefficient The correlation coefficient ρ is defined as: • The sign of the correlation coefficient is determined by the sign of the covariance. • The correlation is perfect if ρ = ±1 . • The correlation is poor if ρ ≈ 0 . • Absolute value of the correlation coefficient should be greater than 0.5 to indicate that y and x are related. 2 2 2 2 1 aa 2 xy x y xy x y slopeof st regressionline slopeof nd regressionline or s r s s s r s s ¢ = = = = ±
  • 13. Polynomial regression Sometimes the data may show a non-linear behavior that may be modeled by a polynomial relation. Ex: y f = ax2 + bx + c . The variance of the data with respect to the fit is again minimized with respect to the three fit parameters a, b, c to get three normal equations. Least square principle requires: ( ) 2 2 2 1 n i i i i y ax bx c s n = é ù - + + ë û = å ( ) ( ) ( ) 2 2 2 1 2 2 1 2 2 1 2 0; 2 0; 2 0; n i i i i i n i i i i i n i i i i s y ax bx c x a n s y ax bx c x b n s y ax bx c c n = = = ¶ é ù = - + + = ë û ¶ ¶ é ù = - + + = ë û ¶ ¶ é ù = - + + = ë û ¶ å å å
  • 14. Polynomial regression The earlier equations give: These are solved for the fit parameters. ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 3 2 4 2 2 3 2 i i i i i i i i i i i i i x a x b x c x y x a x b x c x y x a x b nc y + + = + + = + + = å å å å å å å å å å å
  • 15. Goodness of fit and the index of correlation In the case of a non-linear fit we define a quantity known as the index of correlation to determine the goodness of the fit. [ ] 2 2 2 2 1 1 f y y y s y y r s é ù - ë û = ± - = ± - - å å • If the index of correlation is close to ±1, the fit to be considered good. • The index of correlation is identical to the correlation coefficient for a linear fit. • The index of correlation compares the scatter of the data with respect to its own mean as compared to the scatter of the data with respect to the regression curve
  • 16. General index of correlation Let’s suppose a function z is defined as : Standard error: LS principle: Index of correlation: Basically means variance w.r.t to mean compared to variance w.r.t to local mean (fit). 2 2 1 y s or R r s = ± - ( ) ( , ) z f x y z ax by c = = + + ( ) ( ) ( ) 1 1 1 2 ( ) 0; 2 ( ) 0; 2 0; n i i i i i n i i i i i n i i i i S z ax by c x a S z ax by c y b S z ax by c c = = = ¶ = - + + - = é ù ë û ¶ ¶ = - + + - = é ù ë û ¶ ¶ = - + + = é ù ë û ¶ å å å ( ) 2 1 n i i i i S z ax by c = = - + + é ù ë û å
  • 17. Parity plot • The data and the fit may be compared by making a parity plot. • The parity plot is a plot of given data (z) along the abscissa and the fit (zf) along the ordinate. • The parity line is a line of equality between the two. • The departure of the data from the parity line is an indication of the quality of the fit. When the data is a function of more than one independent variable it is not always possible to make plots between independent and dependent variables. In such a case the parity plot is a way out.
  • 18. General non-linear fit: What if the fit equation is a non-linear relation that is neither a polynomial nor can be reducible to the linear form? Example: Here, parameter estimation requires the use of a search method to determine the best parameter set that minimizes the sum of the squares of the residual. i.e. to find (a, b,….p) such that S is minimized for yf = f (x : a, b, c….p ) – [general non linear function with p parameters]. Where sum of the squares of the residual given by Hence, choose the parameters such that In general it is not possible to set the partial derivatives with respect to the parameters to zero to obtain the normal equations and thus obtain the fit parameters. 2 b( ln ) (1) (2) bx x c x d y ae cx d or y ae + + = + + = 2 1 (min) N i f i S y y = é ù = - ë û å .... 0 S S S S a b c p ¶ ¶ ¶ ¶ = = = = ¶ ¶ ¶ ¶
  • 19. General non-linear fit: Let’s consider a 3 parameter system with a, b, c as the parameter. Now assume certain values which gives some value that may not be minimum. Then evaluate If each of these is zero, then it’s a minimum. Now S being a function of the parameters, We can write and minimum is achieved when Magnitude of gradient Unit vectors in the direction of components: (0) (0) (0) , ,c a b (0) S (0) (0) (0) (0) (0) (0) (0) (0) (0) , , , , , , a b c a b c a b c S S S a b c ¶ ¶ ¶ = = ¶ ¶ ¶ ( , , ) S f a b c = ˆ ˆ ˆ . .b .c S S S S a a b c ¶ ¶ ¶ Ñ = + + ¶ ¶ ¶ 0 S Ñ = 2 2 2 S S S S a b c ¶ ¶ ¶ æ ö æ ö æ ö Ñ = + + ç ÷ ç ÷ ç ÷ ¶ ¶ ¶ è ø è ø è ø , , S S S a b c S S S ¶ ¶ ¶ ¶ ¶ ¶ Ñ Ñ Ñ
  • 20. General non-linear fit: To minimize we now move in a direction opposite to the gradient hence reducing the parameter values by Thus Note: will be same for all Hence, This is repeated until S reaches a value which is minimum. Since it moves along the steepest path, hence known as Steepest Descent method. NOTE: Initially you may chose larger values for but once it moves close to minimum then you must reduce its magnitude. d (1) (0) (0) (1) (0) (0) (1) (0) (0) ( ) ( ) c ( ) a a component along a b b component along b c component along c d d d = - = - = - (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) ,b ,c ,b ,c ,b ,c ,b ,c ,b ,c ,b ,c (1) (0) (1) (0) (1) (0) ; ; c a a a a a a S S S a b c a a b b c S S S d d d ¶ ¶ ¶ ¶ ¶ ¶ = - = - = - Ñ Ñ Ñ d e d
  • 21. Example : Steepest Descent Q. Determine the general fit parameters by general non-linear regression if the data follows the form X 0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 y 1.196 1.379 1.581 1.79 2.013 2.279 2.545 2.842 3.173 3.5 bx f y ae cx = +
  • 22. Solution Sum of squares of residuals: Hence, Assume we get 10 2 1 ( ) i bx i i i S y ae cx = é ù = - + ë û å 10 1 10 1 10 1 2 ( ) ( ); 2 ( ) ( ); 2 ( ) ( ) i i i i i bx bx i i i bx bx i i i i bx i i i i S y ae cx e a S y ae cx ax e b S y ae cx x c = = = ¶ é ù = - + - ë û ¶ ¶ é ù = - + - ë û ¶ ¶ é ù = - + - ë û ¶ å å å 11.674; 24.023; 30.682; 23.003 S S a S b S c = ¶ = - ¶ ¶ = - ¶ ¶ = - ¶ (0) (0) (0) 1; 0.2; c 0.1 a b = = =
  • 23. Magnitude of gradient vector: Hence unit vector in the direction of components: Hence, 2 2 2 45.251 S S S S a b c ¶ ¶ ¶ æ ö æ ö æ ö Ñ = + + = ç ÷ ç ÷ ç ÷ ¶ ¶ ¶ è ø è ø è ø (0) (0) (0) (0) (0) (0) ,b ,c ,b ,c 24.023 0.531 45.249 a a S a S ¶ - ¶ = = - Ñ 30.681 0.678; 45.249 23.003 0.508 45.249 S b S S c S ¶ - ¶ = = - Ñ ¶ - ¶ = = - Ñ (1) (0) (1) (0) (1) (0) ˆ (a) 1 (0.02 0.531) 1.011 ˆ (b) 0.02 (0.02 0.678) 0.214 ˆ c ( ) 0.1 (0.02 0.508) 0.11 a a b b c c d d d = - = - ´- = = - = - ´- = = - = - ´- =
  • 24. Now for these values of the new value of S=10.948 This is repeated until the value of S comes below 0.01 or less (which is specified already). For this example calculate the final values of a, b and c with a MATLAB program. #Assignment (1) (1) (1) 1.011; 0.214;c 0.11 a b = = =