SlideShare a Scribd company logo
Regression Analysis
Understanding Supervised Learning through Regression Analysis
By
Sharmila Chidaravalli
Assistant Professor
Department of ISE
Global Academy of Technology
• Regression analysis is a supervised learning method for predicting continuous variables.
• Oldest and most popular technique
• Models relationship between independent variables (x) and dependent variable (y)
• Regression represents the relationship as:
y = f(x)
where:
x = independent variable(s)
y = dependent variable
• feature variable x is also known as an explanatory variable, exploratory variable, a predictor variable, an
independent variable, a covariate, or a domain point.
• y is a dependent variable.
• Dependent variables are also called as labels, target variables, or response variables. Regression analysis
determines the change in response variables when one exploration variable is varied while keeping all other
parameters constant.
• This is used to determine the relationship each of the exploratory variables exhibits. Thus, regression analysis is
used for prediction and forecasting.
Introduction to Regression
Thus, the primary concern of regression analysis is to find answer to questions such as:
1. What is the relationship between the variables?
2. What is the strength of the relationships?
3. What is the nature of the relationship such as linear or non-linear?
4. What is the relevance of the attributes?
5. What is the contribution of each attribute?
Introduction to Regression
There are many applications of regression analysis. Some of the applications of regressions include
predicting:
1. Sales of a goods or services
2. Value of bonds in portfolio management
3. Premium on insurance companies
4. Yield of crops in agriculture
5. Prices of real estate
Introduction to Regression
Introduction to Linearity, Correlation, and Causation.
The quality of the regression analysis is determined by the factors such as correlation and causation.
Regression and Correlation
. Scatter plots show relationships between two variables
X-axis: independent variables
Y-axis: dependent variables
Pearson correlation coefficient (r) measures strength/direction
• Types:
- Positive correlation
- Negative correlation
- No correlation
Regression and Causation
Causation: One variable directly influences another.
Represented as: x → y
Example: Increasing study time → higher test scores.
Causation is about causal relationship among variables, say x and y.
Causation means knowing whether x causes y to happen or vice versa. x causes y is often
denoted as x implies y.
Correlation and Regression relationships are not same as causation relationship.
Introduction to Linearity, Correlation, and Causation.
Scenario Relationship Type
High temperature ↔ more ice cream sales Correlation (not causation)
Exercise → lower blood pressure (from a controlled
study)
Causation
Linearity and Non-linearity
Introduction to Linearity, Correlation, and Causation.
The linearity relationship between the variables means the relationship between the dependent and independent
variables can be visualized as a straight line.
The line of the form, y = ax + b can be fitted to the data points that indicate the relationship between x and y.
By linearity, it is meant that as one variable increases, the corresponding variable also increases in a linear manner.
A non-linear relationship exists in functions such as exponential function and power function . Here, x-axis is given
by x data and y-axis is given by y data.
Types of Regression Methods
Introduction to Linearity, Correlation, and Causation.
The classification of regression methods
Linear Regression
It is a type of regression where a line is fitted upon given data for finding the linear
relationship between one independent variable and one dependent variable to describe
relationships.
Multiple Regression
It is a type of regression where a line is fitted for finding the linear relationship between two or more independent
variables and one dependent variable to describe relationships among variables.
Polynomial Regression
It is a type of non-linear regression method of describing relationships among variables where Nth degree
polynomial is used to model the relationship between one independent variable and one dependent variable.
Polynomial multiple regression is used to model two or more independent variables and one dependent variable.
Logistic Regression
It is used for predicting categorical variables that involve one or more independent variables and one dependent
variable. This is also known as a binary classifier.
Lasso and Ridge Regression Methods
These are special variants of regression method where regularization methods are used to limit the number and
size of coefficients of the independent variables.
Types of Regression Methods
Introduction to Linearity, Correlation, and Causation.
1. Outliers – Outliers are abnormal data. It can bias the outcome of the regression model, as outliers push the
regression line towards it.
2. Number of cases – The ratio of independent and dependent variables should be at least 20: 1. For every
explanatory variable, there should be at least 20 samples. Atleast five samples are required in extreme cases.
3. Missing data – Missing data in training data can make the model unfit for the sampled data.
4. Multicollinearity – If exploratory variables are highly correlated (0.9 and above), the regression is vulnerable
to bias. Singularity leads to perfect correlation of 1. The remedy is to remove exploratory variables that
exhibit correlation more than 1. If there is a tie, then the tolerance (1 – R squared) is used to eliminate
variables that have the greatest value.
Introduction to Linearity, Correlation, and Causation.
Limitations of Regression Method
Introduction To Linear Regression
Assumptions of Linear Regression
Ordinary Least Squares (OLS)
Introduction To Linear Regression
•The OLS approach fits a straight line through the data points.
•The goal is to minimize the errors (residuals) between the observed values and predicted values on the line.
•The residual for a data point is:
Error Minimization Approaches
Introduction To Linear Regression
Define the Cost Function (Loss Function)
Calculate Optimal Parameters
Introduction To Linear Regression
Interpret the Regression Line
Consider the following dataset in Table where the week and number of working hours per week spent by a research
scholar in a library are tabulated. Based on the dataset, predict the number of hours that will be spent by the research
scholar in the 7th and 9th week. Apply linear regression model.
12
18
22
28
35
0
5
10
15
20
25
30
35
40
1 2 3 4 5
Regression Analysis-Machine Learning -Different Types
The regression equation is given as
12
18
22
28
35
45.4
56.6
0
10
20
30
40
50
60
1 2 3 4 5 7 9
Consider an the five weeks' sales data (in Thousands) is given as shown below in Table.
Apply linear regression technique to predict the 7th and 9th month sales.
Linear Regression in Matrix Form
Linear Regression Equation for each data point is given by :
This can be written as a system of equations:
Express in Matrix Form
This is written as:
Linear Regression in Matrix Form
Estimating Coefficients Using Matrix Algebra
To find the best-fit line (least squares solution), minimize the sum of squared
errors:
Linear Regression in Matrix Form
X (Week) 1 2 3 4 5
Y (Hours) 12 18 22 28 35
Find linear regression of the data. Use linear regression in matrix form.
X (Week) 1 2 3 4 5
Y (Hours) 12 18 22 28 35
Find linear regression of the data. Use linear regression in matrix form.
Step 1: Create matrices X and Y
X (Week) 1 2 3 4 5
Y (Hours) 12 18 22 28 35
Find linear regression of the data. Use linear regression in matrix form.
Step 2 :Apply the Normal Equation
X (Week) 1 2 3 4 5
Y (Hours) 12 18 22 28 35
Find linear regression of the data. Use linear regression in matrix form.
Step 2 :Apply the Normal Equation
X (Week) 1 2 3 4 5
Y (Hours) 12 18 22 28 35
Find linear regression of the data. Use linear regression in matrix form.
Step 2 :Apply the Normal Equation
Inverse of a 2×2 Matrix
X (Week) 1 2 3 4 5
Y (Hours) 12 18 22 28 35
Find linear regression of the data. Use linear regression in matrix form.
Step 2 :Apply the Normal Equation
Final Regression Equation
Find linear regression of the data. Use linear regression in matrix form.
Week (x) Sales (y)
1 1
2 3
3 4
4 8
Multiple Linear Regression
Multiple Linear Regression (MLR) is a statistical technique that models the relationship between one
dependent variable and two or more independent variables. It extends simple linear regression, which
involves only one independent variable, to capture more complex real-world scenarios where multiple factors
influence an outcome.
The multiple regression of two variables x1 and x2 is given as follows:
In general, this is given for ‘n’ independent variables as:
Using multiple regression, fit a line for the following dataset shown in Table. Here, z is the equity, x is the net sales
and y is the asset. z is the dependent variable and x, y are independent variables. All the data is in million dollars.
z x y
4 12 8
6 18 12
7 22 16
8 28 36
11 35 42
Using multiple regression, fit a line for the following dataset shown in Table. Here, z is the equity, x is the net sales
and y is the asset. z is the dependent variable and x, y are independent variables. All the data is in million dollars.
z x y
4 12 8
6 18 12
7 22 16
8 28 36
11 35 42
The matrix X and Y is given as follows:
Using multiple regression, fit a line for the following dataset shown in Table. Here, z is the equity, x is the net sales
and y is the asset. z is the dependent variable and x, y are independent variables. All the data is in million dollars.
z x y
4 12 8
6 18 12
7 22 16
8 28 36
11 35 42
The regression coefficients can be found as follows
Substituting the values one get,
Using multiple regression, fit a line for the following dataset shown in Table. Here, z is the equity, x is the net sales
and y is the asset. z is the dependent variable and x, y are independent variables. All the data is in million dollars.
z x y
4 12 8
6 18 12
7 22 16
8 28 36
11 35 42
The regression coefficients can be found as follows
Using multiple regression, fit a line for the following dataset shown in Table. Here, z is the equity, x is the net sales
and y is the asset. z is the dependent variable and x, y are independent variables. All the data is in million dollars.
z x y
4 12 8
6 18 12
7 22 16
8 28 36
11 35 42
The regression coefficients can be found as follows
Using multiple regression, fit a line for the following dataset shown in Table. Here, z is the equity, x is the net sales
and y is the asset. z is the dependent variable and x, y are independent variables. All the data is in million dollars.
z x y
4 12 8
6 18 12
7 22 16
8 28 36
11 35 42
The regression coefficients can be found as follows
=
Using multiple regression, fit a line for the following dataset shown in Table. Here, z is the equity, x is the net sales
and y is the asset. z is the dependent variable and x, y are independent variables. All the data is in million dollars.
z x y
4 12 8
6 18 12
7 22 16
8 28 36
11 35 42
The regression coefficients can be found as follows
Therefore, the regression line is given as
Apply multiple regression for the values given in Table where weekly sales along with sales for products x1 and x2
are provided. Use matrix approach for finding multiple regression.
x₁ (Product
One Sales)
x₂ (Product Two
Sales)
y (Output Weekly
Sales in Thousands)
1 4 1
2 5 6
3 8 8
4 2 12
Polynomial Regression
When the relationship between the independent and dependent variables is non-linear, standard
linear regression may not accurately model the data, resulting in large errors.
To address this, two main approaches can be used:
1. Transformation of non-linear data to linear data, so that the linear regression can handle the
data
2. Using polynomial regression
Transformation of Non-linear Data to Linear
This approach involves transforming the non-linear equation into a linear form, allowing the use of
linear regression techniques. Common transformations include:
Polynomial Regression
Transformation of Non-linear Data to Linear
Polynomial Regression
Polynomial Regression
Polynomial regression is a technique used to model non-linear relationships between the independent
variable x and the dependent variable y by fitting a polynomial equation of degree n. It provides a flexible
approach to capture curvilinear trends in data without transforming the variables.
Polynomial regression provides a non-linear curve such as quadratic and cubic.
the second-degree transformation (called quadratic transformation) is given as:
third-degree polynomial is called cubic transformation given as:
Generally, polynomials of maximum degree 4 are used, as higher order polynomials take some strange shapes
and make the curve more flexible. It leads to a situation of overfitting and hence is avoided.
Polynomial Regression
The Polynomial Regression system can be written in matrix form:
This is of the form:
X a=B
Where:
•X is the matrix of sums of powers of x,
•a is the column vector of coefficients,
•B is the column vector of target sums.
To solve for the coefficients:
Consider the data provided in Table and fit it using the second-order polynomial.
x y
1 6
2 11
3 18
4 27
5 38
Find the best-fitting quadratic polynomial of the form:
Compute Summations
x y x² x³ x⁴ x·y x²·y
1 6 1 1 1 6 6
2 11 4 8 16 22 44
3 18 9 27 81 54 162
4 27 16 64 256 108 432
5 38 25 125 625 190 950
N=5
Set Up Normal Equations
Final Polynomial
Check Fit
Consider the data provided in Table and fit it using the second-order polynomial.
Logistic Regression
Linear regression predicts the numerical response but is not suitable for predicting the categorical
variables.
When categorical variables are involved, it is called classification problem. Logistic regression is
suitable for binary classification problem.
Here, the output is often a categorical variable.
For example, the following scenarios are instances of predicting categorical variables.
1. Is the mail spam or not spam? The answer is yes or no. Thus, categorical dependent variable is
a binary response of yes or no.
2. If the student should be admitted or not is based on entrance examination marks. Here,
categorical variable response is admitted or not.
3. The student being pass or fail is based on marks secured.
Logistic Regression
Logistic regression is used as a binary classifier and works by predicting the probability of the
categorical variable.
In general, it takes one or more features x and predicts the response y.
If the probability is predicted via linear regression, it is given as:
Linear regression generated value is in the range -∞ to +∞, whereas the probability of the
response variable ranges between 0 and 1.
Hence, there must be a mapping function to map the value -∞ to +∞ to 0–1.
The core of the mapping function in logistic regression method is sigmoidal function.
A sigmoidal function is a ‘S’ shaped function that yields values between 0 and 1.
This is known as logit function. This is mathematically represented as:
where
•x: independent variable
•e: Euler’s number (~2.718)
Logistic Regression
The Logistic Function is given by :
This function is S-shaped and maps any real value to the range (0, 1).
Here,
x is the explanatory or predictor variable,
e is the Euler number
a0, a1 are the regression coefficients.
The coefficients a0, a1 can be learned and the predictor predicts p(x) directly using the threshold function
as:
Let us assume a binomial logistic regression problem where the classes are pass and fail. The student dataset has
entrance mark based on the historic data of those who are selected or not selected. Based on the logistic
regression, the values of the learnt parameters are a0 = 1 and a1 = 8. Assuming marks of x = 60, compute the
resultant class.
Given:
a0 = 1
a1 = 8
x = 60
Compute z
Compute Sigmoid Function
Since threshold value as 0.5,
then it is observed that 0.44 < 0.5,
therefore, the candidate with marks 60 is not selected.

More Related Content

PDF
KNN,Weighted KNN,Nearest Centroid Classifier,Locally Weighted Regression
Sharmila Chidaravalli
 
PDF
Decision Tree-ID3,C4.5,CART,Regression Tree
Sharmila Chidaravalli
 
PDF
Bayesian Learning - Naive Bayes Algorithm
Sharmila Chidaravalli
 
PDF
Artificial Neural Network-Types,Perceptron,Problems
Sharmila Chidaravalli
 
PPTX
Modern Block Cipher- Modern Symmetric-Key Cipher
Mahbubur Rahman
 
PDF
Chapter 6 part1- Introduction to Inference-Estimating with Confidence (Introd...
nszakir
 
PPTX
Methods of organizing data
Roxane La'O
 
KNN,Weighted KNN,Nearest Centroid Classifier,Locally Weighted Regression
Sharmila Chidaravalli
 
Decision Tree-ID3,C4.5,CART,Regression Tree
Sharmila Chidaravalli
 
Bayesian Learning - Naive Bayes Algorithm
Sharmila Chidaravalli
 
Artificial Neural Network-Types,Perceptron,Problems
Sharmila Chidaravalli
 
Modern Block Cipher- Modern Symmetric-Key Cipher
Mahbubur Rahman
 
Chapter 6 part1- Introduction to Inference-Estimating with Confidence (Introd...
nszakir
 
Methods of organizing data
Roxane La'O
 

What's hot (7)

PDF
Introduction to the t-test
Kaori Kubo Germano, PhD
 
PPTX
Inferential statistics powerpoint
kellula
 
PPTX
Quartiles, Deciles, Percentiles 3.pptx
Shiv Verdhan Chauhan
 
PPTX
Trusted systems1
Sumita Das
 
PDF
Sampling Distribution of Sample proportion
Robert Geofroy
 
PPTX
Presentation on determination of size of sample (n)
Partnered Health
 
PPT
Confidence Intervals
mandalina landy
 
Introduction to the t-test
Kaori Kubo Germano, PhD
 
Inferential statistics powerpoint
kellula
 
Quartiles, Deciles, Percentiles 3.pptx
Shiv Verdhan Chauhan
 
Trusted systems1
Sumita Das
 
Sampling Distribution of Sample proportion
Robert Geofroy
 
Presentation on determination of size of sample (n)
Partnered Health
 
Confidence Intervals
mandalina landy
 
Ad

Similar to Regression Analysis-Machine Learning -Different Types (20)

PPTX
regression analysis presentation slides.
nsnatraj23
 
PPTX
REGRESSION ANALYSIS THEORY EXPLAINED HERE
ShriramKargaonkar
 
PPTX
Artifical Intelligence And Machine Learning Algorithum.pptx
Aishwarya SenthilNathan
 
PPTX
Unit-III Correlation and Regression.pptx
Anusuya123
 
PPTX
An Introduction to Regression Models: Linear and Logistic approaches
Bhanu Yadav
 
PPTX
Linear regression aims to find the "best-fit" linear line
rnycsepp
 
PPTX
Regression analysis
bijuhari
 
PPTX
Regression of research methodlogyyy.pptx
SofiX4
 
PPTX
Lecture 8 Linear and Multiple Regression (1).pptx
haseebayy45
 
PPTX
Regression analysis refers to assessing the relationship between the outcome ...
sureshm491823
 
PPTX
Regression
sayantansarkar50
 
PDF
regression-linearandlogisitics-220524024037-4221a176 (1).pdf
lisow86669
 
PPTX
Linear and Logistics Regression
Mukul Kumar Singh Chauhan
 
PPTX
Regression Analysis
Salim Azad
 
PPTX
Regression analysis
sayantansarkar50
 
PPTX
Regression analysis
sayantansarkar50
 
PDF
Regression
AbidShahriar8
 
PDF
Regression analysis
Srikant001p
 
PPTX
4. regression analysis1
Karan Kukreja
 
PPTX
Correlation and regression with Formulas and examples
hassandanish14
 
regression analysis presentation slides.
nsnatraj23
 
REGRESSION ANALYSIS THEORY EXPLAINED HERE
ShriramKargaonkar
 
Artifical Intelligence And Machine Learning Algorithum.pptx
Aishwarya SenthilNathan
 
Unit-III Correlation and Regression.pptx
Anusuya123
 
An Introduction to Regression Models: Linear and Logistic approaches
Bhanu Yadav
 
Linear regression aims to find the "best-fit" linear line
rnycsepp
 
Regression analysis
bijuhari
 
Regression of research methodlogyyy.pptx
SofiX4
 
Lecture 8 Linear and Multiple Regression (1).pptx
haseebayy45
 
Regression analysis refers to assessing the relationship between the outcome ...
sureshm491823
 
Regression
sayantansarkar50
 
regression-linearandlogisitics-220524024037-4221a176 (1).pdf
lisow86669
 
Linear and Logistics Regression
Mukul Kumar Singh Chauhan
 
Regression Analysis
Salim Azad
 
Regression analysis
sayantansarkar50
 
Regression analysis
sayantansarkar50
 
Regression
AbidShahriar8
 
Regression analysis
Srikant001p
 
4. regression analysis1
Karan Kukreja
 
Correlation and regression with Formulas and examples
hassandanish14
 
Ad

More from Sharmila Chidaravalli (14)

PDF
Clustering Algorithms - Kmeans,Min ALgorithm
Sharmila Chidaravalli
 
PDF
Concept Learning - Find S Algorithm,Candidate Elimination Algorithm
Sharmila Chidaravalli
 
PDF
Big Data Tools MapReduce,Hive and Pig.pdf
Sharmila Chidaravalli
 
PDF
NoSQL BIg Data Analytics Mongo DB and Cassandra .pdf
Sharmila Chidaravalli
 
PDF
Big Data Intoduction & Hadoop ArchitectureModule1.pdf
Sharmila Chidaravalli
 
PPTX
Dms introduction Sharmila Chidaravalli
Sharmila Chidaravalli
 
PDF
Assembly code
Sharmila Chidaravalli
 
PDF
Direct Memory Access & Interrrupts
Sharmila Chidaravalli
 
PPT
8255 Introduction
Sharmila Chidaravalli
 
PPTX
System Modeling & Simulation Introduction
Sharmila Chidaravalli
 
PDF
Travelling Salesperson Problem-Branch & Bound
Sharmila Chidaravalli
 
PDF
Bellman ford algorithm -Shortest Path
Sharmila Chidaravalli
 
Clustering Algorithms - Kmeans,Min ALgorithm
Sharmila Chidaravalli
 
Concept Learning - Find S Algorithm,Candidate Elimination Algorithm
Sharmila Chidaravalli
 
Big Data Tools MapReduce,Hive and Pig.pdf
Sharmila Chidaravalli
 
NoSQL BIg Data Analytics Mongo DB and Cassandra .pdf
Sharmila Chidaravalli
 
Big Data Intoduction & Hadoop ArchitectureModule1.pdf
Sharmila Chidaravalli
 
Dms introduction Sharmila Chidaravalli
Sharmila Chidaravalli
 
Assembly code
Sharmila Chidaravalli
 
Direct Memory Access & Interrrupts
Sharmila Chidaravalli
 
8255 Introduction
Sharmila Chidaravalli
 
System Modeling & Simulation Introduction
Sharmila Chidaravalli
 
Travelling Salesperson Problem-Branch & Bound
Sharmila Chidaravalli
 
Bellman ford algorithm -Shortest Path
Sharmila Chidaravalli
 

Recently uploaded (20)

PDF
Biological Classification Class 11th NCERT CBSE NEET.pdf
NehaRohtagi1
 
PPTX
Python-Application-in-Drug-Design by R D Jawarkar.pptx
Rahul Jawarkar
 
PPTX
HEALTH CARE DELIVERY SYSTEM - UNIT 2 - GNM 3RD YEAR.pptx
Priyanshu Anand
 
PPTX
How to Manage Leads in Odoo 18 CRM - Odoo Slides
Celine George
 
PPTX
Software Engineering BSC DS UNIT 1 .pptx
Dr. Pallawi Bulakh
 
PPTX
How to Apply for a Job From Odoo 18 Website
Celine George
 
PPTX
Care of patients with elImination deviation.pptx
AneetaSharma15
 
PDF
Virat Kohli- the Pride of Indian cricket
kushpar147
 
PDF
What is CFA?? Complete Guide to the Chartered Financial Analyst Program
sp4989653
 
PDF
BÀI TẬP TEST BỔ TRỢ THEO TỪNG CHỦ ĐỀ CỦA TỪNG UNIT KÈM BÀI TẬP NGHE - TIẾNG A...
Nguyen Thanh Tu Collection
 
DOCX
pgdei-UNIT -V Neurological Disorders & developmental disabilities
JELLA VISHNU DURGA PRASAD
 
PDF
The-Invisible-Living-World-Beyond-Our-Naked-Eye chapter 2.pdf/8th science cur...
Sandeep Swamy
 
DOCX
Action Plan_ARAL PROGRAM_ STAND ALONE SHS.docx
Levenmartlacuna1
 
PPTX
Five Point Someone – Chetan Bhagat | Book Summary & Analysis by Bhupesh Kushwaha
Bhupesh Kushwaha
 
PDF
The Minister of Tourism, Culture and Creative Arts, Abla Dzifa Gomashie has e...
nservice241
 
DOCX
SAROCES Action-Plan FOR ARAL PROGRAM IN DEPED
Levenmartlacuna1
 
PDF
Module 2: Public Health History [Tutorial Slides]
JonathanHallett4
 
PDF
Antianginal agents, Definition, Classification, MOA.pdf
Prerana Jadhav
 
PPTX
Introduction to pediatric nursing in 5th Sem..pptx
AneetaSharma15
 
PPTX
TEF & EA Bsc Nursing 5th sem.....BBBpptx
AneetaSharma15
 
Biological Classification Class 11th NCERT CBSE NEET.pdf
NehaRohtagi1
 
Python-Application-in-Drug-Design by R D Jawarkar.pptx
Rahul Jawarkar
 
HEALTH CARE DELIVERY SYSTEM - UNIT 2 - GNM 3RD YEAR.pptx
Priyanshu Anand
 
How to Manage Leads in Odoo 18 CRM - Odoo Slides
Celine George
 
Software Engineering BSC DS UNIT 1 .pptx
Dr. Pallawi Bulakh
 
How to Apply for a Job From Odoo 18 Website
Celine George
 
Care of patients with elImination deviation.pptx
AneetaSharma15
 
Virat Kohli- the Pride of Indian cricket
kushpar147
 
What is CFA?? Complete Guide to the Chartered Financial Analyst Program
sp4989653
 
BÀI TẬP TEST BỔ TRỢ THEO TỪNG CHỦ ĐỀ CỦA TỪNG UNIT KÈM BÀI TẬP NGHE - TIẾNG A...
Nguyen Thanh Tu Collection
 
pgdei-UNIT -V Neurological Disorders & developmental disabilities
JELLA VISHNU DURGA PRASAD
 
The-Invisible-Living-World-Beyond-Our-Naked-Eye chapter 2.pdf/8th science cur...
Sandeep Swamy
 
Action Plan_ARAL PROGRAM_ STAND ALONE SHS.docx
Levenmartlacuna1
 
Five Point Someone – Chetan Bhagat | Book Summary & Analysis by Bhupesh Kushwaha
Bhupesh Kushwaha
 
The Minister of Tourism, Culture and Creative Arts, Abla Dzifa Gomashie has e...
nservice241
 
SAROCES Action-Plan FOR ARAL PROGRAM IN DEPED
Levenmartlacuna1
 
Module 2: Public Health History [Tutorial Slides]
JonathanHallett4
 
Antianginal agents, Definition, Classification, MOA.pdf
Prerana Jadhav
 
Introduction to pediatric nursing in 5th Sem..pptx
AneetaSharma15
 
TEF & EA Bsc Nursing 5th sem.....BBBpptx
AneetaSharma15
 

Regression Analysis-Machine Learning -Different Types

  • 1. Regression Analysis Understanding Supervised Learning through Regression Analysis By Sharmila Chidaravalli Assistant Professor Department of ISE Global Academy of Technology
  • 2. • Regression analysis is a supervised learning method for predicting continuous variables. • Oldest and most popular technique • Models relationship between independent variables (x) and dependent variable (y) • Regression represents the relationship as: y = f(x) where: x = independent variable(s) y = dependent variable • feature variable x is also known as an explanatory variable, exploratory variable, a predictor variable, an independent variable, a covariate, or a domain point. • y is a dependent variable. • Dependent variables are also called as labels, target variables, or response variables. Regression analysis determines the change in response variables when one exploration variable is varied while keeping all other parameters constant. • This is used to determine the relationship each of the exploratory variables exhibits. Thus, regression analysis is used for prediction and forecasting. Introduction to Regression
  • 3. Thus, the primary concern of regression analysis is to find answer to questions such as: 1. What is the relationship between the variables? 2. What is the strength of the relationships? 3. What is the nature of the relationship such as linear or non-linear? 4. What is the relevance of the attributes? 5. What is the contribution of each attribute? Introduction to Regression
  • 4. There are many applications of regression analysis. Some of the applications of regressions include predicting: 1. Sales of a goods or services 2. Value of bonds in portfolio management 3. Premium on insurance companies 4. Yield of crops in agriculture 5. Prices of real estate Introduction to Regression
  • 5. Introduction to Linearity, Correlation, and Causation. The quality of the regression analysis is determined by the factors such as correlation and causation. Regression and Correlation . Scatter plots show relationships between two variables X-axis: independent variables Y-axis: dependent variables Pearson correlation coefficient (r) measures strength/direction • Types: - Positive correlation - Negative correlation - No correlation
  • 6. Regression and Causation Causation: One variable directly influences another. Represented as: x → y Example: Increasing study time → higher test scores. Causation is about causal relationship among variables, say x and y. Causation means knowing whether x causes y to happen or vice versa. x causes y is often denoted as x implies y. Correlation and Regression relationships are not same as causation relationship. Introduction to Linearity, Correlation, and Causation. Scenario Relationship Type High temperature ↔ more ice cream sales Correlation (not causation) Exercise → lower blood pressure (from a controlled study) Causation
  • 7. Linearity and Non-linearity Introduction to Linearity, Correlation, and Causation. The linearity relationship between the variables means the relationship between the dependent and independent variables can be visualized as a straight line. The line of the form, y = ax + b can be fitted to the data points that indicate the relationship between x and y. By linearity, it is meant that as one variable increases, the corresponding variable also increases in a linear manner. A non-linear relationship exists in functions such as exponential function and power function . Here, x-axis is given by x data and y-axis is given by y data.
  • 8. Types of Regression Methods Introduction to Linearity, Correlation, and Causation. The classification of regression methods
  • 9. Linear Regression It is a type of regression where a line is fitted upon given data for finding the linear relationship between one independent variable and one dependent variable to describe relationships. Multiple Regression It is a type of regression where a line is fitted for finding the linear relationship between two or more independent variables and one dependent variable to describe relationships among variables. Polynomial Regression It is a type of non-linear regression method of describing relationships among variables where Nth degree polynomial is used to model the relationship between one independent variable and one dependent variable. Polynomial multiple regression is used to model two or more independent variables and one dependent variable. Logistic Regression It is used for predicting categorical variables that involve one or more independent variables and one dependent variable. This is also known as a binary classifier. Lasso and Ridge Regression Methods These are special variants of regression method where regularization methods are used to limit the number and size of coefficients of the independent variables. Types of Regression Methods Introduction to Linearity, Correlation, and Causation.
  • 10. 1. Outliers – Outliers are abnormal data. It can bias the outcome of the regression model, as outliers push the regression line towards it. 2. Number of cases – The ratio of independent and dependent variables should be at least 20: 1. For every explanatory variable, there should be at least 20 samples. Atleast five samples are required in extreme cases. 3. Missing data – Missing data in training data can make the model unfit for the sampled data. 4. Multicollinearity – If exploratory variables are highly correlated (0.9 and above), the regression is vulnerable to bias. Singularity leads to perfect correlation of 1. The remedy is to remove exploratory variables that exhibit correlation more than 1. If there is a tie, then the tolerance (1 – R squared) is used to eliminate variables that have the greatest value. Introduction to Linearity, Correlation, and Causation. Limitations of Regression Method
  • 11. Introduction To Linear Regression Assumptions of Linear Regression
  • 12. Ordinary Least Squares (OLS) Introduction To Linear Regression •The OLS approach fits a straight line through the data points. •The goal is to minimize the errors (residuals) between the observed values and predicted values on the line. •The residual for a data point is:
  • 13. Error Minimization Approaches Introduction To Linear Regression Define the Cost Function (Loss Function)
  • 14. Calculate Optimal Parameters Introduction To Linear Regression Interpret the Regression Line
  • 15. Consider the following dataset in Table where the week and number of working hours per week spent by a research scholar in a library are tabulated. Based on the dataset, predict the number of hours that will be spent by the research scholar in the 7th and 9th week. Apply linear regression model. 12 18 22 28 35 0 5 10 15 20 25 30 35 40 1 2 3 4 5
  • 17. The regression equation is given as 12 18 22 28 35 45.4 56.6 0 10 20 30 40 50 60 1 2 3 4 5 7 9
  • 18. Consider an the five weeks' sales data (in Thousands) is given as shown below in Table. Apply linear regression technique to predict the 7th and 9th month sales.
  • 19. Linear Regression in Matrix Form Linear Regression Equation for each data point is given by : This can be written as a system of equations: Express in Matrix Form
  • 20. This is written as: Linear Regression in Matrix Form Estimating Coefficients Using Matrix Algebra To find the best-fit line (least squares solution), minimize the sum of squared errors:
  • 21. Linear Regression in Matrix Form
  • 22. X (Week) 1 2 3 4 5 Y (Hours) 12 18 22 28 35 Find linear regression of the data. Use linear regression in matrix form.
  • 23. X (Week) 1 2 3 4 5 Y (Hours) 12 18 22 28 35 Find linear regression of the data. Use linear regression in matrix form. Step 1: Create matrices X and Y
  • 24. X (Week) 1 2 3 4 5 Y (Hours) 12 18 22 28 35 Find linear regression of the data. Use linear regression in matrix form. Step 2 :Apply the Normal Equation
  • 25. X (Week) 1 2 3 4 5 Y (Hours) 12 18 22 28 35 Find linear regression of the data. Use linear regression in matrix form. Step 2 :Apply the Normal Equation
  • 26. X (Week) 1 2 3 4 5 Y (Hours) 12 18 22 28 35 Find linear regression of the data. Use linear regression in matrix form. Step 2 :Apply the Normal Equation Inverse of a 2×2 Matrix
  • 27. X (Week) 1 2 3 4 5 Y (Hours) 12 18 22 28 35 Find linear regression of the data. Use linear regression in matrix form. Step 2 :Apply the Normal Equation Final Regression Equation
  • 28. Find linear regression of the data. Use linear regression in matrix form. Week (x) Sales (y) 1 1 2 3 3 4 4 8
  • 29. Multiple Linear Regression Multiple Linear Regression (MLR) is a statistical technique that models the relationship between one dependent variable and two or more independent variables. It extends simple linear regression, which involves only one independent variable, to capture more complex real-world scenarios where multiple factors influence an outcome. The multiple regression of two variables x1 and x2 is given as follows: In general, this is given for ‘n’ independent variables as:
  • 30. Using multiple regression, fit a line for the following dataset shown in Table. Here, z is the equity, x is the net sales and y is the asset. z is the dependent variable and x, y are independent variables. All the data is in million dollars. z x y 4 12 8 6 18 12 7 22 16 8 28 36 11 35 42
  • 31. Using multiple regression, fit a line for the following dataset shown in Table. Here, z is the equity, x is the net sales and y is the asset. z is the dependent variable and x, y are independent variables. All the data is in million dollars. z x y 4 12 8 6 18 12 7 22 16 8 28 36 11 35 42 The matrix X and Y is given as follows:
  • 32. Using multiple regression, fit a line for the following dataset shown in Table. Here, z is the equity, x is the net sales and y is the asset. z is the dependent variable and x, y are independent variables. All the data is in million dollars. z x y 4 12 8 6 18 12 7 22 16 8 28 36 11 35 42 The regression coefficients can be found as follows Substituting the values one get,
  • 33. Using multiple regression, fit a line for the following dataset shown in Table. Here, z is the equity, x is the net sales and y is the asset. z is the dependent variable and x, y are independent variables. All the data is in million dollars. z x y 4 12 8 6 18 12 7 22 16 8 28 36 11 35 42 The regression coefficients can be found as follows
  • 34. Using multiple regression, fit a line for the following dataset shown in Table. Here, z is the equity, x is the net sales and y is the asset. z is the dependent variable and x, y are independent variables. All the data is in million dollars. z x y 4 12 8 6 18 12 7 22 16 8 28 36 11 35 42 The regression coefficients can be found as follows
  • 35. Using multiple regression, fit a line for the following dataset shown in Table. Here, z is the equity, x is the net sales and y is the asset. z is the dependent variable and x, y are independent variables. All the data is in million dollars. z x y 4 12 8 6 18 12 7 22 16 8 28 36 11 35 42 The regression coefficients can be found as follows =
  • 36. Using multiple regression, fit a line for the following dataset shown in Table. Here, z is the equity, x is the net sales and y is the asset. z is the dependent variable and x, y are independent variables. All the data is in million dollars. z x y 4 12 8 6 18 12 7 22 16 8 28 36 11 35 42 The regression coefficients can be found as follows Therefore, the regression line is given as
  • 37. Apply multiple regression for the values given in Table where weekly sales along with sales for products x1 and x2 are provided. Use matrix approach for finding multiple regression. x₁ (Product One Sales) x₂ (Product Two Sales) y (Output Weekly Sales in Thousands) 1 4 1 2 5 6 3 8 8 4 2 12
  • 38. Polynomial Regression When the relationship between the independent and dependent variables is non-linear, standard linear regression may not accurately model the data, resulting in large errors. To address this, two main approaches can be used: 1. Transformation of non-linear data to linear data, so that the linear regression can handle the data 2. Using polynomial regression
  • 39. Transformation of Non-linear Data to Linear This approach involves transforming the non-linear equation into a linear form, allowing the use of linear regression techniques. Common transformations include: Polynomial Regression
  • 40. Transformation of Non-linear Data to Linear Polynomial Regression
  • 41. Polynomial Regression Polynomial regression is a technique used to model non-linear relationships between the independent variable x and the dependent variable y by fitting a polynomial equation of degree n. It provides a flexible approach to capture curvilinear trends in data without transforming the variables. Polynomial regression provides a non-linear curve such as quadratic and cubic. the second-degree transformation (called quadratic transformation) is given as: third-degree polynomial is called cubic transformation given as: Generally, polynomials of maximum degree 4 are used, as higher order polynomials take some strange shapes and make the curve more flexible. It leads to a situation of overfitting and hence is avoided.
  • 42. Polynomial Regression The Polynomial Regression system can be written in matrix form: This is of the form: X a=B Where: •X is the matrix of sums of powers of x, •a is the column vector of coefficients, •B is the column vector of target sums. To solve for the coefficients:
  • 43. Consider the data provided in Table and fit it using the second-order polynomial. x y 1 6 2 11 3 18 4 27 5 38 Find the best-fitting quadratic polynomial of the form:
  • 44. Compute Summations x y x² x³ x⁴ x·y x²·y 1 6 1 1 1 6 6 2 11 4 8 16 22 44 3 18 9 27 81 54 162 4 27 16 64 256 108 432 5 38 25 125 625 190 950 N=5
  • 45. Set Up Normal Equations
  • 47. Consider the data provided in Table and fit it using the second-order polynomial.
  • 48. Logistic Regression Linear regression predicts the numerical response but is not suitable for predicting the categorical variables. When categorical variables are involved, it is called classification problem. Logistic regression is suitable for binary classification problem. Here, the output is often a categorical variable. For example, the following scenarios are instances of predicting categorical variables. 1. Is the mail spam or not spam? The answer is yes or no. Thus, categorical dependent variable is a binary response of yes or no. 2. If the student should be admitted or not is based on entrance examination marks. Here, categorical variable response is admitted or not. 3. The student being pass or fail is based on marks secured.
  • 49. Logistic Regression Logistic regression is used as a binary classifier and works by predicting the probability of the categorical variable. In general, it takes one or more features x and predicts the response y. If the probability is predicted via linear regression, it is given as: Linear regression generated value is in the range -∞ to +∞, whereas the probability of the response variable ranges between 0 and 1. Hence, there must be a mapping function to map the value -∞ to +∞ to 0–1. The core of the mapping function in logistic regression method is sigmoidal function. A sigmoidal function is a ‘S’ shaped function that yields values between 0 and 1. This is known as logit function. This is mathematically represented as: where •x: independent variable •e: Euler’s number (~2.718)
  • 50. Logistic Regression The Logistic Function is given by : This function is S-shaped and maps any real value to the range (0, 1). Here, x is the explanatory or predictor variable, e is the Euler number a0, a1 are the regression coefficients. The coefficients a0, a1 can be learned and the predictor predicts p(x) directly using the threshold function as:
  • 51. Let us assume a binomial logistic regression problem where the classes are pass and fail. The student dataset has entrance mark based on the historic data of those who are selected or not selected. Based on the logistic regression, the values of the learnt parameters are a0 = 1 and a1 = 8. Assuming marks of x = 60, compute the resultant class. Given: a0 = 1 a1 = 8 x = 60 Compute z Compute Sigmoid Function Since threshold value as 0.5, then it is observed that 0.44 < 0.5, therefore, the candidate with marks 60 is not selected.