Biostatistics: Lect6: Correlation and Regression Analysis Dr. Ecem Yeğin

This document discusses correlation and regression analysis, emphasizing their importance in understanding relationships between variables in medical research. It explains the Pearson correlation coefficient, the distinction between correlation and causation, and the basics of simple and multiple linear regression. The document also highlights applications in identifying risk factors, evaluating diagnostic tests, and predicting treatment effectiveness.

Uploaded by

fahadalhababi427

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views28 pages

Biostatistics: Lect6: Correlation and Regression Analysis Dr. Ecem Yeğin

Uploaded by

fahadalhababi427

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 28

BIOSTATISTICS

LECT6: CORRELATION AND REGRESSION ANALYSIS

DR. ECEM YEĞİN
Correlation and Regression Analysis

These methods allow us to understand the relationships

between variables, determine the strength and direction of
these relationships, and even estimate the value of one
variable over another.

They play a critical role in many areas, especially in medical

research, such as determining risk factors, evaluating the
effectiveness of diagnostic tests, predicting treatment
outcomes, and understanding disease etiology.
Correlation and Regression Analysis

We can discover whether there is a relationship between two or

more variables, and if there is a relationship, the direction and
strength of the relationship, with "correlation analysis".

The analysis that examines how one variable changes when the
other changes by a certain unit is "regression analysis".
Correlation Coefficient (r):
• The most commonly used statistical value to measure the strength
and direction of a linear relationship between two variables is the
Pearson correlation coefficient (r).
• This coefficient takes values between -1 and +1.
• r = +1: Perfect positive correlation. Example: Under ideal
conditions, as the dose of a drug increases, the blood level
increases at the same rate.
• r = -1: Perfect negative correlation. Example: As the dose of a
drug increases, the pain score decreases at the same rate.
• r = 0: No correlation. There is no linear relationship between the
variables.
Correlation Coefficient (r):
• 0 < |r| < 1: A weak, moderate, or strong linear relationship.

***The interpretation of values in this range may vary depending on

the domain and context being studied, but as a general guide:

• |r| < 0.3: Weak correlation

• 0.3 ≤ |r| < 0.7: Moderate correlation
• |r| ≥ 0.7: Strong correlation
Correlation Coefficient (r):

r= -1 r= 0 r= +1
Perfect No relationship Perfect
negative relationship positive relationship

Scatter plots provide general information about the relationship

between two variables. However, in order to comment on the
amount of relationship, we need to calculate the correlation
coefficient.
NOTES:
Important Points:
• Correlation Does Not Mean Causality! A strong correlation between two variables does
not mean that one causes the other. There may be a third factor (confounding variable) or
the relationship may be completely coincidental. Classic example: A positive correlation can
be observed between ice cream sales and drownings in the summer months, but it cannot
be concluded that eating ice cream causes drowning. Both events are associated with warm
weather and increased water activity.

• Measures Linear Relationship: The Pearson correlation coefficient only measures linear
relationships between variables.

• Suitable for Continuous Variables: Correlation analysis is generally used for continuous
(measurable, numerical) variables. Different methods such as the Chi-Square test are used
to examine relationships between categorical variables.
EXAMPLE
• A research team wanted to study the relationship between
children's height and shoe size. The height (cm) and shoe
size of 5 randomly selected children were recorded as
follows:
Child No. Height (cm) Shoe Size
1 110 30
2 115 32
3 120 33
4 125 35
5 130 36
Answer:
• Now let's calculate the Pearson correlation coefficient (r)
between these two variables.
• Step 1: Calculate the mean of each variable.
• Average Height ( xˉ ): (110 + 115 + 120 + 125 + 130) / 5
= 600 / 5 = 120 cm
• Average Shoe Size ( yˉ): (30 + 32 + 33 + 35 + 36) / 5
= 166 / 5 = 33.2
Answer
• Step 2: Calculate the difference of each data point from the
mean.

Child No. Height (x) x−xˉ Show Size (y) y−yˉ

1 110 -10 30 -3.2
2 115 -5 32 -1.2
3 120 0 33 -0.2
4 125 5 35 1.8
5 130 10 36 2.8
Answer
• Step 3: Calculate the product terms and squares.

Child No. x−xˉ y−yˉ (x−xˉ)(y−yˉ) (x−xˉ)2 (y−yˉ)2

1 -10 -3.2 32 100 10.24
2 -5 -1.2 6 25 1.44
3 0 -0.2 0 0 0.04
4 5 1.8 9 25 3.24
5 10 2.8 28 100 7.84
Total 75 250 22.8
Answer
• Step 4: Calculate the Pearson correlation coefficient (r).
Answer
• Step 5: Comment; The correlation coefficient (r) we obtained is
approximately 0.993. Since this value is very close to +1, it shows
that there is a very strong and positive linear relationship between
children's height and shoe size.
• We can say that as height increases, shoe size also tends to
increase. This simple example shows how correlation measures the
direction and strength of a linear relationship between two
continuous variables.
• NOTE: This strong correlation does not mean that height directly
"causes" shoe size, but it can suggest that these two variables are
related to the growth process.
Regression Analysis:

It aims to predict the value of a

dependent variable (output It also helps us understand
variable or response variable) how much change a unit
using one or more independent change in the independent
variables (predictor variables) variables causes in the
and express this relationship dependent variable.
with a mathematical model.
(+) directional (-) directional
linear relationship linear relationship
nonlinear no relationship
relationship
Simple Linear Regression:
• The most basic type of regression. It examines the linear
relationship between a single continuous independent variable and
a single continuous dependent variable. This relationship is
expressed mathematically as a straight line equation:
y= +x

• y = Value of the dependent variable

• = Intercept of the regression line (Constant value)
• = Slope of the regression line 
• x = Value of the independent variable
Example: Does blood sugar increase as BMI increases?
Multiple Linear Regression:
• It is used to examine the effect of more than one continuous
independent variable on the dependent variable.
• The model is expressed as follows:

• x1,x2,...,xprepresent the independent variables

• b1,b2,...,bprepresent the coefficients.
Example: Estimating HbA1c level based on age, BMI and physical
activity
Assumptions of Regression Analysis:
• In order for the results of regression analysis to be reliable, some basic
assumptions must be met:
• Linearity: The relationship between the independent variables and the
dependent variable is expected to be linear.
• Independence of Residuals: The residuals must be independent of each
other (there should be no autocorrelation). This assumption is especially
important in time series data.
• Homoscedasticity of Residuals: The variance of the residuals must be
constant for all independent variable values. Heteroscedasticity (variance)
can affect the reliability of the model.
• Normal Distribution of Residuals: The residuals are assumed to have a
normal distribution. This assumption is especially important for hypothesis
testing and confidence intervals.
Example
• A researcher is studying the relationship between sleep duration
(hours) and a student's performance on an exam (score). The sleep
durations and exam scores of 3 randomly selected students are
recorded as follows: The researcher wants to perform simple linear
regression analysis to predict exam scores based on sleep
duration.
Sleep Duration
Student No. (hours) (x) Exam Score (y)
1 6 60
2 7 70
3 8 80
Example
• Step 1: Calculate Averages.

• Average Sleep Duration ( xˉ ): (6 + 7 + 8) / 3 = 21 / 3 = 7 hours

• Average Exam Score ( yˉ): (60 + 70 + 80) / 3 = 210 / 3 = 70 points
Example
• Step 2: Calculate the Slope () and Y-intercept ().

Student No. xi yi xi−xˉ yi−yˉ (xi−xˉ)(yi−yˉ) (xi−xˉ)2

1 6 60 -1 -10 10 1
2 7 70 0 0 0 0
3 8 80 1 10 10 1
Total 20 2
Example
• Step 3: Write the Regression Equation.

• The predicted test score ( y^) can be modeled with sleep duration
(x) as follows:
Example
• Step 4: Interpret the Equation.

• Y-intercept (=0): Theoretically, if a student sleeps 0 hours, the test

score would be expected to be 0.

• Slope (=10): For every 1 hour increase in sleep time, the student's
test score would be expected to increase by 10 points, on average.
Example
• Step 5: Make a Prediction (Example).

• If a student sleeps for 7.5 hours, we can predict what their test
score will be: y^=10×7.5=75 points
• According to our simple model, a student who sleeps for 7.5 hours
is expected to score 75 on the test.
Applications of Correlation and Regression in Medical
Research
• Identifying Risk Factors: For example, how much smoking increases the risk of
lung cancer)
• Evaluating Diagnostic Tests: Examining the correlation between the results of
a new diagnostic test and the results of a gold standard test and evaluating how
reliable the new test is.
• Predicting Treatment Effectiveness: Modeling the relationship between
patients' baseline characteristics (age, disease severity, etc.) and treatment
outcomes (recovery time, risk of complications, etc.) using regression analysis
and determining the factors that affect treatment success.
• Examining Drug Dose-Response Relationships: Evaluating the effect of
different doses of a drug on patient response using regression analysis and
helping determine the optimal dose.
• Epidemiological Studies: Analyzing the correlation of environmental factors or
lifestyle habits with disease incidence and the strength of this relationship.
CONCLUSION
• Correlation and regression analysis are powerful and widely used
tools in medical research to understand relationships between
variables, determine the strength and direction of these
relationships, and predict future values.
• However, applying these methods correctly, checking their
assumptions, and carefully interpreting their results are critical to
obtaining clinically meaningful and reliable results, especially
keeping in mind that correlation does not imply causation.
Thank you.
Dr. Ecem YEĞİN

Correlation and Regression
No ratings yet
Correlation and Regression
82 pages
Correlation and Regression: Associate Professor Georgi Iskrov, PHD Department of Social Medicine and Public Health
No ratings yet
Correlation and Regression: Associate Professor Georgi Iskrov, PHD Department of Social Medicine and Public Health
28 pages
Correlation
No ratings yet
Correlation
22 pages
Lecture 7 - Correlation Regression
No ratings yet
Lecture 7 - Correlation Regression
47 pages
Lecture 5.Correlation&Regression
No ratings yet
Lecture 5.Correlation&Regression
42 pages
07 - Correlation and Regression Analysis-1
No ratings yet
07 - Correlation and Regression Analysis-1
13 pages
Regression Correlation
No ratings yet
Regression Correlation
22 pages
Correlation Regression
No ratings yet
Correlation Regression
18 pages
Correlation and Regression Original
No ratings yet
Correlation and Regression Original
44 pages
Business Stats for Students
No ratings yet
Business Stats for Students
66 pages
Correlation and Simple Linear Regression Analyses: Objectives
No ratings yet
Correlation and Simple Linear Regression Analyses: Objectives
6 pages
Biostat Lecture Note 3
No ratings yet
Biostat Lecture Note 3
5 pages
Biostatistics: Correlation & Regression
No ratings yet
Biostatistics: Correlation & Regression
60 pages
صادق 8
No ratings yet
صادق 8
10 pages
Corr PDF
No ratings yet
Corr PDF
30 pages
Correlation and Regression
No ratings yet
Correlation and Regression
4 pages
MBA LSCM: Correlation & Regression
No ratings yet
MBA LSCM: Correlation & Regression
50 pages
Unit III Describing Relationships
No ratings yet
Unit III Describing Relationships
56 pages
Correlation and Regression Analysis Using SPSS
No ratings yet
Correlation and Regression Analysis Using SPSS
102 pages
Correlation Regression
100% (1)
Correlation Regression
55 pages
SOCI1005 - Correlation and Regression
No ratings yet
SOCI1005 - Correlation and Regression
36 pages
Regression
No ratings yet
Regression
12 pages
Correlation & Regression Analysis Guide
No ratings yet
Correlation & Regression Analysis Guide
5 pages
Correlation
No ratings yet
Correlation
34 pages
SolomonAntonioVisuyanTandoyBallartaGumbocAretanoNaive - Ed104 - Pearson R & Simple Regression - April 24, 2021
No ratings yet
SolomonAntonioVisuyanTandoyBallartaGumbocAretanoNaive - Ed104 - Pearson R & Simple Regression - April 24, 2021
13 pages
Captura de Ecrã 2024-10-16 À(s) 13.04.06
No ratings yet
Captura de Ecrã 2024-10-16 À(s) 13.04.06
38 pages
16.. Correlation Analysis - Michael
No ratings yet
16.. Correlation Analysis - Michael
25 pages
Correlation Anad Regression
No ratings yet
Correlation Anad Regression
13 pages
Correlation and Regression Analysis - Updated
No ratings yet
Correlation and Regression Analysis - Updated
49 pages
PBH7003 Tests of Relationships
No ratings yet
PBH7003 Tests of Relationships
68 pages
Biostatistics: Lect 7: Risk Measures DR - Ecem Yeğin
No ratings yet
Biostatistics: Lect 7: Risk Measures DR - Ecem Yeğin
21 pages
Biostatistics: Lect 10: Survival Analysis DR - Ecem Yeğin
No ratings yet
Biostatistics: Lect 10: Survival Analysis DR - Ecem Yeğin
27 pages
Alcohols: Ünvan Ad Soyad Anabilim Dalı
No ratings yet
Alcohols: Ünvan Ad Soyad Anabilim Dalı
26 pages
Pathogenesis of Sepsis: M.D., Clinical Microbiologist, Professor
No ratings yet
Pathogenesis of Sepsis: M.D., Clinical Microbiologist, Professor
48 pages
Medical Ethics: Basic Concepts in Ethics & Changing Paradigms
No ratings yet
Medical Ethics: Basic Concepts in Ethics & Changing Paradigms
26 pages
Abuse and Addiction of Drugs and Substances: Ünvan Ad Soyad Anabilim Dalı
No ratings yet
Abuse and Addiction of Drugs and Substances: Ünvan Ad Soyad Anabilim Dalı
36 pages
Lecturer:: Population and Sample "Research On Language Education 1"
No ratings yet
Lecturer:: Population and Sample "Research On Language Education 1"
12 pages
ECON2331 Assignment 2
No ratings yet
ECON2331 Assignment 2
4 pages
Time Series Notes
No ratings yet
Time Series Notes
7 pages
Mathematics Paper 3 Grade 11 Examplars
No ratings yet
Mathematics Paper 3 Grade 11 Examplars
12 pages
Overfitting Regression
No ratings yet
Overfitting Regression
14 pages
Stat272 All Questions - 210530 - 214904
No ratings yet
Stat272 All Questions - 210530 - 214904
119 pages
Introduction To Econometrics (Pearson Series in Economics (Hardcover) ) 3rd Edition, (Ebook PDF) Sample
No ratings yet
Introduction To Econometrics (Pearson Series in Economics (Hardcover) ) 3rd Edition, (Ebook PDF) Sample
134 pages
(Ebook PDF) Understanding Social Statistics: A Student's Guide To Navigating The Maze Download PDF
100% (11)
(Ebook PDF) Understanding Social Statistics: A Student's Guide To Navigating The Maze Download PDF
46 pages
Jurnal Publikasi Bank BRKS
No ratings yet
Jurnal Publikasi Bank BRKS
7 pages
Untitled
No ratings yet
Untitled
6 pages
Business Econometrics Lecture Notes Quiz Econ2271
No ratings yet
Business Econometrics Lecture Notes Quiz Econ2271
2 pages
Statistics Homework: Hypothesis Testing
No ratings yet
Statistics Homework: Hypothesis Testing
2 pages
Credit Card Fraud Detection Using Machine Learning Techniques
No ratings yet
Credit Card Fraud Detection Using Machine Learning Techniques
9 pages
Interpreting Meta-Analysis Steps
No ratings yet
Interpreting Meta-Analysis Steps
3 pages
Spatial Data Analysis Guide
No ratings yet
Spatial Data Analysis Guide
9 pages
3sensitivity Lecture Note
No ratings yet
3sensitivity Lecture Note
17 pages
Naive Bayes Classifier Guide
No ratings yet
Naive Bayes Classifier Guide
28 pages
Econometric Matching Estimators
No ratings yet
Econometric Matching Estimators
52 pages
Linear Regression and Logistic Regression
No ratings yet
Linear Regression and Logistic Regression
19 pages
CMH-17 Equivalency Tutorial - 2011 Kansas City
No ratings yet
CMH-17 Equivalency Tutorial - 2011 Kansas City
27 pages
Predicting Wins in Baseball
No ratings yet
Predicting Wins in Baseball
7 pages
Prob Assignment
No ratings yet
Prob Assignment
8 pages
Chapter 8 Sampling and Confidence Intervals
No ratings yet
Chapter 8 Sampling and Confidence Intervals
38 pages
AS Binomial Distribution
No ratings yet
AS Binomial Distribution
10 pages
Sample Paper For The Machine Learning Course Ajay Sharma
No ratings yet
Sample Paper For The Machine Learning Course Ajay Sharma
19 pages
Unit 3
No ratings yet
Unit 3
31 pages
Introduction To Machine Learning: ETH Zurich Janik Schuettler Marcel Graetz FS18
No ratings yet
Introduction To Machine Learning: ETH Zurich Janik Schuettler Marcel Graetz FS18
18 pages
Aktu 2022 Math
No ratings yet
Aktu 2022 Math
3 pages
3 Statistical Description of Data
100% (1)
3 Statistical Description of Data
76 pages
Spatial Predictive Modelling With R 1st Edition Jin Li Full
100% (6)
Spatial Predictive Modelling With R 1st Edition Jin Li Full
142 pages