Correlation Coefficient Formula: The correlation coefficient is a statistical measure used to quantify the relationship between predicted and observed values in a statistical analysis. It provides insight into the degree of precision between these predicted and actual values.
Correlation coefficients are used to calculate how vital a connection is between two variables. There are different types of correlation coefficients, one of the most popular is Pearson's correlation (also known as Pearson's R)which is commonly used in linear regression.
In this article, learn about the correlation coefficient formula, along with what is correlation, its types, examples, and problems.
What is Correlation?
Correlation is a statistical measure that describes the extent to which two variables are related to each other. It quantifies the direction and strength of the linear relationship between variables. Generally, a correlation between any two variables is of three types that include:
- Positive Correlation
- Zero Correlation
- Negative Correlation
CorrelationCorrelation Coefficient Definition
A statistical measure that quantifies the strength and direction of the linear relationship between two variables is called the Correlation coefficient. Generally, it is denoted by the symbol 'r' and ranges from -1 to 1.
Correlation coefficient procedure is used to determine how strong a relationship is between the data. The correlation coefficient procedure yields a value between 1 and -1. In which,
- -1 indicates a strong negative relationship
- 1 indicates strong positive relationships
- Zero implies no connection at all
Understanding Correlation Coefficient
- Correlation coefficient of -1 means there is a negative decrease of a fixed proportion, for every positive increase in one variable. Like, the amount of gas in a tank decreases in a perfect correlation with the speed.
- Correlation coefficient of 1 means there is a positive increase of a fixed proportion of others, for every positive increase in one variable. Like, the size of the shoe goes up in perfect correlation with foot length.
- Correlation coefficient of 0 means that for every increase, there is neither a positive nor a negative increase. The two just aren't related.
Various types of Correlation Coeeficient are:
Pearson's Correlation Coefficient Formula is added below:
R~=~\frac{n(∑xy) - (∑x)(∑y)}{\sqrt{[n∑x²-(∑x)²][n∑y²-(∑y)²}}
Sample Correlation Coefficient Formula is added below:
r_{xy}~=~Cov(x,y) / s_x.s_y
where,
- Sxy is Covariance of Sample
- Sx and Sy are Standard Deviations of Sample
Population Correlation Coefficient Formula is added below:
?xy = σxy/σx.σy
where,
- σx and σy are Populatin Standard Deviation
- σxy is Population Covariance
Pearson's Correlation
It is the most common correlation in statistics. The full name is Pearson's Product Moment Correlation in short PPMC. It displays the Linear relation between the two sets of data. Two letters are used to represent the Pearson correlation
Greek Letter "rho (ρ)" for a population and the letter “r” for a sample correlation coefficient.
How to Find Pearson's Correlation Coefficient?
Follow the steps added below to find the Pearson's Correlation Coefficient of any given data set
Step 1: Firstly make a chart with the given data like subject,x, and y and add three more columns in it xy, x² and y².
Step 2: Now multiply the x and y columns to fill the xy column. For example:- in x we have 24 and in y we have 65 so xy will be 24×65=1560.
Step 3: Now, take the square of the numbers in the x column and fill the x² column.
Step 4: Now, take the square of the numbers in the y column and fill the y² column.
Step 5: Now, add up all the values in the columns and put the result at the bottom. Greek letter sigma (Σ) is the short way of saying summation.
Step 6: Now, use the formula for Pearson's correlation coefficient:
R= \frac{n(∑xy) - (∑x)(∑y)}{\sqrt{[n∑x²-(∑x)²][n∑y²-(∑y)²}}
To know which type of variable we have either positive or negative.
Linear Correlation Coefficient
The Pearson's correlation coefficient is the linear correlation coefficient which returns the value between the -1 and +1. In this -1 indicates a strong negative correlation and +1 indicates a strong positive correlation. If it lies 0 then there is no correlation. This is also known as zero correlation.
The "crude estimations” for analyzing the stability of correlations using Pearson’s Correlation:
r Value | Crude Estimates |
---|
+.70 or higher | A very strong positive relationship |
+.40 to +.69 | Strong positive relationship |
+.30 to +.39. | Moderate positive relationship |
+.20 to +.29 | Weak positive relationship |
+.01 to +.19 | No or negligible relationship |
0 | No relationship [zero correlation] |
-.01 to -.19 | No or negligible relationship |
-.20 to -.29 | Weak negative relationship |
-.30 to -.39 | Moderate negative relationship |
-.40 to -.69 | Strong negative relationship |
-.70 or higher | The very strong negative relationship |
Cramer’s V Correlation
It is as similar as the Pearson correlation coefficient. It is used to calculate the correlation with more than 2×2 rows and columns. Cramer's V correlation varies between 0 and 1. The value close to zero associates that a very little association is there between the variables and if it's close to 1 it indicates a very strong association.
The "crude estimates” for interpreting strengths of correlations using Cramer's V Correlation:
Cramer’s V | Crude Estimates |
---|
.25 or higher | Very strong relationship |
.15 to .25 | Strong relationship |
.11 to .15 | Moderate relationship |
.06 to .10 | Weak relationship |
.01 to .05 | No or negligible relationship |
Problem 1: Calculate the correlation coefficient from the following table:
SUBJECT | AGE (X) | GLUCOSE LEVEL (Y) |
---|
1 | 42 | 98 |
2 | 23 | 68 |
3 | 22 | 73 |
4 | 47 | 79 |
5 | 50 | 88 |
6 | 60 | 82 |
Solution:
Make a table from the given data and add three more columns of XY, X², and Y².
SUBJECT | AGE (X) | GLUCOSE LEVEL (Y) | XY | X² | Y² |
---|
1 | 42 | 98 | 4116 | 1764 | 9604 |
2 | 23 | 68 | 1564 | 529 | 4624 |
3 | 22 | 73 | 1606 | 484 | 5329 |
4 | 47 | 79 | 3713 | 2209 | 6241 |
5 | 50 | 88 | 4400 | 2500 | 7744 |
6 | 60 | 82 | 4980 | 3600 | 6724 |
∑ | 244 | 488 | 20379 | 11086 | 40266 |
∑xy = 20379
∑x = 244
∑y = 488
∑x² = 11086
∑y² = 40266
n = 6.
Put all the values in the Pearson's correlation coefficient formula:
R= \frac{n(∑xy) - (∑x)(∑y)}{\sqrt{[n∑x²-(∑x)²][n∑y²-(∑y)²}}
R = 6(20379) - (244)(488) / √[6(11086)-(244)²][6(40266)-(488)² ]
R = 3202 / √[6980][3452]
R = 3202/4972.238
R = 0.6439
It shows that the relationship between the variables of the data is a strong positive relationship.
Problem 2: Calculate the correlation coefficient from the following table:
SUBJECT | AGE (X) | Weight (Y) |
---|
1 | 40 | 99 |
2 | 25 | 79 |
3 | 22 | 69 |
4 | 54 | 89 |
Solution:
Make a table from the given data and add three more columns of XY, X², and Y².
SUBJECT | AGE (X) | Weight (Y) | XY | X² | Y² |
---|
1 | 40 | 99 | 3960 | 1600 | 9801 |
2 | 25 | 79 | 1975 | 625 | 6241 |
3 | 22 | 69 | 1518 | 484 | 4761 |
4 | 54 | 89 | 4806 | 2916 | 7921 |
∑ | 151 | 336 | 12259 | 5625 | 28724 |
∑xy = 12258
∑x = 151
∑y = 336
∑x² = 5625
∑y² 28724
n = 4
Put all the values in the Pearson's correlation coefficient formula:
R= \frac{n(∑xy) - (∑x)(∑y)}{\sqrt{[n∑x²-(∑x)²][n∑y²-(∑y)²}}
R = 4(12258) - (151)(336) / √[4(5625)-(151)²][4(28724)-(336)²]
R = -1704 / √[-301][-2000]
R=-1704/775.886
R=-2.1961
It shows that the relationship between the variables of the data is a very strong negative relationship.
Problem 3: Calculate the correlation coefficient for the following data:
X = 7,9,14 and Y = 17,19,21
Solution:
Given variables are,
X = 7,9,14
and,
Y = 17,19,21
To, find the correlation coefficient of the following variables Firstly a table is to be constructed as follows, to get the values required in the formula.
X | Y | XY | X² | Y² |
---|
7 | 17 | 119 | 49 | 36 |
9 | 19 | 171 | 81 | 361 |
14 | 21 | 294 | 196 | 441 |
∑ 30 | ∑ 57 | ∑ 584 | ∑ 326 | ∑ 838 |
∑xy = 584
∑x = 30
∑y = 57
∑x² = 326
∑y² = 838
n = 3
Put all the values in the Pearson's correlation coefficient formula:
R= \frac{n(∑xy) - (∑x)(∑y)}{\sqrt{[n∑x²-(∑x)²][n∑y²-(∑y)²}}
R = 3(584) - (30)(57) / √[3(326)-(30)²][3(838)-(57)²]
R = 42 / √[78][-735]
R = 42/-239.43
R = -0.1754
It shows that the relationship between the variables of the data is negligible relationship
Problem 4: Calculate the correlation coefficient for the following data:
X = 21, 31, 25, 40, 47, 38 and Y = 70,55,60,78,66,80
Solution:
Given variables are,
X = 21,31,25,40,47,38
And,
Y = 70,55,60,78,66,80
To, find the correlation coefficient of the following variables Firstly a table is to be constructed as follows, to get the values required in the formula.
X | Y | XY | X² | Y² |
---|
21 | 70 | 1470 | 441 | 4900 |
31 | 55 | 1705 | 961 | 3025 |
25 | 60 | 1500 | 625 | 3600 |
40 | 78 | 3120 | 1600 | 6094 |
47 | 66 | 3102 | 2209 | 4356 |
38 | 80 | 3040 | 1444 | 6400 |
∑202 | ∑409 | ∑13937 | ∑7280 | ∑28265 |
∑xy = 13937
∑x = 202
∑y = 409
∑x² = 7280
∑y² = 28265
n = 6
Put all the values in the Pearson's correlation coefficient formula:
R= \frac{n(∑xy) - (∑x)(∑y)}{\sqrt{[n∑x²-(∑x)²][n∑y²-(∑y)²}}
R = 6(13937) - (202)(409) / √[6(7280) - (202)²][6(28265) - (409)²]
R = 1004 /√[2876][2909]
R = 1004 / 2892.452938
R = 0.3471
It shows that the relationship between the variables of the data is a moderate positive relationship.
Problem 5: Calculate the correlation coefficient for the following data?
X = 5 ,9 ,14, 16 and Y = 6, 10, 16, 20 .
Solution:
Given variables are,
X = 5 ,9 ,14, 16
And
Y = 6, 10, 16, 20.
To, find the correlation coefficient of the following variables Firstly a table is to be constructed as follows, to get the values required in the formula add all the values in the columns to get the values used in the formula
X | Y | XY | X² | Y² |
---|
5 | 6 | 30 | 25 | 36 |
9 | 10 | 90 | 81 | 100 |
14 | 16 | 224 | 196 | 256 |
16 | 20 | 320 | 256 | 400 |
∑44 | ∑52 | ∑664 | ∑558 | ∑792 |
∑xy = 664
∑x = 44
∑y = 52
∑x² = 558
∑y² = 792
n = 4
Put all the values in the Pearson's correlation coefficient formula:
R= n(∑xy) - (∑x)(∑y) / √[n∑x²-(∑x)²][n∑y²-(∑y)²
R = 4(664) - (44)(52) / √[4(558) - (44)²][4(792) - (52)²]
R = 368 / √[296][464]
R = 368/370.599
R = 0.9930
It shows that the relationship between the variables of the data is a very strong positive relationship.
Problem 6: Calculate the correlation coefficient for the following data:
X = 10, 13, 15 ,17 ,19 and Y = 5,10,15,20,25.
Solution:
Given variables are,
X = 10, 13, 15 ,17 ,19 and Y = 5, 10, 15, 20, 25.
To, find the correlation coefficient of the following variables Firstly a table is to be constructed as follows, to get the values required in the formula also add all the values in the columns to get the values used in formula,
X | Y | XY | X² | Y² |
---|
10 | 5 | 50 | 100 | 25 |
13 | 10 | 130 | 169 | 100 |
15 | 15 | 225 | 225 | 225 |
17 | 20 | 340 | 340 | 400 |
19 | 25 | 475 | 475 | 625 |
∑74 | ∑75 | ∑1103 | ∑1144 | ∑1375 |
∑xy = 1103
∑x = 74
∑y = 75
∑x² = 1144
∑y² = 1375
n = 5
Put all the values in the Pearson's correlation coefficient formula:
R= \frac{n(∑xy) - (∑x)(∑y)}{\sqrt{[n∑x²-(∑x)²][n∑y²-(∑y)²}}
R = 5(1103) - (74)(75) / √ [5(1144) - (74)²][5(1375) - (75)²]
R = -35 / √[244][1250]
R = -35/552.26
R = 0.0633
It shows that the relationship between the variables of the data is a negligible relationship.
Problems 7: Calculate the correlation coefficient for the following data:
X = 12, 10, 42, 27, 35, 56 and Y = 13, 15, 56, 34, 65, 26
Solution:
Given variables are,
X = 12, 10, 42, 27, 35, 56 and Y = 13, 15, 56, 34, 65, 26
To, find the correlation coefficient of the following variables Firstly a table is to be constructed as follows, to get the values required in the formula also add all the values in the columns to get the values used in the formula
X | Y | XY | X² | Y² |
---|
12 | 13 | 156 | 144 | 169 |
10 | 15 | 150 | 100 | 225 |
42 | 56 | 2352 | 1764 | 3136 |
27 | 34 | 918 | 729 | 1156 |
35 | 65 | 2275 | 1225 | 4225 |
56 | 26 | 1456 | 3136 | 676 |
∑182 | ∑209 | ∑7307 | ∑7098 | ∑9587 |
∑xy = 7307
∑x = 182
∑y = 209
∑x² = 7098
∑y² = 9587
n = 6
Put all the values in the Pearson's correlation coefficient formula:
R= \frac{n(∑xy) - (∑x)(∑y)}{\sqrt{[n∑x²-(∑x)²][n∑y²-(∑y)²}}
R = 6(7307) - (182)(209) / √ {[6(7098) - (182)²][6(9587)-(209)²]}
R = 5804 / √[9464][13841]
R = 5804/11445.139
R = 0.5071
It shows that the relationship between the variables of the data is a strong positive relationship.
The correlation coefficient serves as a statistical tool to assess the relationship between two variables in a dataset. Represented by the symbol rrr, its value ranges from -1 to 1, indicating the strength and direction of the linear association. A correlation of 1 signifies a perfect positive linear relationship, while -1 indicates a perfect negative linear relationship. A value of 0 implies no linear relationship. The formula to calculate the correlation coefficient involves the number of data points, the sum of products of corresponding values of the variables, and their sums and squares. This coefficient aids in understanding the extent to which one variable can predict the other, providing valuable insights in various fields including economics, social sciences, and engineering.
Similar Reads
Non-linear Components In electrical circuits, Non-linear Components are electronic devices that need an external power source to operate actively. Non-Linear Components are those that are changed with respect to the voltage and current. Elements that do not follow ohm's law are called Non-linear Components. Non-linear Co
11 min read
Spring Boot Tutorial Spring Boot is a Java framework that makes it easier to create and run Java applications. It simplifies the configuration and setup process, allowing developers to focus more on writing code for their applications. This Spring Boot Tutorial is a comprehensive guide that covers both basic and advance
10 min read
Class Diagram | Unified Modeling Language (UML) A UML class diagram is a visual tool that represents the structure of a system by showing its classes, attributes, methods, and the relationships between them. It helps everyone involved in a projectâlike developers and designersâunderstand how the system is organized and how its components interact
12 min read
3-Phase Inverter An inverter is a fundamental electrical device designed primarily for the conversion of direct current into alternating current . This versatile device , also known as a variable frequency drive , plays a vital role in a wide range of applications , including variable frequency drives and high power
13 min read
Backpropagation in Neural Network Back Propagation is also known as "Backward Propagation of Errors" is a method used to train neural network . Its goal is to reduce the difference between the modelâs predicted output and the actual output by adjusting the weights and biases in the network.It works iteratively to adjust weights and
9 min read
What is Vacuum Circuit Breaker? A vacuum circuit breaker is a type of breaker that utilizes a vacuum as the medium to extinguish electrical arcs. Within this circuit breaker, there is a vacuum interrupter that houses the stationary and mobile contacts in a permanently sealed enclosure. When the contacts are separated in a high vac
13 min read
Polymorphism in Java Polymorphism in Java is one of the core concepts in object-oriented programming (OOP) that allows objects to behave differently based on their specific class type. The word polymorphism means having many forms, and it comes from the Greek words poly (many) and morph (forms), this means one entity ca
7 min read
CTE in SQL In SQL, a Common Table Expression (CTE) is an essential tool for simplifying complex queries and making them more readable. By defining temporary result sets that can be referenced multiple times, a CTE in SQL allows developers to break down complicated logic into manageable parts. CTEs help with hi
6 min read
Python Variables In Python, variables are used to store data that can be referenced and manipulated during program execution. A variable is essentially a name that is assigned to a value. Unlike many other programming languages, Python variables do not require explicit declaration of type. The type of the variable i
6 min read
Spring Boot Interview Questions and Answers Spring Boot is a Java-based framework used to develop stand-alone, production-ready applications with minimal configuration. Introduced by Pivotal in 2014, it simplifies the development of Spring applications by offering embedded servers, auto-configuration, and fast startup. Many top companies, inc
15+ min read