Program for Spearman's Rank Correlation
Last Updated :
23 May, 2023
Prerequisite: Correlation Coefficient
Given two arrays X and Y. Find Spearman's Rank Correlation. In Spearman rank correlation instead of working with the data values themselves (as discussed in Correlation coefficient), it works with the ranks of these values. The observations are first ranked and then these ranks are used in correlation. The Algorithm for this correlation is as follows
Â
Rank each observation in X and store it in Rank_X
Rank each observation in Y and store it in Rank_Y
Obtain Pearson Correlation Coefficient for Rank_X and Rank_Y
The formula used to calculate Pearson's Correlation Coefficient (r or rho) of sets X and Y is as follows:Â
{{\displaystyle r=\frac {n(\sum xy)-\left ( \sum x \right )\left ( \sum y \right )}{\sqrt{[n\sum x^2-(\sum x)^2 ][n\sum y^2 - (\sum y)^2]}}}Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â
Algorithm for calculating Pearson's Coefficient of Sets X and YÂ
Â
function correlationCoefficient(X, Y)
n = X.size
sigma_x = sigma_y = sigma_xy = 0
sigma_xsq = sigma_ysq = 0
for i in 0...N-1
sigma_x = sigma_x + X[i]
sigma_y = sigma_y + Y[i]
sigma_xy = sigma_xy + X[i] * Y[i]
sigma_xsq = sigma_xsq + X[i] * X[i]
sigma_ysq = sigma_ysq + Y[i] * Y[i]
num =( n * sigma_xy - sigma_x * sigma_y)
den = sqrt( [n*sigma_xsq - (sigma_x)^ 2]*[ n*sigma_ysq - (sigma_y) ^ 2] )
return num/den
While assigning ranks, it may encounter ties i.e two or more observations having the same rank. To resolve ties, this will use fractional ranking scheme. In this scheme, if n observations have the same rank then each observation gets a fractional rank given by:Â
fractional_rank = (rank) + (n-1)/2
The next rank that gets assigned is rank + n and not rank + 1. For instance, if the 3 items have same rank r, then each gets fractional_rank as given above. The next rank that can be given to another observation is r + 3. Note that fractional ranks need not be fractions. They are the arithmetic mean of n consecutive ranks ex r, r + 1, r + 2 ... r + n-1.Â
(r + r+1 + r+2 + ... + r+n-1) / n = r + (n-1)/2
Some Examples :Â
Input : X = [15 18 19 20 21]
Y = [25 26 28 27 29]
Solution : Rank_X = [1 2 3 4 5]
Rank_Y = [1 2 4 3 5 ]
sigma_x = 1+2+3+4+5 = 15
sigma_y = 1+2+4+3+5 = 15
sigma_xy = 1*2+2*2+3*4+4*3+5*5 = 54
sigma_xsq = 1*1+2*2+3*3+4*4+5*5 = 55
sigma_ysq = 1*1+2*2+3*3+4*4+5*5 = 55
Substitute values in formula
Coefficient = Pearson(Rank_X, Rank_Y) = 0.9
Input: X = [15 18 21 15 21 ]
Y = [25 25 27 27 27 ]
Solution: Rank_X = [1.5 3 4.5 1.5 4.5]
Rank_Y = [1.5 1.5 4 4 4]
Calculate and substitute values of sigma_x, sigma_y,
sigma_xy, sigma_xsq, sigma_ysq.
Coefficient = Pearson(Rank_X, Rank_Y) = 0.456435
The Algorithm for fractional ranking scheme is given below:
function rankify(X)
N = X.size()
// Vector to store ranks
Rank_X(N)
for i = 0 ... N-1
r = 1 and s = 1
// Count no of smaller elements in 0...i-1
for j = 0...i-1
if X[j] < X[i]
r = r+1
if X[j] == X[i]
s = s+1
// Count no of smaller elements in i+1...N-1
for j = i+1...N-1
if X[j] < X[i]
r = r+1
if X[j] == X[i]
s = s+1
//Assign Fractional Rank
Rank_X[i] = r + (s-1) * 0.5
return Rank_X
Note:Â
There is a direct formula to calculate Spearman's coefficient given by {\displaystyle r_{s}={1-{\frac {6\sum d_{i}^{2}}{n(n^{2}-1)}}}.} However, we need to put in a correction term to resolve each tie and hence this formula has not been discussed. Calculating Spearman's coefficient from the correlation coefficient of ranks is the most general method.Â
Â
A CPP Program to evaluate Spearman's coefficient is given below
C++
// Program to find correlation
// coefficient
#include <iostream>
#include <vector>
#include <cmath>
using namespace std;
typedef vector<float> Vector;
// Utility Function to print
// a Vector
void printVector(const Vector &X)
{
for (auto i: X)
cout << i << " ";
cout << endl;
}
// Function returns the rank vector
// of the set of observations
Vector rankify(Vector & X) {
int N = X.size();
// Rank Vector
Vector Rank_X(N);
for(int i = 0; i < N; i++)
{
int r = 1, s = 1;
// Count no of smaller elements
// in 0 to i-1
for(int j = 0; j < i; j++) {
if (X[j] < X[i] ) r++;
if (X[j] == X[i] ) s++;
}
// Count no of smaller elements
// in i+1 to N-1
for (int j = i+1; j < N; j++) {
if (X[j] < X[i] ) r++;
if (X[j] == X[i] ) s++;
}
// Use Fractional Rank formula
// fractional_rank = r + (n-1)/2
Rank_X[i] = r + (s-1) * 0.5;
}
// Return Rank Vector
return Rank_X;
}
// function that returns
// Pearson correlation coefficient.
float correlationCoefficient
(Vector &X, Vector &Y)
{
int n = X.size();
float sum_X = 0, sum_Y = 0,
sum_XY = 0;
float squareSum_X = 0,
squareSum_Y = 0;
for (int i = 0; i < n; i++)
{
// sum of elements of array X.
sum_X = sum_X + X[i];
// sum of elements of array Y.
sum_Y = sum_Y + Y[i];
// sum of X[i] * Y[i].
sum_XY = sum_XY + X[i] * Y[i];
// sum of square of array elements.
squareSum_X = squareSum_X +
X[i] * X[i];
squareSum_Y = squareSum_Y +
Y[i] * Y[i];
}
// use formula for calculating
// correlation coefficient.
float corr = (float)(n * sum_XY -
sum_X * sum_Y) /
sqrt((n * squareSum_X -
sum_X * sum_X) *
(n * squareSum_Y -
sum_Y * sum_Y));
return corr;
}
// Driver function
int main()
{
Vector X = {15,18,21, 15, 21};
Vector Y= {25,25,27,27,27};
// Get ranks of vector X
Vector rank_x = rankify(X);
// Get ranks of vector y
Vector rank_y = rankify(Y);
cout << "Vector X" << endl;
printVector(X);
// Print rank vector of X
cout << "Rankings of X" << endl;
printVector(rank_x);
// Print Vector Y
cout << "Vector Y" << endl;
printVector(Y);
// Print rank vector of Y
cout << "Rankings of Y" << endl;
printVector(rank_y);
// Print Spearmans coefficient
cout << "Spearman's Rank correlation: "
<< endl;
cout<<correlationCoefficient(rank_x,
rank_y);
return 0;
}
Java
// Java Program to find correlation
// coefficient
import java.util.*;
class GFG
{
// Utility Function to print
// a Vector
static void printVector(ArrayList<Double> X)
{
for (double i : X)
System.out.print(i + " ");
System.out.println();
}
// Function returns the rank vector
// of the set of observations
static ArrayList<Double> rankify(ArrayList<Double> X)
{
int N = X.size();
// Rank Vector
ArrayList<Double> Rank_X = new ArrayList<Double>();
for (int i = 0; i < N; i++) {
Rank_X.add(0d);
int r = 1, s = 1;
// Count no of smaller elements
// in 0 to i-1
for (int j = 0; j < i; j++) {
if (X.get(j) < X.get(i))
r++;
if (X.get(j) == X.get(i))
s++;
}
// Count no of smaller elements
// in i+1 to N-1
for (int j = i + 1; j < N; j++) {
if (X.get(j) < X.get(i))
r++;
if (X.get(j) == X.get(i))
s++;
}
// Use Fractional Rank formula
// fractional_rank = r + (n-1)/2
Rank_X.set(i, (r + (s - 1) * 0.5));
}
// Return Rank Vector
return Rank_X;
}
// function that returns
// Pearson correlation coefficient.
static double
correlationCoefficient(ArrayList<Double> X,
ArrayList<Double> Y)
{
int n = X.size();
double sum_X = 0, sum_Y = 0, sum_XY = 0;
double squareSum_X = 0, squareSum_Y = 0;
for (int i = 0; i < n; i++) {
// sum of elements of array X.
sum_X = sum_X + X.get(i);
// sum of elements of array Y.
sum_Y = sum_Y + Y.get(i);
// sum of X[i] * Y[i].
sum_XY = sum_XY + X.get(i) * Y.get(i);
// sum of square of array elements.
squareSum_X = squareSum_X + X.get(i) * X.get(i);
squareSum_Y = squareSum_Y + Y.get(i) * Y.get(i);
}
// use formula for calculating
// correlation coefficient.
double corr
= (n * sum_XY - sum_X * sum_Y)
/ Math.sqrt(
(n * squareSum_X - sum_X * sum_X)
* (n * squareSum_Y - sum_Y * sum_Y));
return corr;
}
// Driver function
public static void main(String[] args)
{
ArrayList<Double> X = new ArrayList<Double>(
Arrays.asList(15d, 18d, 21d, 15d, 21d));
ArrayList<Double> Y = new ArrayList<Double>(
Arrays.asList(25d, 25d, 27d, 27d, 27d));
// Get ranks of vector X
ArrayList<Double> rank_x = rankify(X);
// Get ranks of vector y
ArrayList<Double> rank_y = rankify(Y);
System.out.println("Vector X");
printVector(X);
// Print rank vector of X
System.out.println("Rankings of X");
printVector(rank_x);
// Print Vector Y
System.out.println("Vector Y");
printVector(Y);
// Print rank vector of Y
System.out.println("Rankings of Y");
printVector(rank_y);
// Print Spearmans coefficient
System.out.println("Spearman's Rank correlation: ");
System.out.println(
correlationCoefficient(rank_x, rank_y));
}
}
// This code is contributed by phasing17
Python3
# Python3 Program to find correlation coefficient
# Utility Function to print
# a Vector
def printVector(X):
print(*X)
# Function returns the rank vector
# of the set of observations
def rankify(X):
N = len(X)
# Rank Vector
Rank_X = [None for _ in range(N)]
for i in range(N):
r = 1
s = 1
# Count no of smaller elements
# in 0 to i-1
for j in range(i):
if (X[j] < X[i]):
r += 1
if (X[j] == X[i]):
s += 1
# Count no of smaller elements
# in i+1 to N-1
for j in range(i+1, N):
if (X[j] < X[i]):
r += 1
if (X[j] == X[i]):
s += 1
# Use Fractional Rank formula
# fractional_rank = r + (n-1)/2
Rank_X[i] = r + (s-1) * 0.5
# Return Rank Vector
return Rank_X
# function that returns
# Pearson correlation coefficient.
def correlationCoefficient(X, Y):
n = len(X)
sum_X = 0
sum_Y = 0
sum_XY = 0
squareSum_X = 0
squareSum_Y = 0
for i in range(n):
# sum of elements of array X.
sum_X = sum_X + X[i]
# sum of elements of array Y.
sum_Y = sum_Y + Y[i]
# sum of X[i] * Y[i].
sum_XY = sum_XY + X[i] * Y[i]
# sum of square of array elements.
squareSum_X = squareSum_X + X[i] * X[i]
squareSum_Y = squareSum_Y + Y[i] * Y[i]
# use formula for calculating
# correlation coefficient.
corr = (n * sum_XY - sum_X * sum_Y) / ((n * squareSum_X -
sum_X * sum_X) * (n * squareSum_Y - sum_Y * sum_Y)) ** 0.5
return corr
# Driver function
X = [15, 18, 21, 15, 21]
Y = [25, 25, 27, 27, 27]
# Get ranks of vector X
rank_x = rankify(X)
# Get ranks of vector y
rank_y = rankify(Y)
print("Vector X")
printVector(X)
# Print rank vector of X
print("Rankings of X")
printVector(rank_x)
# Print Vector Y
print("Vector Y")
printVector(Y)
# Print rank vector of Y
print("Rankings of Y")
printVector(rank_y)
# Print Spearmans coefficient
print("Spearman's Rank correlation: ")
print(correlationCoefficient(rank_x, rank_y))
# This code is contributed by phasing17
C#
// Program to find correlation
// coefficient
using System;
using System.Collections.Generic;
class GFG {
// Utility Function to print
// a Vector
static void printVector(List<double> X)
{
foreach(var i in X) Console.Write(i + " ");
Console.WriteLine();
}
// Function returns the rank vector
// of the set of observations
static List<double> rankify(List<double> X)
{
int N = X.Count;
// Rank Vector
List<double> Rank_X = new List<double>();
for (int i = 0; i < N; i++) {
Rank_X.Add(0);
int r = 1, s = 1;
// Count no of smaller elements
// in 0 to i-1
for (int j = 0; j < i; j++) {
if (X[j] < X[i])
r++;
if (X[j] == X[i])
s++;
}
// Count no of smaller elements
// in i+1 to N-1
for (int j = i + 1; j < N; j++) {
if (X[j] < X[i])
r++;
if (X[j] == X[i])
s++;
}
// Use Fractional Rank formula
// fractional_rank = r + (n-1)/2
Rank_X[i] = (r + (s - 1) * 0.5);
}
// Return Rank Vector
return Rank_X;
}
// function that returns
// Pearson correlation coefficient.
static double correlationCoefficient(List<double> X,
List<double> Y)
{
int n = X.Count;
double sum_X = 0, sum_Y = 0, sum_XY = 0;
double squareSum_X = 0, squareSum_Y = 0;
for (int i = 0; i < n; i++) {
// sum of elements of array X.
sum_X = sum_X + X[i];
// sum of elements of array Y.
sum_Y = sum_Y + Y[i];
// sum of X[i] * Y[i].
sum_XY = sum_XY + X[i] * Y[i];
// sum of square of array elements.
squareSum_X = squareSum_X + X[i] * X[i];
squareSum_Y = squareSum_Y + Y[i] * Y[i];
}
// use formula for calculating
// correlation coefficient.
double corr
= (n * sum_XY - sum_X * sum_Y)
/ Math.Sqrt(
(n * squareSum_X - sum_X * sum_X)
* (n * squareSum_Y - sum_Y * sum_Y));
return corr;
}
// Driver function
public static void Main(string[] args)
{
List<double> X = new List<double>(
new double[] { 15, 18, 21, 15, 21 });
List<double> Y = new List<double>(
new double[] { 25, 25, 27, 27, 27 });
// Get ranks of vector X
List<double> rank_x = rankify(X);
// Get ranks of vector y
List<double> rank_y = rankify(Y);
Console.WriteLine("Vector X");
printVector(X);
// Print rank vector of X
Console.WriteLine("Rankings of X");
printVector(rank_x);
// Print Vector Y
Console.WriteLine("Vector Y");
printVector(Y);
// Print rank vector of Y
Console.WriteLine("Rankings of Y");
printVector(rank_y);
// Print Spearmans coefficient
Console.WriteLine("Spearman's Rank correlation: ");
Console.WriteLine(
correlationCoefficient(rank_x, rank_y));
}
}
// This code is contributed by phasing17
JavaScript
// Program to find correlation
// coefficient
// Utility Function to print
// a Vector
function printVector(X)
{
for (var i of X)
process.stdout.write(i + " ");
process.stdout.write("\n");
}
// Function returns the rank vector
// of the set of observations
function rankify(X) {
let N = X.length;
// Rank Vector
let Rank_X = new Array(N);
for(var i = 0; i < N; i++)
{
var r = 1, s = 1;
// Count no of smaller elements
// in 0 to i-1
for(var j = 0; j < i; j++) {
if (X[j] < X[i] ) r++;
if (X[j] == X[i] ) s++;
}
// Count no of smaller elements
// in i+1 to N-1
for (var j = i+1; j < N; j++) {
if (X[j] < X[i] ) r++;
if (X[j] == X[i] ) s++;
}
// Use Fractional Rank formula
// fractional_rank = r + (n-1)/2
Rank_X[i] = r + (s-1) * 0.5;
}
// Return Rank Vector
return Rank_X;
}
// function that returns
// Pearson correlation coefficient.
function correlationCoefficient
(X, Y)
{
let n = X.length;
let sum_X = 0, sum_Y = 0,
sum_XY = 0;
let squareSum_X = 0,
squareSum_Y = 0;
for (var i = 0; i < n; i++)
{
// sum of elements of array X.
sum_X = sum_X + X[i];
// sum of elements of array Y.
sum_Y = sum_Y + Y[i];
// sum of X[i] * Y[i].
sum_XY = sum_XY + X[i] * Y[i];
// sum of square of array elements.
squareSum_X = squareSum_X +
X[i] * X[i];
squareSum_Y = squareSum_Y +
Y[i] * Y[i];
}
// use formula for calculating
// correlation coefficient.
let corr = (n * sum_XY -
sum_X * sum_Y) /
Math.sqrt((n * squareSum_X -
sum_X * sum_X) *
(n * squareSum_Y -
sum_Y * sum_Y));
return corr;
}
// Driver function
let X = [15,18,21, 15, 21];
let Y= [25,25,27,27,27];
// Get ranks of vector X
let rank_x = rankify(X);
// Get ranks of vector y
let rank_y = rankify(Y);
console.log("Vector X");
printVector(X);
// Print rank vector of X
console.log("Rankings of X");
printVector(rank_x);
// Print Vector Y
console.log("Vector Y");
printVector(Y);
// Print rank vector of Y
console.log("Rankings of Y");
printVector(rank_y);
// Print Spearmans coefficient
console.log("Spearman's Rank correlation: ");
console.log(correlationCoefficient(rank_x,
rank_y));
// This code is contributed by phasing17
OutputVector X
15 18 21 15 21
Rankings of X
1.5 3 4.5 1.5 4.5
Vector Y
25 25 27 27 27
Rankings of Y
1.5 1.5 4 4 4
Spearman's Rank correlation:
0.456435
Time Complexity: O(N*N)
Auxiliary Space: O(N)
Python code to calculate Spearman's Correlation using Scipy LibraryÂ
We can use scipy to calculate Spearman's correlation coefficient. Scipy is one of the most used python library for mathematical calculations.
Python3
from scipy.stats import spearmanr
# sample data
x = [1, 2, 3, 4, 5]
y = [5, 4, 3, 2, 1]
# calculate Spearman's correlation coefficient and p-value
corr, pval = spearmanr(x, y)
# print the result
print("Spearman's correlation coefficient:", corr)
print("p-value:", pval)
Output:Â Â
Spearman's correlation coefficient: -0.9999999999999999
p-value: 1.4042654220543672e-24
Similar Reads
Spearman's Rank Correlation Correlation measures the strength of the association between two variables. For instance, if we are interested in knowing whether there is a relationship between the heights of fathers and sons, a correlation coefficient can be calculated to answer this question. To learn more about correlation, ple
8 min read
Program to Find Correlation Coefficient The correlation coefficient is a statistical measure that helps determine the strength and direction of the relationship between two variables. It quantifies how changes in one variable correspond to changes in another. This coefficient, sometimes referred to as the cross-correlation coefficient, al
8 min read
Program to calculate Percentile of a student based on rank Given the rank of a student and the total number of students appearing in an examination, the task is to find the percentile of the student. The percentile of a student is the % of the number of students having marks less than him/her. Examples: Input: Rank: 805, Total Number of Students Appeared: 9
3 min read
Python | Kendall Rank Correlation Coefficient What is correlation test? The strength of the association between two variables is known as the correlation test. For instance, if we are interested to know whether there is a relationship between the heights of fathers and sons, a correlation coefficient can be calculated to answer this question. F
3 min read
Lexicographic rank of a string using STL You are given a string, find its rank among all its permutations sorted lexicographically. Examples:Input : str[] = "acb"Output : Rank = 2Input : str[] = "string"Output : Rank = 598Input : str[] = "cba"Output : Rank = 6We have already discussed solutions to find Lexicographic rank of string In this
4 min read
Lexicographic rank of a string among all its substrings Given string str, the task is to find the rank of the given string among all its substrings arranged lexicographically. Examples: Input: S = "enren"Output: 7Explanation:All the possible substrings in the sorted order are {"e", "e", "en", "en", "enr", "enre", "enren", "n", "n", "nr", "nre", "nren", "
10 min read