2 Test of Independence - I
Dr. A. Ramesh
DEPARTMENT OF MANAGEMENT STUDIES
1
Agenda
• To understand 2 Test of Independence
2
2 Test of Independence
• It is used to analyze the frequencies of two variables with multiple
categories to determine whether the two variables are independent.
• Qualitative Variables
• Nominal Data
3
2 Test of Independence: Investment Example
• In which region of the country do you reside?
A. Northeast B. Midwest C. South D. West
• Which type of financial investment are you most likely to make today?
E. Stocks F. Bonds G. Treasury bills
Type of financial
Investment
Contingency Table
E F G
A O13 nA
Geographic B nB
Region C nC
D nD
nE nF nG N
4
2 Test of Independence: Investment Example
e AF
= N P( A F )
n n n n
If A and F are independent, P( A) = A
P( F ) = F
= N A F
N N N N
P( A F) = P( A) P( F ) n n
P( A F ) = A F
n n
N N = A F
N
Type of Financial
Contingency Table Investment
E F G
A e12 nA
Geographic B nB
Region C nC
D nD
nE nF nG N
5
2 Test of Independence: Formulas
e =
ij
(n ) (n)j
i j
N
Expected where : i = the row
Frequencies j = the column
ni = the total of row i
nj = the total of column j
N = the total of all frequencies
6
2 Test of Independence: Formulas
( f o − f e)
2
Calculated
2
=
(Observed ) f
where : df = (r - 1)(c - 1)
e
r = the numberr of rows
c = the numberr of columns
7
Example for Independence
8
2 Test of Independence
Ho : Type of gasoline is
independent of income
Ha : Type of gasoline is not
independent of income
9
2 Test of Independence
Type of
Gasoline
r=4 c=3 Extra
Income Regular Premium Premium
Less than $30,000
$30,000 to $49,999
$50,000 to $99,000
At least $100,000
10
2 Test of Independence: Gasoline Preference Versus
Income Category
=.01
df = ( r − 1)( c − 1)
= ( 4 − 1)( 3 − 1)
=6
2
.01, 6
= 16.812
If 2
Cal
16.812, reject Ho.
If 2
Cal
16.812, do not reject Ho.
11
Python code
12
Gasoline Preference Versus Income Category:
Observed Frequencies
Type of
Gasoline
Extra
Income Regular Premium Premium
Less than $30,000 85 16 6 107
$30,000 to $49,999 102 27 13 142
$50,000 to $99,000 36 22 15 73
At least $100,000 15 23 25 63
238 88 59 385
13
Gasoline Preference Versus Income Category: Expected
Frequencies
e =
ij
(n )
N
(ni
) j
Type of
Gasoline Extra
=
(107 )(238 ) Income Regular Premium Premium
e11 385 Less than $30,000 (66.15) (24.46) (16.40)
= 66.15 85 16 6 107
(107 )(88 ) $30,000 to $49,999 (87.78) (32.46) (21.76)
e12 = 385
102 27 13 142
$50,000 to $99,000 (45.13) (16.69) (11.19)
= 24 .46 36 22 15 73
(107 )(59) At least $100,000 (38.95) (14.40) (9.65)
e 13 = 385 15 23 25 63
= 16.40 238 88 59 385
14
Gasoline Preference Versus Income Category: 2
Calculation
(f o −f f e)
2
=
2
(85 −6666 .15 ) + (16 −2424 .46) + (6 −16.40) +
2 2 2
= .15 .46 16.40
(102 87
− 87.78) + (27 −3232 .46) + (13 − 21.76) +
2 2 2
.78 .46 21.76
(36 −454513 . )+ (22 −1616 .69 ) + (15 −1119. )+
2 2 2
.13 .69 11.19
(15 −3838 .95) + (23 −1414 .40) + (25 − 9.65)
2 2 2
.95 .40 9.65
= 7075
.
15
Gasoline Preference Versus Income Category:
Conclusion
df = 6
0.01
Non rejection
region
16.812
2
= 70.78 16.812, reject Ho.
Cal
16
Contingency Tables
Contingency Tables
• Useful in situations involving multiple population proportions
• Used to classify sample observations according to two or more
characteristics
• Also called a cross-classification table.
17
Contingency Table Example
Hand Preference vs. Gender
Dominant Hand: Left vs. Right
Gender: Male vs. Female
• 2 categories for each variable, so the table is called a 2 x 2 table
• Suppose we examine a sample of 300 college students
18
Contingency Table Example
Sample results organized in a contingency table:
Gender
sample size = n = 300:
Hand
120 Females, 12 were Preference
Female Male
left handed
Left 12 24 36
180 Males, 24 were
left handed Right 108 156 264
120 180 300
19
Contingency Table Example
H0: π1 = π2 (Proportion of females who are left handed is equal to the
proportion of males who are left handed)
H1: π1 ≠ π2 (The two proportions are not the same Hand preference is
not independent of gender)
• If H0 is true, then the proportion of left-handed females should be the
same as the proportion of left-handed males.
• The two proportions above should be the same as the proportion of left-
handed people overall.
20
The Chi-Square Test Statistic
The Chi-square test statistic is:
(f − f ) 2
χ2 = o e
all cells fe
where:
fo = observed frequency in a particular cell
fe = expected frequency in a particular cell if H0 is true
2 for the 2 x 2 case has 1 degree of freedom
Assumed: each cell in the contingency table has expected frequency of at least 5
21
The Chi-Square Test Statistic
The 2 test statistic approximately follows a chi-square
distribution with one degree of freedom
Decision Rule:
If 2 > 2U, reject H0,
otherwise, do not reject
H0
0 Do not Reject H0
reject H0 2U
22
Observed vs. Expected Frequencies
Gender
Hand
Female Male
Preference
Observed = 12 Observed = 24
Left 36
Expected = 14.4 Expected = 21.6
Observed = 108 Observed = 156
Right 264
Expected = 105.6 Expected = 158.4
120 180 300
The Chi-Square Test Statistic
Gender
Hand
Female Male
Preference
Observed = 12 Observed = 24
Left 36
Expected = 14.4 Expected = 21.6
Observed = 108 Observed = 156
Right 264
Expected = 105.6 Expected = 158.4
120 180 300
The test statistic is:
( fo − fe )2
2 =
all cells fe
(12 − 14.4) 2 (108 − 105.6) 2 ( 24 − 21.6) 2 (156 − 158.4) 2
= + + + = 0.7576
14.4 105.6 21.6 158.4
24
The Chi-Square Test Statistic
The test statistic is 2 = 0.7576 , U2 with 1 d.f. = 3.841
Decision Rule:
If 2 > 3.841, reject H0, otherwise, do not
reject H0
Here,
2 = 0..7576 < 2U = 3.841,
=.05
so you do not reject H0 and
conclude that there is
0 Do not Reject H0
insufficient evidence that the
reject H0
2U=3.841 two proportions are different.
25
2 Test for The Differences Among More Than Two
Proportions
• Extend the 2 test to the case with more than two independent
populations:
H0: π1 = π2 = … = πc
H1: Not all of the πj are equal (j = 1, 2, …, c)
26
The Chi-Square Test Statistic
The Chi-square test statistic is:
( fo − fe )2
2 =
all cells fe
where:
• fo = observed frequency in a particular cell of the 2 x c table
• fe = expected frequency in a particular cell if H0 is true
• 2 for the 2 x c case has (2-1)(c-1) = c - 1 degrees of freedom
Assumed: each cell in the contingency table has expected frequency of at least 5
27
2 Test with More Than Two Proportions: Example
The sharing of patient records is a controversial issue in health care. A survey
of 500 respondents asked whether they objected to their records being
shared by insurance companies, by pharmacies, and by medical researchers.
The results are summarized on the following table:
28
2 Test with More Than Two Proportions: Example
Organization
Object to Insurance Pharmacies Medical
Record Companies Researchers
Sharing
Yes 410 295 335
No 90 205 165
2 Test with More Than Two Proportions: Example
Organization
Object to Insurance Pharmacies Medical Row Sum
Record Companies Researchers
Sharing
Yes 410 295 335 1040
No 90 205 165 460
Column 500 500 500 1500
Sum
2 Test with More Than Two Proportions: Example
The overall proportion is:
X 1 + X 2 + ... + X c 410 + 295 + 335
p= = = 0.6933
n1 + n2 + ... + nc 500 + 500 + 500
Organization
Object to Record Insurance Pharmacies Medical
Sharing Companies Researchers
Yes fo = 410 fo = 295 fo = 335
fe = 346.667 fe = 346.667 fe = 346.667
No fo = 90 fo = 205 fo = 165
fe = 153.333 fe = 153.333 fe = 153.333
2 Test with More Than Two Proportions: Example
Organization
Object to Insurance Pharmacies Medical
Record Companies Researchers
Sharing
Yes ( fo − fe )
2
= 11.571 ( f o − f e )2 ( f o − f e )2
= 7.700 = 0.3926
fe fe fe
No ( f o − f e )2 ( fo − fe )
2
= 17.409
( fo − fe )
2
= 0.888
= 26.159
fe fe fe
( fo − fe )2
The Chi-square test statistic is: 2
= = 64.1196
all cells fe
2 Test with More Than Two Proportions: Example
H0: π1 = π2 = π3
H1: Not all of the πj are equal (j = 1, 2, 3)
Decision Rule: 2U = 5.991 is from the chi-square
If 2 > 2U, reject H0, otherwise, distribution with 2 degrees of
do not reject H0 freedom.
Conclusion: Since 64.1196 > 5.991, you reject H0 and you conclude that at
least one proportion of respondents who object to their records being shared
is different across the three organizations
33
Thank You
34