0% found this document useful (0 votes)
17 views34 pages

Test of Independence - I: Dr. A. Ramesh

The document provides an overview of the Chi-Square (χ2) Test of Independence, which is used to analyze the relationship between two categorical variables to determine if they are independent. It includes examples, formulas for calculating expected frequencies and the test statistic, and decision rules for hypothesis testing. Additionally, it discusses the application of the test in scenarios involving multiple population proportions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views34 pages

Test of Independence - I: Dr. A. Ramesh

The document provides an overview of the Chi-Square (χ2) Test of Independence, which is used to analyze the relationship between two categorical variables to determine if they are independent. It includes examples, formulas for calculating expected frequencies and the test statistic, and decision rules for hypothesis testing. Additionally, it discusses the application of the test in scenarios involving multiple population proportions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

2 Test of Independence - I

Dr. A. Ramesh
DEPARTMENT OF MANAGEMENT STUDIES

1
Agenda

• To understand 2 Test of Independence

2
2 Test of Independence

• It is used to analyze the frequencies of two variables with multiple


categories to determine whether the two variables are independent.
• Qualitative Variables
• Nominal Data

3
2 Test of Independence: Investment Example
• In which region of the country do you reside?
A. Northeast B. Midwest C. South D. West
• Which type of financial investment are you most likely to make today?
E. Stocks F. Bonds G. Treasury bills
Type of financial
Investment
Contingency Table
E F G
A O13 nA
Geographic B nB
Region C nC
D nD
nE nF nG N
4
2 Test of Independence: Investment Example
e AF
= N  P( A  F )
n n n n 
If A and F are independent, P( A) = A
P( F ) = F
= N A  F
N N  N N 
P( A  F) = P( A)  P( F ) n n
P( A  F ) = A F
n n
N N = A F
N

Type of Financial
Contingency Table Investment
E F G
A e12 nA
Geographic B nB
Region C nC
D nD
nE nF nG N
5
2 Test of Independence: Formulas

e =
ij
(n ) (n)j
i j
N
Expected where : i = the row
Frequencies j = the column
ni = the total of row i
nj = the total of column j
N = the total of all frequencies

6
2 Test of Independence: Formulas

( f o − f e)
2

Calculated   
2
=
(Observed ) f
where : df = (r - 1)(c - 1)
e

r = the numberr of rows


c = the numberr of columns

7
Example for Independence

8
2 Test of Independence

Ho : Type of gasoline is
independent of income
Ha : Type of gasoline is not
independent of income

9
2 Test of Independence

Type of
Gasoline
r=4 c=3 Extra
Income Regular Premium Premium
Less than $30,000
$30,000 to $49,999
$50,000 to $99,000
At least $100,000

10
2 Test of Independence: Gasoline Preference Versus
Income Category
 =.01
df = ( r − 1)( c − 1)
= ( 4 − 1)( 3 − 1)
=6

 2

.01, 6
= 16.812

If  2

Cal
 16.812, reject Ho.

If  2

Cal
 16.812, do not reject Ho.

11
Python code

12
Gasoline Preference Versus Income Category:
Observed Frequencies

Type of
Gasoline
Extra
Income Regular Premium Premium
Less than $30,000 85 16 6 107
$30,000 to $49,999 102 27 13 142
$50,000 to $99,000 36 22 15 73
At least $100,000 15 23 25 63
238 88 59 385

13
Gasoline Preference Versus Income Category: Expected
Frequencies

e =
ij
(n )
N
(ni
) j
Type of
Gasoline Extra
=
(107 )(238 ) Income Regular Premium Premium
e11 385 Less than $30,000 (66.15) (24.46) (16.40)
= 66.15 85 16 6 107
(107 )(88 ) $30,000 to $49,999 (87.78) (32.46) (21.76)
e12 = 385
102 27 13 142
$50,000 to $99,000 (45.13) (16.69) (11.19)
= 24 .46 36 22 15 73
(107 )(59) At least $100,000 (38.95) (14.40) (9.65)
e 13 = 385 15 23 25 63
= 16.40 238 88 59 385
14
Gasoline Preference Versus Income Category: 2
Calculation

(f o −f f e)
2

 = 
2

(85 −6666 .15 ) + (16 −2424 .46) + (6 −16.40) +


2 2 2

= .15 .46 16.40


(102 87
− 87.78) + (27 −3232 .46) + (13 − 21.76) +
2 2 2

.78 .46 21.76


(36 −454513 . )+ (22 −1616 .69 ) + (15 −1119. )+
2 2 2

.13 .69 11.19


(15 −3838 .95) + (23 −1414 .40) + (25 − 9.65)
2 2 2

.95 .40 9.65


= 7075
.
15
Gasoline Preference Versus Income Category:
Conclusion

df = 6
0.01
Non rejection
region

16.812


2
= 70.78  16.812, reject Ho.
Cal

16
Contingency Tables

Contingency Tables
• Useful in situations involving multiple population proportions
• Used to classify sample observations according to two or more
characteristics
• Also called a cross-classification table.

17
Contingency Table Example

Hand Preference vs. Gender


Dominant Hand: Left vs. Right
Gender: Male vs. Female

• 2 categories for each variable, so the table is called a 2 x 2 table

• Suppose we examine a sample of 300 college students

18
Contingency Table Example

Sample results organized in a contingency table:

Gender
sample size = n = 300:
Hand
120 Females, 12 were Preference
Female Male
left handed
Left 12 24 36
180 Males, 24 were
left handed Right 108 156 264

120 180 300


19
Contingency Table Example

H0: π1 = π2 (Proportion of females who are left handed is equal to the


proportion of males who are left handed)
H1: π1 ≠ π2 (The two proportions are not the same Hand preference is
not independent of gender)

• If H0 is true, then the proportion of left-handed females should be the


same as the proportion of left-handed males.
• The two proportions above should be the same as the proportion of left-
handed people overall.

20
The Chi-Square Test Statistic

The Chi-square test statistic is:


(f − f ) 2
χ2 =  o e

all cells fe
where:
fo = observed frequency in a particular cell
fe = expected frequency in a particular cell if H0 is true

2 for the 2 x 2 case has 1 degree of freedom


Assumed: each cell in the contingency table has expected frequency of at least 5
21
The Chi-Square Test Statistic

The 2 test statistic approximately follows a chi-square


distribution with one degree of freedom

Decision Rule:
If 2 > 2U, reject H0,
otherwise, do not reject 
H0
0 Do not Reject H0 
reject H0 2U
22
Observed vs. Expected Frequencies
Gender
Hand
Female Male
Preference

Observed = 12 Observed = 24
Left 36
Expected = 14.4 Expected = 21.6

Observed = 108 Observed = 156


Right 264
Expected = 105.6 Expected = 158.4

120 180 300


The Chi-Square Test Statistic
Gender
Hand
Female Male
Preference

Observed = 12 Observed = 24
Left 36
Expected = 14.4 Expected = 21.6

Observed = 108 Observed = 156


Right 264
Expected = 105.6 Expected = 158.4
120 180 300
The test statistic is:
( fo − fe )2
2 = 
all cells fe
(12 − 14.4) 2 (108 − 105.6) 2 ( 24 − 21.6) 2 (156 − 158.4) 2
= + + + = 0.7576
14.4 105.6 21.6 158.4
24
The Chi-Square Test Statistic

The test statistic is  2 = 0.7576 , U2 with 1 d.f. = 3.841


Decision Rule:
If 2 > 3.841, reject H0, otherwise, do not
reject H0
Here,
2 = 0..7576 < 2U = 3.841,
=.05
so you do not reject H0 and
conclude that there is
0 Do not Reject H0  
insufficient evidence that the
reject H0
2U=3.841 two proportions are different.

25
2 Test for The Differences Among More Than Two
Proportions

• Extend the 2 test to the case with more than two independent
populations:

H0: π1 = π2 = … = πc
H1: Not all of the πj are equal (j = 1, 2, …, c)

26
The Chi-Square Test Statistic

The Chi-square test statistic is:


( fo − fe )2
2 = 
all cells fe
where:
• fo = observed frequency in a particular cell of the 2 x c table
• fe = expected frequency in a particular cell if H0 is true
• 2 for the 2 x c case has (2-1)(c-1) = c - 1 degrees of freedom

Assumed: each cell in the contingency table has expected frequency of at least 5
27
2 Test with More Than Two Proportions: Example

The sharing of patient records is a controversial issue in health care. A survey


of 500 respondents asked whether they objected to their records being
shared by insurance companies, by pharmacies, and by medical researchers.
The results are summarized on the following table:

28
2 Test with More Than Two Proportions: Example
Organization
Object to Insurance Pharmacies Medical
Record Companies Researchers
Sharing

Yes 410 295 335

No 90 205 165
2 Test with More Than Two Proportions: Example
Organization
Object to Insurance Pharmacies Medical Row Sum
Record Companies Researchers
Sharing

Yes 410 295 335 1040

No 90 205 165 460

Column 500 500 500 1500


Sum
2 Test with More Than Two Proportions: Example
The overall proportion is:
X 1 + X 2 + ... + X c 410 + 295 + 335
p= = = 0.6933
n1 + n2 + ... + nc 500 + 500 + 500

Organization
Object to Record Insurance Pharmacies Medical
Sharing Companies Researchers

Yes fo = 410 fo = 295 fo = 335


fe = 346.667 fe = 346.667 fe = 346.667
No fo = 90 fo = 205 fo = 165
fe = 153.333 fe = 153.333 fe = 153.333
2 Test with More Than Two Proportions: Example
Organization
Object to Insurance Pharmacies Medical
Record Companies Researchers
Sharing

Yes ( fo − fe )
2
= 11.571 ( f o − f e )2 ( f o − f e )2
= 7.700 = 0.3926
fe fe fe

No ( f o − f e )2 ( fo − fe )
2
= 17.409
( fo − fe )
2
= 0.888
= 26.159
fe fe fe

( fo − fe )2
The Chi-square test statistic is:  2
=  = 64.1196
all cells fe
2 Test with More Than Two Proportions: Example
H0: π1 = π2 = π3
H1: Not all of the πj are equal (j = 1, 2, 3)

Decision Rule: 2U = 5.991 is from the chi-square


If 2 > 2U, reject H0, otherwise, distribution with 2 degrees of
do not reject H0 freedom.

Conclusion: Since 64.1196 > 5.991, you reject H0 and you conclude that at
least one proportion of respondents who object to their records being shared
is different across the three organizations

33
Thank You

34

You might also like