2 Test of Independence - II
Dr. A. Ramesh
DEPARTMENT OF MANAGEMENT STUDIES
1
Agenda
• Using python to test the independence of variables
• Understanding goodness of fit test for Poisson
2
Example
• Record of 50 students studying in ABN School is taken at random, the first
10 entries are like this:
res_num aa pe sm ae r g c
1 99 19 1 2 0 0 1
2 46 12 0 0 0 0 0
3 57 15 1 1 0 0 0
4 94 18 2 2 1 1 1
5 82 13 2 1 1 1 1
6 59 12 0 0 2 0 0
7 61 12 1 2 0 0 0
8 29 9 0 0 1 1 0
9 36 13 1 1 0 0 0
10 91 16 2 2 1 1 0
3
Example
Here :
• res_num = registration no.
• aa= academic ability
• pe = parent education
• sm = student motivation
• r = religion
• g = gender
4
Python code
5
Hypothesis
• Test the hypothesis that “gender and student motivation” are
independent
6
Python code
7
Observed values
Gender Student motivation
0 1 2 Row Sum
(Disagree ) (Not (Agree)
decided )
0 (Male) 10 13 6 29
1(Female ) 4 9 8 21
Column 14 22 14 50
Sum
8
Expected frequency (contingency table)
Gender Student motivation
0 1 2
0 29*14/50= 12.76 8.12
8.12
1 5.88 9.24 5.88
9
Frequency Table
Gender Student motivation
0 1 2
0 fo = 10 fo = 13 fo = 6
fe = 8.12 fe =12.76 fe =8.12
1 fo = 4 fo = 9 fo = 8
fe =5.88 fe =9.24 fe =5.88
10
Chi sq. calculation
(f o −f f e)
2
=
2
= 0.435+ 0.005+0.554+0.601+0.006+0.764
= 2.365
11
Python code
12
Python code
Degrees of
freedom =
(2-1)*(3-1)
13
Python code
Contingency
table
14
2 Goodness of Fit Test
15
2 Goodness-of-Fit Test
• The 2 goodness-of-fit test compares expected (theoretical)
frequencies of categories from a population distribution to the
observed (actual) frequencies from a distribution to determine
whether there is a difference between what was expected and what
was observed
16
2 Goodness-of-Fit Test
( f o− f e )
2
=
2
f e
df = k - 1 - p
where : f = frequency of observed values
o
f = frequency of expected values
e
k = number of categories
p = number of parameters estimated from the sample data
17
Goodness of Fit Test: Poisson Distribution
1. Set up the null and alternative hypotheses.
H0: Population has a Poisson probability distribution
Ha: Population does not have a Poisson distribution
2. Select a random sample and
• Record the observed frequency fi for each value of the Poisson
random variable.
• Compute the mean number of occurrences .
3. Compute the expected frequency of occurrences ei
for each value of the Poisson random variable.
18
Goodness of Fit Test: Poisson Distribution
4. Compute the value of the test statistic
k( f i − ei ) 2
=
2
i =1 ei
where:
fi = observed frequency for category i
ei = expected frequency for category i
k = number of categories
19
Goodness of Fit Test: Poisson Distribution
5. Rejection rule:
p-value approach: Reject H0 if p-value <
Critical value approach: Reject H0 if 2 2
where is the significance level and
there are k - 2 degrees of freedom
20
Goodness of Fit Test: Poisson Distribution
• Example: Parking Garage
In studying the need for an additional entrance to a city parking
garage, a consultant has recommended an analysis, that approach is
applicable only in situations where the number of cars entering
during a specified time period follows a Poisson distribution.
21
Goodness of Fit Test: Poisson Distribution
A random sample of 100 one- minute time intervals resulted in the
customer arrivals listed below. A statistical test must be conducted to
see if the assumption of a Poisson distribution is reasonable.
# Arrivals 0 1 2 3 4 5 6 7 8 9 10 11 12
Frequency 0 1 4 10 14 20 12 12 9 8 6 3 1
22
Goodness of Fit Test: Poisson Distribution
• Hypotheses
H0: Number of cars entering the garage during
a one-minute interval is Poisson distributed
Ha: Number of cars entering the garage during a
one-minute interval is not Poisson distributed
23
Python Code
24
Goodness of Fit Test: Poisson Distribution
• Estimate of Poisson Probability Function
otal Arrivals = 0(0) + 1(1) + 2(4) + . . . + 12(1) = 600
Estimate of = 600/100 = 6
Total Time Periods = 100
Hence,
6 x e −6
f ( x) =
x!
25
Goodness of Fit Test: Poisson Distribution
• Expected Frequencies
x f (x ) nf (x ) x f (x ) nf (x )
0 .0025 .25 7 .1377 13.77
1 .0149 1.49 8 .1033 10.33
2 .0446 4.46 9 .0688 6.88
3 .0892 8.92 10 .0413 4.13
4 .1339 13.39 11 .0225 2.25
5 .1606 16.06 12+ .0201 2.01
6 .1606 16.06 Total 1.0000 100.00
26
Python code
27
Python code
28
Goodness of Fit Test: Poisson Distribution
• Observed and Expected Frequencies
i fi ei fi - ei
0 or 1 or 2 5 6.20 -1.20
3 10 8.92 1.08
4 14 13.39 0.61
5 20 16.06 3.94
6 12 16.06 -4.06
7 12 13.77 -1.77
8 9 10.33 -1.33
9 8 6.88 1.12
10 or more 10 8.39 1.61
29
Python code
30
Goodness of Fit Test: Poisson Distribution
• Rejection Rule
With = .05 and k - p - 1 = 9 - 1 - 1 = 7 d.f.
(where k = number of categories and p = number of
population parameters estimated), .02 5 = 1 4 .0 6 7
Reject H0 if p-value < .05 or 2 > 14.067.
• Test Statistic
( − 1.20) 2
(1.08) 2
(1.61) 2
2 = + + ... + = 3.268
6.20 8.92 8.39
31
Python code
32
Goodness of Fit Test: Poisson
Distribution
df = 7
0.05
Non rejection
region
14.067
2
= 3.268 14.067, do not reject Ho.
Cal
33
Thank You
34