Business Statistics, 5
th
ed.
by Ken Black
Chapter 12
Analysis of
Categorical Data
Discrete Distributions
PowerPoint presentations prepared by Lloyd Jaisingh,
Morehead State University
Learning Objectives
Understand the _
2
goodness-of-fit test and how to
use it.
Analyze data using the _
2
test of independence.
_
2
Goodness-of-Fit Test
The _
2
goodness-of-fit test compares
expected (theoretical) frequencies
of categories from a population distribution
to the observed (actual) frequencies
from a distribution to determine whether
there is a difference between what was
expected and what was observed.
_
2
Goodness-of-Fit Test
( )
data sample the from estimated parameters of number =
categories of number
values expected of frequency
values observed of frequency :
- 1 - = df
2
0
0 2
c
k
where
c k
e
e
e
f
f
f
f f
=
=
=
=
_
The formula which is used to compute the test statistic for
a chi-square goodness-of-fit test is given below.
Month Gallons
January 1,610
February 1,585
March 1,649
April 1,590
May 1,540
June 1,397
July 1,410
August 1,350
September 1,495
October 1,564
November 1,602
December 1,655
18,447
Milk Sales Data
for Demonstration
Problem 12.1
Hypotheses and Decision Rules
for Demonstration Problem 12.1
d distribute uniformly not are
sales milk for figures monthly The : H
d distribute uniformly are
sales milk for figures monthly The : H
a
o
o
_
=
=
=
=
=
.
.
. ,
01
1
12 1 0
11
24 725
01 11
2
df k c
If reject H .
If do not reject H .
Cal
2
o
Cal
2
o
_
_
>
s
24 725
24 725
. ,
. ,
Calculations
for Demonstration Problem 12.1
Month f
o
f
e
(f
o
- f
e
)
2
/f
e
January 1,610 1,537.25 3.44
February 1,585 1,537.25 1.48
March 1,649 1,537.25 8.12
April 1,590 1,537.25 1.81
May 1,540 1,537.25 0.00
June 1,397 1,537.25 12.80
July 1,410 1,537.25 10.53
August 1,350 1,537.25 22.81
September 1,495 1,537.25 1.16
October 1,564 1,537.25 0.47
November 1,602 1,537.25 2.73
December 1,655 1,537.25 9.02
18,447 18,447.00 74.38
e
f
=
=
18447
12
1537 25 .
Cal
2
74 37 _ = .
The observed chi-square value of 74.37 is
greater than the critical value of 24.725.
The decision is to reject the null
hypothesis. The data provides enough
evidence to indicate that the distribution
of milk sales is not uniform.
Calculations
for Demonstration Problem 12.1
Calculations
for Demonstration Problem 12.1
Bank Customer Arrival Data
for Demonstration Problem 12.2
Number of
Arrivals
Observed
Frequencies
0 7
1 18
2 25
3 17
4 12
>5 5
Hypotheses and Decision Rules
for Demonstration Problem 12.2
Poisson not is on distributi frequency The : H
Poisson is on distributi frequency The : H
a
o
o
_
=
=
=
=
=
.
.
. ,
05
1
6 1 1
4
9 488
05 4
2
df k c
. H reject not do , 488 . 9 If
. H reject , 488 . 9 If
0
2
Cal
0
2
Cal
s
>
_
_
Calculations
for Demonstration Problem 12.2:
Estimating the Mean Arrival Rate
Number of
Arrivals
X
Observed
Frequencies
f
f X
0 7 0
1 18 18
2 25 50
3 17 51
4 12 48
>5 5 25
192
=
=
=
f X
f
192
84
2 3 . customers per minute
Mean
Arrival
Rate
Calculations for Demonstration Problem
12.2: Poisson Probabilities for = 2.3
Number of
Arrivals X
Expected
Probabilities
P(X)
Expected
Frequencies
n P(X)
0 0.1003 8.42
1 0.2306 19.37
2 0.2652 22.28
3 0.2033 17.08
4 0.1169 9.82
>5 0.0838 7.04
n f =
=
84
Poisson
Probabilities
for = 2.3
_
2
Calculations
for Demonstration Problem 12.2
Cal
2
174
_
= .
Number of
Arrivals
X
Observed
Frequencies
f
Expected
Frequencies
nP(X)
(f
o
- f
e
)
2
f
e
0
1
2
3
4
>5
7 8.42
18 19.37
25 22.28
17 17.08
12 9.82
5 7.04
84 84.00
0.24
0.10
0.33
0.00
0.48
0.59
1.74
The observed chi-square value of 1.74 is
less than the critical value of 9.4877.
The decision is not to reject the null
hypothesis. The data does not provide
enough evidence to indicate that the
distribution of bank arrivals is Poisson.
Calculations
for Demonstration Problem 12.2
Calculations
for Demonstration Problem 12.2
Using a _
2
Goodness-of-Fit Test
to Test a Population Proportion
.08 : H
.08 = :
a = p
p Ho
o
_
=
=
=
=
=
.
.
. ,
05
1
2 1 0
1
3841
05 1
2
df k c
If reject H .
If do not reject H .
Cal
2
o
Cal
2
o
_
_
>
s
3841
3841
. ,
. ,
Using a _
2
Goodness-of-Fit Test to Test
a Population Proportion: Calculations
f
o
f
e
Defects 33 16
Nondefects 167 184
200 200
n =
( )( )
( )
( )( )
184
92 . 200
1
16
08 . 200
=
=
=
=
=
=
f
f
f
f
e
e
e
e
P n Nondef ects
P n Def ects
6332 . 19
5707 . 1 0625 . 18
184
) 184 167 (
16
) 16 33 (
2 2
2
0
2
=
+ =
=
|
|
.
|
\
|
e
e
f
f f
_
The observed chi-square value of 19.63 is
greater than the critical value of 3.8415.
The decision is to reject the null
hypothesis. The data does provide
enough evidence to indicate that the
manufacturer does not produce 8% of
defective items.
Observing the actual sample result, in
which 0.165 of the sample was defective,
indicates that the proportion of the
population that is defective might be
greater than 8%.
Using a _
2
Goodness-of-Fit Test to
Test a Population Proportion
Using a _
2
Goodness-of-Fit Test
to Test a Population Proportion
MINITAB Solution
_
2
Test of Independence
Used to analyze the frequencies of two
variables with multiple categories to
determine whether the two variables
are independent.
Qualitative Variables
Nominal Data
_
2
Test of Independence: Investment Example
In which region of the country do you reside?
A. Northeast B. Midwest C. South D. West
Which type of financial investment are you most likely to
make today?
E. Stocks F. Bonds G. Treasury bills
Type of financial
Investment
E F G
A O
13
n
A
Geographic B n
B
Region C n
C
D n
D
n
E
n
F
n
G
N
Contingency Table
_
2
Test of Independence: Investment Example
Type of Financial
Investment
E F G
A e
12
n
A
Geographic B n
B
Region C n
C
D n
D
n
E
n
F
n
G
N
Contingency Table
( ) ( ) ( )
If A and F are independent,
P A F = P A P F
( ) ( )
( )
P A
N
P F
N
P A F
N N
A F
A F
n n
n n
= =
=
( )
AF
A F
A F
e
n n
n n
N P A F
N
N N
N
=
=
|
\
|
.
|
=
_
2
Test of Independence: Formulas
( )
( )
ij
i j
e
n n
N
where
=
=
=
: i = the row
j = the column n
the total of row i
the total of column j
N = the total of all frequencies
i
j
n
n
( ) 2
2
_
=
o e
where
f f
f
e
: df = (r - 1)(c - 1)
r = the number r of rows
c = the number r of columns
Expected
Frequencies
Calculated _
2
(Observed _
2
)
Type of
Gasoline
Income Regular Premium
Extra
Premium
Less than $30,000
$30,000 to $49,999
$50,000 to $99,000
At least $100,000
r = 4
c = 3
_
2
Test of Independence: Gasoline
Preference Versus Income Category
income of t independen
not is gasoline of Type : H
income of t independen
is gasoline of Type :
a
o H
( )( )
( )( )
o
_
=
=
=
=
=
.
.
. ,
01
1 1
4 1 3 1
6
16812
01 6
2
df r c
If reject H .
If do not reject H .
Cal
2
o
Cal
2
o
_
_
>
s
16812
16812
. ,
. ,
Gasoline Preference Versus Income
Category: Observed Frequencies
Type of
Gasoline
Income Regular Premium
Extra
Premium
Less than $30,000 85 16 6 107
$30,000 to $49,999 102 27 13 142
$50,000 to $99,000 36 22 15 73
At least $100,000 15 23 25 63
238 88 59 385
Gasoline Preference Versus Income
Category: Expected Frequencies
Type of
Gasoline
Income Regular Premium
Extra
Premium
Less than $30,000 (66.15) (24.46) (16.40)
85 16 6 107
$30,000 to $49,999 (87.78) (32.46) (21.76)
102 27 13 142
$50,000 to $99,000 (45.13) (16.69) (11.19)
36 22 15 73
At least $100,000 (38.95) (14.40) (9.65)
15 23 25 63
238 88 59 385
( )
( )
( ) ( )
( ) ( )
( ) ( )
ij
i j
e
n n
e
e
e
N
=
=
=
=
=
=
=
11
12
13
107 238
385
66 15
107 88
385
24 46
107 59
385
16 40
.
.
.
( ) ( ) ( )
( ) ( ) ( )
( ) ( ) ( )
( ) ( ) ( )
2
2
88 66 15 16 24 46 6 16 40
102 87 78 27 32 46 13 21 76
36 45 13 22 16 69 15 11 19
15 38 95 23 14 40 25 9 65
66 15 24 46 16 40
87 78 32 46 21 76
45 13 16 69 11 19
38 95 14 40 9 65
70 78
_
=
= + + +
+ + +
+ + +
+ +
=
o e
f f
f
e
2 2 2
2 2 2
2 2 2
2 2 2
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
.
Gasoline Preference Versus Income
Category: _
2
Calculation
The observed chi-square value of 70.78 is
greater than the critical value of 16.8119.
The decision is to reject the null
hypothesis. The data does provide
enough evidence to indicate that the type
of gasoline preferred is not independent
of income.
Gasoline Preference Versus Income
Category
Gasoline Preference Versus Income
Category: _
2
Calculation
Gasoline Preference Versus Income
Category: MINITAB Output
Copyright 2008 John Wiley & Sons, Inc.
All rights reserved. Reproduction or translation
of this work beyond that permitted in section 117
of the 1976 United States Copyright Act without
express permission of the copyright owner is
unlawful. Request for further information should
be addressed to the Permissions Department, John
Wiley & Sons, Inc. The purchaser may make
back-up copies for his/her own use only and not
for distribution or resale. The Publisher assumes
no responsibility for errors, omissions, or damages
caused by the use of these programs or from the
use of the information herein.