Categorical Data Analysis
• Independent (Explanatory) Variable is
Categorical (Nominal or Ordinal)
• Dependent (Response) Variable is
Categorical (Nominal or Ordinal)
• Special Cases:
– 2x2 (Each variable has 2 levels)
– Nominal/Nominal
– Nominal/Ordinal
– Ordinal/Ordinal
Contingency Tables
• Tables representing all combinations of
levels of explanatory and response
variables
• Numbers in table represent Counts of the
number of cases in each cell
• Row and column totals are called
Marginal counts
Example – EMT Assessment of Kids
• Explanatory Variable
– Child Age (Infant,
Toddler, Pre-school,
School-age,
Adolescent)
• Response Variable –
EMT Assessment
(Accurate, Inaccurate)
Assessment
Age Acc Inac Tot
Inf 168 73 241
Tod 230 73 303
Pre 254 53 307
Sch 379 58 437
Ado 652 124 776
Tot 1683 381 2064
Source: Foltin, et al (2002)
2x2 Tables
• Each variable has 2 levels
– Explanatory Variable – Groups (Typically
based on demographics, exposure, or Trt)
– Response Variable – Outcome (Typically
presence or absence of a characteristic)
• Measures of association
– Relative Risk (Prospective Studies)
– Odds Ratio (Prospective or Retrospective)
– Absolute Risk (Prospective Studies)
2x2 Tables - Notation
Outcome
Present
Outcome
Absent
Group
Total
Group 1 n11 n12 n1.
Group 2 n21 n22 n2.
Outcome
Total
n.1 n.2 n..
Relative Risk
• Ratio of the probability that the outcome
characteristic is present for one group,
relative to the other
• Sample proportions with characteristic from
groups 1 and 2:
.
2
21
2
^
.
1
11
1
^
n
n
n
n

 

Relative Risk
• Estimated Relative Risk:
2
^
1
^



RR
95% Confidence Interval for Population Relative Risk:
21
2
^
11
1
^
96
.
1
96
.
1
)
1
(
)
1
(
71828
.
2
)
)
(
,
)
(
(
n
n
v
e
e
RR
e
RR v
v

 





Relative Risk
• Interpretation
– Conclude that the probability that the outcome
is present is higher (in the population) for
group 1 if the entire interval is above 1
– Conclude that the probability that the outcome
is present is lower (in the population) for
group 1 if the entire interval is below 1
– Do not conclude that the probability of the
outcome differs for the two groups if the
interval contains 1
Example - Coccidioidomycosis and
TNF-antagonists
• Research Question: Risk of developing Coccidioidmycosis
associated with arthritis therapy?
• Groups: Patients receiving tumor necrosis factor  (TNF)
versus Patients not receiving TNF (all patients arthritic)
COC No COC Total
TNFa 7 240 247
Other 4 734 738
Total 11 974 985
Source: Bergstrom, et al (2004)
Example - Coccidioidomycosis and
TNF-antagonists
• Group 1: Patients on TNF
• Group 2: Patients not on TNF
)
76
.
17
,
55
.
1
(
)
24
.
5
,
24
.
5
(
:
%
95
3874
.
4
0054
.
1
7
0283
.
1
24
.
5
0054
.
0283
.
0054
.
738
4
0283
.
247
7
3874
.
96
.
1
3874
.
96
.
1
2
^
1
^
2
^
1
^














e
e
CI
v
RR




Entire CI above 1  Conclude higher risk if on TNF
Odds Ratio
• Odds of an event is the probability it occurs
divided by the probability it does not occur
• Odds ratio is the odds of the event for group 1
divided by the odds of the event for group 2
• Sample odds of the outcome for each group:
22
21
2
12
11
.
1
12
.
1
11
1
/
/
n
n
odds
n
n
n
n
n
n
odds



Odds Ratio
• Estimated Odds Ratio:
21
12
22
11
22
21
12
11
2
1
/
/
n
n
n
n
n
n
n
n
odds
odds
OR 


95% Confidence Interval for Population Odds Ratio
22
21
12
11
96
.
1
96
.
1
1
1
1
1
71828
.
2
)
)
(
,
)
(
(
n
n
n
n
v
e
e
OR
e
OR v
v






Odds Ratio
• Interpretation
– Conclude that the probability that the outcome
is present is higher (in the population) for
group 1 if the entire interval is above 1
– Conclude that the probability that the outcome
is present is lower (in the population) for
group 1 if the entire interval is below 1
– Do not conclude that the probability of the
outcome differs for the two groups if the
interval contains 1
Example - NSAIDs and GBM
• Case-Control Study (Retrospective)
– Cases: 137 Self-Reporting Patients with Glioblastoma
Multiforme (GBM)
– Controls: 401 Population-Based Individuals matched to
cases wrt demographic factors
GBMPresent GBMAbsent Total
NSAIDUser 32 138 170
NSAIDNon-User 105 263 368
Total 137 401 538
Source: Sivak-Sears, et al (2004)
Example - NSAIDs and GBM
)
91
.
0
,
37
.
0
(
)
58
.
0
,
58
.
0
(
:
%
95
0518
.
0
263
1
105
1
138
1
32
1
58
.
0
14490
8416
)
105
(
138
)
263
(
32
0518
.
0
96
.
1
0518
.
0
96
.
1










e
e
CI
v
OR
Interval is entirely below 1, NSAID use appears
to be lower among cases than controls
Absolute Risk
• Difference Between Proportions of outcomes
with an outcome characteristic for 2 groups
• Sample proportions with characteristic
from groups 1 and 2:
.
2
21
2
^
.
1
11
1
^
n
n
n
n

 

Absolute Risk
2
^
1
^

 

AR
Estimated Absolute Risk:
95% Confidence Interval for Population Absolute Risk
.
2
2
^
2
^
.
1
1
^
1
^
1
1
96
.
1
n
n
AR




















Absolute Risk
• Interpretation
– Conclude that the probability that the outcome
is present is higher (in the population) for
group 1 if the entire interval is positive
– Conclude that the probability that the outcome
is present is lower (in the population) for
group 1 if the entire interval is negative
– Do not conclude that the probability of the
outcome differs for the two groups if the
interval contains 0
Example - Coccidioidomycosis and
TNF-antagonists
• Group 1: Patients on TNF
• Group 2: Patients not on TNF
)
0242
.
0
,
0016
.
0
(
0213
.
0229
.
738
)
9946
(.
0054
.
247
)
9717
(.
0283
.
96
.
1
0229
.
:
%
95
0229
.
0054
.
0283
.
0054
.
738
4
0283
.
247
7
2
^
1
^
2
^
1
^














CI
AR 



Interval is entirely positive, TNF is
associated with higher risk
Fisher’s Exact Test
• Method of testing for association for 2x2 tables
when one or both of the group sample sizes is
small
• Measures (conditional on the group sizes and
number of cases with and without the
characteristic) the chances we would see
differences of this magnitude or larger in the
sample proportions, if there were no differences
in the populations
Example – Echinacea Purpurea for Colds
• Healthy adults randomized to receive EP
(n1.=24) or placebo (n2.=22, two were dropped)
• Among EP subjects, 14 of 24 developed cold
after exposure to RV-39 (58%)
• Among Placebo subjects, 18 of 22 developed
cold after exposure to RV-39 (82%)
• Out of a total of 46 subjects, 32 developed cold
• Out of a total of 46 subjects, 24 received EP
Source: Sperber, et al (2004)
Example – Echinacea Purpurea for Colds
• Conditional on 32 people
developing colds and 24
receiving EP, the
following table gives the
outcomes that would
have been as strong or
stronger evidence that EP
reduced risk of
developing cold (1-sided
test). P-value from SPSS
is .079.
EP/Cold Plac/Cold
14 18
13 19
12 20
11 21
10 22
Example - SPSS Output
Chi-Square Tests
2.990b 1 .084
1.984 1 .159
3.071 1 .080
.114 .079
46
Pearson Chi-Square
Continuity Correctiona
Likelihood Ratio
Fisher's Exact Test
N of Valid Cases
Value df
Asymp. Sig.
(2-sided)
Exact Sig.
(2-sided)
Exact Sig.
(1-sided)
Computed only for a 2x2 table
a.
0 cells (.0%) have expected count less than 5. The minimum expected count is
6.70.
b.
TRT * COLD Crosstabulation
Count
10 14 24
4 18 22
14 32 46
EP
Placebo
TRT
Total
No Yes
COLD
Total
McNemar’s Test for Paired Samples
• Common subjects being observed under 2
conditions (2 treatments, before/after, 2
diagnostic tests) in a crossover setting
• Two possible outcomes (Presence/Absence of
Characteristic) on each measurement
• Four possibilities for each subjects wrt outcome:
– Present in both conditions
– Absent in both conditions
– Present in Condition 1, Absent in Condition 2
– Absent in Condition 1, Present in Condition 2
McNemar’s Test for Paired Samples
Condition 12 Present Absent
Present n11 n12
Absent n21 n22
McNemar’s Test for Paired Samples
• H0: Probability the outcome is Present is same
for the 2 conditions
• HA: Probabilities differ for the 2 conditions (Can
also be conducted as 1-sided test)
|)
|
(
2
)
05
.
0
96
.
1
(
|
|
:
.
.
:
.
.
2
/
21
12
21
12
obs
obs
obs
z
Z
P
val
P
if
z
z
R
R
n
n
n
n
z
S
T










Example - Reporting of Silicone Breast
Implant Leakage in Revision Surgery
• Subjects - 165 women having revision surgery
involving silicone gel breast implants
• Conditions (Each being observed on all women)
– Self Report of Presence/Absence of Rupture/Leak
– Surgical Record of Presence/Absence of Rupture/Leak
SELF * SURGICAL Crosstabulation
Count
69 28 97
5 63 68
74 91 165
Rupture
No Rupture
SELF
Total
Rupture No Rupture
SURGICAL
Total
Source: Brown and Pennello (2002)
Example - Reporting of Silicone Breast
Implant Leakage in Revision Surgery
• H0: Tendency to report ruptures/leaks is the
same for self reports and surgical records
• HA: Tendencies differ
0
|)
|
(
2
96
.
1
|
|
:
.
.
00
.
4
5
28
5
28
:
.
.
21
12
21
12












obs
obs
obs
z
Z
P
val
P
z
R
R
n
n
n
n
z
S
T
Pearson’s Chi-Square Test
• Can be used for nominal or ordinal explanatory
and response variables
• Variables can have any number of distinct levels
• Tests whether the distribution of the response
variable is the same for each level of the
explanatory variable (H0: No association
between the variables
• r = # of levels of explanatory variable
• c = # of levels of response variable
Pearson’s Chi-Square Test
• Intuition behind test statistic
– Obtain marginal distribution of outcomes for
the response variable
– Apply this common distribution to all levels of
the explanatory variable, by multiplying each
proportion by the corresponding sample size
– Measure the difference between actual cell
counts and the expected cell counts in the
previous step
Pearson’s Chi-Square Test
• Notation to obtain test statistic
– Rows represent explanatory variable (r levels)
– Cols represent response variable (c levels)
1 2 … c Total
1 n11
n12
… n1c
n1.
2 n21
n22
… n2c
n2.
… … … … … …
r nr1
nr2
… nrc
nr.
Total n.1
n.2
… n.c
n..
Pearson’s Chi-Square Test
• Marginal distribution of response and expected cell
counts under hypothesis of no association:
..
.
.
^
.
..
.
^
..
1
.
1
^
)
(
n
n
n
n
n
E
n
n
n
n
j
i
j
i
ij
c
c






 
Pearson’s Chi-Square Test
• H0: No association between variables
• HA: Variables are associated
)
(
:
.
.
)
(
))
(
(
:
.
.
2
2
2
)
1
)(
1
(
,
2
2
2
X
P
val
P
X
R
R
n
E
n
E
n
X
S
T
c
r
i j ij
ij
ij














Example – EMT Assessment of Kids
Assessment
Age Acc Inac Tot
Inf 168 73 241
Tod 230 73 303
Pre 254 53 307
Sch 379 58 437
Ado 652 124 776
Tot 1683 381 2064
Assessment
Age Acc Inac Tot
Inf 197 44 241
Tod 247 56 303
Pre 250 57 307
Sch 356 81 437
Ado 633 143 776
Tot 1683 381 2064
Observed Expected
Example – EMT Assessment of Kids
• Note that each expected count is the row total
times the column total, divided by the overall
total. For the first cell in the table:
197
2064
)
1683
(
241
)
(
..
1
.
.
1
11 


n
n
n
n
E
• The contribution to the test statistic for this cell is
27
.
4
197
)
197
168
( 2


Example – EMT Assessment of Kids
• H0: No association between variables
• HA: Variables are associated
488
.
9
:
.
.
1
.
40
143
)
143
124
(
197
)
197
168
(
:
.
.
2
4
,
05
.
2
)
1
2
)(
1
5
(
,
05
.
2
2
2
2












 

X
R
R
X
S
T 
Reject H0, conclude that the accuracy of
assessments differs among age groups
Example - SPSS Output
AGE * ASSESS Crosstabulation
168 73 241
196.5 44.5 241.0
230 73 303
247.1 55.9 303.0
254 53 307
250.3 56.7 307.0
379 58 437
356.3 80.7 437.0
652 124 776
632.8 143.2 776.0
1683 381 2064
1683.0 381.0 2064.0
Count
Expected Count
Count
Expected Count
Count
Expected Count
Count
Expected Count
Count
Expected Count
Count
Expected Count
Infant
Toddler
Pre-school
School age
Adolescent
AGE
Total
Accurate Inaccurate
ASSESS
Total
Chi-Square Tests
40.073a 4 .000
37.655 4 .000
29.586 1 .000
2064
Pearson Chi-Square
Likelihood Ratio
Linear-by-Linear
Association
N of Valid Cases
Value df
Asymp. Sig.
(2-sided)
0 cells (.0%) have expected count less than 5. The
minimum expected count is 44.49.
a.
Ordinal Explanatory and Response
Variables
• Pearson’s Chi-square test can be used to test
associations among ordinal variables, but more
powerful methods exist
• When theories exist that the association is
directional (positive or negative), measures exist
to describe and test for these specific
alternatives from independence:
– Gamma
– Kendall’s b
Concordant and Discordant Pairs
• Concordant Pairs - Pairs of individuals where
one individual scores “higher” on both ordered
variables than the other individual
• Discordant Pairs - Pairs of individuals where one
individual scores “higher” on one ordered
variable and the other individual scores “higher”
on the other
• C = # Concordant Pairs D = # Discordant Pairs
– Under Positive association, expect C > D
– Under Negative association, expect C < D
– Under No association, expect C  D
Example - Alcohol Use and Sick Days
• Alcohol Risk (Without Risk, Hardly any Risk,
Some to Considerable Risk)
• Sick Days (0, 1-6, 7)
• Concordant Pairs - Pairs of respondents where
one scores higher on both alcohol risk and sick
days than the other
• Discordant Pairs - Pairs of respondents where
one scores higher on alcohol risk and the other
scores higher on sick days
Source: Hermansson, et al (2003)
Example - Alcohol Use and Sick Days
ALCOHOL * SICKDAYS Crosstabulation
Count
347 113 145 605
154 63 56 273
52 25 34 111
553 201 235 989
Without Risk
Hardly any Risk
Some-Considerable Risk
ALCOHOL
Total
0 days 1-6 days 7+ days
SICKDAYS
Total
• Concordant Pairs: Each individual in a given cell
is concordant with each individual in cells
“Southeast” of theirs
•Discordant Pairs: Each individual in a given cell is
discordant with each individual in cells “Southwest”
of theirs
Example - Alcohol Use and Sick Days
ALCOHOL * SICKDAYS Crosstabulation
Count
347 113 145 605
154 63 56 273
52 25 34 111
553 201 235 989
Without Risk
Hardly any Risk
Some-Considerable Risk
ALCOHOL
Total
0 days 1-6 days 7+ days
SICKDAYS
Total
73496
)
52
(
63
)
25
52
(
56
)
52
154
(
113
)
25
52
63
154
(
145
83164
)
34
(
63
)
34
25
(
154
)
34
56
(
113
)
34
25
56
63
(
347




















D
C
Measures of Association
• Goodman and Kruskal’s Gamma:
1
1
^
^






 

D
C
D
C
• Kendall’s b:
)
)(
(
2
.
2
2
.
2
^

 



j
i
b
n
n
n
n
D
C

When there’s no association between the ordinal variables,
the population based values of these measures are 0.
Statistical software packages provide these tests.
Example - Alcohol Use and Sick Days
0617
.
0
73496
83164
73496
83164
^







D
C
D
C

Symmetric Measures
.035 .030 1.187 .235
.062 .052 1.187 .235
989
Kendall's tau-b
Gamma
Ordinal by
Ordinal
N of Valid Cases
Value
Asymp.
Std. Error
a
Approx. T
b
Approx. Sig.
Not assuming the null hypothesis.
a.
Using the asymptotic standard error assuming the null hypothesis.
b.

More Related Content

PPT
Categorical-data-afghvvghgfhg.analysis.ppt
PPTX
Epidemiology and statistics basic understanding
PPTX
Company Induction process and Onboarding
PPTX
Inferential statistics nominal data
PPT
Chi-square, Yates, Fisher & McNemar
PPTX
Basic of Biostatistics The Second Part.pptx
PPT
Medical statistics Basic concept and applications [Square one]
PPT
Statistics tests and Probablity
Categorical-data-afghvvghgfhg.analysis.ppt
Epidemiology and statistics basic understanding
Company Induction process and Onboarding
Inferential statistics nominal data
Chi-square, Yates, Fisher & McNemar
Basic of Biostatistics The Second Part.pptx
Medical statistics Basic concept and applications [Square one]
Statistics tests and Probablity

Similar to Categorical data analysis which part of the generalized linear model (20)

PPTX
Chi square(hospital admin)
PPTX
MD Paediatricts (Part 2) - Epidemiology and Statistics
PPT
Displaying your results
PPT
23-Statistical_tests_(chi-square,Fishers___&Macnemars))(UG1435-36).ppt
PDF
Concepts in Biostatistics Presentation.pdf
PPT
DataTestsComputerScienceAndEngineering.ppt
PDF
Categorical data analysis
PPT
3 6 datatests-slides
PPTX
Inferential statistics
PDF
inferentialstatistics-210411214248.pdf
PDF
P G STAT 531 Lecture 8 Chi square test
PPTX
Epidemological methods
PPT
Data Analysis_Simple Statistical Tests.ppt
PPT
Chapter 03 and 04.ppthhvvvvvvvvvvvvvgggvgv
PPTX
Chi square distribution and analysis of frequencies.pptx
PPTX
Non Parametric Tests
PPTX
7. THE CHI-SQUARE STATISTICAL TESTS.pptx
PPTX
Chi square test
PPT
Case control design
PPTX
Statistical methods for research scholars (cd)
Chi square(hospital admin)
MD Paediatricts (Part 2) - Epidemiology and Statistics
Displaying your results
23-Statistical_tests_(chi-square,Fishers___&Macnemars))(UG1435-36).ppt
Concepts in Biostatistics Presentation.pdf
DataTestsComputerScienceAndEngineering.ppt
Categorical data analysis
3 6 datatests-slides
Inferential statistics
inferentialstatistics-210411214248.pdf
P G STAT 531 Lecture 8 Chi square test
Epidemological methods
Data Analysis_Simple Statistical Tests.ppt
Chapter 03 and 04.ppthhvvvvvvvvvvvvvgggvgv
Chi square distribution and analysis of frequencies.pptx
Non Parametric Tests
7. THE CHI-SQUARE STATISTICAL TESTS.pptx
Chi square test
Case control design
Statistical methods for research scholars (cd)
Ad

More from yonas381043 (6)

PPT
mapping disease risk in space and time, re-mapping
PPT
spatio-temporal modelling, in samall area
PPT
Statistical tests for categorical data(2020)88.ppt
PPT
spatial modeling of aggregated data in small area
PPT
Non-parametric methods of data analysis for non normal data
PPT
Non-parametric statistics - a class of statistics associated with non paramet...
mapping disease risk in space and time, re-mapping
spatio-temporal modelling, in samall area
Statistical tests for categorical data(2020)88.ppt
spatial modeling of aggregated data in small area
Non-parametric methods of data analysis for non normal data
Non-parametric statistics - a class of statistics associated with non paramet...
Ad

Recently uploaded (20)

PPTX
Stats annual compiled ipd opd ot br 2024
PDF
Hikvision-IR-PPT---EN.pdfSADASDASSAAAAAAAAAAAAAAA
PDF
The Role of Pathology AI in Translational Cancer Research and Education
PPTX
transformers as a tool for understanding advance algorithms in deep learning
PPTX
Sheep Seg. Marketing Plan_C2 2025 (1).pptx
PPTX
Introduction to Fundamentals of Data Security
PPTX
machinelearningoverview-250809184828-927201d2.pptx
PPTX
1 hour to get there before the game is done so you don’t need a car seat for ...
PPTX
GPS sensor used agriculture land for automation
PPTX
AI AND ML PROPOSAL PRESENTATION MUST.pptx
PPTX
Hushh.ai: Your Personal Data, Your Business
PPT
dsa Lec-1 Introduction FOR THE STUDENTS OF bscs
PDF
©️ 01_Algorithm for Microsoft New Product Launch - handling web site - by Ale...
PPTX
chuitkarjhanbijunsdivndsijvndiucbhsaxnmzsicvjsd
PDF
Mcdonald's : a half century growth . pdf
PPTX
DATA ANALYTICS COURSE IN PITAMPURA.pptx
PPTX
Statisticsccdxghbbnhhbvvvvvvvvvv. Dxcvvvhhbdzvbsdvvbbvv ccc
PPTX
research framework and review of related literature chapter 2
PPT
What is life? We never know the answer exactly
PPTX
865628565-Pertemuan-2-chapter-03-NUMERICAL-MEASURES.pptx
Stats annual compiled ipd opd ot br 2024
Hikvision-IR-PPT---EN.pdfSADASDASSAAAAAAAAAAAAAAA
The Role of Pathology AI in Translational Cancer Research and Education
transformers as a tool for understanding advance algorithms in deep learning
Sheep Seg. Marketing Plan_C2 2025 (1).pptx
Introduction to Fundamentals of Data Security
machinelearningoverview-250809184828-927201d2.pptx
1 hour to get there before the game is done so you don’t need a car seat for ...
GPS sensor used agriculture land for automation
AI AND ML PROPOSAL PRESENTATION MUST.pptx
Hushh.ai: Your Personal Data, Your Business
dsa Lec-1 Introduction FOR THE STUDENTS OF bscs
©️ 01_Algorithm for Microsoft New Product Launch - handling web site - by Ale...
chuitkarjhanbijunsdivndsijvndiucbhsaxnmzsicvjsd
Mcdonald's : a half century growth . pdf
DATA ANALYTICS COURSE IN PITAMPURA.pptx
Statisticsccdxghbbnhhbvvvvvvvvvv. Dxcvvvhhbdzvbsdvvbbvv ccc
research framework and review of related literature chapter 2
What is life? We never know the answer exactly
865628565-Pertemuan-2-chapter-03-NUMERICAL-MEASURES.pptx

Categorical data analysis which part of the generalized linear model

  • 1. Categorical Data Analysis • Independent (Explanatory) Variable is Categorical (Nominal or Ordinal) • Dependent (Response) Variable is Categorical (Nominal or Ordinal) • Special Cases: – 2x2 (Each variable has 2 levels) – Nominal/Nominal – Nominal/Ordinal – Ordinal/Ordinal
  • 2. Contingency Tables • Tables representing all combinations of levels of explanatory and response variables • Numbers in table represent Counts of the number of cases in each cell • Row and column totals are called Marginal counts
  • 3. Example – EMT Assessment of Kids • Explanatory Variable – Child Age (Infant, Toddler, Pre-school, School-age, Adolescent) • Response Variable – EMT Assessment (Accurate, Inaccurate) Assessment Age Acc Inac Tot Inf 168 73 241 Tod 230 73 303 Pre 254 53 307 Sch 379 58 437 Ado 652 124 776 Tot 1683 381 2064 Source: Foltin, et al (2002)
  • 4. 2x2 Tables • Each variable has 2 levels – Explanatory Variable – Groups (Typically based on demographics, exposure, or Trt) – Response Variable – Outcome (Typically presence or absence of a characteristic) • Measures of association – Relative Risk (Prospective Studies) – Odds Ratio (Prospective or Retrospective) – Absolute Risk (Prospective Studies)
  • 5. 2x2 Tables - Notation Outcome Present Outcome Absent Group Total Group 1 n11 n12 n1. Group 2 n21 n22 n2. Outcome Total n.1 n.2 n..
  • 6. Relative Risk • Ratio of the probability that the outcome characteristic is present for one group, relative to the other • Sample proportions with characteristic from groups 1 and 2: . 2 21 2 ^ . 1 11 1 ^ n n n n    
  • 7. Relative Risk • Estimated Relative Risk: 2 ^ 1 ^    RR 95% Confidence Interval for Population Relative Risk: 21 2 ^ 11 1 ^ 96 . 1 96 . 1 ) 1 ( ) 1 ( 71828 . 2 ) ) ( , ) ( ( n n v e e RR e RR v v        
  • 8. Relative Risk • Interpretation – Conclude that the probability that the outcome is present is higher (in the population) for group 1 if the entire interval is above 1 – Conclude that the probability that the outcome is present is lower (in the population) for group 1 if the entire interval is below 1 – Do not conclude that the probability of the outcome differs for the two groups if the interval contains 1
  • 9. Example - Coccidioidomycosis and TNF-antagonists • Research Question: Risk of developing Coccidioidmycosis associated with arthritis therapy? • Groups: Patients receiving tumor necrosis factor  (TNF) versus Patients not receiving TNF (all patients arthritic) COC No COC Total TNFa 7 240 247 Other 4 734 738 Total 11 974 985 Source: Bergstrom, et al (2004)
  • 10. Example - Coccidioidomycosis and TNF-antagonists • Group 1: Patients on TNF • Group 2: Patients not on TNF ) 76 . 17 , 55 . 1 ( ) 24 . 5 , 24 . 5 ( : % 95 3874 . 4 0054 . 1 7 0283 . 1 24 . 5 0054 . 0283 . 0054 . 738 4 0283 . 247 7 3874 . 96 . 1 3874 . 96 . 1 2 ^ 1 ^ 2 ^ 1 ^               e e CI v RR     Entire CI above 1  Conclude higher risk if on TNF
  • 11. Odds Ratio • Odds of an event is the probability it occurs divided by the probability it does not occur • Odds ratio is the odds of the event for group 1 divided by the odds of the event for group 2 • Sample odds of the outcome for each group: 22 21 2 12 11 . 1 12 . 1 11 1 / / n n odds n n n n n n odds   
  • 12. Odds Ratio • Estimated Odds Ratio: 21 12 22 11 22 21 12 11 2 1 / / n n n n n n n n odds odds OR    95% Confidence Interval for Population Odds Ratio 22 21 12 11 96 . 1 96 . 1 1 1 1 1 71828 . 2 ) ) ( , ) ( ( n n n n v e e OR e OR v v      
  • 13. Odds Ratio • Interpretation – Conclude that the probability that the outcome is present is higher (in the population) for group 1 if the entire interval is above 1 – Conclude that the probability that the outcome is present is lower (in the population) for group 1 if the entire interval is below 1 – Do not conclude that the probability of the outcome differs for the two groups if the interval contains 1
  • 14. Example - NSAIDs and GBM • Case-Control Study (Retrospective) – Cases: 137 Self-Reporting Patients with Glioblastoma Multiforme (GBM) – Controls: 401 Population-Based Individuals matched to cases wrt demographic factors GBMPresent GBMAbsent Total NSAIDUser 32 138 170 NSAIDNon-User 105 263 368 Total 137 401 538 Source: Sivak-Sears, et al (2004)
  • 15. Example - NSAIDs and GBM ) 91 . 0 , 37 . 0 ( ) 58 . 0 , 58 . 0 ( : % 95 0518 . 0 263 1 105 1 138 1 32 1 58 . 0 14490 8416 ) 105 ( 138 ) 263 ( 32 0518 . 0 96 . 1 0518 . 0 96 . 1           e e CI v OR Interval is entirely below 1, NSAID use appears to be lower among cases than controls
  • 16. Absolute Risk • Difference Between Proportions of outcomes with an outcome characteristic for 2 groups • Sample proportions with characteristic from groups 1 and 2: . 2 21 2 ^ . 1 11 1 ^ n n n n    
  • 17. Absolute Risk 2 ^ 1 ^     AR Estimated Absolute Risk: 95% Confidence Interval for Population Absolute Risk . 2 2 ^ 2 ^ . 1 1 ^ 1 ^ 1 1 96 . 1 n n AR                    
  • 18. Absolute Risk • Interpretation – Conclude that the probability that the outcome is present is higher (in the population) for group 1 if the entire interval is positive – Conclude that the probability that the outcome is present is lower (in the population) for group 1 if the entire interval is negative – Do not conclude that the probability of the outcome differs for the two groups if the interval contains 0
  • 19. Example - Coccidioidomycosis and TNF-antagonists • Group 1: Patients on TNF • Group 2: Patients not on TNF ) 0242 . 0 , 0016 . 0 ( 0213 . 0229 . 738 ) 9946 (. 0054 . 247 ) 9717 (. 0283 . 96 . 1 0229 . : % 95 0229 . 0054 . 0283 . 0054 . 738 4 0283 . 247 7 2 ^ 1 ^ 2 ^ 1 ^               CI AR     Interval is entirely positive, TNF is associated with higher risk
  • 20. Fisher’s Exact Test • Method of testing for association for 2x2 tables when one or both of the group sample sizes is small • Measures (conditional on the group sizes and number of cases with and without the characteristic) the chances we would see differences of this magnitude or larger in the sample proportions, if there were no differences in the populations
  • 21. Example – Echinacea Purpurea for Colds • Healthy adults randomized to receive EP (n1.=24) or placebo (n2.=22, two were dropped) • Among EP subjects, 14 of 24 developed cold after exposure to RV-39 (58%) • Among Placebo subjects, 18 of 22 developed cold after exposure to RV-39 (82%) • Out of a total of 46 subjects, 32 developed cold • Out of a total of 46 subjects, 24 received EP Source: Sperber, et al (2004)
  • 22. Example – Echinacea Purpurea for Colds • Conditional on 32 people developing colds and 24 receiving EP, the following table gives the outcomes that would have been as strong or stronger evidence that EP reduced risk of developing cold (1-sided test). P-value from SPSS is .079. EP/Cold Plac/Cold 14 18 13 19 12 20 11 21 10 22
  • 23. Example - SPSS Output Chi-Square Tests 2.990b 1 .084 1.984 1 .159 3.071 1 .080 .114 .079 46 Pearson Chi-Square Continuity Correctiona Likelihood Ratio Fisher's Exact Test N of Valid Cases Value df Asymp. Sig. (2-sided) Exact Sig. (2-sided) Exact Sig. (1-sided) Computed only for a 2x2 table a. 0 cells (.0%) have expected count less than 5. The minimum expected count is 6.70. b. TRT * COLD Crosstabulation Count 10 14 24 4 18 22 14 32 46 EP Placebo TRT Total No Yes COLD Total
  • 24. McNemar’s Test for Paired Samples • Common subjects being observed under 2 conditions (2 treatments, before/after, 2 diagnostic tests) in a crossover setting • Two possible outcomes (Presence/Absence of Characteristic) on each measurement • Four possibilities for each subjects wrt outcome: – Present in both conditions – Absent in both conditions – Present in Condition 1, Absent in Condition 2 – Absent in Condition 1, Present in Condition 2
  • 25. McNemar’s Test for Paired Samples Condition 12 Present Absent Present n11 n12 Absent n21 n22
  • 26. McNemar’s Test for Paired Samples • H0: Probability the outcome is Present is same for the 2 conditions • HA: Probabilities differ for the 2 conditions (Can also be conducted as 1-sided test) |) | ( 2 ) 05 . 0 96 . 1 ( | | : . . : . . 2 / 21 12 21 12 obs obs obs z Z P val P if z z R R n n n n z S T          
  • 27. Example - Reporting of Silicone Breast Implant Leakage in Revision Surgery • Subjects - 165 women having revision surgery involving silicone gel breast implants • Conditions (Each being observed on all women) – Self Report of Presence/Absence of Rupture/Leak – Surgical Record of Presence/Absence of Rupture/Leak SELF * SURGICAL Crosstabulation Count 69 28 97 5 63 68 74 91 165 Rupture No Rupture SELF Total Rupture No Rupture SURGICAL Total Source: Brown and Pennello (2002)
  • 28. Example - Reporting of Silicone Breast Implant Leakage in Revision Surgery • H0: Tendency to report ruptures/leaks is the same for self reports and surgical records • HA: Tendencies differ 0 |) | ( 2 96 . 1 | | : . . 00 . 4 5 28 5 28 : . . 21 12 21 12             obs obs obs z Z P val P z R R n n n n z S T
  • 29. Pearson’s Chi-Square Test • Can be used for nominal or ordinal explanatory and response variables • Variables can have any number of distinct levels • Tests whether the distribution of the response variable is the same for each level of the explanatory variable (H0: No association between the variables • r = # of levels of explanatory variable • c = # of levels of response variable
  • 30. Pearson’s Chi-Square Test • Intuition behind test statistic – Obtain marginal distribution of outcomes for the response variable – Apply this common distribution to all levels of the explanatory variable, by multiplying each proportion by the corresponding sample size – Measure the difference between actual cell counts and the expected cell counts in the previous step
  • 31. Pearson’s Chi-Square Test • Notation to obtain test statistic – Rows represent explanatory variable (r levels) – Cols represent response variable (c levels) 1 2 … c Total 1 n11 n12 … n1c n1. 2 n21 n22 … n2c n2. … … … … … … r nr1 nr2 … nrc nr. Total n.1 n.2 … n.c n..
  • 32. Pearson’s Chi-Square Test • Marginal distribution of response and expected cell counts under hypothesis of no association: .. . . ^ . .. . ^ .. 1 . 1 ^ ) ( n n n n n E n n n n j i j i ij c c        
  • 33. Pearson’s Chi-Square Test • H0: No association between variables • HA: Variables are associated ) ( : . . ) ( )) ( ( : . . 2 2 2 ) 1 )( 1 ( , 2 2 2 X P val P X R R n E n E n X S T c r i j ij ij ij              
  • 34. Example – EMT Assessment of Kids Assessment Age Acc Inac Tot Inf 168 73 241 Tod 230 73 303 Pre 254 53 307 Sch 379 58 437 Ado 652 124 776 Tot 1683 381 2064 Assessment Age Acc Inac Tot Inf 197 44 241 Tod 247 56 303 Pre 250 57 307 Sch 356 81 437 Ado 633 143 776 Tot 1683 381 2064 Observed Expected
  • 35. Example – EMT Assessment of Kids • Note that each expected count is the row total times the column total, divided by the overall total. For the first cell in the table: 197 2064 ) 1683 ( 241 ) ( .. 1 . . 1 11    n n n n E • The contribution to the test statistic for this cell is 27 . 4 197 ) 197 168 ( 2  
  • 36. Example – EMT Assessment of Kids • H0: No association between variables • HA: Variables are associated 488 . 9 : . . 1 . 40 143 ) 143 124 ( 197 ) 197 168 ( : . . 2 4 , 05 . 2 ) 1 2 )( 1 5 ( , 05 . 2 2 2 2                X R R X S T  Reject H0, conclude that the accuracy of assessments differs among age groups
  • 37. Example - SPSS Output AGE * ASSESS Crosstabulation 168 73 241 196.5 44.5 241.0 230 73 303 247.1 55.9 303.0 254 53 307 250.3 56.7 307.0 379 58 437 356.3 80.7 437.0 652 124 776 632.8 143.2 776.0 1683 381 2064 1683.0 381.0 2064.0 Count Expected Count Count Expected Count Count Expected Count Count Expected Count Count Expected Count Count Expected Count Infant Toddler Pre-school School age Adolescent AGE Total Accurate Inaccurate ASSESS Total Chi-Square Tests 40.073a 4 .000 37.655 4 .000 29.586 1 .000 2064 Pearson Chi-Square Likelihood Ratio Linear-by-Linear Association N of Valid Cases Value df Asymp. Sig. (2-sided) 0 cells (.0%) have expected count less than 5. The minimum expected count is 44.49. a.
  • 38. Ordinal Explanatory and Response Variables • Pearson’s Chi-square test can be used to test associations among ordinal variables, but more powerful methods exist • When theories exist that the association is directional (positive or negative), measures exist to describe and test for these specific alternatives from independence: – Gamma – Kendall’s b
  • 39. Concordant and Discordant Pairs • Concordant Pairs - Pairs of individuals where one individual scores “higher” on both ordered variables than the other individual • Discordant Pairs - Pairs of individuals where one individual scores “higher” on one ordered variable and the other individual scores “higher” on the other • C = # Concordant Pairs D = # Discordant Pairs – Under Positive association, expect C > D – Under Negative association, expect C < D – Under No association, expect C  D
  • 40. Example - Alcohol Use and Sick Days • Alcohol Risk (Without Risk, Hardly any Risk, Some to Considerable Risk) • Sick Days (0, 1-6, 7) • Concordant Pairs - Pairs of respondents where one scores higher on both alcohol risk and sick days than the other • Discordant Pairs - Pairs of respondents where one scores higher on alcohol risk and the other scores higher on sick days Source: Hermansson, et al (2003)
  • 41. Example - Alcohol Use and Sick Days ALCOHOL * SICKDAYS Crosstabulation Count 347 113 145 605 154 63 56 273 52 25 34 111 553 201 235 989 Without Risk Hardly any Risk Some-Considerable Risk ALCOHOL Total 0 days 1-6 days 7+ days SICKDAYS Total • Concordant Pairs: Each individual in a given cell is concordant with each individual in cells “Southeast” of theirs •Discordant Pairs: Each individual in a given cell is discordant with each individual in cells “Southwest” of theirs
  • 42. Example - Alcohol Use and Sick Days ALCOHOL * SICKDAYS Crosstabulation Count 347 113 145 605 154 63 56 273 52 25 34 111 553 201 235 989 Without Risk Hardly any Risk Some-Considerable Risk ALCOHOL Total 0 days 1-6 days 7+ days SICKDAYS Total 73496 ) 52 ( 63 ) 25 52 ( 56 ) 52 154 ( 113 ) 25 52 63 154 ( 145 83164 ) 34 ( 63 ) 34 25 ( 154 ) 34 56 ( 113 ) 34 25 56 63 ( 347                     D C
  • 43. Measures of Association • Goodman and Kruskal’s Gamma: 1 1 ^ ^          D C D C • Kendall’s b: ) )( ( 2 . 2 2 . 2 ^       j i b n n n n D C  When there’s no association between the ordinal variables, the population based values of these measures are 0. Statistical software packages provide these tests.
  • 44. Example - Alcohol Use and Sick Days 0617 . 0 73496 83164 73496 83164 ^        D C D C  Symmetric Measures .035 .030 1.187 .235 .062 .052 1.187 .235 989 Kendall's tau-b Gamma Ordinal by Ordinal N of Valid Cases Value Asymp. Std. Error a Approx. T b Approx. Sig. Not assuming the null hypothesis. a. Using the asymptotic standard error assuming the null hypothesis. b.