Analysis of variance

ANALYSIS OF VARIANCE
(ANOVA)
DR RAVI ROHILLA
COMMUNITY MEDICINE
PT. B.D. SHARMA PGIMS, ROHTAK
1

Contents
 Parametric tests
 Difference b/w parametric & non-parametric tests
 Introduction of ANOVA
 Defining the Hypothesis
 Rationale for ANOVA
 Basic ANOVA situation and assumptions
 Methodology for Calculations
 F- distribution
 Example of 1 way ANOVA
2

Contents
 Violation of assumptions
 Two way ANOVA
 Null hypothesis and data layout
 Methodology for calculations and example
 Comparisons in ANOVA
 ANOVA with repeated measures
 Assumptions and example
 MANOVA
 Assumptions and example
 Other tests of interest and References
3

PARAMETRIC TESTS
 Parameter- summary value that describes the
population such as mean, variance, correlation
coefficient, proportion etc.
 Parametric test- population constants as described
above are used such as mean, variances etc. and
data tend to follow one assumed or established
distribution such as normal, binomial, poisson etc.
4

Parametric vs. non-parametric tests
Choosing parametric
test
Choosing non-parametric
test
Correlation test Pearson Spearman
Independent measures,
2 groups
Independent-measures
t-test
Mann-Whitney test
Independent measures,
>2 groups
One-way, independent-measures
ANOVA
Kruskal-Wallis test
Repeated measures, 2
conditions
Matched-pair t-test Wilcoxon test
Repeated measures, >2
conditions
One-way, repeated
measures ANOVA
Friedman's test
5

(ANalysis Of VAriance)
• Idea: For two or more groups, test difference
between means, for quantitative normally
distributed variables.
• Just an extension of the t-test (an ANOVA with only
two groups is mathematically equivalent to a t-test).
6

EXAMPLE
Question 1.
Marks of 8th class students of two schools are given. Find
whether the scores are differing significantly with each other.
A 45 35 45 46 48 41 42 39 49
B 49 47 36 48 42 38 41 42 45
ANSWER: The test applicable here is T-test since there
are two groups involved here.
7

EXAMPLE
Question 2.
Marks of 8th class students of four schools are given. Find
whether the scores are differing significantly with each other.
A 45 35 45 46 48 41 42 39 49
B 49 47 36 48 42 38 41 42 45
C 43 45 42 37 39 40 41 35 47
D 34 48 47 42 36 41 45 42 48
ANSWER: The test applicable here is ANOVA since there
are more than two groups(4) involved here.
8

Why Not Use t-test Repeatedly?
• The t-test can only be used to test differences
between two means.
• Conducting multiple t-tests can lead to severe
inflation of the Type I error rate (false positives) and
is NOT RECOMMENDED
• ANOVA is used to test for differences among several
means without increasing the Type I error rate
9

Defining the hypothesis
10
The Null & Alternate hypotheses for one-way
ANOVA are:
H0  1  2...  i
Ha  not all of the i are equal

Rationale of the test
• Null Hypothesis states that all groups come from the
same population; have the same means!
• But do all groups have the same population mean??
• We need to compare the sample means.
We compare the variation among (between) the
means of several groups with the variation within
groups
11

30
25
20
19
12
10
9
7
A small variability within
the samples makes it easier
to draw a conclusion about the
population means.
1
Treatment 1 Treatment 2Treatment 3
20
16
15
14
11
10
9
10 x1 
x2 15
x3  20
TreatmentT 1reatmentT 2reatment 3
x110
x2 15
20 x3 
The sample means are the same as before,
but the larger within-sample variability
makes it harder to draw a conclusion
about the population means.
12
FIGURE 1 FIGURE 2

• Clearly, we can conclude that the groups appear
most different or distinct in the 1st figure. Why?
• Because there is relatively little overlap between the
bell-shaped curves. In the high variability case, the
group difference appears least striking because the
bell-shaped distributions overlap so much.
14

• This leads us to a very important conclusion: when
we are looking at the differences between scores for
two or more groups, we have to judge the difference
between their means relative to the spread or
variability of their scores.
15

Types of variability
Two types of variability are employed when
testing for the equality of the means:
17
Between Group variability
&
Within Group variability

 To distinguish between the groups, the variability
between (or among) the groups must be greater
than the variability within the groups.
 If the within-groups variability is large compared with
the between-groups variability, any difference
between the groups is difficult to detect.
 To determine whether or not the group means are
significantly different, the variability between groups
and the variability within groups are compared
18

The basic ANOVA situation
• Two variables: 1 Categorical, 1 Quantitative
• Main Question: Whether there are any significant
differences between the means of three or more
independent (unrelated) groups?
19

Assumptions
• Normality: the values within each group are
normally distributed.
• Homogeneity of variances: the variance within each
group should be equal for all groups.
• Independence of error: The error(variation of each
value around its own group mean) should be
independent of each value.
20

ANOVA Calculations
Sum of squares represent variation present in the data. They are calculated
by summing squared deviations. The simple ANOVA design have 3 sums
of squares.
  2 ( ) TOT i G SS X X
The total sum of squares comes from the
distance of all the scores from the grand
mean.
  2 ( ) W i A SS X X
The within-group or within-cell sum of
squares comes from the distance of the
observations to the cell means. This
indicates error.
  2 ( ) B A A G SS N X X
The between-cells or between-groups
sum of squares tells of the distance of
the cell means from the grand mean.
This indicates IV effects.
TOT B W SS  SS  SS 21

Calculating variance between groups
1. Calculate the mean of each sample.
2. Calculate the Grand average
3. Take the difference between the means of the
nx n x ... n x
  
various samples and the grand average.
1 1 2 2 k k
n n ... n
X

4. Square these deviations and obtain the total which
  
1 2 k
will give sum of squares between the samples
5. Divide the total obtained in step 4 by the degrees of
2
i SST n (x x) i
freedom to calculate the mean squares for
treatment(MST).
i
22

Calculating Variance within groups
1. Calculate the mean value of each sample
2. Take the deviations of the various items in a sample
from the mean values of the respective samples.
3. Square these deviations and obtain the total which
gives the sum of square within the samples
4. Divide the total obtained in 3rd step by the degrees
(  )2
of freedom to SSE calculate the x mean x
i squares for
ij error(MSE).
ij
23

The mean sum of squares
• To perform the test we need to calculate the
mean squares as follows
SST
1

k
MST
SSE
n k
MSE


Calculation of
MST-Mean Square
for Treatments
Calculation of MSE
Mean Square for Error
24

ANOVA: one way classification model
Source of
Variation
SS (Sum of
Squares)
Degrees of
Freedom
MS Mean
Square
Variance
Ratio of F
Between
Samples
Sum of squares
between samples
SSB/SST k-1 MST=
SST/(k-1)
MST/MSE
Within
Samples
Sum of squares
within samples
Total sum of
square of
variations
SSW/SSE n-k MSE=
SSE/(n-k)
Total SS(Total) n-1
k= No of Groups, n= Total No of observations 25

The F-distribution
 The F distribution is the probability
distribution associated with the f statistic
 The F-test tests the hypothesis that two
variances are equal.
26

Calculation of the test statistic
Variability between groups
Variability within groups
F 
with the following degrees of freedom:
v1=k -1 and v2=n-k
28
F- statistic =
푀푆푇
푀푆퐸
We compare the F-statistic value with F(critical) value
which is obtained by looking for it in F distribution tables
against degrees of freedom respectively.

Example
Group1 Group 2 Group3 Group 4
60 50 48 47
67 52 49 67
42 43 50 54
67 67 55 67
56 67 56 68
62 59 61 65
64 67 61 65
59 64 60 56
72 63 59 60
71 65 64 65
29

60 50 48 47
67 52 49 67
42 43 50 54
67 67 55 67
56 67 56 68
62 59 61 65
64 67 61 65
59 64 60 56
72 63 59 60
71 65 64 65
62.0 59.7 56.3 61.4
Step 1) calculate the sum
of squares between
groups:
Grand mean= 59.85
SSB = [(62-59.85)2 +
(59.7-59.85)2 + (56.3-
59.85)2 + (61.4-59.85)2]
x n per group =
19.65x10 = 196.5
Mean
DEGREES OF
FREEDOM(df) are
k-1 = 4-1 = 3
30

Step 2) calculate the sum
of squares within groups:
SSW=(60-62) 2+(67-62)
2+ … + (50-59.7) 2+ (52-
59.7) 2+ …+(48-56.3)2
+(49-56.3)2 +…(sum of
40 squared deviations)
= 2060.6
60 50 48 47
67 52 49 67
42 43 50 54
67 67 55 67
56 67 56 68
62 59 61 65
64 67 61 65
59 64 60 56
72 63 59 60
71 65 64 65
Mean 62.0 59.7 56.3 61.4
DEGREES OF
FREEDOM(df) are
N-k = 40-4 = 36
31

RESULTS
Source of
variation
df Sum of
Squares
Mean Sum
of Squares
F-Statistic P-value
Between
(Treatment
effect)
3 196.5 65.5 1.14 .344
Within
(Error)
36 2060.6 57.2 - -
Total 39 2257.1 - - -
32
F(critical)
= 2.866

Violations of Assumptions
Testing for Normality
• Each of the populations being compared should
follow a normal distribution. This can be tested using
various normality tests, such as:
 Shapiro-Wilk test or
 Kolmogorov–Smirnov test or
 Assessed graphically using a normal quantile plot or
Histogram.
34

Remedies for Non- normal data
• Transform your data using various algorithms so that
the shape of your distributions become normally
distributed . Common transformation used are:
 Logarithm
 Square root and
 Multiplicative inverse
• Choose the non-parametric Kruskal-Wallis H Test
which does not require the assumption of normality.
35

Testing for variances
• The populations being compared should have the
same variance. Tested by using
 Levene's test,
 Modified Levene's test
 Bartlett's test
36

Remedies for heterogenous data
• Two tests that we can run when the assumption of
homogeneity of variances has been violated are:
Welch test or
 Brown and Forsythe test.
• Alternatively, we can run a Kruskal-Wallis H Test. But
for most situations it has been shown that the Welsh
test is best.
37

Two-Way ANOVA
• One dependent variable (quantitative variable),
• Two independent variables (classifying variables =
factors)
• Key Advantages
– Compare relative influences on DV
– Examine interactions between IV
38

Example
IV#1 IV#2 DV
– Drug Level Age of Patient Anxiety Level
– Type of Therapy Length of Therapy Anxiety Level
– Type of Exercise Type of Diet Weight
Change
39

Null hypothesis
 The two-way ANOVA include tests of three
null hypotheses:
 That the means of observations grouped by one
factor are the same;
 That the means of observations grouped by the
other factor are the same; and
 That there is no interaction between the two
factors. The interaction test tells you whether
the effects of one factor depend on the other
factor.
40

Two-Way ANOVA Data Layout
Observation k
in each cell
Xijk
Level i
Factor
A
Level j
Factor
B
Factor Factor B
A 1 2 ... b
1
X111 X121 ... X1b1
X11n X12n ... X1bn
2
X211 X221 ... X2b1
X21n X22n ... X2bn
: : : : :
a Xa11 Xa21 ... Xab1
Xa1n Xa2n ... Xabn
i = 1,…,a
j = 1,…,b
k = 1,…,n
There are a X b treatment combinations
41

Formula for calculations
• Just as we had Sums of Squares and Mean
Squares in One-way ANOVA, we have the
same in Two-way ANOVA.
• In balanced Two-way ANOVA, we measure
the overall variability in the data by:
a
2    
 SS X X df N
( ) 1
T ijk
  
1 1 1
i
b
j
n
k
42

Sum of Squares for factor A
2       
  SS X X bn X X df a
( ) ( ) 1
1
2
A i
1 1 1

 
  
a
i
i
a
i
b
j
n
k
Sum of Squares for factor B
a
2 2 ( ) ( ) 1
 
   
B j j SS X X an X X df b
           
i
b
j
n
k
b
j
1 1 1 1
43

Interaction Sum of Squares
a

  
AB ij i j SS X X X X df a b
           
i
b
j
n
k
1 1 1
2 ( ) ( 1)( 1)
Measures the variation in the response due to the interaction
between factors A and B.
Error or Residual Sum of Squares
a

  
E ijk ij SS X X df ab n
    
i
b
j
n
k
1 1 1
2 ( ) ( 1)
Measures the variation in the response within the a x b factor
combinations.
44

• So the Two-way ANOVA Identity is:
T A B AB E SS  SS  SS  SS  SS
• This partitions the Total Sum of Squares
into four pieces of interest for our
hypotheses to be tested.
45

Two-way ANOVA Table
Source of
Variation
Degrees of
Freedom
Sum of
Squares
Mean
Square F-ratio
P-value
Factor A a  1 SSA MSA FA = MSA / MSE Tail area
Factor B b  1 SSB MSB FB = MSB / MSE Tail area
Interaction (a – 1)(b – 1) SSAB MSAB FAB = MSAB /
MSE
Tail area
Error ab(n – 1) SSE MSE
Total abn  1 SST
46

WHAT AFTER ANOVA RESULTS??
• ANOVA test tells us whether we have an
overall difference between our groups but it
does not tell us which specific groups differed.
• Two possibilities are then available:
For specific predictions k/a priori tests(contrasts)
For predictions after the test k/a post-hoc
comparisons.
47

CONTRASTS
• Known as priori or planned comparisons
– Used when a researcher plans to compare specific group
means prior to collecting the data or
– Decides to compare group means after the data has been
collected and noticing that some of the means appears to
be different.
– This can be tested, even when the H0 cannot be rejected.
– Bonferroni t procedure(referred as Dunn’s test) is used.
48

Post-hoc Tests
• Post-hoc tests provide solution to this and therefore
should only be run when we have an overall
significant difference in group means.
• Post-hoc tests are termed a posteriori tests - that is,
performed after the event.
– Tukey’s HSD Procedure
– Scheffe’s Procedure
– Newman-Keuls Procedure
– Dunnett’s Procedure
49

Example in SPSS
• A clinical psychologist is interested in comparing the relative
effectiveness of three forms of psychotherapy for alleviating
depression. Fifteen individuals are randomly assigned to each
of three treatment groups: cognitive-behavioral, Rogerian,
and assertiveness training. The Depression Scale of MMPI
serves as the response. The psychologist also wished to
incorporate information about the patient’s severity of
depression, so all subjects in the study were classified as
having mild, moderate, or severe depression.
50

ANOVA with Repeated Measures
• An ANOVA with repeated measures is for comparing
three or more group means where the participants
are the same in each group.
• This usually occurs in two situations –
– when participants are measured multiple times to see
changes to an intervention or
– when participants are subjected to more than one
condition/trial and the response to each of these
conditions wants to be compared
51

Assumptions
• The dependent variable is interval or ratio
(continuous).
• Dependent variable is approximately normally
distributed.
• Sphericity
• One independent variable where participants are
tested on the same dependent variable at least 2
times.
Sphericity is the condition where the variances of the differences between all
combinations of related groups (levels) are equal.
52

Sphericity violation
• Sphericity can be likened to homogeneity of
variances in a between-subjects ANOVA.
• The violation of sphericity is serious for the Repeated
Measures ANOVA, with violation causing the test to
have an increase in the Type I error rate).
• Mauchly's Test of Sphericity tests the assumption of
sphericity.
53

Sphericity violation
• The corrections employed to combat the violation of
the assumption of sphericity are:
 Lower-bound estimate,
 Greenhouse-Geisser correction and
 Huynh-Feldt correction.
• The corrections are applied to the degrees of
freedom (df) such that a valid critical F-value can be
obtained.
54

Problems
• In a 6-month exercise training program with 20
participants, a researcher measured CRP levels of the
subjects before training, 2 weeks into training and
post-6-months-training.
• The researcher wished to know whether protection
against heart disease might be afforded by exercise
and whether this protection might be gained over a
short period of time or whether it took longer.
55

MANOVA
• MANOVA is a procedure used to test the significance
of the effects of one or more IVs on two or more
DVs.
• MANOVA can be viewed as an extension of ANOVA
with the key difference that we are dealing with
many dependent variables (not a single DV as in the
case of ANOVA)
56

Data requirements
• Dependent Variables
– Interval (or ratio) level variables
– May be correlated
– Multivariate normality
– Homogeneity of variance
• Independent Variable(s)
– Nominal level variable(s)
– At least two groups with each independent variable
– Each independent variable should be independent of
each other
57

Various tests to use
 Wilk's Lambda
 Widely used; good balance between power and
assumptions
 Pillai's Trace
 Useful when sample sizes are small, cell sizes are unequal,
or covariances are not homogeneous
 Hotelling's (Lawley-Hotelling) Trace
 Useful when examining differences between two groups
58

Results
• The result of a MANOVA simply tells us that a
difference exists (or not) across groups.
• It does not tell us which treatment(s) differ or what is
contributing to the differences.
• For such information, we need to run ANOVAs with
post hoc tests.
59

Example
• A high school takes its intake from three different
primary schools. A teacher was concerned that, even
after a few years, there were academic differences
between the pupils from the different schools. As
such, she randomly selected 20 pupils from each
School and measured their academic performance by
end-of-year English and Maths exams.
60
INDEPENDENT VARIABLE is Gender with
male and female categories
DEPENDENT VARIABLE are English and math
scores

Other tests of Interest
• ANCOVA
Analysis of Covariance
This test is a blend of ANOVA and linear regression
MANCOVA
Multivariate analysis of covariance
One or more continous covariates present
61

References
• Wikepedia: Encyclopedia. Available from
URL:http://en.wikipedia.org/wiki/
• Methods in Biostatistics by BK Mahajan
• Statistical Methods by SP Gupta
• Basic & Clinical Biostatistics by Dawson and Beth
• Statistical Methods in Medical Research by
Armitage, Berry, Matthews
62

Analysis of variance

More Related Content

What's hot

Viewers also liked

Similar to Analysis of variance

More from Ravi Rohilla

Recently uploaded

In this document

Analysis of variance