ANALYSIS OF VARIANCE 
(ANOVA) 
DR RAVI ROHILLA 
COMMUNITY MEDICINE 
PT. B.D. SHARMA PGIMS, ROHTAK 
1
Contents 
 Parametric tests 
 Difference b/w parametric & non-parametric tests 
 Introduction of ANOVA 
 Defining the Hypothesis 
 Rationale for ANOVA 
 Basic ANOVA situation and assumptions 
 Methodology for Calculations 
 F- distribution 
 Example of 1 way ANOVA 
2
Contents 
 Violation of assumptions 
 Two way ANOVA 
 Null hypothesis and data layout 
 Methodology for calculations and example 
 Comparisons in ANOVA 
 ANOVA with repeated measures 
 Assumptions and example 
 MANOVA 
 Assumptions and example 
 Other tests of interest and References 
3
PARAMETRIC TESTS 
 Parameter- summary value that describes the 
population such as mean, variance, correlation 
coefficient, proportion etc. 
 Parametric test- population constants as described 
above are used such as mean, variances etc. and 
data tend to follow one assumed or established 
distribution such as normal, binomial, poisson etc. 
4
Parametric vs. non-parametric tests 
Choosing parametric 
test 
Choosing non-parametric 
test 
Correlation test Pearson Spearman 
Independent measures, 
2 groups 
Independent-measures 
t-test 
Mann-Whitney test 
Independent measures, 
>2 groups 
One-way, independent-measures 
ANOVA 
Kruskal-Wallis test 
Repeated measures, 2 
conditions 
Matched-pair t-test Wilcoxon test 
Repeated measures, >2 
conditions 
One-way, repeated 
measures ANOVA 
Friedman's test 
5
(ANalysis Of VAriance) 
• Idea: For two or more groups, test difference 
between means, for quantitative normally 
distributed variables. 
• Just an extension of the t-test (an ANOVA with only 
two groups is mathematically equivalent to a t-test). 
6
EXAMPLE 
Question 1. 
Marks of 8th class students of two schools are given. Find 
whether the scores are differing significantly with each other. 
A 45 35 45 46 48 41 42 39 49 
B 49 47 36 48 42 38 41 42 45 
ANSWER: The test applicable here is T-test since there 
are two groups involved here. 
7
EXAMPLE 
Question 2. 
Marks of 8th class students of four schools are given. Find 
whether the scores are differing significantly with each other. 
A 45 35 45 46 48 41 42 39 49 
B 49 47 36 48 42 38 41 42 45 
C 43 45 42 37 39 40 41 35 47 
D 34 48 47 42 36 41 45 42 48 
ANSWER: The test applicable here is ANOVA since there 
are more than two groups(4) involved here. 
8
Why Not Use t-test Repeatedly? 
• The t-test can only be used to test differences 
between two means. 
• Conducting multiple t-tests can lead to severe 
inflation of the Type I error rate (false positives) and 
is NOT RECOMMENDED 
• ANOVA is used to test for differences among several 
means without increasing the Type I error rate 
9
Defining the hypothesis 
10 
The Null & Alternate hypotheses for one-way 
ANOVA are: 
H0  1  2...  i 
Ha  not all of the i are equal
Rationale of the test 
• Null Hypothesis states that all groups come from the 
same population; have the same means! 
• But do all groups have the same population mean?? 
• We need to compare the sample means. 
We compare the variation among (between) the 
means of several groups with the variation within 
groups 
11
30 
25 
20 
19 
12 
10 
9 
7 
A small variability within 
the samples makes it easier 
to draw a conclusion about the 
population means. 
1 
Treatment 1 Treatment 2Treatment 3 
20 
16 
15 
14 
11 
10 
9 
10 x1  
x2 15 
x3  20 
TreatmentT 1reatmentT 2reatment 3 
x110 
x2 15 
20 x3  
The sample means are the same as before, 
but the larger within-sample variability 
makes it harder to draw a conclusion 
about the population means. 
12 
FIGURE 1 FIGURE 2
13 
FIGURE 1 FIGURE 2
Rationale of the test 
• Clearly, we can conclude that the groups appear 
most different or distinct in the 1st figure. Why? 
• Because there is relatively little overlap between the 
bell-shaped curves. In the high variability case, the 
group difference appears least striking because the 
bell-shaped distributions overlap so much. 
14
Rationale of the test 
• This leads us to a very important conclusion: when 
we are looking at the differences between scores for 
two or more groups, we have to judge the difference 
between their means relative to the spread or 
variability of their scores. 
15
Types of variability 
Two types of variability are employed when 
testing for the equality of the means: 
17 
Between Group variability 
& 
Within Group variability
 To distinguish between the groups, the variability 
between (or among) the groups must be greater 
than the variability within the groups. 
 If the within-groups variability is large compared with 
the between-groups variability, any difference 
between the groups is difficult to detect. 
 To determine whether or not the group means are 
significantly different, the variability between groups 
and the variability within groups are compared 
18
The basic ANOVA situation 
• Two variables: 1 Categorical, 1 Quantitative 
• Main Question: Whether there are any significant 
differences between the means of three or more 
independent (unrelated) groups? 
19
Assumptions 
• Normality: the values within each group are 
normally distributed. 
• Homogeneity of variances: the variance within each 
group should be equal for all groups. 
• Independence of error: The error(variation of each 
value around its own group mean) should be 
independent of each value. 
20
ANOVA Calculations 
Sum of squares represent variation present in the data. They are calculated 
by summing squared deviations. The simple ANOVA design have 3 sums 
of squares. 
  2 ( ) TOT i G SS X X 
The total sum of squares comes from the 
distance of all the scores from the grand 
mean. 
  2 ( ) W i A SS X X 
The within-group or within-cell sum of 
squares comes from the distance of the 
observations to the cell means. This 
indicates error. 
  2 ( ) B A A G SS N X X 
The between-cells or between-groups 
sum of squares tells of the distance of 
the cell means from the grand mean. 
This indicates IV effects. 
TOT B W SS  SS  SS 21
Calculating variance between groups 
1. Calculate the mean of each sample. 
2. Calculate the Grand average 
3. Take the difference between the means of the 
nx n x ... n x 
   
various samples and the grand average. 
1 1 2 2 k k 
n n ... n 
X 
 
4. Square these deviations and obtain the total which 
   
1 2 k 
will give sum of squares between the samples 
5. Divide the total obtained in step 4 by the degrees of 
2 
i SST n (x x) i 
freedom to calculate the mean squares for 
treatment(MST). 
i 
22
Calculating Variance within groups 
1. Calculate the mean value of each sample 
2. Take the deviations of the various items in a sample 
from the mean values of the respective samples. 
3. Square these deviations and obtain the total which 
gives the sum of square within the samples 
4. Divide the total obtained in 3rd step by the degrees 
(  )2 
of freedom to SSE calculate the x mean x 
i squares for 
ij error(MSE). 
ij 
23
The mean sum of squares 
• To perform the test we need to calculate the 
mean squares as follows 
SST 
1 
 
k 
MST 
SSE 
n k 
MSE 
 
 
Calculation of 
MST-Mean Square 
for Treatments 
Calculation of MSE 
Mean Square for Error 
24
ANOVA: one way classification model 
Source of 
Variation 
SS (Sum of 
Squares) 
Degrees of 
Freedom 
MS Mean 
Square 
Variance 
Ratio of F 
Between 
Samples 
Sum of squares 
between samples 
SSB/SST k-1 MST= 
SST/(k-1) 
MST/MSE 
Within 
Samples 
Sum of squares 
within samples 
Total sum of 
square of 
variations 
SSW/SSE n-k MSE= 
SSE/(n-k) 
Total SS(Total) n-1 
k= No of Groups, n= Total No of observations 25
The F-distribution 
 The F distribution is the probability 
distribution associated with the f statistic 
 The F-test tests the hypothesis that two 
variances are equal. 
26
F- Distribution 
27
Calculation of the test statistic 
Variability between groups 
Variability within groups 
F  
with the following degrees of freedom: 
v1=k -1 and v2=n-k 
28 
F- statistic = 
푀푆푇 
푀푆퐸 
We compare the F-statistic value with F(critical) value 
which is obtained by looking for it in F distribution tables 
against degrees of freedom respectively.
Example 
Group1 Group 2 Group3 Group 4 
60 50 48 47 
67 52 49 67 
42 43 50 54 
67 67 55 67 
56 67 56 68 
62 59 61 65 
64 67 61 65 
59 64 60 56 
72 63 59 60 
71 65 64 65 
29
Group1 Group 2 Group3 Group 4 
60 50 48 47 
67 52 49 67 
42 43 50 54 
67 67 55 67 
56 67 56 68 
62 59 61 65 
64 67 61 65 
59 64 60 56 
72 63 59 60 
71 65 64 65 
62.0 59.7 56.3 61.4 
Step 1) calculate the sum 
of squares between 
groups: 
Grand mean= 59.85 
SSB = [(62-59.85)2 + 
(59.7-59.85)2 + (56.3- 
59.85)2 + (61.4-59.85)2] 
x n per group = 
19.65x10 = 196.5 
Mean 
DEGREES OF 
FREEDOM(df) are 
k-1 = 4-1 = 3 
30
Step 2) calculate the sum 
of squares within groups: 
SSW=(60-62) 2+(67-62) 
2+ … + (50-59.7) 2+ (52- 
59.7) 2+ …+(48-56.3)2 
+(49-56.3)2 +…(sum of 
40 squared deviations) 
= 2060.6 
Group1 Group 2 Group3 Group 4 
60 50 48 47 
67 52 49 67 
42 43 50 54 
67 67 55 67 
56 67 56 68 
62 59 61 65 
64 67 61 65 
59 64 60 56 
72 63 59 60 
71 65 64 65 
Mean 62.0 59.7 56.3 61.4 
DEGREES OF 
FREEDOM(df) are 
N-k = 40-4 = 36 
31
RESULTS 
Source of 
variation 
df Sum of 
Squares 
Mean Sum 
of Squares 
F-Statistic P-value 
Between 
(Treatment 
effect) 
3 196.5 65.5 1.14 .344 
Within 
(Error) 
36 2060.6 57.2 - - 
Total 39 2257.1 - - - 
32 
F(critical) 
= 2.866
Violations of Assumptions 
Testing for Normality 
• Each of the populations being compared should 
follow a normal distribution. This can be tested using 
various normality tests, such as: 
 Shapiro-Wilk test or 
 Kolmogorov–Smirnov test or 
 Assessed graphically using a normal quantile plot or 
Histogram. 
34
Remedies for Non- normal data 
• Transform your data using various algorithms so that 
the shape of your distributions become normally 
distributed . Common transformation used are: 
 Logarithm 
 Square root and 
 Multiplicative inverse 
• Choose the non-parametric Kruskal-Wallis H Test 
which does not require the assumption of normality. 
35
Testing for variances 
• The populations being compared should have the 
same variance. Tested by using 
 Levene's test, 
 Modified Levene's test 
 Bartlett's test 
36
Remedies for heterogenous data 
• Two tests that we can run when the assumption of 
homogeneity of variances has been violated are: 
Welch test or 
 Brown and Forsythe test. 
• Alternatively, we can run a Kruskal-Wallis H Test. But 
for most situations it has been shown that the Welsh 
test is best. 
37
Two-Way ANOVA 
• One dependent variable (quantitative variable), 
• Two independent variables (classifying variables = 
factors) 
• Key Advantages 
– Compare relative influences on DV 
– Examine interactions between IV 
38
Example 
IV#1 IV#2 DV 
– Drug Level Age of Patient Anxiety Level 
– Type of Therapy Length of Therapy Anxiety Level 
– Type of Exercise Type of Diet Weight 
Change 
39
Null hypothesis 
 The two-way ANOVA include tests of three 
null hypotheses: 
 That the means of observations grouped by one 
factor are the same; 
 That the means of observations grouped by the 
other factor are the same; and 
 That there is no interaction between the two 
factors. The interaction test tells you whether 
the effects of one factor depend on the other 
factor. 
40
Two-Way ANOVA Data Layout 
Observation k 
in each cell 
Xijk 
Level i 
Factor 
A 
Level j 
Factor 
B 
Factor Factor B 
A 1 2 ... b 
1 
X111 X121 ... X1b1 
X11n X12n ... X1bn 
2 
X211 X221 ... X2b1 
X21n X22n ... X2bn 
: : : : : 
a Xa11 Xa21 ... Xab1 
Xa1n Xa2n ... Xabn 
i = 1,…,a 
j = 1,…,b 
k = 1,…,n 
There are a X b treatment combinations 
41
Formula for calculations 
• Just as we had Sums of Squares and Mean 
Squares in One-way ANOVA, we have the 
same in Two-way ANOVA. 
• In balanced Two-way ANOVA, we measure 
the overall variability in the data by: 
a 
2     
 SS X X df N 
( ) 1 
T ijk 
   
1 1 1 
i 
b 
j 
n 
k 
42
Formula for calculations 
Sum of Squares for factor A 
2        
  SS X X bn X X df a 
( ) ( ) 1 
1 
2 
A i 
1 1 1 
 
  
   
a 
i 
i 
a 
i 
b 
j 
n 
k 
Sum of Squares for factor B 
a 
2 2 ( ) ( ) 1 
  
    
B j j SS X X an X X df b 
            
i 
b 
j 
n 
k 
b 
j 
1 1 1 1 
43
Formula for calculations 
Interaction Sum of Squares 
a 
 
   
AB ij i j SS X X X X df a b 
            
i 
b 
j 
n 
k 
1 1 1 
2 ( ) ( 1)( 1) 
Measures the variation in the response due to the interaction 
between factors A and B. 
Error or Residual Sum of Squares 
a 
 
   
E ijk ij SS X X df ab n 
     
i 
b 
j 
n 
k 
1 1 1 
2 ( ) ( 1) 
Measures the variation in the response within the a x b factor 
combinations. 
44
Formula for calculations 
• So the Two-way ANOVA Identity is: 
T A B AB E SS  SS  SS  SS  SS 
• This partitions the Total Sum of Squares 
into four pieces of interest for our 
hypotheses to be tested. 
45
Two-way ANOVA Table 
Source of 
Variation 
Degrees of 
Freedom 
Sum of 
Squares 
Mean 
Square F-ratio 
P-value 
Factor A a  1 SSA MSA FA = MSA / MSE Tail area 
Factor B b  1 SSB MSB FB = MSB / MSE Tail area 
Interaction (a – 1)(b – 1) SSAB MSAB FAB = MSAB / 
MSE 
Tail area 
Error ab(n – 1) SSE MSE 
Total abn  1 SST 
46
WHAT AFTER ANOVA RESULTS?? 
• ANOVA test tells us whether we have an 
overall difference between our groups but it 
does not tell us which specific groups differed. 
• Two possibilities are then available: 
For specific predictions k/a priori tests(contrasts) 
For predictions after the test k/a post-hoc 
comparisons. 
47
CONTRASTS 
• Known as priori or planned comparisons 
– Used when a researcher plans to compare specific group 
means prior to collecting the data or 
– Decides to compare group means after the data has been 
collected and noticing that some of the means appears to 
be different. 
– This can be tested, even when the H0 cannot be rejected. 
– Bonferroni t procedure(referred as Dunn’s test) is used. 
48
Post-hoc Tests 
• Post-hoc tests provide solution to this and therefore 
should only be run when we have an overall 
significant difference in group means. 
• Post-hoc tests are termed a posteriori tests - that is, 
performed after the event. 
– Tukey’s HSD Procedure 
– Scheffe’s Procedure 
– Newman-Keuls Procedure 
– Dunnett’s Procedure 
49
Example in SPSS 
• A clinical psychologist is interested in comparing the relative 
effectiveness of three forms of psychotherapy for alleviating 
depression. Fifteen individuals are randomly assigned to each 
of three treatment groups: cognitive-behavioral, Rogerian, 
and assertiveness training. The Depression Scale of MMPI 
serves as the response. The psychologist also wished to 
incorporate information about the patient’s severity of 
depression, so all subjects in the study were classified as 
having mild, moderate, or severe depression. 
50
ANOVA with Repeated Measures 
• An ANOVA with repeated measures is for comparing 
three or more group means where the participants 
are the same in each group. 
• This usually occurs in two situations – 
– when participants are measured multiple times to see 
changes to an intervention or 
– when participants are subjected to more than one 
condition/trial and the response to each of these 
conditions wants to be compared 
51
Assumptions 
• The dependent variable is interval or ratio 
(continuous). 
• Dependent variable is approximately normally 
distributed. 
• Sphericity 
• One independent variable where participants are 
tested on the same dependent variable at least 2 
times. 
Sphericity is the condition where the variances of the differences between all 
combinations of related groups (levels) are equal. 
52
Sphericity violation 
• Sphericity can be likened to homogeneity of 
variances in a between-subjects ANOVA. 
• The violation of sphericity is serious for the Repeated 
Measures ANOVA, with violation causing the test to 
have an increase in the Type I error rate). 
• Mauchly's Test of Sphericity tests the assumption of 
sphericity. 
53
Sphericity violation 
• The corrections employed to combat the violation of 
the assumption of sphericity are: 
 Lower-bound estimate, 
 Greenhouse-Geisser correction and 
 Huynh-Feldt correction. 
• The corrections are applied to the degrees of 
freedom (df) such that a valid critical F-value can be 
obtained. 
54
Problems 
• In a 6-month exercise training program with 20 
participants, a researcher measured CRP levels of the 
subjects before training, 2 weeks into training and 
post-6-months-training. 
• The researcher wished to know whether protection 
against heart disease might be afforded by exercise 
and whether this protection might be gained over a 
short period of time or whether it took longer. 
55
MANOVA 
• MANOVA is a procedure used to test the significance 
of the effects of one or more IVs on two or more 
DVs. 
• MANOVA can be viewed as an extension of ANOVA 
with the key difference that we are dealing with 
many dependent variables (not a single DV as in the 
case of ANOVA) 
56
Data requirements 
• Dependent Variables 
– Interval (or ratio) level variables 
– May be correlated 
– Multivariate normality 
– Homogeneity of variance 
• Independent Variable(s) 
– Nominal level variable(s) 
– At least two groups with each independent variable 
– Each independent variable should be independent of 
each other 
57
Various tests to use 
 Wilk's Lambda 
 Widely used; good balance between power and 
assumptions 
 Pillai's Trace 
 Useful when sample sizes are small, cell sizes are unequal, 
or covariances are not homogeneous 
 Hotelling's (Lawley-Hotelling) Trace 
 Useful when examining differences between two groups 
58
Results 
• The result of a MANOVA simply tells us that a 
difference exists (or not) across groups. 
• It does not tell us which treatment(s) differ or what is 
contributing to the differences. 
• For such information, we need to run ANOVAs with 
post hoc tests. 
59
Example 
• A high school takes its intake from three different 
primary schools. A teacher was concerned that, even 
after a few years, there were academic differences 
between the pupils from the different schools. As 
such, she randomly selected 20 pupils from each 
School and measured their academic performance by 
end-of-year English and Maths exams. 
60 
INDEPENDENT VARIABLE is Gender with 
male and female categories 
DEPENDENT VARIABLE are English and math 
scores
Other tests of Interest 
• ANCOVA 
Analysis of Covariance 
This test is a blend of ANOVA and linear regression 
MANCOVA 
Multivariate analysis of covariance 
One or more continous covariates present 
61
References 
• Wikepedia: Encyclopedia. Available from 
URL:http://en.wikipedia.org/wiki/ 
• Methods in Biostatistics by BK Mahajan 
• Statistical Methods by SP Gupta 
• Basic & Clinical Biostatistics by Dawson and Beth 
• Statistical Methods in Medical Research by 
Armitage, Berry, Matthews 
62
63 
THANKS

Analysis of variance

  • 1.
    ANALYSIS OF VARIANCE (ANOVA) DR RAVI ROHILLA COMMUNITY MEDICINE PT. B.D. SHARMA PGIMS, ROHTAK 1
  • 2.
    Contents  Parametrictests  Difference b/w parametric & non-parametric tests  Introduction of ANOVA  Defining the Hypothesis  Rationale for ANOVA  Basic ANOVA situation and assumptions  Methodology for Calculations  F- distribution  Example of 1 way ANOVA 2
  • 3.
    Contents  Violationof assumptions  Two way ANOVA  Null hypothesis and data layout  Methodology for calculations and example  Comparisons in ANOVA  ANOVA with repeated measures  Assumptions and example  MANOVA  Assumptions and example  Other tests of interest and References 3
  • 4.
    PARAMETRIC TESTS Parameter- summary value that describes the population such as mean, variance, correlation coefficient, proportion etc.  Parametric test- population constants as described above are used such as mean, variances etc. and data tend to follow one assumed or established distribution such as normal, binomial, poisson etc. 4
  • 5.
    Parametric vs. non-parametrictests Choosing parametric test Choosing non-parametric test Correlation test Pearson Spearman Independent measures, 2 groups Independent-measures t-test Mann-Whitney test Independent measures, >2 groups One-way, independent-measures ANOVA Kruskal-Wallis test Repeated measures, 2 conditions Matched-pair t-test Wilcoxon test Repeated measures, >2 conditions One-way, repeated measures ANOVA Friedman's test 5
  • 6.
    (ANalysis Of VAriance) • Idea: For two or more groups, test difference between means, for quantitative normally distributed variables. • Just an extension of the t-test (an ANOVA with only two groups is mathematically equivalent to a t-test). 6
  • 7.
    EXAMPLE Question 1. Marks of 8th class students of two schools are given. Find whether the scores are differing significantly with each other. A 45 35 45 46 48 41 42 39 49 B 49 47 36 48 42 38 41 42 45 ANSWER: The test applicable here is T-test since there are two groups involved here. 7
  • 8.
    EXAMPLE Question 2. Marks of 8th class students of four schools are given. Find whether the scores are differing significantly with each other. A 45 35 45 46 48 41 42 39 49 B 49 47 36 48 42 38 41 42 45 C 43 45 42 37 39 40 41 35 47 D 34 48 47 42 36 41 45 42 48 ANSWER: The test applicable here is ANOVA since there are more than two groups(4) involved here. 8
  • 9.
    Why Not Uset-test Repeatedly? • The t-test can only be used to test differences between two means. • Conducting multiple t-tests can lead to severe inflation of the Type I error rate (false positives) and is NOT RECOMMENDED • ANOVA is used to test for differences among several means without increasing the Type I error rate 9
  • 10.
    Defining the hypothesis 10 The Null & Alternate hypotheses for one-way ANOVA are: H0  1  2...  i Ha  not all of the i are equal
  • 11.
    Rationale of thetest • Null Hypothesis states that all groups come from the same population; have the same means! • But do all groups have the same population mean?? • We need to compare the sample means. We compare the variation among (between) the means of several groups with the variation within groups 11
  • 12.
    30 25 20 19 12 10 9 7 A small variability within the samples makes it easier to draw a conclusion about the population means. 1 Treatment 1 Treatment 2Treatment 3 20 16 15 14 11 10 9 10 x1  x2 15 x3  20 TreatmentT 1reatmentT 2reatment 3 x110 x2 15 20 x3  The sample means are the same as before, but the larger within-sample variability makes it harder to draw a conclusion about the population means. 12 FIGURE 1 FIGURE 2
  • 13.
    13 FIGURE 1FIGURE 2
  • 14.
    Rationale of thetest • Clearly, we can conclude that the groups appear most different or distinct in the 1st figure. Why? • Because there is relatively little overlap between the bell-shaped curves. In the high variability case, the group difference appears least striking because the bell-shaped distributions overlap so much. 14
  • 15.
    Rationale of thetest • This leads us to a very important conclusion: when we are looking at the differences between scores for two or more groups, we have to judge the difference between their means relative to the spread or variability of their scores. 15
  • 16.
    Types of variability Two types of variability are employed when testing for the equality of the means: 17 Between Group variability & Within Group variability
  • 17.
     To distinguishbetween the groups, the variability between (or among) the groups must be greater than the variability within the groups.  If the within-groups variability is large compared with the between-groups variability, any difference between the groups is difficult to detect.  To determine whether or not the group means are significantly different, the variability between groups and the variability within groups are compared 18
  • 18.
    The basic ANOVAsituation • Two variables: 1 Categorical, 1 Quantitative • Main Question: Whether there are any significant differences between the means of three or more independent (unrelated) groups? 19
  • 19.
    Assumptions • Normality:the values within each group are normally distributed. • Homogeneity of variances: the variance within each group should be equal for all groups. • Independence of error: The error(variation of each value around its own group mean) should be independent of each value. 20
  • 20.
    ANOVA Calculations Sumof squares represent variation present in the data. They are calculated by summing squared deviations. The simple ANOVA design have 3 sums of squares.   2 ( ) TOT i G SS X X The total sum of squares comes from the distance of all the scores from the grand mean.   2 ( ) W i A SS X X The within-group or within-cell sum of squares comes from the distance of the observations to the cell means. This indicates error.   2 ( ) B A A G SS N X X The between-cells or between-groups sum of squares tells of the distance of the cell means from the grand mean. This indicates IV effects. TOT B W SS  SS  SS 21
  • 21.
    Calculating variance betweengroups 1. Calculate the mean of each sample. 2. Calculate the Grand average 3. Take the difference between the means of the nx n x ... n x    various samples and the grand average. 1 1 2 2 k k n n ... n X  4. Square these deviations and obtain the total which    1 2 k will give sum of squares between the samples 5. Divide the total obtained in step 4 by the degrees of 2 i SST n (x x) i freedom to calculate the mean squares for treatment(MST). i 22
  • 22.
    Calculating Variance withingroups 1. Calculate the mean value of each sample 2. Take the deviations of the various items in a sample from the mean values of the respective samples. 3. Square these deviations and obtain the total which gives the sum of square within the samples 4. Divide the total obtained in 3rd step by the degrees (  )2 of freedom to SSE calculate the x mean x i squares for ij error(MSE). ij 23
  • 23.
    The mean sumof squares • To perform the test we need to calculate the mean squares as follows SST 1  k MST SSE n k MSE   Calculation of MST-Mean Square for Treatments Calculation of MSE Mean Square for Error 24
  • 24.
    ANOVA: one wayclassification model Source of Variation SS (Sum of Squares) Degrees of Freedom MS Mean Square Variance Ratio of F Between Samples Sum of squares between samples SSB/SST k-1 MST= SST/(k-1) MST/MSE Within Samples Sum of squares within samples Total sum of square of variations SSW/SSE n-k MSE= SSE/(n-k) Total SS(Total) n-1 k= No of Groups, n= Total No of observations 25
  • 25.
    The F-distribution The F distribution is the probability distribution associated with the f statistic  The F-test tests the hypothesis that two variances are equal. 26
  • 26.
  • 27.
    Calculation of thetest statistic Variability between groups Variability within groups F  with the following degrees of freedom: v1=k -1 and v2=n-k 28 F- statistic = 푀푆푇 푀푆퐸 We compare the F-statistic value with F(critical) value which is obtained by looking for it in F distribution tables against degrees of freedom respectively.
  • 28.
    Example Group1 Group2 Group3 Group 4 60 50 48 47 67 52 49 67 42 43 50 54 67 67 55 67 56 67 56 68 62 59 61 65 64 67 61 65 59 64 60 56 72 63 59 60 71 65 64 65 29
  • 29.
    Group1 Group 2Group3 Group 4 60 50 48 47 67 52 49 67 42 43 50 54 67 67 55 67 56 67 56 68 62 59 61 65 64 67 61 65 59 64 60 56 72 63 59 60 71 65 64 65 62.0 59.7 56.3 61.4 Step 1) calculate the sum of squares between groups: Grand mean= 59.85 SSB = [(62-59.85)2 + (59.7-59.85)2 + (56.3- 59.85)2 + (61.4-59.85)2] x n per group = 19.65x10 = 196.5 Mean DEGREES OF FREEDOM(df) are k-1 = 4-1 = 3 30
  • 30.
    Step 2) calculatethe sum of squares within groups: SSW=(60-62) 2+(67-62) 2+ … + (50-59.7) 2+ (52- 59.7) 2+ …+(48-56.3)2 +(49-56.3)2 +…(sum of 40 squared deviations) = 2060.6 Group1 Group 2 Group3 Group 4 60 50 48 47 67 52 49 67 42 43 50 54 67 67 55 67 56 67 56 68 62 59 61 65 64 67 61 65 59 64 60 56 72 63 59 60 71 65 64 65 Mean 62.0 59.7 56.3 61.4 DEGREES OF FREEDOM(df) are N-k = 40-4 = 36 31
  • 31.
    RESULTS Source of variation df Sum of Squares Mean Sum of Squares F-Statistic P-value Between (Treatment effect) 3 196.5 65.5 1.14 .344 Within (Error) 36 2060.6 57.2 - - Total 39 2257.1 - - - 32 F(critical) = 2.866
  • 32.
    Violations of Assumptions Testing for Normality • Each of the populations being compared should follow a normal distribution. This can be tested using various normality tests, such as:  Shapiro-Wilk test or  Kolmogorov–Smirnov test or  Assessed graphically using a normal quantile plot or Histogram. 34
  • 33.
    Remedies for Non-normal data • Transform your data using various algorithms so that the shape of your distributions become normally distributed . Common transformation used are:  Logarithm  Square root and  Multiplicative inverse • Choose the non-parametric Kruskal-Wallis H Test which does not require the assumption of normality. 35
  • 34.
    Testing for variances • The populations being compared should have the same variance. Tested by using  Levene's test,  Modified Levene's test  Bartlett's test 36
  • 35.
    Remedies for heterogenousdata • Two tests that we can run when the assumption of homogeneity of variances has been violated are: Welch test or  Brown and Forsythe test. • Alternatively, we can run a Kruskal-Wallis H Test. But for most situations it has been shown that the Welsh test is best. 37
  • 36.
    Two-Way ANOVA •One dependent variable (quantitative variable), • Two independent variables (classifying variables = factors) • Key Advantages – Compare relative influences on DV – Examine interactions between IV 38
  • 37.
    Example IV#1 IV#2DV – Drug Level Age of Patient Anxiety Level – Type of Therapy Length of Therapy Anxiety Level – Type of Exercise Type of Diet Weight Change 39
  • 38.
    Null hypothesis The two-way ANOVA include tests of three null hypotheses:  That the means of observations grouped by one factor are the same;  That the means of observations grouped by the other factor are the same; and  That there is no interaction between the two factors. The interaction test tells you whether the effects of one factor depend on the other factor. 40
  • 39.
    Two-Way ANOVA DataLayout Observation k in each cell Xijk Level i Factor A Level j Factor B Factor Factor B A 1 2 ... b 1 X111 X121 ... X1b1 X11n X12n ... X1bn 2 X211 X221 ... X2b1 X21n X22n ... X2bn : : : : : a Xa11 Xa21 ... Xab1 Xa1n Xa2n ... Xabn i = 1,…,a j = 1,…,b k = 1,…,n There are a X b treatment combinations 41
  • 40.
    Formula for calculations • Just as we had Sums of Squares and Mean Squares in One-way ANOVA, we have the same in Two-way ANOVA. • In balanced Two-way ANOVA, we measure the overall variability in the data by: a 2      SS X X df N ( ) 1 T ijk    1 1 1 i b j n k 42
  • 41.
    Formula for calculations Sum of Squares for factor A 2          SS X X bn X X df a ( ) ( ) 1 1 2 A i 1 1 1       a i i a i b j n k Sum of Squares for factor B a 2 2 ( ) ( ) 1       B j j SS X X an X X df b             i b j n k b j 1 1 1 1 43
  • 42.
    Formula for calculations Interaction Sum of Squares a     AB ij i j SS X X X X df a b             i b j n k 1 1 1 2 ( ) ( 1)( 1) Measures the variation in the response due to the interaction between factors A and B. Error or Residual Sum of Squares a     E ijk ij SS X X df ab n      i b j n k 1 1 1 2 ( ) ( 1) Measures the variation in the response within the a x b factor combinations. 44
  • 43.
    Formula for calculations • So the Two-way ANOVA Identity is: T A B AB E SS  SS  SS  SS  SS • This partitions the Total Sum of Squares into four pieces of interest for our hypotheses to be tested. 45
  • 44.
    Two-way ANOVA Table Source of Variation Degrees of Freedom Sum of Squares Mean Square F-ratio P-value Factor A a  1 SSA MSA FA = MSA / MSE Tail area Factor B b  1 SSB MSB FB = MSB / MSE Tail area Interaction (a – 1)(b – 1) SSAB MSAB FAB = MSAB / MSE Tail area Error ab(n – 1) SSE MSE Total abn  1 SST 46
  • 45.
    WHAT AFTER ANOVARESULTS?? • ANOVA test tells us whether we have an overall difference between our groups but it does not tell us which specific groups differed. • Two possibilities are then available: For specific predictions k/a priori tests(contrasts) For predictions after the test k/a post-hoc comparisons. 47
  • 46.
    CONTRASTS • Knownas priori or planned comparisons – Used when a researcher plans to compare specific group means prior to collecting the data or – Decides to compare group means after the data has been collected and noticing that some of the means appears to be different. – This can be tested, even when the H0 cannot be rejected. – Bonferroni t procedure(referred as Dunn’s test) is used. 48
  • 47.
    Post-hoc Tests •Post-hoc tests provide solution to this and therefore should only be run when we have an overall significant difference in group means. • Post-hoc tests are termed a posteriori tests - that is, performed after the event. – Tukey’s HSD Procedure – Scheffe’s Procedure – Newman-Keuls Procedure – Dunnett’s Procedure 49
  • 48.
    Example in SPSS • A clinical psychologist is interested in comparing the relative effectiveness of three forms of psychotherapy for alleviating depression. Fifteen individuals are randomly assigned to each of three treatment groups: cognitive-behavioral, Rogerian, and assertiveness training. The Depression Scale of MMPI serves as the response. The psychologist also wished to incorporate information about the patient’s severity of depression, so all subjects in the study were classified as having mild, moderate, or severe depression. 50
  • 49.
    ANOVA with RepeatedMeasures • An ANOVA with repeated measures is for comparing three or more group means where the participants are the same in each group. • This usually occurs in two situations – – when participants are measured multiple times to see changes to an intervention or – when participants are subjected to more than one condition/trial and the response to each of these conditions wants to be compared 51
  • 50.
    Assumptions • Thedependent variable is interval or ratio (continuous). • Dependent variable is approximately normally distributed. • Sphericity • One independent variable where participants are tested on the same dependent variable at least 2 times. Sphericity is the condition where the variances of the differences between all combinations of related groups (levels) are equal. 52
  • 51.
    Sphericity violation •Sphericity can be likened to homogeneity of variances in a between-subjects ANOVA. • The violation of sphericity is serious for the Repeated Measures ANOVA, with violation causing the test to have an increase in the Type I error rate). • Mauchly's Test of Sphericity tests the assumption of sphericity. 53
  • 52.
    Sphericity violation •The corrections employed to combat the violation of the assumption of sphericity are:  Lower-bound estimate,  Greenhouse-Geisser correction and  Huynh-Feldt correction. • The corrections are applied to the degrees of freedom (df) such that a valid critical F-value can be obtained. 54
  • 53.
    Problems • Ina 6-month exercise training program with 20 participants, a researcher measured CRP levels of the subjects before training, 2 weeks into training and post-6-months-training. • The researcher wished to know whether protection against heart disease might be afforded by exercise and whether this protection might be gained over a short period of time or whether it took longer. 55
  • 54.
    MANOVA • MANOVAis a procedure used to test the significance of the effects of one or more IVs on two or more DVs. • MANOVA can be viewed as an extension of ANOVA with the key difference that we are dealing with many dependent variables (not a single DV as in the case of ANOVA) 56
  • 55.
    Data requirements •Dependent Variables – Interval (or ratio) level variables – May be correlated – Multivariate normality – Homogeneity of variance • Independent Variable(s) – Nominal level variable(s) – At least two groups with each independent variable – Each independent variable should be independent of each other 57
  • 56.
    Various tests touse  Wilk's Lambda  Widely used; good balance between power and assumptions  Pillai's Trace  Useful when sample sizes are small, cell sizes are unequal, or covariances are not homogeneous  Hotelling's (Lawley-Hotelling) Trace  Useful when examining differences between two groups 58
  • 57.
    Results • Theresult of a MANOVA simply tells us that a difference exists (or not) across groups. • It does not tell us which treatment(s) differ or what is contributing to the differences. • For such information, we need to run ANOVAs with post hoc tests. 59
  • 58.
    Example • Ahigh school takes its intake from three different primary schools. A teacher was concerned that, even after a few years, there were academic differences between the pupils from the different schools. As such, she randomly selected 20 pupils from each School and measured their academic performance by end-of-year English and Maths exams. 60 INDEPENDENT VARIABLE is Gender with male and female categories DEPENDENT VARIABLE are English and math scores
  • 59.
    Other tests ofInterest • ANCOVA Analysis of Covariance This test is a blend of ANOVA and linear regression MANCOVA Multivariate analysis of covariance One or more continous covariates present 61
  • 60.
    References • Wikepedia:Encyclopedia. Available from URL:http://en.wikipedia.org/wiki/ • Methods in Biostatistics by BK Mahajan • Statistical Methods by SP Gupta • Basic & Clinical Biostatistics by Dawson and Beth • Statistical Methods in Medical Research by Armitage, Berry, Matthews 62
  • 61.