One-Way ANOVA: Conceptual Foundations

One Way ANOVA
Dr. Sean P. Mackinnon
PSYO 2501
Slides created by Sean P. Mackinnon
The slides may be copied, edited, and/or shared via the CC BY-SA license 1

LO 17. Define analysis of variance (ANOVA) as a statistical inference method that is used to
determine if the variability in the sample means is so large that it seems unlikely to be from
chance alone by simultaneously considering many groups at once.
LO 18. Recognize that the null hypothesis in ANOVA sets all means equal to each other, and
the alternative hypothesis suggest that at least one mean is different.
LO 19. List the conditions necessary for performing ANOVA (1) the observations should be
independent within and across groups (2) the data within each group are nearly normal (3)
the variability across the groups is about equal and check if they are met using graphical
diagnostics.
LO 20. Recognize that the test statistic for ANOVA, the F statistic, is calculated as the ratio of
the mean square between groups (MSG, variability between groups) and mean square error
(MSE, variability within errors), and has two degrees of freedom, one for the numerator and
one for the denominator.
LO 21. Describe why calculation of the p-value for ANOVA is always “one sided”.
LO 22. Describe why conducting many t-tests for differences between each pair of means
leads to an increased Type 1 Error rate, and how to use a corrected significance level (e.g.,
Bonferroni) to combat inflating this error rate. 2

When And Why
• Between-subjects design (like the independent t-
test)
• When we want to compare between-subjects means
we can use a t-test. This test has limitations:
– You can compare only 2 means
– It can be used only with one Independent Variable.
• ANOVA
– Compares several means.
– Can be used when you have manipulated more than one
Independent Variables.
3

Why Not Use Lots of t-Tests?
• If we want to compare several means why don’t we compare
pairs of means with t-tests?
– Can’t look at several independent variables.
– Inflates the Type I error rate.
– For 3 means, the new Type I error rate is .14 instead of .05!
1
2
3
1 2
2 3
1 3
  14.95.01ErrorFamilywise 
n
Image from Andy Field’s Textbook Slides “Discovering Statistics Using SPSS Statistics”
4

How many t-tests would you need to
do for all pairwise comparisons?
Where k = # of means
The above formula can
be used to calculate the
# of pairwise
comparisons you’d
need to do for a certain
number of means (N)
The more means you
compare, the more
pairwise comparisons
there are!
2
)1( kk
http://onlinestatbook.com/2/tests_of_means/pairwise.html
5

With lots of tests, Type I error gets crazy!
http://www.claviusweb.net/pitfalls/
So, as you do more tests, the odds of finding a false positive increases quite a bit!
This is a graph of Familywise Error = 1 – (0.95)n
6

What Does ANOVA Tell us?
• Null Hypothesis:
– ANOVA tests the null hypothesis that the means are the same.
– It does this with ONE TEST, thus circumventing the inflated Type I error
problem we just discussed.
• ANOVA is an Omnibus test
– It test for an overall difference between groups.
– It tells us that the group means are different.
– It doesn’t tell us exactly which means differ.
• Follow-up tests can examine nuance btw. groups
– Contrasts, post-hoc tests
– These sometimes need corrections for Type I error inflation
7

Theory of ANOVA
• We calculate how much variability there is in the data overall
– Total Sum of squares (SST).
• We then calculate how much of this variability can be explained
by the model we fit to the data
– How much variability is due to the experimental manipulation / group
membership, Between-Groups Sum of Squares (SSG)...
• … and how much cannot be explained
– How much variability is due to individual differences in performance,
Sum of Squared Error (SSE).
Pictures on upcoming slides taken from:
http://www.ripplestat.com/ripplestat/concepttutors.html
8

Theory of ANOVA: Variance Partitioning
Total Sum of Squares
Between-Groups
Sum of Squares
Sum of
Squared
Error
SST = SSG + SSE
If the experiment is successful, then the model will explain
more variance than it can’t
SSG will be greater than SSE
Total variance in the data
Variance explained by the model Unexplained variance
9

Plot the Data
Three groups, 15 participants, ANOVA will compare the three means
Each dot is a separate participant
10

Calculate the Total Sum of Squares
The grand mean is 205.47. We get the sum of squared deviations from the grand mean
This is the Total Sum of Squares (SST)
11

Observed
(Xi)
Grand Mean
(x-bar)
Deviation
(Xi – x-bar)
Deviation2
(Xi – x-bar)2
254 205.47 48.53 2355.48
228 205.47 22.53 507.75
256 205.47 50.53 2553.62
211 205.47 5.53 30.62
267 205.47 61.53 3786.35
243 205.47 37.53 1408.75
249 205.47 43.53 1895.15
243 205.47 37.53 1408.75
213 205.47 7.53 56.75
239 205.47 33.53 1124.48
114 205.47 -91.47 8366.15
130 205.47 -75.47 5695.22
122 205.47 -83.47 6966.68
167 205.47 -38.47 1479.68
146 205.47 -59.47 3536.28
SST 41171.73
A step-by-step
calculation of
the total sum of
squares (SST)
12

Calculate the Sum of Squared Error
The three group means are: 243.20, 237.40, 135.80.
We get the sum of squared deviations from the three group means (SSE)
13

Observed
(Xi)
Group Mean
(x-bar)
Deviation
(Xi – x-bar)
Deviation2
(Xi – x-bar)2
254 243.20 10.80 116.64
228 243.20 -15.20 231.04
256 243.20 12.80 163.84
211 243.20 -32.20 1036.84
267 243.20 23.80 566.44
243 237.40 5.60 31.36
249 237.40 11.60 134.56
243 237.40 5.60 31.36
213 237.40 -24.40 595.36
239 237.40 1.60 2.56
114 135.80 -21.80 475.24
130 135.80 -5.80 33.64
122 135.80 -13.80 190.44
167 135.80 31.20 973.44
146 135.80 10.20 104.04
SSR 4686.80
A step-by-step
calculation of the
sum of squared
error (SSE)
Note, this is
sometimes
referred to as the
“Within” sum of
squares (SSW)
Or … Sum of
squared residuals
(SSR)
In this course, I’ll
use SSE, like your
book.
14

Calculate the Between Groups Sum of Squares
The three group means are: 243.20, 237.40, 135.80. The grand mean is 205.47
We get the SSG by looking at the difference between the group means and grand mean
15

Grand
Mean
Group Mean Deviation Deviation2
205.47 243.20 -37.73 1423.80
205.47 243.20 -37.73 1423.80
205.47 243.20 -37.73 1423.80
205.47 243.20 -37.73 1423.80
205.47 243.20 -37.73 1423.80
205.47 237.40 -31.93 1019.74
205.47 237.40 -31.93 1019.74
205.47 237.40 -31.93 1019.74
205.47 237.40 -31.93 1019.74
205.47 237.40 -31.93 1019.74
205.47 135.80 69.67 4853.44
205.47 135.80 69.67 4853.44
205.47 135.80 69.67 4853.44
205.47 135.80 69.67 4853.44
205.47 135.80 69.67 4853.44
SSG 36484.93
A step-by-step
calculation of between-
groups sums of squares
(SSG)
Sometimes, people use
the acronym SSB to
mean the same thing.
Yet others use “Sum of
squares model” or SSM
to mean the same thing.
In general, this would be
a more generic term.
I am going to use SSG to
be consistent with your
book
16

Practice Questions
1. If the SST = 1000, and the SSE = 800, what is
the value for SSG?
2. In an ANOVA with 3 groups (A, B & C), the
three group means are: (A) 20 (B) 35 (C) 10.
The Grand Mean is 21.67. Joe is a participant
in this study who was in group A, and scored
23. What would Joe’s deviance score be
when calculating SST?
17

Practice Answers
1. If the SST = 1000, and the SSE = 800, what is the value for SSM?
SST = SSM + SSE
SST – SSE = SSM
1000 - 800 = SSM
SSM = 200
2. In an ANOVA with 3 groups (A, B & C), the three group means are: (A) 20
(B) 35 (C) 10. The Grand Mean is 21.67. Joe is a participant in this study who
was in group A, and scored 23. What would Joe’s deviance score be when
calculating SST?
(Xi – grand mean)
23 – 21.67
1.33
18

Filling in those Sums of Squares into
an ANOVA table
Sums of Squares calculated as in the previous slides
These are a stepping stone to getting “F” which is closer to what we are interested in
*In statistics output, this might read “Residuals” which is the same thing
Source SS df MS F p
Group 36484.93
Error* 4686.80
Total 41171.73
19

Model Degrees of Freedom
• How many values did we use to calculate SSG?
– We used the 3 means.
  2131  kdfM
20

Ok, but why?
Mean Group 1 243.20
Mean Group 2 237.40
Mean Group 3 135.80
Grand Mean 205.47
Mean Group 1 243.20
Mean Group 2 237.40
Mean Group 3
Grand Mean 205.47
Mean Group 1
Mean Group 2
Mean Group 3
Grand Mean 205.47
You start knowing
the grand mean
The means for G1 & G2
could be ANY numbers
(positive or negative)
However once you know
the other 3 numbers,
group 3 can only be 1
number (in this case,
135.80)
Thus the last number is not “free” to be anything
It’s like being the last person picked for a team – there is no choice for the last pick!
Thus, in its simplest form, degrees of freedom = k – 1
Where k = # of groups
See also: “Why divide by (n-1).pdf” on Brightspace
21

Residual Degrees of Freedom
• How many values did we use to calculate SSE?
– We used the 5 scores for each of the SS for each group.
     
     
12
151515
111 321
321




nnn
dfdfdfdf groupgroupgroupR
12
315


 kndfR
Simplified formula
n = # of observations
k = # of groups
Full formula to explain why it is -3
22

Adding in Degrees of Freedom
Source SS df MS F p
Group 36484.93 2
Error 4686.80 12
Total 41171.73 14
df total is just 2+12 = 14
23

Concept Review
What happens to the sum of squares as the
sample size increases?
Why might this be a problem?
Hint: Remember back to the concept of
“variance” from lecture 2.
24

Adding Mean Square Errors
The sum of squares increases as df increases, thus are not directly comparable
Thus, we will divide the SS by df to get the “Mean Squares”
It is basically an estimate of “variance”
MSG = SSG / dfG
MSE = SSE / dfE
Source SS df MS F p
Group 36484.93 2 18242.47
Error 4686.80 12 390.57
Total 41171.73 14 2940.84
25

Calculating the F-test
F = MsG / MsE
Compare to critical F table for a p-value
Source SS df MS F p
Group 36484.93 2 18242.47 46.71
Error 4686.80 12 390.57
Total 41171.73 14 2940.84
26

The F Distribution
Things to note about F
• The F-values are all non-negative
• The distribution is not symmetrical like z
• The null hypothesis is that F = 1, not zero
• There are many different F distributions,
one for each pair of degrees of freedom.
• Let’s take a look at different distributions:
2501.psychology.dal.ca/dist_calc
https://people.richland.edu/james/lecture/m170/ch13-f.html
27

Calculating the p-value
So, we reject the null hypothesis that F = 1
In other words, the means in each group are not equal to each other!
243.20, 237.40, 135.80
Source SS df MS F p
Group 36484.93 2 18242.47 46.71 < .001
Error 4686.80 12 390.57
Total 41171.73 14 2940.84
28

Practice
Source SS df MS F p
Group 91.467 2 45.733 ? ?
Error 2760.40 27 10.237
Total 367.867 29 2940.84
Given this data: (a) What is the value of F? (b) What is the value of p? (c) how many
groups were in this experiment?
You can use 2501.psychology.dal.ca/dist_calc for p
29

Practice: Answers
Source SS df MS F p
Group 91.467 2 45.733 4.467 .021
Error 2760.40 27 10.237
Total 367.867 29 2940.84
There are 3 groups, because dfG = k -1
Also, this is a significant F-test … we reject the null (i.e., that the means are the same)30

ANOVA Assumptions
• Independence of Observations
– Random sampling, participants don’t influence each other.
• Approximately normal
– The distributions in the population for each group are normally
distributed.
– No extreme outliers (i.e., very high or low values)
– This assumption can be relaxed as sample size increases due to the CLT
– When population distribution is unknown, some people check to see if
the residuals (i.e., the deviance scores for SSE) are normally distributed.
• Homogeneity of Variance
– The variance in each of k groups are assumed to be equal
– Tests known as the “Brown-Forsythe” or “Welch” F-tests are one of many
alternatives when this assumption is violated.
When any of these are not true, the p-values will tend to be inaccurate
31

“Post-Hoc” Tests
• The F-Test tells you that the means are not equal to each
other. However, it does not give more nuanced results.
• In our example:
– 243.20, 237.40, 135.80
• Are all 3 means different from each other? Or is it just that
blue differs from red and green?
• We can’t just run a bunch of t-tests, because of the
familywise error rate … so what do we do?
32

“Post-Hoc” Tests
• Post-hoc tests (for the most part) are just variations on the t-
test formula to control for the familywise error rate.
• The easiest one to implement is the Bonferroni correction.
This just adjusts the critical value for significance.
• So, if your critical α was .05, and you have 3 groups:
– .05/3 = .0167. So the new critical alpha is .0167.
– To reject the null, p-values must be less than .0167 NOT .05
TestsofNumber

 Bonferroni
2
)1( kkWhere number
of tests is
calculated by:
33

There are too many types of post-hoc
test to list!
• Some people have a problem with the Bonferroni correction, because
it is too “conservative” (i.e., it makes it too hard to reject hypotheses,
and may inflate the Type II error rate!).
• There are MANY other variations (I know of ~18). Everyone has their
favourites. Here are some alternatives you might see reported:
– Basic t-tests with no correction (do not trust!)
• Least Significant Difference (LSD test)
– Assumptions met, less conservative than Bonferroni
• Tukey HSD (“Honestly Significant Difference)
– Better when there are Unequal Variances
• Games-Howell
– Better when there are Unequal Sample Sizes:
• Gabriel’s (small n), Hochberg’s GT2 (large n)
34

ANOVA Example Output
• Participants (N = 30 students)
• Outcome: # of recalled words (out of 40)
• Predictor: Three Experimental Conditions
– Before (Recall the words in same order they were recalled, with all the
distractor words at the end)
– Meshed (Recall the words in same order they were recalled, BUT they
are mixed together with distractor words)
– SFR (Control condition, fully randomized presentation)
Friendly, M., & Franklin, P. (1980). Interactive presentation in multitrial free recall. Memory & cognition, 8(3), 265-270.
Data obtained from: http://vincentarelbundock.github.io/Rdatasets/
35

Visualizing the Data
Combination side-by-side box plot & dot plot
What does this seem to suggest about the assumptions and hypothesis test?36

Visualizing the Data
This is the same data in a density plot
It’s a close relative of the histogram, that differs because it has been smoothed out
It is generally more aesthetically pleasing, but works better with lots of data
We’ve actually seen this before with the z, t, and F distribution plots!
37

The F-Test
In software, sometimes the
output doesn’t have all the
parts. This is missing the
“Total” row, but that’s ok.
You would report this as:
F(2, 27) = 4.34, p = .023
We reject the null, and
conclude that the means are
different!
38

Before (M = 36.6) vs. Meshed (M = 36.6)
t(14.24) = 0.00, p = 1.0
No difference!
Before (M = 36.6) vs. SFR (M = 30.3)
t(16.45) = 2.20, p = .043
Sig difference! (p < .05)
No Bonferroni sig difference (p < .0167)
Meshed (M = 36.6) vs. SFR (M = 30.3)
t(11.98) = 2.51, p = 0.028
Sig difference (p < .05)
No Bonferroni sig difference (p < .0167) 39

Conclusions
• The F-test shows that the 3 means are not equal
– But, be cautious: The homogeneity of variance assumption probably did
not hold in this dataset. Control had a lot more variance.
• Looking at the pairwise tests, the meshed and before conditions
differed from control, but not from each other.
– With this small sample, you can see why some people criticize the
Bonferroni method! It is because sometimes, you’ll get a significant F, but
all non-significant t-tests, which is a bit nonsensical.
– If we used an alternative algorithm, like “Tukey’s HSD” this pattern I
describe here would hold.
• Thus, recalling things in the same order you remembered them in
seems to confer a memory advantage in this data.
40

Practice
• A researcher wants to conduct an experiment with 5
conditions.
• A) How many pairwise comparisons would be needed to
compare every condition to all the others?
• B) What would the critical value be for α using the Bonferroni
method? (assuming they are using .05 as their criteria)
41

Practice: Answers
• A researcher wants to conduct an experiment with 5
conditions.
• A) How many pairwise comparisons would be needed to
compare every condition to all the others?
– k(k-1) / 2
– 5(5-1) / 2 = 20/2 = 10
• B) What would the critical value be for α using the Bonferroni
method? (assuming they are using .05 as their criteria)
– .05 / 10 = .005
42

One-Way ANOVA: Conceptual Foundations

More Related Content

What's hot

Similar to One-Way ANOVA: Conceptual Foundations

Recently uploaded

One-Way ANOVA: Conceptual Foundations