One Way ANOVA
Dr. Sean P. Mackinnon
PSYO 2501
Slides created by Sean P. Mackinnon
The slides may be copied, edited, and/or shared via the CC BY-SA license 1
LO 17. Define analysis of variance (ANOVA) as a statistical inference method that is used to
determine if the variability in the sample means is so large that it seems unlikely to be from
chance alone by simultaneously considering many groups at once.
LO 18. Recognize that the null hypothesis in ANOVA sets all means equal to each other, and
the alternative hypothesis suggest that at least one mean is different.
LO 19. List the conditions necessary for performing ANOVA (1) the observations should be
independent within and across groups (2) the data within each group are nearly normal (3)
the variability across the groups is about equal and check if they are met using graphical
diagnostics.
LO 20. Recognize that the test statistic for ANOVA, the F statistic, is calculated as the ratio of
the mean square between groups (MSG, variability between groups) and mean square error
(MSE, variability within errors), and has two degrees of freedom, one for the numerator and
one for the denominator.
LO 21. Describe why calculation of the p-value for ANOVA is always “one sided”.
LO 22. Describe why conducting many t-tests for differences between each pair of means
leads to an increased Type 1 Error rate, and how to use a corrected significance level (e.g.,
Bonferroni) to combat inflating this error rate. 2
When And Why
• Between-subjects design (like the independent t-
test)
• When we want to compare between-subjects means
we can use a t-test. This test has limitations:
– You can compare only 2 means
– It can be used only with one Independent Variable.
• ANOVA
– Compares several means.
– Can be used when you have manipulated more than one
Independent Variables.
3
Why Not Use Lots of t-Tests?
• If we want to compare several means why don’t we compare
pairs of means with t-tests?
– Can’t look at several independent variables.
– Inflates the Type I error rate.
– For 3 means, the new Type I error rate is .14 instead of .05!
1
2
3
1 2
2 3
1 3
  14.95.01ErrorFamilywise 
n
Image from Andy Field’s Textbook Slides “Discovering Statistics Using SPSS Statistics”
4
How many t-tests would you need to
do for all pairwise comparisons?
Where k = # of means
The above formula can
be used to calculate the
# of pairwise
comparisons you’d
need to do for a certain
number of means (N)
The more means you
compare, the more
pairwise comparisons
there are!
2
)1( kk
http://onlinestatbook.com/2/tests_of_means/pairwise.html
5
With lots of tests, Type I error gets crazy!
http://www.claviusweb.net/pitfalls/
So, as you do more tests, the odds of finding a false positive increases quite a bit!
This is a graph of Familywise Error = 1 – (0.95)n
6
What Does ANOVA Tell us?
• Null Hypothesis:
– ANOVA tests the null hypothesis that the means are the same.
– It does this with ONE TEST, thus circumventing the inflated Type I error
problem we just discussed.
• ANOVA is an Omnibus test
– It test for an overall difference between groups.
– It tells us that the group means are different.
– It doesn’t tell us exactly which means differ.
• Follow-up tests can examine nuance btw. groups
– Contrasts, post-hoc tests
– These sometimes need corrections for Type I error inflation
7
Theory of ANOVA
• We calculate how much variability there is in the data overall
– Total Sum of squares (SST).
• We then calculate how much of this variability can be explained
by the model we fit to the data
– How much variability is due to the experimental manipulation / group
membership, Between-Groups Sum of Squares (SSG)...
• … and how much cannot be explained
– How much variability is due to individual differences in performance,
Sum of Squared Error (SSE).
Pictures on upcoming slides taken from:
http://www.ripplestat.com/ripplestat/concepttutors.html
8
Theory of ANOVA: Variance Partitioning
Total Sum of Squares
Between-Groups
Sum of Squares
Sum of
Squared
Error
SST = SSG + SSE
If the experiment is successful, then the model will explain
more variance than it can’t
SSG will be greater than SSE
Total variance in the data
Variance explained by the model Unexplained variance
9
Plot the Data
Three groups, 15 participants, ANOVA will compare the three means
Each dot is a separate participant
10
Calculate the Total Sum of Squares
The grand mean is 205.47. We get the sum of squared deviations from the grand mean
This is the Total Sum of Squares (SST)
11
Observed
(Xi)
Grand Mean
(x-bar)
Deviation
(Xi – x-bar)
Deviation2
(Xi – x-bar)2
254 205.47 48.53 2355.48
228 205.47 22.53 507.75
256 205.47 50.53 2553.62
211 205.47 5.53 30.62
267 205.47 61.53 3786.35
243 205.47 37.53 1408.75
249 205.47 43.53 1895.15
243 205.47 37.53 1408.75
213 205.47 7.53 56.75
239 205.47 33.53 1124.48
114 205.47 -91.47 8366.15
130 205.47 -75.47 5695.22
122 205.47 -83.47 6966.68
167 205.47 -38.47 1479.68
146 205.47 -59.47 3536.28
SST 41171.73
A step-by-step
calculation of
the total sum of
squares (SST)
12
Calculate the Sum of Squared Error
The three group means are: 243.20, 237.40, 135.80.
We get the sum of squared deviations from the three group means (SSE)
13
Observed
(Xi)
Group Mean
(x-bar)
Deviation
(Xi – x-bar)
Deviation2
(Xi – x-bar)2
254 243.20 10.80 116.64
228 243.20 -15.20 231.04
256 243.20 12.80 163.84
211 243.20 -32.20 1036.84
267 243.20 23.80 566.44
243 237.40 5.60 31.36
249 237.40 11.60 134.56
243 237.40 5.60 31.36
213 237.40 -24.40 595.36
239 237.40 1.60 2.56
114 135.80 -21.80 475.24
130 135.80 -5.80 33.64
122 135.80 -13.80 190.44
167 135.80 31.20 973.44
146 135.80 10.20 104.04
SSR 4686.80
A step-by-step
calculation of the
sum of squared
error (SSE)
Note, this is
sometimes
referred to as the
“Within” sum of
squares (SSW)
Or … Sum of
squared residuals
(SSR)
In this course, I’ll
use SSE, like your
book.
14
Calculate the Between Groups Sum of Squares
The three group means are: 243.20, 237.40, 135.80. The grand mean is 205.47
We get the SSG by looking at the difference between the group means and grand mean
15
Grand
Mean
Group Mean Deviation Deviation2
205.47 243.20 -37.73 1423.80
205.47 243.20 -37.73 1423.80
205.47 243.20 -37.73 1423.80
205.47 243.20 -37.73 1423.80
205.47 243.20 -37.73 1423.80
205.47 237.40 -31.93 1019.74
205.47 237.40 -31.93 1019.74
205.47 237.40 -31.93 1019.74
205.47 237.40 -31.93 1019.74
205.47 237.40 -31.93 1019.74
205.47 135.80 69.67 4853.44
205.47 135.80 69.67 4853.44
205.47 135.80 69.67 4853.44
205.47 135.80 69.67 4853.44
205.47 135.80 69.67 4853.44
SSG 36484.93
A step-by-step
calculation of between-
groups sums of squares
(SSG)
Sometimes, people use
the acronym SSB to
mean the same thing.
Yet others use “Sum of
squares model” or SSM
to mean the same thing.
In general, this would be
a more generic term.
I am going to use SSG to
be consistent with your
book
16
Practice Questions
1. If the SST = 1000, and the SSE = 800, what is
the value for SSG?
2. In an ANOVA with 3 groups (A, B & C), the
three group means are: (A) 20 (B) 35 (C) 10.
The Grand Mean is 21.67. Joe is a participant
in this study who was in group A, and scored
23. What would Joe’s deviance score be
when calculating SST?
17
Practice Answers
1. If the SST = 1000, and the SSE = 800, what is the value for SSM?
SST = SSM + SSE
SST – SSE = SSM
1000 - 800 = SSM
SSM = 200
2. In an ANOVA with 3 groups (A, B & C), the three group means are: (A) 20
(B) 35 (C) 10. The Grand Mean is 21.67. Joe is a participant in this study who
was in group A, and scored 23. What would Joe’s deviance score be when
calculating SST?
(Xi – grand mean)
23 – 21.67
1.33
18
Filling in those Sums of Squares into
an ANOVA table
Sums of Squares calculated as in the previous slides
These are a stepping stone to getting “F” which is closer to what we are interested in
*In statistics output, this might read “Residuals” which is the same thing
Source SS df MS F p
Group 36484.93
Error* 4686.80
Total 41171.73
19
Model Degrees of Freedom
• How many values did we use to calculate SSG?
– We used the 3 means.
  2131  kdfM
20
Ok, but why?
Mean Group 1 243.20
Mean Group 2 237.40
Mean Group 3 135.80
Grand Mean 205.47
Mean Group 1 243.20
Mean Group 2 237.40
Mean Group 3
Grand Mean 205.47
Mean Group 1
Mean Group 2
Mean Group 3
Grand Mean 205.47
You start knowing
the grand mean
The means for G1 & G2
could be ANY numbers
(positive or negative)
However once you know
the other 3 numbers,
group 3 can only be 1
number (in this case,
135.80)
Thus the last number is not “free” to be anything
It’s like being the last person picked for a team – there is no choice for the last pick!
Thus, in its simplest form, degrees of freedom = k – 1
Where k = # of groups
See also: “Why divide by (n-1).pdf” on Brightspace
21
Residual Degrees of Freedom
• How many values did we use to calculate SSE?
– We used the 5 scores for each of the SS for each group.
     
     
12
151515
111 321
321




nnn
dfdfdfdf groupgroupgroupR
12
315


 kndfR
Simplified formula
n = # of observations
k = # of groups
Full formula to explain why it is -3
22
Adding in Degrees of Freedom
Source SS df MS F p
Group 36484.93 2
Error 4686.80 12
Total 41171.73 14
df total is just 2+12 = 14
23
Concept Review
What happens to the sum of squares as the
sample size increases?
Why might this be a problem?
Hint: Remember back to the concept of
“variance” from lecture 2.
24
Adding Mean Square Errors
The sum of squares increases as df increases, thus are not directly comparable
Thus, we will divide the SS by df to get the “Mean Squares”
It is basically an estimate of “variance”
MSG = SSG / dfG
MSE = SSE / dfE
Source SS df MS F p
Group 36484.93 2 18242.47
Error 4686.80 12 390.57
Total 41171.73 14 2940.84
25
Calculating the F-test
F = MsG / MsE
Compare to critical F table for a p-value
Source SS df MS F p
Group 36484.93 2 18242.47 46.71
Error 4686.80 12 390.57
Total 41171.73 14 2940.84
26
The F Distribution
Things to note about F
• The F-values are all non-negative
• The distribution is not symmetrical like z
• The null hypothesis is that F = 1, not zero
• There are many different F distributions,
one for each pair of degrees of freedom.
• Let’s take a look at different distributions:
2501.psychology.dal.ca/dist_calc
https://people.richland.edu/james/lecture/m170/ch13-f.html
27
Calculating the p-value
So, we reject the null hypothesis that F = 1
In other words, the means in each group are not equal to each other!
243.20, 237.40, 135.80
Source SS df MS F p
Group 36484.93 2 18242.47 46.71 < .001
Error 4686.80 12 390.57
Total 41171.73 14 2940.84
28
Practice
Source SS df MS F p
Group 91.467 2 45.733 ? ?
Error 2760.40 27 10.237
Total 367.867 29 2940.84
Given this data: (a) What is the value of F? (b) What is the value of p? (c) how many
groups were in this experiment?
You can use 2501.psychology.dal.ca/dist_calc for p
29
Practice: Answers
Source SS df MS F p
Group 91.467 2 45.733 4.467 .021
Error 2760.40 27 10.237
Total 367.867 29 2940.84
There are 3 groups, because dfG = k -1
Also, this is a significant F-test … we reject the null (i.e., that the means are the same)30
ANOVA Assumptions
• Independence of Observations
– Random sampling, participants don’t influence each other.
• Approximately normal
– The distributions in the population for each group are normally
distributed.
– No extreme outliers (i.e., very high or low values)
– This assumption can be relaxed as sample size increases due to the CLT
– When population distribution is unknown, some people check to see if
the residuals (i.e., the deviance scores for SSE) are normally distributed.
• Homogeneity of Variance
– The variance in each of k groups are assumed to be equal
– Tests known as the “Brown-Forsythe” or “Welch” F-tests are one of many
alternatives when this assumption is violated.
When any of these are not true, the p-values will tend to be inaccurate
31
“Post-Hoc” Tests
• The F-Test tells you that the means are not equal to each
other. However, it does not give more nuanced results.
• In our example:
– 243.20, 237.40, 135.80
• Are all 3 means different from each other? Or is it just that
blue differs from red and green?
• We can’t just run a bunch of t-tests, because of the
familywise error rate … so what do we do?
32
“Post-Hoc” Tests
• Post-hoc tests (for the most part) are just variations on the t-
test formula to control for the familywise error rate.
• The easiest one to implement is the Bonferroni correction.
This just adjusts the critical value for significance.
• So, if your critical α was .05, and you have 3 groups:
– .05/3 = .0167. So the new critical alpha is .0167.
– To reject the null, p-values must be less than .0167 NOT .05
TestsofNumber

 Bonferroni
2
)1( kkWhere number
of tests is
calculated by:
33
There are too many types of post-hoc
test to list!
• Some people have a problem with the Bonferroni correction, because
it is too “conservative” (i.e., it makes it too hard to reject hypotheses,
and may inflate the Type II error rate!).
• There are MANY other variations (I know of ~18). Everyone has their
favourites. Here are some alternatives you might see reported:
– Basic t-tests with no correction (do not trust!)
• Least Significant Difference (LSD test)
– Assumptions met, less conservative than Bonferroni
• Tukey HSD (“Honestly Significant Difference)
– Better when there are Unequal Variances
• Games-Howell
– Better when there are Unequal Sample Sizes:
• Gabriel’s (small n), Hochberg’s GT2 (large n)
34
ANOVA Example Output
• Participants (N = 30 students)
• Outcome: # of recalled words (out of 40)
• Predictor: Three Experimental Conditions
– Before (Recall the words in same order they were recalled, with all the
distractor words at the end)
– Meshed (Recall the words in same order they were recalled, BUT they
are mixed together with distractor words)
– SFR (Control condition, fully randomized presentation)
Friendly, M., & Franklin, P. (1980). Interactive presentation in multitrial free recall. Memory & cognition, 8(3), 265-270.
Data obtained from: http://vincentarelbundock.github.io/Rdatasets/
35
Visualizing the Data
Combination side-by-side box plot & dot plot
What does this seem to suggest about the assumptions and hypothesis test?36
Visualizing the Data
This is the same data in a density plot
It’s a close relative of the histogram, that differs because it has been smoothed out
It is generally more aesthetically pleasing, but works better with lots of data
We’ve actually seen this before with the z, t, and F distribution plots!
37
The F-Test
In software, sometimes the
output doesn’t have all the
parts. This is missing the
“Total” row, but that’s ok.
You would report this as:
F(2, 27) = 4.34, p = .023
We reject the null, and
conclude that the means are
different!
38
Before (M = 36.6) vs. Meshed (M = 36.6)
t(14.24) = 0.00, p = 1.0
No difference!
Before (M = 36.6) vs. SFR (M = 30.3)
t(16.45) = 2.20, p = .043
Sig difference! (p < .05)
No Bonferroni sig difference (p < .0167)
Meshed (M = 36.6) vs. SFR (M = 30.3)
t(11.98) = 2.51, p = 0.028
Sig difference (p < .05)
No Bonferroni sig difference (p < .0167) 39
Conclusions
• The F-test shows that the 3 means are not equal
– But, be cautious: The homogeneity of variance assumption probably did
not hold in this dataset. Control had a lot more variance.
• Looking at the pairwise tests, the meshed and before conditions
differed from control, but not from each other.
– With this small sample, you can see why some people criticize the
Bonferroni method! It is because sometimes, you’ll get a significant F, but
all non-significant t-tests, which is a bit nonsensical.
– If we used an alternative algorithm, like “Tukey’s HSD” this pattern I
describe here would hold.
• Thus, recalling things in the same order you remembered them in
seems to confer a memory advantage in this data.
40
Practice
• A researcher wants to conduct an experiment with 5
conditions.
• A) How many pairwise comparisons would be needed to
compare every condition to all the others?
• B) What would the critical value be for α using the Bonferroni
method? (assuming they are using .05 as their criteria)
41
Practice: Answers
• A researcher wants to conduct an experiment with 5
conditions.
• A) How many pairwise comparisons would be needed to
compare every condition to all the others?
– k(k-1) / 2
– 5(5-1) / 2 = 20/2 = 10
• B) What would the critical value be for α using the Bonferroni
method? (assuming they are using .05 as their criteria)
– .05 / 10 = .005
42

One-Way ANOVA: Conceptual Foundations

  • 1.
    One Way ANOVA Dr.Sean P. Mackinnon PSYO 2501 Slides created by Sean P. Mackinnon The slides may be copied, edited, and/or shared via the CC BY-SA license 1
  • 2.
    LO 17. Defineanalysis of variance (ANOVA) as a statistical inference method that is used to determine if the variability in the sample means is so large that it seems unlikely to be from chance alone by simultaneously considering many groups at once. LO 18. Recognize that the null hypothesis in ANOVA sets all means equal to each other, and the alternative hypothesis suggest that at least one mean is different. LO 19. List the conditions necessary for performing ANOVA (1) the observations should be independent within and across groups (2) the data within each group are nearly normal (3) the variability across the groups is about equal and check if they are met using graphical diagnostics. LO 20. Recognize that the test statistic for ANOVA, the F statistic, is calculated as the ratio of the mean square between groups (MSG, variability between groups) and mean square error (MSE, variability within errors), and has two degrees of freedom, one for the numerator and one for the denominator. LO 21. Describe why calculation of the p-value for ANOVA is always “one sided”. LO 22. Describe why conducting many t-tests for differences between each pair of means leads to an increased Type 1 Error rate, and how to use a corrected significance level (e.g., Bonferroni) to combat inflating this error rate. 2
  • 3.
    When And Why •Between-subjects design (like the independent t- test) • When we want to compare between-subjects means we can use a t-test. This test has limitations: – You can compare only 2 means – It can be used only with one Independent Variable. • ANOVA – Compares several means. – Can be used when you have manipulated more than one Independent Variables. 3
  • 4.
    Why Not UseLots of t-Tests? • If we want to compare several means why don’t we compare pairs of means with t-tests? – Can’t look at several independent variables. – Inflates the Type I error rate. – For 3 means, the new Type I error rate is .14 instead of .05! 1 2 3 1 2 2 3 1 3   14.95.01ErrorFamilywise  n Image from Andy Field’s Textbook Slides “Discovering Statistics Using SPSS Statistics” 4
  • 5.
    How many t-testswould you need to do for all pairwise comparisons? Where k = # of means The above formula can be used to calculate the # of pairwise comparisons you’d need to do for a certain number of means (N) The more means you compare, the more pairwise comparisons there are! 2 )1( kk http://onlinestatbook.com/2/tests_of_means/pairwise.html 5
  • 6.
    With lots oftests, Type I error gets crazy! http://www.claviusweb.net/pitfalls/ So, as you do more tests, the odds of finding a false positive increases quite a bit! This is a graph of Familywise Error = 1 – (0.95)n 6
  • 7.
    What Does ANOVATell us? • Null Hypothesis: – ANOVA tests the null hypothesis that the means are the same. – It does this with ONE TEST, thus circumventing the inflated Type I error problem we just discussed. • ANOVA is an Omnibus test – It test for an overall difference between groups. – It tells us that the group means are different. – It doesn’t tell us exactly which means differ. • Follow-up tests can examine nuance btw. groups – Contrasts, post-hoc tests – These sometimes need corrections for Type I error inflation 7
  • 8.
    Theory of ANOVA •We calculate how much variability there is in the data overall – Total Sum of squares (SST). • We then calculate how much of this variability can be explained by the model we fit to the data – How much variability is due to the experimental manipulation / group membership, Between-Groups Sum of Squares (SSG)... • … and how much cannot be explained – How much variability is due to individual differences in performance, Sum of Squared Error (SSE). Pictures on upcoming slides taken from: http://www.ripplestat.com/ripplestat/concepttutors.html 8
  • 9.
    Theory of ANOVA:Variance Partitioning Total Sum of Squares Between-Groups Sum of Squares Sum of Squared Error SST = SSG + SSE If the experiment is successful, then the model will explain more variance than it can’t SSG will be greater than SSE Total variance in the data Variance explained by the model Unexplained variance 9
  • 10.
    Plot the Data Threegroups, 15 participants, ANOVA will compare the three means Each dot is a separate participant 10
  • 11.
    Calculate the TotalSum of Squares The grand mean is 205.47. We get the sum of squared deviations from the grand mean This is the Total Sum of Squares (SST) 11
  • 12.
    Observed (Xi) Grand Mean (x-bar) Deviation (Xi –x-bar) Deviation2 (Xi – x-bar)2 254 205.47 48.53 2355.48 228 205.47 22.53 507.75 256 205.47 50.53 2553.62 211 205.47 5.53 30.62 267 205.47 61.53 3786.35 243 205.47 37.53 1408.75 249 205.47 43.53 1895.15 243 205.47 37.53 1408.75 213 205.47 7.53 56.75 239 205.47 33.53 1124.48 114 205.47 -91.47 8366.15 130 205.47 -75.47 5695.22 122 205.47 -83.47 6966.68 167 205.47 -38.47 1479.68 146 205.47 -59.47 3536.28 SST 41171.73 A step-by-step calculation of the total sum of squares (SST) 12
  • 13.
    Calculate the Sumof Squared Error The three group means are: 243.20, 237.40, 135.80. We get the sum of squared deviations from the three group means (SSE) 13
  • 14.
    Observed (Xi) Group Mean (x-bar) Deviation (Xi –x-bar) Deviation2 (Xi – x-bar)2 254 243.20 10.80 116.64 228 243.20 -15.20 231.04 256 243.20 12.80 163.84 211 243.20 -32.20 1036.84 267 243.20 23.80 566.44 243 237.40 5.60 31.36 249 237.40 11.60 134.56 243 237.40 5.60 31.36 213 237.40 -24.40 595.36 239 237.40 1.60 2.56 114 135.80 -21.80 475.24 130 135.80 -5.80 33.64 122 135.80 -13.80 190.44 167 135.80 31.20 973.44 146 135.80 10.20 104.04 SSR 4686.80 A step-by-step calculation of the sum of squared error (SSE) Note, this is sometimes referred to as the “Within” sum of squares (SSW) Or … Sum of squared residuals (SSR) In this course, I’ll use SSE, like your book. 14
  • 15.
    Calculate the BetweenGroups Sum of Squares The three group means are: 243.20, 237.40, 135.80. The grand mean is 205.47 We get the SSG by looking at the difference between the group means and grand mean 15
  • 16.
    Grand Mean Group Mean DeviationDeviation2 205.47 243.20 -37.73 1423.80 205.47 243.20 -37.73 1423.80 205.47 243.20 -37.73 1423.80 205.47 243.20 -37.73 1423.80 205.47 243.20 -37.73 1423.80 205.47 237.40 -31.93 1019.74 205.47 237.40 -31.93 1019.74 205.47 237.40 -31.93 1019.74 205.47 237.40 -31.93 1019.74 205.47 237.40 -31.93 1019.74 205.47 135.80 69.67 4853.44 205.47 135.80 69.67 4853.44 205.47 135.80 69.67 4853.44 205.47 135.80 69.67 4853.44 205.47 135.80 69.67 4853.44 SSG 36484.93 A step-by-step calculation of between- groups sums of squares (SSG) Sometimes, people use the acronym SSB to mean the same thing. Yet others use “Sum of squares model” or SSM to mean the same thing. In general, this would be a more generic term. I am going to use SSG to be consistent with your book 16
  • 17.
    Practice Questions 1. Ifthe SST = 1000, and the SSE = 800, what is the value for SSG? 2. In an ANOVA with 3 groups (A, B & C), the three group means are: (A) 20 (B) 35 (C) 10. The Grand Mean is 21.67. Joe is a participant in this study who was in group A, and scored 23. What would Joe’s deviance score be when calculating SST? 17
  • 18.
    Practice Answers 1. Ifthe SST = 1000, and the SSE = 800, what is the value for SSM? SST = SSM + SSE SST – SSE = SSM 1000 - 800 = SSM SSM = 200 2. In an ANOVA with 3 groups (A, B & C), the three group means are: (A) 20 (B) 35 (C) 10. The Grand Mean is 21.67. Joe is a participant in this study who was in group A, and scored 23. What would Joe’s deviance score be when calculating SST? (Xi – grand mean) 23 – 21.67 1.33 18
  • 19.
    Filling in thoseSums of Squares into an ANOVA table Sums of Squares calculated as in the previous slides These are a stepping stone to getting “F” which is closer to what we are interested in *In statistics output, this might read “Residuals” which is the same thing Source SS df MS F p Group 36484.93 Error* 4686.80 Total 41171.73 19
  • 20.
    Model Degrees ofFreedom • How many values did we use to calculate SSG? – We used the 3 means.   2131  kdfM 20
  • 21.
    Ok, but why? MeanGroup 1 243.20 Mean Group 2 237.40 Mean Group 3 135.80 Grand Mean 205.47 Mean Group 1 243.20 Mean Group 2 237.40 Mean Group 3 Grand Mean 205.47 Mean Group 1 Mean Group 2 Mean Group 3 Grand Mean 205.47 You start knowing the grand mean The means for G1 & G2 could be ANY numbers (positive or negative) However once you know the other 3 numbers, group 3 can only be 1 number (in this case, 135.80) Thus the last number is not “free” to be anything It’s like being the last person picked for a team – there is no choice for the last pick! Thus, in its simplest form, degrees of freedom = k – 1 Where k = # of groups See also: “Why divide by (n-1).pdf” on Brightspace 21
  • 22.
    Residual Degrees ofFreedom • How many values did we use to calculate SSE? – We used the 5 scores for each of the SS for each group.             12 151515 111 321 321     nnn dfdfdfdf groupgroupgroupR 12 315    kndfR Simplified formula n = # of observations k = # of groups Full formula to explain why it is -3 22
  • 23.
    Adding in Degreesof Freedom Source SS df MS F p Group 36484.93 2 Error 4686.80 12 Total 41171.73 14 df total is just 2+12 = 14 23
  • 24.
    Concept Review What happensto the sum of squares as the sample size increases? Why might this be a problem? Hint: Remember back to the concept of “variance” from lecture 2. 24
  • 25.
    Adding Mean SquareErrors The sum of squares increases as df increases, thus are not directly comparable Thus, we will divide the SS by df to get the “Mean Squares” It is basically an estimate of “variance” MSG = SSG / dfG MSE = SSE / dfE Source SS df MS F p Group 36484.93 2 18242.47 Error 4686.80 12 390.57 Total 41171.73 14 2940.84 25
  • 26.
    Calculating the F-test F= MsG / MsE Compare to critical F table for a p-value Source SS df MS F p Group 36484.93 2 18242.47 46.71 Error 4686.80 12 390.57 Total 41171.73 14 2940.84 26
  • 27.
    The F Distribution Thingsto note about F • The F-values are all non-negative • The distribution is not symmetrical like z • The null hypothesis is that F = 1, not zero • There are many different F distributions, one for each pair of degrees of freedom. • Let’s take a look at different distributions: 2501.psychology.dal.ca/dist_calc https://people.richland.edu/james/lecture/m170/ch13-f.html 27
  • 28.
    Calculating the p-value So,we reject the null hypothesis that F = 1 In other words, the means in each group are not equal to each other! 243.20, 237.40, 135.80 Source SS df MS F p Group 36484.93 2 18242.47 46.71 < .001 Error 4686.80 12 390.57 Total 41171.73 14 2940.84 28
  • 29.
    Practice Source SS dfMS F p Group 91.467 2 45.733 ? ? Error 2760.40 27 10.237 Total 367.867 29 2940.84 Given this data: (a) What is the value of F? (b) What is the value of p? (c) how many groups were in this experiment? You can use 2501.psychology.dal.ca/dist_calc for p 29
  • 30.
    Practice: Answers Source SSdf MS F p Group 91.467 2 45.733 4.467 .021 Error 2760.40 27 10.237 Total 367.867 29 2940.84 There are 3 groups, because dfG = k -1 Also, this is a significant F-test … we reject the null (i.e., that the means are the same)30
  • 31.
    ANOVA Assumptions • Independenceof Observations – Random sampling, participants don’t influence each other. • Approximately normal – The distributions in the population for each group are normally distributed. – No extreme outliers (i.e., very high or low values) – This assumption can be relaxed as sample size increases due to the CLT – When population distribution is unknown, some people check to see if the residuals (i.e., the deviance scores for SSE) are normally distributed. • Homogeneity of Variance – The variance in each of k groups are assumed to be equal – Tests known as the “Brown-Forsythe” or “Welch” F-tests are one of many alternatives when this assumption is violated. When any of these are not true, the p-values will tend to be inaccurate 31
  • 32.
    “Post-Hoc” Tests • TheF-Test tells you that the means are not equal to each other. However, it does not give more nuanced results. • In our example: – 243.20, 237.40, 135.80 • Are all 3 means different from each other? Or is it just that blue differs from red and green? • We can’t just run a bunch of t-tests, because of the familywise error rate … so what do we do? 32
  • 33.
    “Post-Hoc” Tests • Post-hoctests (for the most part) are just variations on the t- test formula to control for the familywise error rate. • The easiest one to implement is the Bonferroni correction. This just adjusts the critical value for significance. • So, if your critical α was .05, and you have 3 groups: – .05/3 = .0167. So the new critical alpha is .0167. – To reject the null, p-values must be less than .0167 NOT .05 TestsofNumber   Bonferroni 2 )1( kkWhere number of tests is calculated by: 33
  • 34.
    There are toomany types of post-hoc test to list! • Some people have a problem with the Bonferroni correction, because it is too “conservative” (i.e., it makes it too hard to reject hypotheses, and may inflate the Type II error rate!). • There are MANY other variations (I know of ~18). Everyone has their favourites. Here are some alternatives you might see reported: – Basic t-tests with no correction (do not trust!) • Least Significant Difference (LSD test) – Assumptions met, less conservative than Bonferroni • Tukey HSD (“Honestly Significant Difference) – Better when there are Unequal Variances • Games-Howell – Better when there are Unequal Sample Sizes: • Gabriel’s (small n), Hochberg’s GT2 (large n) 34
  • 35.
    ANOVA Example Output •Participants (N = 30 students) • Outcome: # of recalled words (out of 40) • Predictor: Three Experimental Conditions – Before (Recall the words in same order they were recalled, with all the distractor words at the end) – Meshed (Recall the words in same order they were recalled, BUT they are mixed together with distractor words) – SFR (Control condition, fully randomized presentation) Friendly, M., & Franklin, P. (1980). Interactive presentation in multitrial free recall. Memory & cognition, 8(3), 265-270. Data obtained from: http://vincentarelbundock.github.io/Rdatasets/ 35
  • 36.
    Visualizing the Data Combinationside-by-side box plot & dot plot What does this seem to suggest about the assumptions and hypothesis test?36
  • 37.
    Visualizing the Data Thisis the same data in a density plot It’s a close relative of the histogram, that differs because it has been smoothed out It is generally more aesthetically pleasing, but works better with lots of data We’ve actually seen this before with the z, t, and F distribution plots! 37
  • 38.
    The F-Test In software,sometimes the output doesn’t have all the parts. This is missing the “Total” row, but that’s ok. You would report this as: F(2, 27) = 4.34, p = .023 We reject the null, and conclude that the means are different! 38
  • 39.
    Before (M =36.6) vs. Meshed (M = 36.6) t(14.24) = 0.00, p = 1.0 No difference! Before (M = 36.6) vs. SFR (M = 30.3) t(16.45) = 2.20, p = .043 Sig difference! (p < .05) No Bonferroni sig difference (p < .0167) Meshed (M = 36.6) vs. SFR (M = 30.3) t(11.98) = 2.51, p = 0.028 Sig difference (p < .05) No Bonferroni sig difference (p < .0167) 39
  • 40.
    Conclusions • The F-testshows that the 3 means are not equal – But, be cautious: The homogeneity of variance assumption probably did not hold in this dataset. Control had a lot more variance. • Looking at the pairwise tests, the meshed and before conditions differed from control, but not from each other. – With this small sample, you can see why some people criticize the Bonferroni method! It is because sometimes, you’ll get a significant F, but all non-significant t-tests, which is a bit nonsensical. – If we used an alternative algorithm, like “Tukey’s HSD” this pattern I describe here would hold. • Thus, recalling things in the same order you remembered them in seems to confer a memory advantage in this data. 40
  • 41.
    Practice • A researcherwants to conduct an experiment with 5 conditions. • A) How many pairwise comparisons would be needed to compare every condition to all the others? • B) What would the critical value be for α using the Bonferroni method? (assuming they are using .05 as their criteria) 41
  • 42.
    Practice: Answers • Aresearcher wants to conduct an experiment with 5 conditions. • A) How many pairwise comparisons would be needed to compare every condition to all the others? – k(k-1) / 2 – 5(5-1) / 2 = 20/2 = 10 • B) What would the critical value be for α using the Bonferroni method? (assuming they are using .05 as their criteria) – .05 / 10 = .005 42