The word "post hoc" is Latin for "after the event." Post hoc analysis in research and statistics is the process of looking at data after the completion of a study to find patterns or information that was not in the original goals. It is usually done when a main statistical test like Analysis of Variance (ANOVA) has a significant result but does not specify where the differences are between groups.
Through the use of post hoc tests, researchers are able to determine certain group differences after accounting for Type I errors (false positives). This added layer of analysis makes research findings deeper and clearer and provides more accurate conclusions. Different post hoc approaches are available and most give equivalent results, yet some are better suited depending on the assumptions and distribution of the study.
Post hoc tests are typically employed when
- An analysis of variance (ANOVA) shows significant differences between group means.
- Researchers want to identify which specific groups differ from each other.
- Exploratory research needs to discover non-hypothesized patterns.
Although post hoc analysis can be tremendously enlightening, it needs to be interpreted with caution to prevent overfitting.
In ANOVA, post hoc tests are used to compare the means of more than two groups to determine statistically significant pairwise differences. The common strategy is to control the family-wise error rate (FWER), which is the probability of making one or more Type I errors (false positives) in multiple comparisons.
Family-Wise Error Rate
𝐹𝑊𝐸𝑅 = 𝑃 (At least one Type I error among all tests),
where FWER increases as the number of comparisons grows, unless properly controlled.
Post hoc tests adjust the significance level to maintain the FWER below a chosen threshold (e.g., 0.05). Different methods achieve this by modifying the critical value used to evaluate the significance of pairwise comparisons.
Common Post Hoc Tests
Below are some widely used post hoc tests, their mathematical principles, and key use cases:
1. Tukey's Honest Significant Difference (HSD) Test
Tukey's HSD test compares all possible pairs of group means while controlling the FWER.
Formula for Critical Difference:
CD = q \cdot \sqrt{\frac{MSE}{n}},
where:
- 𝑞 : Studentized range statistic.
- 𝑀𝑆𝐸 : Mean square error from the ANOVA table.
- 𝑛 : Sample size of each group.
Tukey's Honest Significant Difference (HSD) Test Implementation in Python
Python
from statsmodels.stats.multicomp import pairwise_tukeyhsd
import pandas as pd
data = pd.DataFrame({
'Group': ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C'],
'Value': [23, 21, 22, 30, 29, 28, 18, 19, 20]
})
#Perform Tukey's HSD test
tukey = pairwise_tukeyhsd(endog=data['Value'], groups=data['Group'], alpha=0.05)
print(tukey)
Output
Multiple Comparison of Means - Tukey HSD, FWER=0.05
=====================================================
group1 group2 meandiff p-adj lower upper reject
-----------------------------------------------------
A B 7.0 0.0003 4.4948 9.5052 True
A C -3.0 0.0242 -5.5052 -0.4948 True
B C -10.0 0.0 -12.5052 -7.4948 True
-----------------------------------------------------
2. Bonferroni Correction
This method adjusts the significance level by dividing it by the number of comparisons.
Adjusted Significance Level:
\alpha_{\text{adjusted}} = \frac{\alpha}{k},
where 𝛼 is the original significance level (e.g., 0.05) and 𝑘 is the number of comparisons.
Bonferroni Correction for Multiple Comparisons Implementation in Python
Python
# original significance level
alpha = 0.05
# the number of comparisons
num_comparisons = 3
#Compute the Bonferroni-adjusted significance level
alpha_adjusted = alpha / num_comparisons
print("Bonferroni-adjusted significance level:", alpha_adjusted)
OutputBonferroni-adjusted significance level: 0.016666666666666666
3. Scheffé's Test
Scheffé's test is a conservative post hoc method suitable for unequal sample sizes. It controls the FWER and is robust across multiple testing scenarios.
4. Holm-Bonferroni Method
An improvement over the Bonferroni correction, this sequential procedure ranks p-values and adjusts them stepwise to control the FWER more effectively.
5. Dunnett's Test
Dunnett's test compares each group to a control group, rather than performing pairwise comparisons across all groups. This test is efficient when the focus is on comparing groups against a standard or baseline
Applications of Post Hoc Analysis
1. Clinical Trials:
- Identifying significant differences in treatment effectiveness across various groups.
- Comparing new drugs to existing treatments or placebo groups.
2. Education Research:
- Analyzing the effectiveness of teaching methods or curricula.
- Comparing student performance across different schools or educational programs.
3. Psychology:
- Evaluating the impact of interventions on behavioral or cognitive outcomes.
- Comparing experimental conditions in psychological studies.
Advantages of Post Hoc Analysis
- Enables detailed pairwise comparisons when significant differences are detected in the overall analysis.
- Helps identify patterns or trends not initially hypothesized.
- Controls for Type I errors when multiple comparisons are made.
Limitations of Post Hoc Analysis
- Results may lack generalizability if post hoc tests are performed without prior hypotheses.
- Over-reliance on post hoc analysis can lead to overfitting or spurious findings.
- Interpretation requires caution, especially when the sample size is small or the data is not.
Similar Reads
Analysis Symbols In mathematics, Analysis Symbols are graphical representations and notations used to describe mathematical processes, relationships, and concepts in the field of mathematical analysis. They function as a symbolic language, allowing mathematicians to express concepts precisely and clearly.Analysis Sy
10 min read
What is Content Analysis? Content analysis is a systematic and objective method used to analyze and interpret the meaning of texts, images, videos, and other forms of communication. It is a widely used technique in data analysis, particularly in social sciences, marketing, and media studies, to uncover patterns, themes, and
8 min read
What is Data Analysis? Data analysis refers to the practice of examining datasets to draw conclusions about the information they contain. It involves organizing, cleaning, and studying the data to understand patterns or trends. Data analysis helps to answer questions like "What is happening" or "Why is this happening".Org
6 min read
SQL for Data Analysis SQL (Structured Query Language) is a powerful tool for data analysis, allowing users to efficiently query and manipulate data stored in relational databases. Whether you are working with sales, customer or financial data, SQL helps extract insights and perform complex operations like aggregation, fi
6 min read
How to Write Data Analysis Reports Reports on data analysis are essential for communicating data-driven insights to decision-makers, stakeholders, and other pertinent parties. These reports provide an organized format for providing conclusions, analyses, and suggestions derived from data set analysis. In this guide, we will learn how
9 min read
Six Steps of Data Analysis Process This article provides a detailed overview of the data analysis process, outlining the key steps involved and best practices for each stage.Steps for Data Analysis ProcessDefine the Problem or Research QuestionCollect DataData CleaningAnalyzing the DataData VisualizationPresenting DataEach step has i
6 min read