Improve your study with pre-registration

Improving your project through
pre-registration
Dorothy V. M. Bishop
Professor of Developmental Neuropsychology
University of Oxford
@deevybee

The Reproducibility Crisis
“There is increasing concern about the
reliability of biomedical research, with recent
articles suggesting that up to 85% of
research funding is wasted.”
Bustin, S. A. (2015). The reproducibility of
biomedical research: Sleepers awake!
Biomolecular Detection and
Quantification
2005. PLoS Medicine, 2(8), e124. doi:
10.1371/journal.pmed.0020124

Four key factors leading to poor reproducibility
P-hackingPublication bias
Low power
HARKing

Thought experiment:
You have submitted a paper to Current Biology
evaluating effect of computer games on dyslexia
How likely is your paper to be accepted if you report:
• 20 participants; beneficial effect of intervention, p < .05
• 20 participants; group difference is non-significant

Thought experiment:
You have submitted a paper to Current Biology
evaluating effect of computer games on dyslexia
How likely is your paper to be accepted if you report:
• 20 participants; beneficial effect of intervention, p < .05
• http://deevybee.blogspot.co.uk/2013/03/high-impact-journals-where.html
Ample evidence that many journals – especially ‘high impact’ journals
prioritise newsworthiness over methodological quality:
Reluctant to publish null results
= PUBLICATION BIAS

1956
De Groot
1975
Greenwald
The “file drawer” problem
1979
Rosenthal
Prejudice against the null
“As it is functioning in at least some areas of
behavioral science research, the research-
publication system may be regarded as a
device for systematically generating and
propagating anecdotal information.”
Publication bias

1956
De Groot
1975
Greenwald
1987
Newcombe
“Small studies continue to be carried out
with little more than a blind hope of
showing the desired effect. Nevertheless,
papers based on such work are submitted
for publication, especially if the results
turn out to be statistically significant.”
1979
Rosenthal
Low power
POWER problem:
Journals typically willing to publish a significant finding
with a very small sample, even if they would not think of
doing so for a null result

1956
De Groot
Failure to distinguish between hypothesis-testing and
hypothesis-generating (exploratory) research
-> misuse of statistical tests
Historical timeline: concerns about reproducibility
Acta Psychologica 148, 2014, pp. 188-194
The meaning of “significance” for different types of research
[translated and annotated by Eric-Jan Wagenmakers et al.
//doi.org/10.1016/j.actpsy.2014.02.001

Here is a correlation matrix from a study that measured
various perceptual skills in relation to reading ability in
children. How should you report/interpret results?
N = 20 subjects
Created from script on: https://osf.io/skz3j/
* p < .05, ** p < .01

Key question: Did researcher specifically predict this
association?

• Probability that a specific prespecified pair of variables will be correlated at p
< .05 level when null is true = .05.
• Probability that at least one of 21 correlations will meet p < .05 when null is
true:
= 1 - .95^21 = .64 (i.e. one minus prob. that NONE is significant)
• Bonferroni-corrected significance level = .05/21 = .002
• With N = 20 subjects, to reach .002 significance level, r = .61
If you didn’t predict this specific association and you report
uncorrected p-values, this is P-hacking

P-hacking -> huge risk of false positives

You run a study investigating how a drug, X, affects
anxiety. You plot the results by age, and see this:
No significant effect of X on anxiety overall
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
16 20 24 28 32 36 40 44 48 52 56 60
Symptomimprovement
Age (yr)
Treatment effect by age

But you notice that there is a treatment effect for
those aged over 36
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
16 20 24 28 32 36 40 44 48 52 56 60
Symptomimprovement
Age (yr)

How should you analyse/report this result?
• We tested whether X affects anxiety
• We tested whether X affects anxiety in people aged over 36 years
• We tested whether age affects the impact of X on anxiety
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
16 20 24 28 32 36 40 44 48 52 56 60
Symptomimprovement
Age (yr)

How should you analyse/report this result?
• We tested whether X affects anxiety -TRUE
• We tested whether X affects anxiety in people aged over 36 years – UNTRUE, and most
would agree unacceptable
• We tested whether age affects the impact of X on anxiety – UNTRUE, but many would think
acceptable, given results
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
16 20 24 28 32 36 40 44 48 52 56 60
Symptomimprovement
Age (yr)

Improve your study with pre-registration

Close link between p-hacking and HARKing
You are HARKing if you have no prior predictions, but on seeing results you write up paper as
if you planned to look at effect of age on drug effect.
This kind of thing is endemic in psychology.
• It is OK to say that this association was observed in exploratory analysis, and that it
suggests a new hypothesis that needs to be tested in a new sample.
• It is NOT OK to pretend that you predicted the association if you didn’t.
• And it is REALLY REALLY NOT OK to report only the data that support your new hypothesis
(e.g. dropping those aged below 36 from the analysis)
-1
-0.5
0
0.5
1
16 20 24 28 32 36 40 44 48 52 56 60
Symptom
improvement
Age (yr)

HARKING
Capitalises on chance and produces huge risk of
false positives
Widespread in many fields – and even explicitly
encouraged by some influential people

Which Article Should You Write?
There are two possible articles you can write: (a) the article you planned to
write when you designed your study or (b) the article that makes the most sense
now that you have seen the results. They are rarely the same, and the correct
answer is (b).
re Data Analysis: Examine them from every angle. Analyze the sexes separately.
Make up new composite indexes. If a datum suggests a new hypothesis, try to
find additional evidence for it elsewhere in the data. If you see dim traces of
interesting patterns, try to reorganize the data to bring them into bolder relief. If
there are participants you don’t like, or trials, observers, or interviewers who
gave you anomalous results, drop them (temporarily). Go on a fishing expedition
for something— anything —interesting.
Writing the Empirical Journal Article
Daryl J. Bem
The Compleat Academic: A Practical Guide for the Beginning Social
Scientist, 2nd Edition. Washington, DC: American Psychological
Association, 2004.
“This book provides invaluable guidance that will help new academics plan,
play, and ultimately win the academic career game.”
Explicitly advises
HARKing!

HARKING seems innocuous but it fills the
literature with dross

‘We report how we determined our sample size, all
data exclusions (if any), all manipulations, and all
measures in the study.’
Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2012).
A 21 Word Solution. SPSP Dialogue,26, 2, Fall 2012 issue.
One solution to protect against HARKing:
Suggested wording in write-up to keep researchers honest

A more comprehensive solution: Pre-registration

Plan study
Do study
Submit to
journal
Respond
to
reviewer
comments
Publish
paper
Acceptance!
Classic publishing

Plan study
Do study
Submit to
journal
Respond
to
reviewer
comments
Publish
paper
Plan study Submit to
journal
Respond
to
reviewer
comments
Do study
Publish
paper
Acceptance!
Classic publishing
Registered reports
Acceptance!

Registered reports solves issues of:
• Publication bias: publication decision made on the
basis of quality of introduction/methods, before
results are known
• Low power: researchers required to have 90%
power
• P-hacking: analysis plan specified up-front
• HARKing: hypotheses specified up-front.
Unanticipated findings can be reported but clearly
demarcated as ‘exploratory’

Registered reports
But problematic for student projects because:
• Time-scale means delay before data collected
• Power requirements often hard to meet
Plan study Submit to
journal
Respond
to
reviewer
comments
Do study
Publish
paper
Acceptance!

An alternative: Preregistration lite:
Open Science Framework

Pre-registration on OSF
• Similar to regular publication route
• No guarantee of publication
• But reviewers generally positive about preregistered
papers because prevents p-hacking or HARKing
• And benefits of having well-worked out plan – less stress
when it comes to making sense of data
Plan study
Submit
plan to
OSF
Check by
OSF
statistician
Do study
Submit to
journal
Respond
to
reviewer
comments
Publish
paper
Acceptance!

Advantages of pre-registration through OSF
• Free methodological/statistical consulting:
http://cos.io/stats_consulting
• Have a date-stamped record of what was planned
which can be referred to when publishing
• Encourages open data and scripts
• Work has greater impact
• Errors get detected
• Improves reproducibility
• Possibility of winning $1000 !

https://osf.io/jea94/
Even if you decide not to formally pre-register, this
template is likely to be useful for planning your project
Here I won’t go through template, but I will note points
that crop up when trying to complete it

What is your research question?
Points to consider
• What type of question? Yes/No, Why, How?
• How could it be improved? – is it too general/too precise
Hypotheses
• Can you formulate specific predictions?
• E.g. X will be bigger than Y
• X will be bigger than zero
• X will vary systematically with Y
• Are predictions directional?
Study information

What is your study type?
Are you systematically manipulating a variable to see its effect?
= Experiment
Or are you looking at relationships between variables that occur
naturally?
= Observational Study

Rationale for proposed sample size
OSF note this could include a power analysis but also constraints such as
time, money, availability of a particular group
Power analysis
Can use GPower
http://www.gpower.hhu.de/en.html
For more complex designs, simulate data with given effect size and repeatedly
run through analysis to see how often you can detect the effect of interest
(see Lazic, 2016)
N.B. Power analysis is not the only way to rationalise sample size, but it is
most common
Sample size

When we talk of ‘control’ we usually mean ‘negative’ controls – where we
compare effect of X with a situation that is identical except for X – to isolate
specific effect.
In a positive control: aim is to rule out trivial explanations for null result – e.g.
manipulation didn’t work; participants didn’t attend, etc.
Examples:
You are comparing autistic and typically developing children on a false memory
task
– does paradigm yield false memory effect in the typically-developing group?
You are interested whether there is suppression of the mu frequency band in EEG
(regarded as ‘mirror neuron’ activity) when participants view hand gestures:
- is there mu suppression when participants perform hand gestures?
Making sense of null results:
1. Positive controls

Making sense of null results:
2. Bayes factors
”Bayes factors provide a coherent approach to determining whether non-
significant results support a null hypothesis over a theory, or whether the
data are just insensitive.”

Designing an analysis script using simulated data
• Aim: create a process that is completely transparent and increase the
likelihood that your analysis can be replicated
• Analysis script reads in raw data as input, does all analysis and generates
Tables/Figures/Summary statistics as output
• Avoids common scenario where researcher cannot remember how results
were generated
• Scripts need to be extensively ’commented’ to explain what the code is doing
• Increasingly researchers are using scripts written in R, Matlab or Python, but
can also create scripts easily in SPSS.

Building a script in SPSS
We start by simulating some data (see last
week’s lecture)
Can be either random data (null hypothesis)
or data with an effect of interest added
Here are random numbers (Score) allocated
to groups 1 and 2

Now do whatever analysis steps you
usually do via the GUI
Here we have selected options for a t-
test
But instead of hitting OK, we hit Paste

Hitting Paste opens a new
window with a script in it
This shows the code behind
the analysis done in the GUI
You can run the analysis by
pressing the big green
button
You can add to the script,
add comments, and save it.
Then you can rerun it any
time with a new dataset
Benefit: you have a
complete record of the
analysis
Hitting Paste opens a new
window with a script in it
This shows the code behind
the analysis done in the GUI
You can run the analysis by
pressing the big green
button
You can add to the script,
add comments, and save it.
Then you can rerun it any
time with a new dataset
Benefit: you have a
complete record of the
analysis

Free Coursera lectures
https://www.coursera.org/learn/statistical-inferences
Further suggestions for study

https://www.slideshare.net/deevybishop/bishop-reproducibility-references-nov2016
https://www.slideshare.net/deevybishop/what-is-the-reproducibility-crisis-in-science-
and-what-can-we-do-about-it
Experimental Design for
Laboratory Biologists :
Maximising Information and
Improving Reproducibility
Stanley E Lazic
The 7 deadly sins of
Psychology
Chris Chambers

http://christophergandrud.github.io/RepResR-RStudio/
“Treat all of your research files as if someone who has not worked on
the project will, in the future, try to understand them.”

https://images.nature.com/original/nature-
cms/uploads/ckeditor/attachments/4127/RegisteredReportsGuidelines_NatureHumanBe
haviour.pdf
Require either 95% power or Bayesian equivalent
For Bayes, recommend https://osf.io/d4dcu/
Going further:
author/reviewer guidelines for Registered Reports in Nature Human Behaviour

Improve your study with pre-registration

More Related Content

What's hot (20)

Similar to Improve your study with pre-registration (20)

More from Dorothy Bishop (20)

Recently uploaded (20)

Improve your study with pre-registration