Mathematically Elegant Answers
to Questions No One is Asking
Uri Simonsohn
The overarching concern motivating this talk
• Reality check
• Stat folks: sorry, we have mere supporting roles
• Our research has no intrinsic value
• Extrinsic value: help researchers answer their questions
• As JDMer I worry
• "Do we study things we find interesting, but aren't useful?"
• As Methodologist I worry
• "Do we study things we find interesting, but aren't useful?"
• But it's worse
• Most MBA students can decide whether 'embodied cognition'* is silly
• Most researchers can't decide whether 'Random Effects' are silly
• It's on us to be more transparent about what a method actually does
• Stop taking the math literally
• Start taking researchers seriously
web: http://urisohn.com | Blog: http://datacolada.org
I think of it as a transparency issue
• Important that other methodologists can check our work
• Also important: researchers can evaluate if our work is useful
• Need to transparently (non-technically) explain actual trade-offs
• Not philosophical platitudes (likely to be misinterpreted)
web: http://urisohn.com | Blog: http://datacolada.org
How do researchers study things?
How they choose study designs?
(meta-analytical mean; random effects; Bayes factors)
Taking math literally Taking researchers seriously
Drawn at random Carefully curated, actively non-random
From defined populations From undefined/inexistent populations
(generally)
With known distributions No population à no distribution
If they exist, each researcher their own
Goal: estimate population mean effect Goal: local test of this effect
Qualitative generalization based on thinking
web: http://urisohn.com | Blog: http://datacolada.org
Outline
My Claim: Researchers don't want the answers provided by these tools
1. Mixed models
2. Meta-analysis
3. Bayes Factors
(Platonic generalizability)
(Overall means or subgroup means)
(testing some average hypothesis)
web: http://urisohn.com | Blog: http://datacolada.org
Non-Random Effects:
Designing & Analyzing Experiments with
Multiple Stimuli (in The Real World)
Uri Simonsohn
ESADE, Barcelona
Andres Montealegre
Cornell (PhD Student)
Ioannis Evangelidis
ESADE, Barcelona
!"#$#%&#$%
&'('))*+,
Free Trial
• Hard & applied sciences
• What’s the impact of this vaccine?
• What’s the impact of defaults?
• Psychology
• What’s the impact of disgust on moral judgments?
Psychology’s unique experimental challenge
à
Randomize vaccine à Got Covid?
Randomize default à % organ donors?
Moral judgment
Is this ok? (1-7)
Psychology experiments produce mere correlations
(this seems simultaneously obvious and earth shattering)
• We randomly assign stimuli to participants
• We do not randomly assign attributes to stimuli
• Stimuli are confounded
è psychology experiments are confounded
• Example 1: Rubenstein et al (1971)
• Homophonic words: slower recognition
• Participants randomly shown words, e.g. Pray & Pest
• Pray NOT randomly assigned to have homophone
• Reaction time to Pray vs Pest is confounded
web: http://urisohn.com | Blog: http://datacolada.org
Trainspotting
toilet scene
The Champ
dead father scene
Random assignment
Disgust
Arousal
Study perceived as
objectionable
No Disgust
Unfairness
Nuclear family
reminder
Judged
morality of
incestuous sex
Psychology experiments produce mere correlations
(this seems simultaneously obvious and earth shattering)
Example 2: Emotion induction papers
Endowment
effect
Mixed-model consensus
• Concern is external validity
• Generalize beyond chosen stimuli
• Recommendations:
• Many stimuli
• Use mixed models
• Says nothing on :
• How to select stimuli (beyond, choose many, at random)
• How to learn from stimuli variation
[Clark 1971] >2,900 citations
[Baayen, Davidson, & Bates, 2008] >8,400 citations
[Barr, Levy, Scheepers, & Tily, 2013] >8,100 citations
[Judd, Westfall, & Kenny, 2012] >1,100 citations
web: http://urisohn.com | Blog: http://datacolada.org
Skipping:
Our paper proposes "Match-and-Mix 1.0"
6 steps to choosing (a few) stimuli
For this talk:
Let's focus on the statistical analysis of multi-stimuli experiments
web: http://urisohn.com | Blog: http://datacolada.org
Analyzing Studies with Many Stimuli
Example: Endowment-effect
Dependence Variation
across stimuli
Generalizability
t-test aggregate at
subject level
Cancels out or
can’t do this
Researcher does it
(critical thinking)
regression cluster SE at
subject level
stimuli fixed
effects
Researcher does it
(critical thinking)
mixed-
model
subject
random effect
stimuli
random
intercept
Platonic
Generalizability
(stimuli random slopes)
$75
$14
$6
CHALLENGES
TOOLS
Platonic Generalizability
1. Assume a population of all possible stimuli exist
• All goods that exist
• All goods one could imagine
à Now average them
• People. 50:50 Women:Men
• Endowment effect:
• x% Mugs
• y% Obama dinners
• z% refurbished iPhone 11
• "The" effect we estimate: weighed mean
2. Assume stimuli were chosen at random from it
3. Assume researcher wants to generalize / estimate (1)
(1) exists in theory only à We call it platonic generalizability.
web: http://urisohn.com | Blog: http://datacolada.org
If it were free to get platonic generalizability we may buy it.
But it is very expensive.
Next. Simulations for statistical power
1) Participants see n out of n stimuli
2) Participants see 2 out of n stimuli
web: http://urisohn.com | Blog: http://datacolada.org
Case 1. Subjects see n out of n
Takeaways:
- Nothing beats t-test
- Platonic mixed model
has real power costs
(but recovers as
stimuli increases)
Case 2. Subjects see 2 out of n
Takeaway:
Controlling for
stimuli increases
power when k of n
Mixed model still
expensive
Same pattern
Mixed model advocates know about power
• But they don’t care
• They worry t-tests have too many false-positives
• We sure care about false-positives
• But not about those
• We think they are true-positives.
• This can get philosophical…
…let's make it super concrete.
Next. Let’s contrast those two perspectives in a figure.
S1
S2
S3
S4
S5
S6
Watching bestiality video
Read story of organs
thief who BBQs them
Imagine man clipping
nails in metro
Watch video of man
clipping nails in metro
Watch scene from
“Trainspotting”
Hold bucket
of vomit for
3 minutes
Effect of stimulus on dependent variable Effect of disgust stimulus on immorality of cousin-sex
Proportion
of
Stimuli
Platonic generalizability:
Mean of all possible stimuli is 0?
Construct validity
Do we generally get the effect when expected?
Again, "the" population mean does not exist:
Men:Women 50:50
Nail clipping men vs Trainspotting videos x%:y%?
S4 and S5 are true-positive effects
• Our interest as researchers should guide the tools we use
• Not vice versa
• We thus propose a tool to assess if you generally get an effect when
you expect it.
‘Stimuli Plots’
web: http://urisohn.com | Blog: http://datacolada.org
Stimuli Plots
• Compute effect for each matched-pair of stimuli in control condition
• Assess if effect is obtained in general
• Assess if variation identifies
• Possible confounds
• Interesting moderators
• Ideas for the next study
Next: stimuli plots for three published papers
Paper 1. Kupfer et al (2020)
Means by Stimuli Effects by Stimuli
P =.0046
web: http://urisohn.com | Blog: http://datacolada.org
No effect
Paper 2. Salerno & Slepian (2022)
web: http://urisohn.com | Blog: http://datacolada.org
Paper 3. Rottman & Young (2019)
Two points from our perspective:
1) Stimuli not matched purity/harm
2) Within harm: deer-hunting is an outlier
web: http://urisohn.com | Blog: http://datacolada.org
• Contrast information provided by t-test & stimuli-level data
• With mixed-model results
Generalizability
web: http://urisohn.com | Blog: http://datacolada.org
Outline
My Claim: Researchers don't want the answers provided by these tools
1. Mixed models
2. Meta-analysis
3. Bayes Factors
(Platonic generalizability)
(Overall means or subgroup means)
(average hypothesis)
web: http://urisohn.com | Blog: http://datacolada.org
Also makes sense if taking math literally
1. Population of effects exists
2. Researchers sample at random
3. Estimand: overall mean
web: http://urisohn.com | Blog: http://datacolada.org
web: http://urisohn.com | Blog: http://datacolada.org
Why meaningless?
1) No quality control (skip here)
2) Combining incommensurate results
web: http://urisohn.com | Blog: http://datacolada.org
Example #1 of Incommensurate Findings
PNAS Nudge Meta-analysis
http://datacolada.org/105
Estimate #1
web: http://urisohn.com | Blog: http://datacolada.org
Effect Size
d = - .12
Estimate #2
web: http://urisohn.com | Blog: http://datacolada.org
d= 1.18
meta-analysis
• Our estimate of 'the' effect of reminders":
web: http://urisohn.com | Blog: http://datacolada.org
+
----------------------------
2
-.12 +1.18
= d=.53
Yeah. That's what
we wanted to
know
Example #2 of Incommensurate Findings
Econometrica Nudge Meta-analysis
http://datacolada.org/106
+ 51%
+ 7%
+ 4%
The average environment nudge: ~21%
That average only makes sense if we take the math literally.
• There is no population of effects
(What % of nudges involve website defaults vs researchers stopping by?)
• Researchers do not run studies at random
• Readers do not want to know the average effect
web: http://urisohn.com | Blog: http://datacolada.org
Outline
My Claim: Researchers don't want the answers provided by these tools
1. Mixed models
2. Meta-analysis
3. Bayes Factors
(Platonic generalizability)
(Overall means or subgroup means)
(the average hypothesis)
web: http://urisohn.com | Blog: http://datacolada.org
Data Colada [78]
Likelihoods observing d=.5
• Let's do that for every possible hypothesis
• Not just t-shirt sizes
web: http://urisohn.com | Blog: http://datacolada.org
Uri's claims
1) Confidently.
Many researchers would
like this chart and would
speak to their question.
2) Semi confidently
But probably be
persuaded confidence
intervals actually have
the info they want
3) Most confident
Nobody wants the
average blue number
Especially not weighted
by assumed N(0 , .71)
i.e. Bayes Factor
Bayes Factors
• Taking math literally
• Assume there is a population of effect size
• Assume it is centered at 0 and symmetric
• Assume researchers draw studies at random
• Assume they wish to know if any particular study is:
A) more consistent with that family of all possible effects (including 0)
B) Null of d=0.
What researcher would read that and say "that's exactly what I want" ?
web: http://urisohn.com | Blog: http://datacolada.org
Discussions
Math literally
• Ha ha, that's not "evidence"
• This or that paradox
• Don't you want to have a principled guide for inference?
Researchers seriously
• Does your research question involve an average of hypotheses with
these particular weights?
web: http://urisohn.com | Blog: http://datacolada.org
Shortcomings to my argument
• I am equating my take on researchers being taken seriously
• It is possible to make common-sense arguments against many ideas
• That's OK. We can have those arguments.
• The meta-point:
à we need methods arguments real researchers can play jury to
web: http://urisohn.com | Blog: http://datacolada.org

Mathematically Elegant Answers to Research Questions No One is Asking (meta-analysis, random effects models, and Bayes factors)

  • 1.
    Mathematically Elegant Answers toQuestions No One is Asking Uri Simonsohn
  • 2.
    The overarching concernmotivating this talk • Reality check • Stat folks: sorry, we have mere supporting roles • Our research has no intrinsic value • Extrinsic value: help researchers answer their questions • As JDMer I worry • "Do we study things we find interesting, but aren't useful?" • As Methodologist I worry • "Do we study things we find interesting, but aren't useful?" • But it's worse • Most MBA students can decide whether 'embodied cognition'* is silly • Most researchers can't decide whether 'Random Effects' are silly • It's on us to be more transparent about what a method actually does • Stop taking the math literally • Start taking researchers seriously web: http://urisohn.com | Blog: http://datacolada.org
  • 3.
    I think ofit as a transparency issue • Important that other methodologists can check our work • Also important: researchers can evaluate if our work is useful • Need to transparently (non-technically) explain actual trade-offs • Not philosophical platitudes (likely to be misinterpreted) web: http://urisohn.com | Blog: http://datacolada.org
  • 4.
    How do researchersstudy things? How they choose study designs? (meta-analytical mean; random effects; Bayes factors) Taking math literally Taking researchers seriously Drawn at random Carefully curated, actively non-random From defined populations From undefined/inexistent populations (generally) With known distributions No population à no distribution If they exist, each researcher their own Goal: estimate population mean effect Goal: local test of this effect Qualitative generalization based on thinking web: http://urisohn.com | Blog: http://datacolada.org
  • 5.
    Outline My Claim: Researchersdon't want the answers provided by these tools 1. Mixed models 2. Meta-analysis 3. Bayes Factors (Platonic generalizability) (Overall means or subgroup means) (testing some average hypothesis) web: http://urisohn.com | Blog: http://datacolada.org
  • 6.
    Non-Random Effects: Designing &Analyzing Experiments with Multiple Stimuli (in The Real World) Uri Simonsohn ESADE, Barcelona Andres Montealegre Cornell (PhD Student) Ioannis Evangelidis ESADE, Barcelona !"#$#%&#$% &'('))*+, Free Trial
  • 7.
    • Hard &applied sciences • What’s the impact of this vaccine? • What’s the impact of defaults? • Psychology • What’s the impact of disgust on moral judgments? Psychology’s unique experimental challenge à Randomize vaccine à Got Covid? Randomize default à % organ donors? Moral judgment Is this ok? (1-7)
  • 8.
    Psychology experiments producemere correlations (this seems simultaneously obvious and earth shattering) • We randomly assign stimuli to participants • We do not randomly assign attributes to stimuli • Stimuli are confounded è psychology experiments are confounded • Example 1: Rubenstein et al (1971) • Homophonic words: slower recognition • Participants randomly shown words, e.g. Pray & Pest • Pray NOT randomly assigned to have homophone • Reaction time to Pray vs Pest is confounded web: http://urisohn.com | Blog: http://datacolada.org
  • 9.
    Trainspotting toilet scene The Champ deadfather scene Random assignment Disgust Arousal Study perceived as objectionable No Disgust Unfairness Nuclear family reminder Judged morality of incestuous sex Psychology experiments produce mere correlations (this seems simultaneously obvious and earth shattering) Example 2: Emotion induction papers Endowment effect
  • 10.
    Mixed-model consensus • Concernis external validity • Generalize beyond chosen stimuli • Recommendations: • Many stimuli • Use mixed models • Says nothing on : • How to select stimuli (beyond, choose many, at random) • How to learn from stimuli variation [Clark 1971] >2,900 citations [Baayen, Davidson, & Bates, 2008] >8,400 citations [Barr, Levy, Scheepers, & Tily, 2013] >8,100 citations [Judd, Westfall, & Kenny, 2012] >1,100 citations web: http://urisohn.com | Blog: http://datacolada.org
  • 11.
    Skipping: Our paper proposes"Match-and-Mix 1.0" 6 steps to choosing (a few) stimuli For this talk: Let's focus on the statistical analysis of multi-stimuli experiments web: http://urisohn.com | Blog: http://datacolada.org
  • 12.
    Analyzing Studies withMany Stimuli Example: Endowment-effect Dependence Variation across stimuli Generalizability t-test aggregate at subject level Cancels out or can’t do this Researcher does it (critical thinking) regression cluster SE at subject level stimuli fixed effects Researcher does it (critical thinking) mixed- model subject random effect stimuli random intercept Platonic Generalizability (stimuli random slopes) $75 $14 $6 CHALLENGES TOOLS
  • 13.
    Platonic Generalizability 1. Assumea population of all possible stimuli exist • All goods that exist • All goods one could imagine à Now average them • People. 50:50 Women:Men • Endowment effect: • x% Mugs • y% Obama dinners • z% refurbished iPhone 11 • "The" effect we estimate: weighed mean 2. Assume stimuli were chosen at random from it 3. Assume researcher wants to generalize / estimate (1) (1) exists in theory only à We call it platonic generalizability. web: http://urisohn.com | Blog: http://datacolada.org
  • 14.
    If it werefree to get platonic generalizability we may buy it. But it is very expensive. Next. Simulations for statistical power 1) Participants see n out of n stimuli 2) Participants see 2 out of n stimuli web: http://urisohn.com | Blog: http://datacolada.org
  • 15.
    Case 1. Subjectssee n out of n Takeaways: - Nothing beats t-test - Platonic mixed model has real power costs (but recovers as stimuli increases)
  • 16.
    Case 2. Subjectssee 2 out of n Takeaway: Controlling for stimuli increases power when k of n Mixed model still expensive Same pattern
  • 17.
    Mixed model advocatesknow about power • But they don’t care • They worry t-tests have too many false-positives • We sure care about false-positives • But not about those • We think they are true-positives. • This can get philosophical… …let's make it super concrete.
  • 18.
    Next. Let’s contrastthose two perspectives in a figure.
  • 19.
    S1 S2 S3 S4 S5 S6 Watching bestiality video Readstory of organs thief who BBQs them Imagine man clipping nails in metro Watch video of man clipping nails in metro Watch scene from “Trainspotting” Hold bucket of vomit for 3 minutes Effect of stimulus on dependent variable Effect of disgust stimulus on immorality of cousin-sex Proportion of Stimuli Platonic generalizability: Mean of all possible stimuli is 0? Construct validity Do we generally get the effect when expected? Again, "the" population mean does not exist: Men:Women 50:50 Nail clipping men vs Trainspotting videos x%:y%? S4 and S5 are true-positive effects
  • 20.
    • Our interestas researchers should guide the tools we use • Not vice versa • We thus propose a tool to assess if you generally get an effect when you expect it. ‘Stimuli Plots’ web: http://urisohn.com | Blog: http://datacolada.org
  • 21.
    Stimuli Plots • Computeeffect for each matched-pair of stimuli in control condition • Assess if effect is obtained in general • Assess if variation identifies • Possible confounds • Interesting moderators • Ideas for the next study Next: stimuli plots for three published papers
  • 22.
    Paper 1. Kupferet al (2020) Means by Stimuli Effects by Stimuli P =.0046 web: http://urisohn.com | Blog: http://datacolada.org
  • 23.
    No effect Paper 2.Salerno & Slepian (2022) web: http://urisohn.com | Blog: http://datacolada.org
  • 24.
    Paper 3. Rottman& Young (2019) Two points from our perspective: 1) Stimuli not matched purity/harm 2) Within harm: deer-hunting is an outlier web: http://urisohn.com | Blog: http://datacolada.org
  • 25.
    • Contrast informationprovided by t-test & stimuli-level data • With mixed-model results Generalizability web: http://urisohn.com | Blog: http://datacolada.org
  • 26.
    Outline My Claim: Researchersdon't want the answers provided by these tools 1. Mixed models 2. Meta-analysis 3. Bayes Factors (Platonic generalizability) (Overall means or subgroup means) (average hypothesis) web: http://urisohn.com | Blog: http://datacolada.org
  • 27.
    Also makes senseif taking math literally 1. Population of effects exists 2. Researchers sample at random 3. Estimand: overall mean web: http://urisohn.com | Blog: http://datacolada.org
  • 28.
    web: http://urisohn.com |Blog: http://datacolada.org
  • 29.
    Why meaningless? 1) Noquality control (skip here) 2) Combining incommensurate results web: http://urisohn.com | Blog: http://datacolada.org
  • 30.
    Example #1 ofIncommensurate Findings PNAS Nudge Meta-analysis http://datacolada.org/105
  • 31.
    Estimate #1 web: http://urisohn.com| Blog: http://datacolada.org Effect Size d = - .12
  • 32.
    Estimate #2 web: http://urisohn.com| Blog: http://datacolada.org d= 1.18
  • 33.
    meta-analysis • Our estimateof 'the' effect of reminders": web: http://urisohn.com | Blog: http://datacolada.org + ---------------------------- 2 -.12 +1.18 = d=.53 Yeah. That's what we wanted to know
  • 34.
    Example #2 ofIncommensurate Findings Econometrica Nudge Meta-analysis http://datacolada.org/106 + 51% + 7% + 4% The average environment nudge: ~21%
  • 35.
    That average onlymakes sense if we take the math literally.
  • 36.
    • There isno population of effects (What % of nudges involve website defaults vs researchers stopping by?) • Researchers do not run studies at random • Readers do not want to know the average effect web: http://urisohn.com | Blog: http://datacolada.org
  • 37.
    Outline My Claim: Researchersdon't want the answers provided by these tools 1. Mixed models 2. Meta-analysis 3. Bayes Factors (Platonic generalizability) (Overall means or subgroup means) (the average hypothesis) web: http://urisohn.com | Blog: http://datacolada.org
  • 38.
  • 39.
  • 40.
    • Let's dothat for every possible hypothesis • Not just t-shirt sizes web: http://urisohn.com | Blog: http://datacolada.org
  • 41.
    Uri's claims 1) Confidently. Manyresearchers would like this chart and would speak to their question. 2) Semi confidently But probably be persuaded confidence intervals actually have the info they want 3) Most confident Nobody wants the average blue number Especially not weighted by assumed N(0 , .71) i.e. Bayes Factor
  • 42.
    Bayes Factors • Takingmath literally • Assume there is a population of effect size • Assume it is centered at 0 and symmetric • Assume researchers draw studies at random • Assume they wish to know if any particular study is: A) more consistent with that family of all possible effects (including 0) B) Null of d=0. What researcher would read that and say "that's exactly what I want" ? web: http://urisohn.com | Blog: http://datacolada.org
  • 43.
    Discussions Math literally • Haha, that's not "evidence" • This or that paradox • Don't you want to have a principled guide for inference? Researchers seriously • Does your research question involve an average of hypotheses with these particular weights? web: http://urisohn.com | Blog: http://datacolada.org
  • 44.
    Shortcomings to myargument • I am equating my take on researchers being taken seriously • It is possible to make common-sense arguments against many ideas • That's OK. We can have those arguments. • The meta-point: à we need methods arguments real researchers can play jury to web: http://urisohn.com | Blog: http://datacolada.org