SlideShare a Scribd company logo
ABC for model choice
1 simulation-based methods in
Econometrics
2 Genetics of ABC
3 Approximate Bayesian computation
4 ABC for model choice
5 ABC model choice via random forests
6 ABC estimation via random forests
7 [some] asymptotics of ABC
Bayesian model choice
Several models M1, M2, . . . are considered simultaneously for a
dataset y and the model index M is part of the inference.
Use of a prior distribution. π(M = m), plus a prior distribution on
the parameter conditional on the value m of the model index,
πm(θm)
Goal is to derive the posterior distribution of M, challenging
computational target when models are complex.
Generic ABC for model choice
Algorithm 4 Likelihood-free model choice sampler (ABC-MC)
for t = 1 to T do
repeat
Generate m from the prior π(M = m)
Generate θm from the prior πm(θm)
Generate z from the model fm(z|θm)
until ρ{η(z), η(y)} <
Set m(t) = m and θ(t)
= θm
end for
ABC estimates
Posterior probability π(M = m|y) approximated by the frequency
of acceptances from model m
1
T
T
t=1
Im(t)=m .
Issues with implementation:
• should tolerances be the same for all models?
• should summary statistics vary across models (incl. their
dimension)?
• should the distance measure ρ vary as well?
ABC estimates
Posterior probability π(M = m|y) approximated by the frequency
of acceptances from model m
1
T
T
t=1
Im(t)=m .
Extension to a weighted polychotomous logistic regression estimate
of π(M = m|y), with non-parametric kernel weights
[Cornuet et al., DIYABC, 2009]
The Great ABC controversy
On-going controvery in phylogeographic genetics about the validity
of using ABC for testing
Against: Templeton, 2008,
2009, 2010a, 2010b, 2010c
argues that nested hypotheses
cannot have higher probabilities
than nesting hypotheses (!)
The Great ABC controversy
On-going controvery in phylogeographic genetics about the validity
of using ABC for testing
Against: Templeton, 2008,
2009, 2010a, 2010b, 2010c
argues that nested hypotheses
cannot have higher probabilities
than nesting hypotheses (!)
Replies: Fagundes et al., 2008,
Beaumont et al., 2010, Berger et
al., 2010, Csill`ery et al., 2010
point out that the criticisms are
addressed at [Bayesian]
model-based inference and have
nothing to do with ABC...
Gibbs random fields
Gibbs distribution
The rv y = (y1, . . . , yn) is a Gibbs random field associated with
the graph G if
f (y) =
1
Z
exp −
c∈C
Vc(yc) ,
where Z is the normalising constant, C is the set of cliques of G
and Vc is any function also called potential sufficient statistic
U(y) = c∈C Vc(yc) is the energy function
Gibbs random fields
Gibbs distribution
The rv y = (y1, . . . , yn) is a Gibbs random field associated with
the graph G if
f (y) =
1
Z
exp −
c∈C
Vc(yc) ,
where Z is the normalising constant, C is the set of cliques of G
and Vc is any function also called potential sufficient statistic
U(y) = c∈C Vc(yc) is the energy function
c Z is usually unavailable in closed form
Potts model
Potts model
Vc(y) is of the form
Vc(y) = θS(y) = θ
l∼i
δyl =yi
where l∼i denotes a neighbourhood structure
Potts model
Potts model
Vc(y) is of the form
Vc(y) = θS(y) = θ
l∼i
δyl =yi
where l∼i denotes a neighbourhood structure
In most realistic settings, summation
Zθ =
x∈X
exp{θT
S(x)}
involves too many terms to be manageable and numerical
approximations cannot always be trusted
[Cucala, Marin, CPR & Titterington, 2009]
Bayesian Model Choice
Comparing a model with potential S0 taking values in Rp0 versus a
model with potential S1 taking values in Rp1 can be done through
the Bayes factor corresponding to the priors π0 and π1 on each
parameter space
Bm0/m1
(x) =
exp{θT
0 S0(x)}/Zθ0,0π0(dθ0)
exp{θT
1 S1(x)}/Zθ1,1π1(dθ1)
Bayesian Model Choice
Comparing a model with potential S0 taking values in Rp0 versus a
model with potential S1 taking values in Rp1 can be done through
the Bayes factor corresponding to the priors π0 and π1 on each
parameter space
Bm0/m1
(x) =
exp{θT
0 S0(x)}/Zθ0,0π0(dθ0)
exp{θT
1 S1(x)}/Zθ1,1π1(dθ1)
Use of Jeffreys’ scale to select most appropriate model
Neighbourhood relations
Choice to be made between M neighbourhood relations
i
m
∼ i (0 ≤ m ≤ M − 1)
with
Sm(x) =
i
m
∼i
I{xi =xi }
driven by the posterior probabilities of the models.
Model index
Formalisation via a model index M that appears as a new
parameter with prior distribution π(M = m) and
π(θ|M = m) = πm(θm)
Model index
Formalisation via a model index M that appears as a new
parameter with prior distribution π(M = m) and
π(θ|M = m) = πm(θm)
Computational target:
P(M = m|x) ∝
Θm
fm(x|θm)πm(θm) dθm π(M = m) ,
Sufficient statistics
By definition, if S(x) sufficient statistic for the joint parameters
(M, θ0, . . . , θM−1),
P(M = m|x) = P(M = m|S(x)) .
Sufficient statistics
By definition, if S(x) sufficient statistic for the joint parameters
(M, θ0, . . . , θM−1),
P(M = m|x) = P(M = m|S(x)) .
For each model m, own sufficient statistic Sm(·) and
S(·) = (S0(·), . . . , SM−1(·)) also sufficient.
Sufficient statistics in Gibbs random fields
For Gibbs random fields,
x|M = m ∼ fm(x|θm) = f 1
m(x|S(x))f 2
m(S(x)|θm)
=
1
n(S(x))
f 2
m(S(x)|θm)
where
n(S(x)) = {˜x ∈ X : S(˜x) = S(x)}
c S(x) is therefore also sufficient for the joint parameters
[Specific to Gibbs random fields!]
ABC model choice Algorithm
ABC-MC
• Generate m∗ from the prior π(M = m).
• Generate θ∗
m∗ from the prior πm∗ (·).
• Generate x∗ from the model fm∗ (·|θ∗
m∗ ).
• Compute the distance ρ(S(x0), S(x∗)).
• Accept (θ∗
m∗ , m∗) if ρ(S(x0), S(x∗)) < .
Note When = 0 the algorithm is exact
ABC approximation to the Bayes factor
Frequency ratio:
BFm0/m1
(x0
) =
ˆP(M = m0|x0)
ˆP(M = m1|x0)
×
π(M = m1)
π(M = m0)
=
{mi∗ = m0}
{mi∗ = m1}
×
π(M = m1)
π(M = m0)
,
ABC approximation to the Bayes factor
Frequency ratio:
BFm0/m1
(x0
) =
ˆP(M = m0|x0)
ˆP(M = m1|x0)
×
π(M = m1)
π(M = m0)
=
{mi∗ = m0}
{mi∗ = m1}
×
π(M = m1)
π(M = m0)
,
replaced with
BFm0/m1
(x0
) =
1 + {mi∗ = m0}
1 + {mi∗ = m1}
×
π(M = m1)
π(M = m0)
to avoid indeterminacy (also Bayes estimate).
Toy example
iid Bernoulli model versus two-state first-order Markov chain, i.e.
f0(x|θ0) = exp θ0
n
i=1
I{xi =1} {1 + exp(θ0)}n
,
versus
f1(x|θ1) =
1
2
exp θ1
n
i=2
I{xi =xi−1} {1 + exp(θ1)}n−1
,
with priors θ0 ∼ U(−5, 5) and θ1 ∼ U(0, 6) (inspired by “phase
transition” boundaries).
Toy example (2)
−40 −20 0 10
−505
BF01
BF
^
01
−40 −20 0 10
−10−50510
BF01
BF
^
01
(left) Comparison of the true BFm0/m1
(x0) with BFm0/m1
(x0) (in
logs) over 2, 000 simulations and 4.106 proposals from the prior.
(right) Same when using tolerance corresponding to the 1%
quantile on the distances.
Back to sufficiency
‘Sufficient statistics for individual models are unlikely to
be very informative for the model probability.’
[Scott Sisson, Jan. 31, 2011, X.’Og]
Back to sufficiency
‘Sufficient statistics for individual models are unlikely to
be very informative for the model probability.’
[Scott Sisson, Jan. 31, 2011, X.’Og]
If η1(x) sufficient statistic for model m = 1 and parameter θ1 and
η2(x) sufficient statistic for model m = 2 and parameter θ2,
(η1(x), η2(x)) is not always sufficient for (m, θm)
Back to sufficiency
‘Sufficient statistics for individual models are unlikely to
be very informative for the model probability.’
[Scott Sisson, Jan. 31, 2011, X.’Og]
If η1(x) sufficient statistic for model m = 1 and parameter θ1 and
η2(x) sufficient statistic for model m = 2 and parameter θ2,
(η1(x), η2(x)) is not always sufficient for (m, θm)
c Potential loss of information at the testing level
Limiting behaviour of B12 (T → ∞)
ABC approximation
B12(y) =
T
t=1 Imt =1 Iρ{η(zt ),η(y)}≤
T
t=1 Imt =2 Iρ{η(zt ),η(y)}≤
,
where the (mt, zt)’s are simulated from the (joint) prior
Limiting behaviour of B12 (T → ∞)
ABC approximation
B12(y) =
T
t=1 Imt =1 Iρ{η(zt ),η(y)}≤
T
t=1 Imt =2 Iρ{η(zt ),η(y)}≤
,
where the (mt, zt)’s are simulated from the (joint) prior
As T go to infinity, limit
B12(y) =
Iρ{η(z),η(y)}≤ π1(θ1)f1(z|θ1) dz dθ1
Iρ{η(z),η(y)}≤ π2(θ2)f2(z|θ2) dz dθ2
=
Iρ{η,η(y)}≤ π1(θ1)f η
1 (η|θ1) dη dθ1
Iρ{η,η(y)}≤ π2(θ2)f η
2 (η|θ2) dη dθ2
,
where f η
1 (η|θ1) and f η
2 (η|θ2) distributions of η(z)
Limiting behaviour of B12 ( → 0)
When goes to zero,
Bη
12(y) =
π1(θ1)f η
1 (η(y)|θ1) dθ1
π2(θ2)f η
2 (η(y)|θ2) dθ2
,
Limiting behaviour of B12 ( → 0)
When goes to zero,
Bη
12(y) =
π1(θ1)f η
1 (η(y)|θ1) dθ1
π2(θ2)f η
2 (η(y)|θ2) dθ2
,
c Bayes factor based on the sole observation of η(y)
Limiting behaviour of B12 (under sufficiency)
If η(y) sufficient statistic for both models,
fi (y|θi ) = gi (y)f η
i (η(y)|θi )
Thus
B12(y) = Θ1
π(θ1)g1(y)f η
1 (η(y)|θ1) dθ1
Θ2
π(θ2)g2(y)f η
2 (η(y)|θ2) dθ2
=
g1(y) π1(θ1)f η
1 (η(y)|θ1) dθ1
g2(y) π2(θ2)f η
2 (η(y)|θ2) dθ2
=
g1(y)
g2(y)
Bη
12(y) .
[Didelot, Everitt, Johansen & Lawson, 2011]
Limiting behaviour of B12 (under sufficiency)
If η(y) sufficient statistic for both models,
fi (y|θi ) = gi (y)f η
i (η(y)|θi )
Thus
B12(y) = Θ1
π(θ1)g1(y)f η
1 (η(y)|θ1) dθ1
Θ2
π(θ2)g2(y)f η
2 (η(y)|θ2) dθ2
=
g1(y) π1(θ1)f η
1 (η(y)|θ1) dθ1
g2(y) π2(θ2)f η
2 (η(y)|θ2) dθ2
=
g1(y)
g2(y)
Bη
12(y) .
[Didelot, Everitt, Johansen & Lawson, 2011]
c No discrepancy only when cross-model sufficiency
Poisson/geometric example
Sample
x = (x1, . . . , xn)
from either a Poisson P(λ) or from a geometric G(p) Then
S =
n
i=1
yi = η(x)
sufficient statistic for either model but not simultaneously
Discrepancy ratio
g1(x)
g2(x)
=
S!n−S / i yi !
1 n+S−1
S
Poisson/geometric discrepancy
Range of B12(x) versus Bη
12(x) B12(x): The values produced have
nothing in common.
Formal recovery
Creating an encompassing exponential family
f (x|θ1, θ2, α1, α2) ∝ exp{θT
1 η1(x) + θT
1 η1(x) + α1t1(x) + α2t2(x)}
leads to a sufficient statistic (η1(x), η2(x), t1(x), t2(x))
[Didelot, Everitt, Johansen & Lawson, 2011]
Formal recovery
Creating an encompassing exponential family
f (x|θ1, θ2, α1, α2) ∝ exp{θT
1 η1(x) + θT
1 η1(x) + α1t1(x) + α2t2(x)}
leads to a sufficient statistic (η1(x), η2(x), t1(x), t2(x))
[Didelot, Everitt, Johansen & Lawson, 2011]
In the Poisson/geometric case, if i xi ! is added to S, no
discrepancy
Formal recovery
Creating an encompassing exponential family
f (x|θ1, θ2, α1, α2) ∝ exp{θT
1 η1(x) + θT
1 η1(x) + α1t1(x) + α2t2(x)}
leads to a sufficient statistic (η1(x), η2(x), t1(x), t2(x))
[Didelot, Everitt, Johansen & Lawson, 2011]
Only applies in genuine sufficiency settings...
c Inability to evaluate loss brought by summary statistics
Meaning of the ABC-Bayes factor
‘This is also why focus on model discrimination typically
(...) proceeds by (...) accepting that the Bayes Factor
that one obtains is only derived from the summary
statistics and may in no way correspond to that of the
full model.’
[Scott Sisson, Jan. 31, 2011, X.’Og]
Meaning of the ABC-Bayes factor
‘This is also why focus on model discrimination typically
(...) proceeds by (...) accepting that the Bayes Factor
that one obtains is only derived from the summary
statistics and may in no way correspond to that of the
full model.’
[Scott Sisson, Jan. 31, 2011, X.’Og]
In the Poisson/geometric case, if E[yi ] = θ0 > 0,
lim
n→∞
Bη
12(y) =
(θ0 + 1)2
θ0
e−θ0
MA(q) divergence
1 2
0.00.20.40.60.81.0
1 2
0.00.20.40.60.81.0
1 2
0.00.20.40.60.81.0
1 2
0.00.20.40.60.81.0
Evolution [against ] of ABC Bayes factor, in terms of frequencies of
visits to models MA(1) (left) and MA(2) (right) when equal to
10, 1, .1, .01% quantiles on insufficient autocovariance distances. Sample
of 50 points from a MA(2) with θ1 = 0.6, θ2 = 0.2. True Bayes factor
equal to 17.71.
MA(q) divergence
1 2
0.00.20.40.60.81.0
1 2
0.00.20.40.60.81.0
1 2
0.00.20.40.60.81.0
1 2
0.00.20.40.60.81.0
Evolution [against ] of ABC Bayes factor, in terms of frequencies of
visits to models MA(1) (left) and MA(2) (right) when equal to
10, 1, .1, .01% quantiles on insufficient autocovariance distances. Sample
of 50 points from a MA(1) model with θ1 = 0.6. True Bayes factor B21
equal to .004.
Further comments
‘There should be the possibility that for the same model,
but different (non-minimal) [summary] statistics (so
different η’s: η1 and η∗
1) the ratio of evidences may no
longer be equal to one.’
[Michael Stumpf, Jan. 28, 2011, ’Og]
Using different summary statistics [on different models] may
indicate the loss of information brought by each set but agreement
does not lead to trustworthy approximations.
A stylised problem
Central question to the validation of ABC for model choice:
When is a Bayes factor based on an insufficient statistic T(y)
consistent?
A stylised problem
Central question to the validation of ABC for model choice:
When is a Bayes factor based on an insufficient statistic T(y)
consistent?
Note/warnin: c drawn on T(y) through BT
12(y) necessarily differs
from c drawn on y through B12(y)
[Marin, Pillai, X, & Rousseau, JRSS B, 2013]
A benchmark if toy example
Comparison suggested by referee of PNAS paper [thanks!]:
[X, Cornuet, Marin, & Pillai, Aug. 2011]
Model M1: y ∼ N(θ1, 1) opposed
to model M2: y ∼ L(θ2, 1/
√
2), Laplace distribution with mean θ2
and scale parameter 1/
√
2 (variance one).
Four possible statistics
1 sample mean y (sufficient for M1 if not M2);
2 sample median med(y) (insufficient);
3 sample variance var(y) (ancillary);
4 median absolute deviation mad(y) = med(|y − med(y)|);
A benchmark if toy example
Comparison suggested by referee of PNAS paper [thanks!]:
[X, Cornuet, Marin, & Pillai, Aug. 2011]
Model M1: y ∼ N(θ1, 1) opposed
to model M2: y ∼ L(θ2, 1/
√
2), Laplace distribution with mean θ2
and scale parameter 1/
√
2 (variance one).
q
q
q
q
q
q
q
q
q
q
q
Gauss Laplace
0.00.10.20.30.40.50.60.7
n=100
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
Gauss Laplace
0.00.20.40.60.81.0
n=100
Framework
Starting from sample
y = (y1, . . . , yn)
the observed sample, not necessarily iid with true distribution
y ∼ Pn
Summary statistics
T(y) = Tn
= (T1(y), T2(y), · · · , Td (y)) ∈ Rd
with true distribution Tn
∼ Gn.
Framework
c Comparison of
– under M1, y ∼ F1,n(·|θ1) where θ1 ∈ Θ1 ⊂ Rp1
– under M2, y ∼ F2,n(·|θ2) where θ2 ∈ Θ2 ⊂ Rp2
turned into
– under M1, T(y) ∼ G1,n(·|θ1), and θ1|T(y) ∼ π1(·|Tn
)
– under M2, T(y) ∼ G2,n(·|θ2), and θ2|T(y) ∼ π2(·|Tn
)
Assumptions
A collection of asymptotic “standard” assumptions:
[A1] is a standard central limit theorem under the true model with
asymptotic mean µ0
[A2] controls the large deviations of the estimator Tn
from the
model mean µ(θ)
[A3] is the standard prior mass condition found in Bayesian
asymptotics (di effective dimension of the parameter)
[A4] restricts the behaviour of the model density against the true
density
[Think CLT!]
Asymptotic marginals
Asymptotically, under [A1]–[A4]
mi (t) =
Θi
gi (t|θi ) πi (θi ) dθi
is such that
(i) if inf{|µi (θi ) − µ0|; θi ∈ Θi } = 0,
Cl vd−di
n ≤ mi (Tn
) ≤ Cuvd−di
n
and
(ii) if inf{|µi (θi ) − µ0|; θi ∈ Θi } > 0
mi (Tn
) = oPn [vd−τi
n + vd−αi
n ].
Between-model consistency
Consequence of above is that asymptotic behaviour of the Bayes
factor is driven by the asymptotic mean value µ(θ) of Tn
under
both models. And only by this mean value!
Between-model consistency
Consequence of above is that asymptotic behaviour of the Bayes
factor is driven by the asymptotic mean value µ(θ) of Tn
under
both models. And only by this mean value!
Indeed, if
inf{|µ0 − µ2(θ2)|; θ2 ∈ Θ2} = inf{|µ0 − µ1(θ1)|; θ1 ∈ Θ1} = 0
then
Cl v
−(d1−d2)
n ≤ m1(Tn
) m2(Tn
) ≤ Cuv
−(d1−d2)
n ,
where Cl , Cu = OPn (1), irrespective of the true model.
c Only depends on the difference d1 − d2: no consistency
Between-model consistency
Consequence of above is that asymptotic behaviour of the Bayes
factor is driven by the asymptotic mean value µ(θ) of Tn
under
both models. And only by this mean value!
Else, if
inf{|µ0 − µ2(θ2)|; θ2 ∈ Θ2} > inf{|µ0 − µ1(θ1)|; θ1 ∈ Θ1} = 0
then
m1(Tn
)
m2(Tn
)
≥ Cu min v
−(d1−α2)
n , v
−(d1−τ2)
n
Checking for adequate statistics
Run a practical check of the relevance (or non-relevance) of Tn
null hypothesis that both models are compatible with the statistic
Tn
H0 : inf{|µ2(θ2) − µ0|; θ2 ∈ Θ2} = 0
against
H1 : inf{|µ2(θ2) − µ0|; θ2 ∈ Θ2} > 0
testing procedure provides estimates of mean of Tn
under each
model and checks for equality
Checking in practice
• Under each model Mi , generate ABC sample θi,l , l = 1, · · · , L
• For each θi,l , generate yi,l ∼ Fi,n(·|ψi,l ), derive Tn
(yi,l ) and
compute
ˆµi =
1
L
L
l=1
Tn
(yi,l ), i = 1, 2 .
• Conditionally on Tn
(y),
√
L { ˆµi − Eπ
[µi (θi )|Tn
(y)]} N(0, Vi ),
• Test for a common mean
H0 : ˆµ1 ∼ N(µ0, V1) , ˆµ2 ∼ N(µ0, V2)
against the alternative of different means
H1 : ˆµi ∼ N(µi , Vi ), with µ1 = µ2 .
Toy example: Laplace versus Gauss
qqqqqqqqqqqqqqq
qqqqqqqqqq
q
qq
q
q
Gauss Laplace Gauss Laplace
010203040
Normalised χ2 without and with mad

More Related Content

PDF
ABC short course: final chapters
Christian Robert
 
PDF
ABC short course: survey chapter
Christian Robert
 
PDF
ABC workshop: 17w5025
Christian Robert
 
PDF
ABC short course: introduction chapters
Christian Robert
 
PDF
Bayesian inference on mixtures
Christian Robert
 
PDF
Convergence of ABC methods
Christian Robert
 
PDF
Approximate Bayesian model choice via random forests
Christian Robert
 
PDF
ABC-Gibbs
Christian Robert
 
ABC short course: final chapters
Christian Robert
 
ABC short course: survey chapter
Christian Robert
 
ABC workshop: 17w5025
Christian Robert
 
ABC short course: introduction chapters
Christian Robert
 
Bayesian inference on mixtures
Christian Robert
 
Convergence of ABC methods
Christian Robert
 
Approximate Bayesian model choice via random forests
Christian Robert
 
ABC-Gibbs
Christian Robert
 

What's hot (20)

PDF
Monte Carlo in Montréal 2017
Christian Robert
 
PDF
ABC-Gibbs
Christian Robert
 
PDF
Inference in generative models using the Wasserstein distance [[INI]
Christian Robert
 
PDF
An overview of Bayesian testing
Christian Robert
 
PDF
CISEA 2019: ABC consistency and convergence
Christian Robert
 
PDF
Multiple estimators for Monte Carlo approximations
Christian Robert
 
PDF
Laplace's Demon: seminar #1
Christian Robert
 
PDF
random forests for ABC model choice and parameter estimation
Christian Robert
 
PDF
asymptotics of ABC
Christian Robert
 
PDF
the ABC of ABC
Christian Robert
 
PDF
better together? statistical learning in models made of modules
Christian Robert
 
PDF
Coordinate sampler : A non-reversible Gibbs-like sampler
Christian Robert
 
PDF
ABC-Gibbs
Christian Robert
 
PDF
Likelihood-free Design: a discussion
Christian Robert
 
PDF
ABC convergence under well- and mis-specified models
Christian Robert
 
PDF
accurate ABC Oliver Ratmann
olli0601
 
PDF
ABC based on Wasserstein distances
Christian Robert
 
PDF
NCE, GANs & VAEs (and maybe BAC)
Christian Robert
 
PDF
Reliable ABC model choice via random forests
Christian Robert
 
PDF
Intractable likelihoods
Christian Robert
 
Monte Carlo in Montréal 2017
Christian Robert
 
ABC-Gibbs
Christian Robert
 
Inference in generative models using the Wasserstein distance [[INI]
Christian Robert
 
An overview of Bayesian testing
Christian Robert
 
CISEA 2019: ABC consistency and convergence
Christian Robert
 
Multiple estimators for Monte Carlo approximations
Christian Robert
 
Laplace's Demon: seminar #1
Christian Robert
 
random forests for ABC model choice and parameter estimation
Christian Robert
 
asymptotics of ABC
Christian Robert
 
the ABC of ABC
Christian Robert
 
better together? statistical learning in models made of modules
Christian Robert
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Christian Robert
 
ABC-Gibbs
Christian Robert
 
Likelihood-free Design: a discussion
Christian Robert
 
ABC convergence under well- and mis-specified models
Christian Robert
 
accurate ABC Oliver Ratmann
olli0601
 
ABC based on Wasserstein distances
Christian Robert
 
NCE, GANs & VAEs (and maybe BAC)
Christian Robert
 
Reliable ABC model choice via random forests
Christian Robert
 
Intractable likelihoods
Christian Robert
 
Ad

Viewers also liked (13)

PDF
RSS discussion of Girolami and Calderhead, October 13, 2010
Christian Robert
 
PDF
ABC and empirical likelihood
Christian Robert
 
PDF
Ratio of uniforms and beyond
Christian Robert
 
PDF
Statistics symposium talk, Harvard University
Christian Robert
 
PDF
folding Markov chains: the origaMCMC
Christian Robert
 
PPTX
Monte carlo
IIBM
 
PDF
Introduction to MCMC methods
Christian Robert
 
PDF
Introducing Monte Carlo Methods with R
Christian Robert
 
PDF
Simulation (AMSI Public Lecture)
Christian Robert
 
PDF
Monte Carlo Statistical Methods
Christian Robert
 
PPT
Monte carlo
shishirkawde
 
PPTX
Monte carlo simulation
MissAnam
 
PDF
Monte carlo simulation
Rajesh Piryani
 
RSS discussion of Girolami and Calderhead, October 13, 2010
Christian Robert
 
ABC and empirical likelihood
Christian Robert
 
Ratio of uniforms and beyond
Christian Robert
 
Statistics symposium talk, Harvard University
Christian Robert
 
folding Markov chains: the origaMCMC
Christian Robert
 
Monte carlo
IIBM
 
Introduction to MCMC methods
Christian Robert
 
Introducing Monte Carlo Methods with R
Christian Robert
 
Simulation (AMSI Public Lecture)
Christian Robert
 
Monte Carlo Statistical Methods
Christian Robert
 
Monte carlo
shishirkawde
 
Monte carlo simulation
MissAnam
 
Monte carlo simulation
Rajesh Piryani
 
Ad

Similar to ABC short course: model choice chapter (20)

PDF
ABC in London, May 5, 2011
Christian Robert
 
PDF
NBBC15, Reyjavik, June 08, 2015
Christian Robert
 
PDF
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
Christian Robert
 
PDF
Workshop in honour of Don Poskitt and Gael Martin
Christian Robert
 
PDF
Asymptotics of ABC, lecture, Collège de France
Christian Robert
 
PDF
Likelihood free computational statistics
Pierre Pudlo
 
PDF
Insufficient Gibbs sampling (A. Luciano, C.P. Robert and R. Ryder)
Christian Robert
 
PDF
Colloquium in honor of Hans Ruedi Künsch
Christian Robert
 
PDF
Edinburgh, Bayes-250
Christian Robert
 
PDF
ABC model choice
Christian Robert
 
PDF
Approximate Bayesian computation for the Ising/Potts model
Matt Moores
 
PDF
BIRS 12w5105 meeting
Christian Robert
 
PDF
MUMS Opening Workshop - Quantifying Nonparametric Modeling Uncertainty with B...
The Statistical and Applied Mathematical Sciences Institute
 
PDF
MUMS: Bayesian, Fiducial, and Frequentist Conference - Inference on Treatment...
The Statistical and Applied Mathematical Sciences Institute
 
PDF
Considerate Approaches to ABC Model Selection
Michael Stumpf
 
PDF
Columbia workshop [ABC model choice]
Christian Robert
 
PDF
Workshop on Bayesian Inference for Latent Gaussian Models with Applications
Christian Robert
 
PDF
Slides: A glance at information-geometric signal processing
Frank Nielsen
 
PDF
MUMS: Bayesian, Fiducial, and Frequentist Conference - Multidimensional Monot...
The Statistical and Applied Mathematical Sciences Institute
 
PDF
ABC with data cloning for MLE in state space models
Umberto Picchini
 
ABC in London, May 5, 2011
Christian Robert
 
NBBC15, Reyjavik, June 08, 2015
Christian Robert
 
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
Christian Robert
 
Workshop in honour of Don Poskitt and Gael Martin
Christian Robert
 
Asymptotics of ABC, lecture, Collège de France
Christian Robert
 
Likelihood free computational statistics
Pierre Pudlo
 
Insufficient Gibbs sampling (A. Luciano, C.P. Robert and R. Ryder)
Christian Robert
 
Colloquium in honor of Hans Ruedi Künsch
Christian Robert
 
Edinburgh, Bayes-250
Christian Robert
 
ABC model choice
Christian Robert
 
Approximate Bayesian computation for the Ising/Potts model
Matt Moores
 
BIRS 12w5105 meeting
Christian Robert
 
MUMS Opening Workshop - Quantifying Nonparametric Modeling Uncertainty with B...
The Statistical and Applied Mathematical Sciences Institute
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Inference on Treatment...
The Statistical and Applied Mathematical Sciences Institute
 
Considerate Approaches to ABC Model Selection
Michael Stumpf
 
Columbia workshop [ABC model choice]
Christian Robert
 
Workshop on Bayesian Inference for Latent Gaussian Models with Applications
Christian Robert
 
Slides: A glance at information-geometric signal processing
Frank Nielsen
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Multidimensional Monot...
The Statistical and Applied Mathematical Sciences Institute
 
ABC with data cloning for MLE in state space models
Umberto Picchini
 

More from Christian Robert (16)

PDF
The future of conferences towards sustainability and inclusivity
Christian Robert
 
PDF
Adaptive Restore algorithm & importance Monte Carlo
Christian Robert
 
PDF
discussion of ICML23.pdf
Christian Robert
 
PDF
How many components in a mixture?
Christian Robert
 
PDF
restore.pdf
Christian Robert
 
PDF
Testing for mixtures at BNP 13
Christian Robert
 
PDF
Inferring the number of components: dream or reality?
Christian Robert
 
PDF
CDT 22 slides.pdf
Christian Robert
 
PDF
Testing for mixtures by seeking components
Christian Robert
 
PDF
discussion on Bayesian restricted likelihood
Christian Robert
 
PDF
eugenics and statistics
Christian Robert
 
PDF
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
Christian Robert
 
PDF
Poster for Bayesian Statistics in the Big Data Era conference
Christian Robert
 
PDF
short course at CIRM, Bayesian Masterclass, October 2018
Christian Robert
 
PDF
ABC with Wasserstein distances
Christian Robert
 
PDF
prior selection for mixture estimation
Christian Robert
 
The future of conferences towards sustainability and inclusivity
Christian Robert
 
Adaptive Restore algorithm & importance Monte Carlo
Christian Robert
 
discussion of ICML23.pdf
Christian Robert
 
How many components in a mixture?
Christian Robert
 
restore.pdf
Christian Robert
 
Testing for mixtures at BNP 13
Christian Robert
 
Inferring the number of components: dream or reality?
Christian Robert
 
CDT 22 slides.pdf
Christian Robert
 
Testing for mixtures by seeking components
Christian Robert
 
discussion on Bayesian restricted likelihood
Christian Robert
 
eugenics and statistics
Christian Robert
 
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
Christian Robert
 
Poster for Bayesian Statistics in the Big Data Era conference
Christian Robert
 
short course at CIRM, Bayesian Masterclass, October 2018
Christian Robert
 
ABC with Wasserstein distances
Christian Robert
 
prior selection for mixture estimation
Christian Robert
 

Recently uploaded (20)

PPTX
Hepatopulmonary syndrome power point presentation
raknasivar1997
 
PPTX
fghvqwhfugqaifbiqufbiquvbfuqvfuqyvfqvfouiqvfq
PERMISONJERWIN
 
PDF
A water-rich interior in the temperate sub-Neptune K2-18 b revealed by JWST
Sérgio Sacani
 
PPTX
The Toxic Effects of Aflatoxin B1 and Aflatoxin M1 on Kidney through Regulati...
OttokomaBonny
 
PPTX
Role of GIS in precision farming.pptx
BikramjitDeuri
 
PDF
Migrating Katalon Studio Tests to Playwright with Model Driven Engineering
ESUG
 
PPT
Grade_9_Science_Atomic_S_t_r_u_cture.ppt
QuintReynoldDoble
 
PPTX
Brain_stem_Medulla oblongata_functions of pons_mid brain
muralinath2
 
PDF
JADESreveals a large population of low mass black holes at high redshift
Sérgio Sacani
 
PPTX
The Obesity Paradox. Friend or Foe ?pptx
drdgd1972
 
PPTX
RED ROT DISEASE OF SUGARCANE.pptx
BikramjitDeuri
 
DOCX
Echoes_of_Andromeda_Partial (1).docx9989
yakshitkrishnia5a3
 
PPTX
Hericium erinaceus, also known as lion's mane mushroom
TinaDadkhah1
 
PPTX
Sleep_pysilogy_types_REM_NREM_duration_Sleep center
muralinath2
 
PDF
The Cosmic Symphony: How Photons Shape the Universe and Our Place Within It
kutatomoshi
 
PDF
Drones in Disaster Response: Real-Time Data Collection and Analysis (www.kiu...
publication11
 
PDF
Evaluating Benchmark Quality: a Mutation-Testing- Based Methodology
ESUG
 
PPTX
Q1_Science 8_Week4-Day 5.pptx science re
AizaRazonado
 
PDF
Renewable Energy Resources (Solar, Wind, Nuclear, Geothermal) Presentation
RimshaNaeem23
 
PPTX
INTRO-TO-CRIM-THEORIES-OF-CRIME-2023 (1).pptx
ChrisFlickIII
 
Hepatopulmonary syndrome power point presentation
raknasivar1997
 
fghvqwhfugqaifbiqufbiquvbfuqvfuqyvfqvfouiqvfq
PERMISONJERWIN
 
A water-rich interior in the temperate sub-Neptune K2-18 b revealed by JWST
Sérgio Sacani
 
The Toxic Effects of Aflatoxin B1 and Aflatoxin M1 on Kidney through Regulati...
OttokomaBonny
 
Role of GIS in precision farming.pptx
BikramjitDeuri
 
Migrating Katalon Studio Tests to Playwright with Model Driven Engineering
ESUG
 
Grade_9_Science_Atomic_S_t_r_u_cture.ppt
QuintReynoldDoble
 
Brain_stem_Medulla oblongata_functions of pons_mid brain
muralinath2
 
JADESreveals a large population of low mass black holes at high redshift
Sérgio Sacani
 
The Obesity Paradox. Friend or Foe ?pptx
drdgd1972
 
RED ROT DISEASE OF SUGARCANE.pptx
BikramjitDeuri
 
Echoes_of_Andromeda_Partial (1).docx9989
yakshitkrishnia5a3
 
Hericium erinaceus, also known as lion's mane mushroom
TinaDadkhah1
 
Sleep_pysilogy_types_REM_NREM_duration_Sleep center
muralinath2
 
The Cosmic Symphony: How Photons Shape the Universe and Our Place Within It
kutatomoshi
 
Drones in Disaster Response: Real-Time Data Collection and Analysis (www.kiu...
publication11
 
Evaluating Benchmark Quality: a Mutation-Testing- Based Methodology
ESUG
 
Q1_Science 8_Week4-Day 5.pptx science re
AizaRazonado
 
Renewable Energy Resources (Solar, Wind, Nuclear, Geothermal) Presentation
RimshaNaeem23
 
INTRO-TO-CRIM-THEORIES-OF-CRIME-2023 (1).pptx
ChrisFlickIII
 

ABC short course: model choice chapter

  • 1. ABC for model choice 1 simulation-based methods in Econometrics 2 Genetics of ABC 3 Approximate Bayesian computation 4 ABC for model choice 5 ABC model choice via random forests 6 ABC estimation via random forests 7 [some] asymptotics of ABC
  • 2. Bayesian model choice Several models M1, M2, . . . are considered simultaneously for a dataset y and the model index M is part of the inference. Use of a prior distribution. π(M = m), plus a prior distribution on the parameter conditional on the value m of the model index, πm(θm) Goal is to derive the posterior distribution of M, challenging computational target when models are complex.
  • 3. Generic ABC for model choice Algorithm 4 Likelihood-free model choice sampler (ABC-MC) for t = 1 to T do repeat Generate m from the prior π(M = m) Generate θm from the prior πm(θm) Generate z from the model fm(z|θm) until ρ{η(z), η(y)} < Set m(t) = m and θ(t) = θm end for
  • 4. ABC estimates Posterior probability π(M = m|y) approximated by the frequency of acceptances from model m 1 T T t=1 Im(t)=m . Issues with implementation: • should tolerances be the same for all models? • should summary statistics vary across models (incl. their dimension)? • should the distance measure ρ vary as well?
  • 5. ABC estimates Posterior probability π(M = m|y) approximated by the frequency of acceptances from model m 1 T T t=1 Im(t)=m . Extension to a weighted polychotomous logistic regression estimate of π(M = m|y), with non-parametric kernel weights [Cornuet et al., DIYABC, 2009]
  • 6. The Great ABC controversy On-going controvery in phylogeographic genetics about the validity of using ABC for testing Against: Templeton, 2008, 2009, 2010a, 2010b, 2010c argues that nested hypotheses cannot have higher probabilities than nesting hypotheses (!)
  • 7. The Great ABC controversy On-going controvery in phylogeographic genetics about the validity of using ABC for testing Against: Templeton, 2008, 2009, 2010a, 2010b, 2010c argues that nested hypotheses cannot have higher probabilities than nesting hypotheses (!) Replies: Fagundes et al., 2008, Beaumont et al., 2010, Berger et al., 2010, Csill`ery et al., 2010 point out that the criticisms are addressed at [Bayesian] model-based inference and have nothing to do with ABC...
  • 8. Gibbs random fields Gibbs distribution The rv y = (y1, . . . , yn) is a Gibbs random field associated with the graph G if f (y) = 1 Z exp − c∈C Vc(yc) , where Z is the normalising constant, C is the set of cliques of G and Vc is any function also called potential sufficient statistic U(y) = c∈C Vc(yc) is the energy function
  • 9. Gibbs random fields Gibbs distribution The rv y = (y1, . . . , yn) is a Gibbs random field associated with the graph G if f (y) = 1 Z exp − c∈C Vc(yc) , where Z is the normalising constant, C is the set of cliques of G and Vc is any function also called potential sufficient statistic U(y) = c∈C Vc(yc) is the energy function c Z is usually unavailable in closed form
  • 10. Potts model Potts model Vc(y) is of the form Vc(y) = θS(y) = θ l∼i δyl =yi where l∼i denotes a neighbourhood structure
  • 11. Potts model Potts model Vc(y) is of the form Vc(y) = θS(y) = θ l∼i δyl =yi where l∼i denotes a neighbourhood structure In most realistic settings, summation Zθ = x∈X exp{θT S(x)} involves too many terms to be manageable and numerical approximations cannot always be trusted [Cucala, Marin, CPR & Titterington, 2009]
  • 12. Bayesian Model Choice Comparing a model with potential S0 taking values in Rp0 versus a model with potential S1 taking values in Rp1 can be done through the Bayes factor corresponding to the priors π0 and π1 on each parameter space Bm0/m1 (x) = exp{θT 0 S0(x)}/Zθ0,0π0(dθ0) exp{θT 1 S1(x)}/Zθ1,1π1(dθ1)
  • 13. Bayesian Model Choice Comparing a model with potential S0 taking values in Rp0 versus a model with potential S1 taking values in Rp1 can be done through the Bayes factor corresponding to the priors π0 and π1 on each parameter space Bm0/m1 (x) = exp{θT 0 S0(x)}/Zθ0,0π0(dθ0) exp{θT 1 S1(x)}/Zθ1,1π1(dθ1) Use of Jeffreys’ scale to select most appropriate model
  • 14. Neighbourhood relations Choice to be made between M neighbourhood relations i m ∼ i (0 ≤ m ≤ M − 1) with Sm(x) = i m ∼i I{xi =xi } driven by the posterior probabilities of the models.
  • 15. Model index Formalisation via a model index M that appears as a new parameter with prior distribution π(M = m) and π(θ|M = m) = πm(θm)
  • 16. Model index Formalisation via a model index M that appears as a new parameter with prior distribution π(M = m) and π(θ|M = m) = πm(θm) Computational target: P(M = m|x) ∝ Θm fm(x|θm)πm(θm) dθm π(M = m) ,
  • 17. Sufficient statistics By definition, if S(x) sufficient statistic for the joint parameters (M, θ0, . . . , θM−1), P(M = m|x) = P(M = m|S(x)) .
  • 18. Sufficient statistics By definition, if S(x) sufficient statistic for the joint parameters (M, θ0, . . . , θM−1), P(M = m|x) = P(M = m|S(x)) . For each model m, own sufficient statistic Sm(·) and S(·) = (S0(·), . . . , SM−1(·)) also sufficient.
  • 19. Sufficient statistics in Gibbs random fields For Gibbs random fields, x|M = m ∼ fm(x|θm) = f 1 m(x|S(x))f 2 m(S(x)|θm) = 1 n(S(x)) f 2 m(S(x)|θm) where n(S(x)) = {˜x ∈ X : S(˜x) = S(x)} c S(x) is therefore also sufficient for the joint parameters [Specific to Gibbs random fields!]
  • 20. ABC model choice Algorithm ABC-MC • Generate m∗ from the prior π(M = m). • Generate θ∗ m∗ from the prior πm∗ (·). • Generate x∗ from the model fm∗ (·|θ∗ m∗ ). • Compute the distance ρ(S(x0), S(x∗)). • Accept (θ∗ m∗ , m∗) if ρ(S(x0), S(x∗)) < . Note When = 0 the algorithm is exact
  • 21. ABC approximation to the Bayes factor Frequency ratio: BFm0/m1 (x0 ) = ˆP(M = m0|x0) ˆP(M = m1|x0) × π(M = m1) π(M = m0) = {mi∗ = m0} {mi∗ = m1} × π(M = m1) π(M = m0) ,
  • 22. ABC approximation to the Bayes factor Frequency ratio: BFm0/m1 (x0 ) = ˆP(M = m0|x0) ˆP(M = m1|x0) × π(M = m1) π(M = m0) = {mi∗ = m0} {mi∗ = m1} × π(M = m1) π(M = m0) , replaced with BFm0/m1 (x0 ) = 1 + {mi∗ = m0} 1 + {mi∗ = m1} × π(M = m1) π(M = m0) to avoid indeterminacy (also Bayes estimate).
  • 23. Toy example iid Bernoulli model versus two-state first-order Markov chain, i.e. f0(x|θ0) = exp θ0 n i=1 I{xi =1} {1 + exp(θ0)}n , versus f1(x|θ1) = 1 2 exp θ1 n i=2 I{xi =xi−1} {1 + exp(θ1)}n−1 , with priors θ0 ∼ U(−5, 5) and θ1 ∼ U(0, 6) (inspired by “phase transition” boundaries).
  • 24. Toy example (2) −40 −20 0 10 −505 BF01 BF ^ 01 −40 −20 0 10 −10−50510 BF01 BF ^ 01 (left) Comparison of the true BFm0/m1 (x0) with BFm0/m1 (x0) (in logs) over 2, 000 simulations and 4.106 proposals from the prior. (right) Same when using tolerance corresponding to the 1% quantile on the distances.
  • 25. Back to sufficiency ‘Sufficient statistics for individual models are unlikely to be very informative for the model probability.’ [Scott Sisson, Jan. 31, 2011, X.’Og]
  • 26. Back to sufficiency ‘Sufficient statistics for individual models are unlikely to be very informative for the model probability.’ [Scott Sisson, Jan. 31, 2011, X.’Og] If η1(x) sufficient statistic for model m = 1 and parameter θ1 and η2(x) sufficient statistic for model m = 2 and parameter θ2, (η1(x), η2(x)) is not always sufficient for (m, θm)
  • 27. Back to sufficiency ‘Sufficient statistics for individual models are unlikely to be very informative for the model probability.’ [Scott Sisson, Jan. 31, 2011, X.’Og] If η1(x) sufficient statistic for model m = 1 and parameter θ1 and η2(x) sufficient statistic for model m = 2 and parameter θ2, (η1(x), η2(x)) is not always sufficient for (m, θm) c Potential loss of information at the testing level
  • 28. Limiting behaviour of B12 (T → ∞) ABC approximation B12(y) = T t=1 Imt =1 Iρ{η(zt ),η(y)}≤ T t=1 Imt =2 Iρ{η(zt ),η(y)}≤ , where the (mt, zt)’s are simulated from the (joint) prior
  • 29. Limiting behaviour of B12 (T → ∞) ABC approximation B12(y) = T t=1 Imt =1 Iρ{η(zt ),η(y)}≤ T t=1 Imt =2 Iρ{η(zt ),η(y)}≤ , where the (mt, zt)’s are simulated from the (joint) prior As T go to infinity, limit B12(y) = Iρ{η(z),η(y)}≤ π1(θ1)f1(z|θ1) dz dθ1 Iρ{η(z),η(y)}≤ π2(θ2)f2(z|θ2) dz dθ2 = Iρ{η,η(y)}≤ π1(θ1)f η 1 (η|θ1) dη dθ1 Iρ{η,η(y)}≤ π2(θ2)f η 2 (η|θ2) dη dθ2 , where f η 1 (η|θ1) and f η 2 (η|θ2) distributions of η(z)
  • 30. Limiting behaviour of B12 ( → 0) When goes to zero, Bη 12(y) = π1(θ1)f η 1 (η(y)|θ1) dθ1 π2(θ2)f η 2 (η(y)|θ2) dθ2 ,
  • 31. Limiting behaviour of B12 ( → 0) When goes to zero, Bη 12(y) = π1(θ1)f η 1 (η(y)|θ1) dθ1 π2(θ2)f η 2 (η(y)|θ2) dθ2 , c Bayes factor based on the sole observation of η(y)
  • 32. Limiting behaviour of B12 (under sufficiency) If η(y) sufficient statistic for both models, fi (y|θi ) = gi (y)f η i (η(y)|θi ) Thus B12(y) = Θ1 π(θ1)g1(y)f η 1 (η(y)|θ1) dθ1 Θ2 π(θ2)g2(y)f η 2 (η(y)|θ2) dθ2 = g1(y) π1(θ1)f η 1 (η(y)|θ1) dθ1 g2(y) π2(θ2)f η 2 (η(y)|θ2) dθ2 = g1(y) g2(y) Bη 12(y) . [Didelot, Everitt, Johansen & Lawson, 2011]
  • 33. Limiting behaviour of B12 (under sufficiency) If η(y) sufficient statistic for both models, fi (y|θi ) = gi (y)f η i (η(y)|θi ) Thus B12(y) = Θ1 π(θ1)g1(y)f η 1 (η(y)|θ1) dθ1 Θ2 π(θ2)g2(y)f η 2 (η(y)|θ2) dθ2 = g1(y) π1(θ1)f η 1 (η(y)|θ1) dθ1 g2(y) π2(θ2)f η 2 (η(y)|θ2) dθ2 = g1(y) g2(y) Bη 12(y) . [Didelot, Everitt, Johansen & Lawson, 2011] c No discrepancy only when cross-model sufficiency
  • 34. Poisson/geometric example Sample x = (x1, . . . , xn) from either a Poisson P(λ) or from a geometric G(p) Then S = n i=1 yi = η(x) sufficient statistic for either model but not simultaneously Discrepancy ratio g1(x) g2(x) = S!n−S / i yi ! 1 n+S−1 S
  • 35. Poisson/geometric discrepancy Range of B12(x) versus Bη 12(x) B12(x): The values produced have nothing in common.
  • 36. Formal recovery Creating an encompassing exponential family f (x|θ1, θ2, α1, α2) ∝ exp{θT 1 η1(x) + θT 1 η1(x) + α1t1(x) + α2t2(x)} leads to a sufficient statistic (η1(x), η2(x), t1(x), t2(x)) [Didelot, Everitt, Johansen & Lawson, 2011]
  • 37. Formal recovery Creating an encompassing exponential family f (x|θ1, θ2, α1, α2) ∝ exp{θT 1 η1(x) + θT 1 η1(x) + α1t1(x) + α2t2(x)} leads to a sufficient statistic (η1(x), η2(x), t1(x), t2(x)) [Didelot, Everitt, Johansen & Lawson, 2011] In the Poisson/geometric case, if i xi ! is added to S, no discrepancy
  • 38. Formal recovery Creating an encompassing exponential family f (x|θ1, θ2, α1, α2) ∝ exp{θT 1 η1(x) + θT 1 η1(x) + α1t1(x) + α2t2(x)} leads to a sufficient statistic (η1(x), η2(x), t1(x), t2(x)) [Didelot, Everitt, Johansen & Lawson, 2011] Only applies in genuine sufficiency settings... c Inability to evaluate loss brought by summary statistics
  • 39. Meaning of the ABC-Bayes factor ‘This is also why focus on model discrimination typically (...) proceeds by (...) accepting that the Bayes Factor that one obtains is only derived from the summary statistics and may in no way correspond to that of the full model.’ [Scott Sisson, Jan. 31, 2011, X.’Og]
  • 40. Meaning of the ABC-Bayes factor ‘This is also why focus on model discrimination typically (...) proceeds by (...) accepting that the Bayes Factor that one obtains is only derived from the summary statistics and may in no way correspond to that of the full model.’ [Scott Sisson, Jan. 31, 2011, X.’Og] In the Poisson/geometric case, if E[yi ] = θ0 > 0, lim n→∞ Bη 12(y) = (θ0 + 1)2 θ0 e−θ0
  • 41. MA(q) divergence 1 2 0.00.20.40.60.81.0 1 2 0.00.20.40.60.81.0 1 2 0.00.20.40.60.81.0 1 2 0.00.20.40.60.81.0 Evolution [against ] of ABC Bayes factor, in terms of frequencies of visits to models MA(1) (left) and MA(2) (right) when equal to 10, 1, .1, .01% quantiles on insufficient autocovariance distances. Sample of 50 points from a MA(2) with θ1 = 0.6, θ2 = 0.2. True Bayes factor equal to 17.71.
  • 42. MA(q) divergence 1 2 0.00.20.40.60.81.0 1 2 0.00.20.40.60.81.0 1 2 0.00.20.40.60.81.0 1 2 0.00.20.40.60.81.0 Evolution [against ] of ABC Bayes factor, in terms of frequencies of visits to models MA(1) (left) and MA(2) (right) when equal to 10, 1, .1, .01% quantiles on insufficient autocovariance distances. Sample of 50 points from a MA(1) model with θ1 = 0.6. True Bayes factor B21 equal to .004.
  • 43. Further comments ‘There should be the possibility that for the same model, but different (non-minimal) [summary] statistics (so different η’s: η1 and η∗ 1) the ratio of evidences may no longer be equal to one.’ [Michael Stumpf, Jan. 28, 2011, ’Og] Using different summary statistics [on different models] may indicate the loss of information brought by each set but agreement does not lead to trustworthy approximations.
  • 44. A stylised problem Central question to the validation of ABC for model choice: When is a Bayes factor based on an insufficient statistic T(y) consistent?
  • 45. A stylised problem Central question to the validation of ABC for model choice: When is a Bayes factor based on an insufficient statistic T(y) consistent? Note/warnin: c drawn on T(y) through BT 12(y) necessarily differs from c drawn on y through B12(y) [Marin, Pillai, X, & Rousseau, JRSS B, 2013]
  • 46. A benchmark if toy example Comparison suggested by referee of PNAS paper [thanks!]: [X, Cornuet, Marin, & Pillai, Aug. 2011] Model M1: y ∼ N(θ1, 1) opposed to model M2: y ∼ L(θ2, 1/ √ 2), Laplace distribution with mean θ2 and scale parameter 1/ √ 2 (variance one). Four possible statistics 1 sample mean y (sufficient for M1 if not M2); 2 sample median med(y) (insufficient); 3 sample variance var(y) (ancillary); 4 median absolute deviation mad(y) = med(|y − med(y)|);
  • 47. A benchmark if toy example Comparison suggested by referee of PNAS paper [thanks!]: [X, Cornuet, Marin, & Pillai, Aug. 2011] Model M1: y ∼ N(θ1, 1) opposed to model M2: y ∼ L(θ2, 1/ √ 2), Laplace distribution with mean θ2 and scale parameter 1/ √ 2 (variance one). q q q q q q q q q q q Gauss Laplace 0.00.10.20.30.40.50.60.7 n=100 q q q q q q q q q q q q q q q q q q Gauss Laplace 0.00.20.40.60.81.0 n=100
  • 48. Framework Starting from sample y = (y1, . . . , yn) the observed sample, not necessarily iid with true distribution y ∼ Pn Summary statistics T(y) = Tn = (T1(y), T2(y), · · · , Td (y)) ∈ Rd with true distribution Tn ∼ Gn.
  • 49. Framework c Comparison of – under M1, y ∼ F1,n(·|θ1) where θ1 ∈ Θ1 ⊂ Rp1 – under M2, y ∼ F2,n(·|θ2) where θ2 ∈ Θ2 ⊂ Rp2 turned into – under M1, T(y) ∼ G1,n(·|θ1), and θ1|T(y) ∼ π1(·|Tn ) – under M2, T(y) ∼ G2,n(·|θ2), and θ2|T(y) ∼ π2(·|Tn )
  • 50. Assumptions A collection of asymptotic “standard” assumptions: [A1] is a standard central limit theorem under the true model with asymptotic mean µ0 [A2] controls the large deviations of the estimator Tn from the model mean µ(θ) [A3] is the standard prior mass condition found in Bayesian asymptotics (di effective dimension of the parameter) [A4] restricts the behaviour of the model density against the true density [Think CLT!]
  • 51. Asymptotic marginals Asymptotically, under [A1]–[A4] mi (t) = Θi gi (t|θi ) πi (θi ) dθi is such that (i) if inf{|µi (θi ) − µ0|; θi ∈ Θi } = 0, Cl vd−di n ≤ mi (Tn ) ≤ Cuvd−di n and (ii) if inf{|µi (θi ) − µ0|; θi ∈ Θi } > 0 mi (Tn ) = oPn [vd−τi n + vd−αi n ].
  • 52. Between-model consistency Consequence of above is that asymptotic behaviour of the Bayes factor is driven by the asymptotic mean value µ(θ) of Tn under both models. And only by this mean value!
  • 53. Between-model consistency Consequence of above is that asymptotic behaviour of the Bayes factor is driven by the asymptotic mean value µ(θ) of Tn under both models. And only by this mean value! Indeed, if inf{|µ0 − µ2(θ2)|; θ2 ∈ Θ2} = inf{|µ0 − µ1(θ1)|; θ1 ∈ Θ1} = 0 then Cl v −(d1−d2) n ≤ m1(Tn ) m2(Tn ) ≤ Cuv −(d1−d2) n , where Cl , Cu = OPn (1), irrespective of the true model. c Only depends on the difference d1 − d2: no consistency
  • 54. Between-model consistency Consequence of above is that asymptotic behaviour of the Bayes factor is driven by the asymptotic mean value µ(θ) of Tn under both models. And only by this mean value! Else, if inf{|µ0 − µ2(θ2)|; θ2 ∈ Θ2} > inf{|µ0 − µ1(θ1)|; θ1 ∈ Θ1} = 0 then m1(Tn ) m2(Tn ) ≥ Cu min v −(d1−α2) n , v −(d1−τ2) n
  • 55. Checking for adequate statistics Run a practical check of the relevance (or non-relevance) of Tn null hypothesis that both models are compatible with the statistic Tn H0 : inf{|µ2(θ2) − µ0|; θ2 ∈ Θ2} = 0 against H1 : inf{|µ2(θ2) − µ0|; θ2 ∈ Θ2} > 0 testing procedure provides estimates of mean of Tn under each model and checks for equality
  • 56. Checking in practice • Under each model Mi , generate ABC sample θi,l , l = 1, · · · , L • For each θi,l , generate yi,l ∼ Fi,n(·|ψi,l ), derive Tn (yi,l ) and compute ˆµi = 1 L L l=1 Tn (yi,l ), i = 1, 2 . • Conditionally on Tn (y), √ L { ˆµi − Eπ [µi (θi )|Tn (y)]} N(0, Vi ), • Test for a common mean H0 : ˆµ1 ∼ N(µ0, V1) , ˆµ2 ∼ N(µ0, V2) against the alternative of different means H1 : ˆµi ∼ N(µi , Vi ), with µ1 = µ2 .
  • 57. Toy example: Laplace versus Gauss qqqqqqqqqqqqqqq qqqqqqqqqq q qq q q Gauss Laplace Gauss Laplace 010203040 Normalised χ2 without and with mad