Bayesian Inference and Uncertainty Quantification for Inverse Problems

Bayesian Inference and Uncertainty Quantification for Inverse Problems
Matt Moores
@MooresMt
https://uow.edu.au/~mmoores/
Centre for Environmental Informatics (CEI), University of Wollongong, NSW, Australia
TIDE Hub Seminar Series
joint with Matthew Berry, Mark Nelson, Brian Monaghan and Raymond Longbottom
1 / 19

Introduction
Physical model F

t; ~
θ

Unknown parameters ~
θ = {θ1, . . . , θk}
Initial conditions F0 and derivatives ∂F
∂t , etc.
2 / 19

Introduction
Physical model F

t; ~
θ

θ = {θ1, . . . , θk}
Initial conditions F0 and derivatives ∂F
∂t , etc.
Observed data yt at times t = 1, . . . , T
yt = F

t; ~
θ

+ εt
where εt is random noise.
2 / 19

Example
Growth curve:
dF
dt
= −βγt
log γ
3 / 19

Example
Growth curve:
dF
dt
= −βγt
log γ
Solution:
F

ti; ~
θ

= α − βγti
at time points t1, . . . , tn with initial condition F0 = α − β.
3 / 19

Example
Growth curve:
dF
dt
= −βγt
log γ
Solution:
F

ti; ~
θ

= α − βγti
at time points t1, . . . , tn with initial condition F0 = α − β.
Noisy observations:
yi = F(ti; α, β, γ) + εi
where εi ∼ N(0, σ2
) is additive Gaussian noise.
θ = {α, β, γ, σ2
}
3 / 19

Observed Data for n = 27 dugongs
4 / 19

Nonlinear Least Squares
nlm - nls(y ~ alpha - beta * gammˆt, data=dat,
start=list(alpha=1, beta=1, gamm=0.9))
summary(nlm)
##
## Formula: y ~ alpha - beta * gamm^t
##
## Parameters:
## Estimate Std. Error t value Pr(|t|)
## alpha 2.65807 0.06151 43.21 2e-16 ***
## beta 0.96352 0.06968 13.83 6.3e-13 ***
## gamm 0.87146 0.02460 35.42 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.09525 on 24 degrees of freedom
5 / 19

Drawbacks
No guarantee of convexity:
will only find the nearest local optimum
results are completely dependent on initialization
6 / 19

Drawbacks
No guarantee of convexity:
will only find the nearest local optimum
results are completely dependent on initialization
Standard errors are underestimated:
95% profile CI for α and β only achieve ~83% coverage
6 / 19

Inverse Problem
Find parameters ~
θ consistent with ~
y
7 / 19

Inverse Problem
Find parameters ~
y
Physical model F : Θ → Y doesn’t need to be invertible:
although lack of identifiability for ~
θ can create problems
we don’t need F in closed form — can use numeric solver for DE
7 / 19

Inverse Problem
Find parameters ~
y
Physical model F : Θ → Y doesn’t need to be invertible:
although lack of identifiability for ~
θ can create problems
we don’t need F in closed form — can use numeric solver for DE
Likelihood L(~
y | ~
θ) is based on
E[Yi | ~
θ] = F

ti; ~
θ

as well as distribution of random noise (not necessarily Gaussian)
7 / 19

Bayesian Inference
We assign prior distributions to the parameters ~
θ:
π(log{α}) ∼ N(0, 100)
π(log{β}) ∼ N(0, 100)
π(logit{γ}) ∼ N(0, 100)
π

1
σ2

∼ Ga
ν0
2
,
2
ν0s2
0
!
8 / 19
Astfalck, Cripps, Gosling, Hodkiewicz Milne (2018) Ocean Engineering 161: 268–276.

Bayesian Inference
We assign prior distributions to the parameters ~
θ:
π(log{α}) ∼ N(0, 100)
π(log{β}) ∼ N(0, 100)
π(logit{γ}) ∼ N(0, 100)
π

1
σ2

∼ Ga
ν0
2
,
2
ν0s2
0
!
Then the joint posterior distribution is:
π

~
θ | ~
y

=
L(~
y | ~
θ)π(~
θ)
π(~
y)
where the normalising constant π(~
y) =
R ∞
0
R ∞
0
R ∞
0
R 1
0 L(~
y | α, β, γ, σ) dπ(γ)dπ(β)dπ(α)dπ(σ)
is intractable.
8 / 19
Astfalck, Cripps, Gosling, Hodkiewicz Milne (2018) Ocean Engineering 161: 268–276.

Markov Chain Monte Carlo
Algorithm 1 Random-walk Metropolis sampler for π

~
θ | ~
y

1: Initialize α0 ∼ π(log{α}), β0 ∼ π(log{β}), γ0 ∼ π(logit{γ}), σ0 ∼ π

1
σ2

2: Solve ~
µ0 = ~
F

~
t; α0, β0, γ0

and calculate `0 = log

L(~
y | ~
µ0, σ2
0)

3: for iterations j = 1, . . . , J do
4: Propose log{α∗} ∼ N(log{αj−1}, σ2
α), log{β∗} ∼ N(log{βj−1}, σ2
β), logit{γ∗} ∼
N(logit{γj−1}, σ2
γ)
5: Solve ~
µ∗ = ~
F

~
t; α∗, β∗, γ∗

6: Propose σ∗ ∼ π

1
σ2 | ~
µ∗,~
y

and calculate `∗ = log

L(~
y | ~
µ∗, σ2
∗)

7: Calculate
ρt =
exp{`∗}π(~
θ∗)q(~
θj−1 | ~
θ∗)
exp{`j−1}π(~
θj−1)q(~
θ∗ | ~
θj−1)
8: Accept ~
θj = ~
θ∗ with probability min{ρt, 1} else ~
θj = ~
θj−1
9: end for 9 / 19

Posterior Distributions
11 / 19

95% Predictive Interval
12 / 19

Thermogravimetric Analysis
The TGA model for a single reaction involves the Arrhenius equations:
dM
dt
= −MA exp

−
E
RT

dT
dt
= α
where M is mass fraction, T is temperature, R is the ideal gas constant. The initial mass M0,
the initial temperature T0, and the heating rate α are experimentally controlled.
The unknown parameters ~
θ are
A pre-exponential factor
E activation energy
σ2 variance of the additive Gaussian noise
13 / 19

Reparameterisation
We have good prior information for E (physically interpretable) and σ2 (measurement noise),
but not for A.
The distributions for A and E are also very highly correlated, slowing the mixing of MCMC.
14 / 19

Reparameterisation
We have good prior information for E (physically interpretable) and σ2 (measurement noise),
but not for A.
The distributions for A and E are also very highly correlated, slowing the mixing of MCMC.
Instead, we reparameterise the model in terms of the temperature, Tm, at which the rate dM
dt is
maximised:
A exp

−
E
RTm

=
E
RT2
m
α
In our MCMC algorithm, we first propose q(E∗ | Ej−1), then propose q(Tm∗ | Tm,j−1). The
value of A∗ can then be obtained from the equation above.
We solve ~
F

~
t; E∗, A∗

numerically using a Runge-Kutta method.
14 / 19

Posterior Distributions
634.6 634.7 634.8 634.9 635.0 635.1 635.2 635.3
Tm
0
1
2
3
4
5
84750 85000 85250 85500 85750 86000 86250 86500 86750
E
0.0000
0.0002
0.0004
0.0006
0.0008
0.0010
0.0012
0.0014
0.0016
0.08 0.10 0.12 0.14
0
10
20
30
40
50
2400000 2600000 2800000 3000000 3200000 3400000 3600000
A
0.0000000
0.0000005
0.0000010
0.0000015
0.0000020
0.0000025
15 / 19

Posterior for Functions of ~
θ
A function of a random variable is also a random variable.
Now that we have random samples from our posterior π(A, E, Tm, σ2 | ~
y), we can use our
model to obtain predictions for any measurable function of the parameters, g(A, E, Tm).
16 / 19

Posterior for Functions of ~
θ
A function of a random variable is also a random variable.
Now that we have random samples from our posterior π(A, E, Tm, σ2 | ~
y), we can use our
model to obtain predictions for any measurable function of the parameters, g(A, E, Tm).
For example, the critical length of a stockpile:
Lcr = g(A, E, Tm) = K
v
u
u
texp
n
E
RT2
m
o
A
RT2
m
E
E[g(A, E, Tm) | ~
y] =
Z 1000
0
Z ∞
0
Z ∞
0
g(A, E, Tm) dπ(A, E, Tm | ~
y)
≈
J
X
j=1
g(Aj, Ej, Tj)
where K is a constant and {Aj, Ej, Tj}J
j=1 are the MCMC samples (after discarding burn-in).
16 / 19

Posterior for Lcr
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
17 / 19

Advanced Methods
Parallel and distributed computation
Sequential updating of π(~
θ | ~
y) and streaming inference
Spatio-temporal modelling of F(x, y, z, t; ~
θ)
Emulation of the forward map Θ → Y
(e.g. using artificial neural networks)
Approximating the likelihood L (e.g. ABC)
Accounting for multi-modality
(e.g. ill-posed problems)
Accounting for error in the numerical solver (probabilistic numerics)
Accounting for model misspecification (discrepancy function)
Multi-level Monte Carlo methods
Hamiltonian Monte Carlo 18 / 19

Further Reading
R. J. Longbottom, B. J. Monaghan, D. J. Pinson, N. A. S. Webster S. J. Chew
In situ Phase Analysis during Self-sintering of BOS Filter Cake for Improved Recycling.
ISIJ International 60(11): 2436–2445, 2020.
A. M. Stuart
Inverse problems: A Bayesian perspective.
Acta Numerica 19: 451—559, 2010.
L.C. Astfalck, E.J. Cripps, J.P. Gosling, M.R. Hodkiewicz I.A. Milne
Expert elicitation of directional metocean parameters. Ocean Engineering 161: 268–276, 2018.
B. P. Carlin A. E. Gelfand
An iterative Monte Carlo method for nonconjugate Bayesian analysis.
Statistics Computing 1(2): 119–128, 1991.
19 / 19

Bayesian Inference and Uncertainty Quantification for Inverse Problems

More Related Content

What's hot (18)

Similar to Bayesian Inference and Uncertainty Quantification for Inverse Problems (20)

More from Matt Moores (13)

Recently uploaded (20)

Bayesian Inference and Uncertainty Quantification for Inverse Problems