A walk through the intersection between machine learning and mechanistic modeling

A walk through the intersection between
machine learning and mechanistic modeling
Data Science Seminar, EURECOM, France. May 3, 2018
Juan Pablo Carbajal juanpablo.carbajal@eawag.ch
Eawag: Swiss Federal Institute of Aquatic Science and Technology
This work is licensed under a Creative Commons “Attribution-
ShareAlike 4.0 International” license.

Background
JPi Carbajal Speaker’s profile 1

Machine Learning and Mechanistic modeling
Machine
Learning
Emulation
Mechanistic
modeling
Machine Learning
• Discover patterns from
data (prediction)
• Minimal processes
knowledge
Emulation
• Discover "useful"
approximation of
numerical model from
data
• Partial/Fuzzy
(sub)processes knowledge
Mechanistic modeling
• Interpret data/model
using model/data
• Detailed (sub)processes
knowledge
JPi Carbajal Speaker’s profile 2

Environemental science and engineering
Model calibration, Fast (extreme) event prediction, & Design
source: https://www.dailyrecord.co.uk/news/scottish-news/storm-frank-rescuers-brave-storm-7096067
JPi Carbajal More motivation 3

Engineered complex systems
Optimization, Control, & Maintenance
source: Leaflet, CC BY-SA 3.0, commons. wikimedia. org/ w/ index. php? curid= 5704247

Data fusion/interpretation
source: Azzimonti et al. 10.1080/01621459.2014.946036

What is emulation?
How is emulation even possible?

Alien researcher
source: Svjo, CC BY-SA 3.0, commons. wikimedia. org/
w/ index. php? curid= 25795521
JPi Carbajal What is emulation? 7

From molecules to fluids

From molecules to fluids
source: doi:10.1088/0965-0393/12/6/R01

Emulation
many names & similarities ...
Surrogate
modeling
Metamodeling
Coarse
graining
Model order
reduction
Inter-
polation
Phenom.
modeling
Extra-
polation
Response
surface
Empiric.
modeling

Partial Differential Equations (PDE)
• Fluid dynamics
• Convection-Diffusion-Reaction systems
• Heat transfer
• Multi-physics, e.g. fluid-structure
interaction
• . . .
source: Wikipedia
JPi Carbajal Types of models 11

Ordinary Differential Equations (ODE)
• Biochemical reactions
• Interconnected tanks
• Population dynamics
• Mechanics
(Lagrangean/Hamiltonian)
• . . .

Algebraic equations
• Equilibrium solutions of
PDE/ODE
• Conservation laws (e.g. mass
(Stoichiometry), energy, . . .)
• Empirical formulas
• Data-driven models
• . . .
source: doi:10.1039/C4CY00409D

Agent based systems
Description of individual (groups of) agents and their interaction
• Sociology
• Biology
• Epidemiology
• Network science
• Molecular dynamics
• . . .
source: www. youtube. com/ watch? v= GUkjC-69vaw

Cellular automata
agents in a graph, discrete states
source: www. youtube. com/ watch? v= -FaqC4h5Ftg
• Agents correspond to nodes in a grid (graph)
• The agents contain one or more discrete internal states
• Each agent interacts with set of nodes: neighborhood
• Rule-set to change states according to neighborhood

source: www. youtube. com/ watch? v= -FaqC4h5Ftg

Gas lattices (Boltzmann)
"continuous" state cellular automata
source: www. coursera. org/ learn/
modeling-simulation-natural-processes
• Probabilistic: collisions
redistribute pdf
• observables = moments of pdf
source: www. youtube. com/ watch? v= p67-Qiad5zc
• Incompresible Navier-Stokes
• Natural multiphysics
(fluid-structure)

source: www. youtube. com/ watch? v= LGcy5OyYHmI

Combination of models
Dimensional heterogeneous models
• Some details are less relevant
• "Simplify" those
• 1D phenomenological
• 1D Emulator of detailed
model
• Reduced model
• Coupling is challenging

Intra-, extra-, inter-polation
interpolation 6= smoothing/approximation
intrapolation 6= extrapolation
JPi Carbajal Terminology 21

Regression
The Regression problem
Dataset:

x, y
Hypotheses set: H
ˆ
f = arg min
f∈H
Loss(f[x], y; θ)
H
ˆ
f
JPi Carbajal Regularization and Constraints 22

Regression
The Constrained Regression problem
Dataset:

x, y
Hypotheses set: H
(nonlinear) Operator: D
ˆ
f = arg min
f∈H
Loss(f[x], y; θ)
s. t.
D(f; µ) = 0
g(f; µ) = 0
h(f; µ) ≤ 0
H
h
g
D
ˆ
f

Regression
The Regularized Regression problem
Dataset:

x, y
Hypotheses set: H
ˆ
f = arg min
f∈H
Loss(f[x], y; θ)
+ κDkD(f; µ)k
+ κgkg(f; µ)k
+ κhkm(h(f; µ))k
H
h
g
D
ˆ
f

Regression
The Regularized and Constrained Regression problem
Dataset:

x, y
Hypotheses set: H
ˆ
f = arg min
f∈H
Loss(f[x], y; θ)
+ κDkD(f; µ)k
s. t.
g(f; µ) = 0
h(f; µ) ≤ 0
H
h
g
D
ˆ
f

Regularized Regression
Linear Differential Operators
Gaussian Processes
Kalman Filters

Regularized Regression (RR)
with quadratic (convex) Loss function
Given design data set with input-output values

ti, yi = (T, y(T)), find f(t) that approximates the data and
provides good predictions for unseen t:
min
f
N
X
j=1
(yj − f(tj))2
+ κ||Rf||2
f(t) should be close to the
design data, but it should be
regular according to R.
H
R
ˆ
f
JPi Carbajal 24

Regularized Regression (RR)
with quadratic (convex) Loss function
N-degree polynomial regression on N points with
kRfk2
=
Z
d2f
dt2
!2
JPi Carbajal 24

Regularization operator1
min
f
N
X
j=1
(yj − f(tj))2
+ κ||Rf||2
J [f] =
N
X
j=1

yj −
Z
f(t)δ(t − tj)dt
2
+ κhRf(t), Rf(t)i
Searching for the critical point of that functional leads to
R†
Rf(t) =
N
X
j=1
yj − fj
κ
δ(t − tj)
R†
RG t, t0
= δ(t − t0
),
1
Tomaso Poggio and F. Girosi (1990). “Networks for approximation and learning.” In: Proceedings of the IEEE
78.9, pp. 1481–1497. issn: 00189219. doi: 10.1109/5.58326.
JPi Carbajal 25

Regularization operator1
min
f
N
X
j=1
(yj − f(tj))2
+ κ||Rf||2
Solution
f(t) = G (t, T) G (T, T) + κI
−1
y(T) − n (T)

+ n (t) .
Where G (t, t0) is the Green’s function of the operator R†R. If R
has a Green’s function g(t, t0) ( and R†, g†(t, t0)), then
G t, t0
=
Z
g(t, u)g†
(u, t0
)du
1
Tomaso Poggio and F. Girosi (1990). “Networks for approximation and learning.” In: Proceedings of the IEEE
78.9, pp. 1481–1497. issn: 00189219. doi: 10.1109/5.58326.
JPi Carbajal 25

Gaussian processes (GP): characterization
GP is a distribution of functions defined by a mean function m (t)
and a covariance function k (t, t0).
Prior GP conditioned on a design data set

ti, yi = (T, y(T)),
gives posterior GP.
mean of posterior GP
f(t) = k (t, T) k (T, T) + κI
−1
y(T) − m (T)

+ m (t) .
The evaluated covariance function k (T, T) is the covariance
matrix.
JPi Carbajal 26

Comparing GP and RR
RR solution
f(t) = G (t, T) G (T, T) + κI
−1
y(T) − n (T)

+ n (t) .
GP predictive mean
f(t) = k (t, T) k (T, T) + κI
−1
y(T) − m (T)

+ m (t) .
JPi Carbajal 27

Example: ODE
1st order linear ODE with piece-wise constant input and random
input, giving distribution of functions
Lf(t) − u(t) = η(t)
η(t) ∼ N 0, Σδ(t − t0
)

0 0.5 1 1.5 2
-0.2
0
0.2
0.4
0.6
time
JPi Carbajal 28

Example: ODE
(non-stationary) Covariance function
time
time
0
0.5
1
1.5
2
0 0.5 1 1.5 2
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.5 1 1.5 2
feature
time
JPi Carbajal 28

Example: ODE
f(t) = k (t, ti) k (ti, ti)
-1
ˆ
f (ti) + L-1
u(t)
0 0.5 1 1.5 2
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
time
mean
actual
obs
JPi Carbajal 28

Relations between GP and RR2
2
Florian Steinke and Bernhard Schölkopf (Nov. 2008). “Kernels, regularization and differential equations.” In:
Pattern Recognition 41.11, pp. 3271–3286. issn: 00313203. doi: 10.1016/j.patcog.2008.06.011.
JPi Carbajal 29

Kalman filters (KF): characterization
An iterative method (O(Tn3)) to predict (hidden) states of a
n-state space dynamic model
ẋ(t) = A x(t) + B u(t) + ν(t)
y(tk) = H x(tk) + D u(tk) + (tk)
x ∈ Rn
, y ∈ Rm
, u ∈ Rl
ν ∼ GP(0, Q ), ∼ N(0, P )
This is the continuous-time model discrete-time measurements
flavor of KF. It can be extended to nonlinear (and non-Gaussian)
systems: the Extended KF (model linearization), or the Unscented
KF (posterior approximation).
JPi Carbajal 30

KF: algorithm structure
State update
1. Simulate forward: predict state.
2. Propagate state error covariance.
Measurement update
1. Compute Kalman gain.
2. Update state estimate with data.
3. Update state error covariance.
Initial values
observation
JPi Carbajal 31

Relations between GP and KF
... all [reviewers] mention the affinity between my results
... and the Kalman filter, ... If only the work in these
fields were more readily accessible to the statistician who
(like me) is not a specialist pure mathematician in terms
familiar to him, much duplication could be avoided. In
fact I explicitly denied any originality for these result.3
3
Anthony O’Hagan and JFC Kingman (1978). “Curve Fitting and Optimal Design for Prediction.” In: Journal of
the Royal Statistical Society. Series B (Methodological) 40.1, pp. 1–42. issn: 00359246. doi: 10.2307/2984861.
JPi Carbajal 32

Relations between GP and KF
• Evidence of GP and KF equivalence based on optimal
estimation of solution (O’Hagan, Steinke Schölkopf, . . .)
• Stationary covariance matrices can be converted to state-space
models, where KF can be applied3, via Wiener-Khinchin
theorem
• KF is extended to non-Gaussian likelihoods4
3
Simo Särkkä, Arno Solin, and Jouni Hartikainen (July 2013). “Spatiotemporal Learning via Infinite-Dimensional
Bayesian Filtering and Smoothing: A Look at Gaussian Process Regression Through Kalman Filtering.” In: IEEE
Signal Processing Magazine 30.4, pp. 51–61. issn: 1053-5888. doi: 10.1109/MSP.2013.2246292.
4
Hannes Nickisch, Arno Solin, and Alexander Grigorievskiy (Feb. 2018). “State Space Gaussian Processes with
Non-Gaussian Likelihood.” In: arXiv: 1802.04846. url: http://arxiv.org/abs/1802.04846.
JPi Carbajal 32

Summary: Linear operators
• KF, GP and RR are unified (linear, stationary, convex
loss-function): interpretability, algorithmic versatility
• GP → KF: GP inference linear in the number of time samples
• RR → GP: non-parametric curve fitting
Conclusion: information provided as linear operators can be
merged with ML!
Technical challenges: optimal algorithms, fast implementations,
(sample) convergence rates, language abstractions, optimized
hardware
JPi Carbajal 33

Nonlinear operators
Here it gets sketchy!

Surrogate trajectory
Differential operators that are linear on the differential part
D(x) = Lx − f(x, θ) = 0
We will use a surrogate for x
D(φ) ' 0
How to find the surrogate?
One alternative
φn+1 = L−1
f(φn, θ)
φn = x · ϕ
JPi Carbajal Surrogate trajectory method 35

GP to KF
Stationary covariance matrices can be converted to state-space
models, where KF can be applied5, via Wiener-Khinchin theorem.
κ(t − t0
) ∝
Z
S(ω)eiω(t−t0)
dω
5
Simo Särkkä, Arno Solin, and Jouni Hartikainen (July 2013). “Spatiotemporal Learning via Infinite-Dimensional
Bayesian Filtering and Smoothing: A Look at Gaussian Process Regression Through Kalman Filtering.” In: IEEE
Signal Processing Magazine 30.4, pp. 51–61. issn: 1053-5888. doi: 10.1109/MSP.2013.2246292.
JPi Carbajal 37

Concretely
Spatio-temporal GP
representation
f(r, t) ∼ GP(0, κ(r, t; r0
, t0
))
yk = Hkf(r, tk) + k
⇒
Linear stochastic partial
differential equation
∂f(r, t)
∂t
= Ff(r, t) + Lw(r, t)
yk = Hkf(r, tk) + εk
O((Tn)3) general matrix
inversion
O(Tn3) infinite-dimensional
Kalman filtering and smoothing
1. Compute spectral density (SD) via Fourier transform of covariance
2. Approximate SD with a rational function (if necessary)
3. Find stable transfer function (poles in upper half complex plane)
4. Transform to SS using control theory methods
JPi Carbajal 38

Example: isotropic Matérn covariance function
Temporal process
κ t − t0
= σ2
λ t − t0
K1

λ t − t0
∂f(t)
∂t
=

0 1
−λ −2
√
λ
#
f(t) +

0
1
#
w(x, t)
Spatio-temporal process
r = k(x, t) − (x0
, t0
)k
κ(r) = σ2
λrK1 (λr)
∂f(x, t)
∂t
=

0 1
∂2
∂x − λ −2
q
λ − ∂2
∂x
#
f(x, t) +

0
1
#
w(x, t)
JPi Carbajal 39

Surrogate trajectory
Adomain polynomials RKHS Regularization
JPi Carbajal 40

Other methods
Volterra-Wiener kernels
JPi Carbajal 41

A walk through the intersection between machine learning and mechanistic modeling

More Related Content

Similar to A walk through the intersection between machine learning and mechanistic modeling (20)

Recently uploaded (20)

A walk through the intersection between machine learning and mechanistic modeling