A walk through the intersection between
machine learning and mechanistic modeling
Data Science Seminar, EURECOM, France. May 3, 2018
Juan Pablo Carbajal juanpablo.carbajal@eawag.ch
Eawag: Swiss Federal Institute of Aquatic Science and Technology
This work is licensed under a Creative Commons “Attribution-
ShareAlike 4.0 International” license.
Background
JPi Carbajal Speaker’s profile 1
Machine Learning and Mechanistic modeling
Machine
Learning
Emulation
Mechanistic
modeling
Machine Learning
• Discover patterns from
data (prediction)
• Minimal processes
knowledge
Emulation
• Discover "useful"
approximation of
numerical model from
data
• Partial/Fuzzy
(sub)processes knowledge
Mechanistic modeling
• Interpret data/model
using model/data
• Detailed (sub)processes
knowledge
JPi Carbajal Speaker’s profile 2
Environemental science and engineering
Model calibration, Fast (extreme) event prediction, & Design
source: https://www.dailyrecord.co.uk/news/scottish-news/storm-frank-rescuers-brave-storm-7096067
JPi Carbajal More motivation 3
Engineered complex systems
Optimization, Control, & Maintenance
source: Leaflet, CC BY-SA 3.0, commons. wikimedia. org/ w/ index. php? curid= 5704247
JPi Carbajal More motivation 4
Data fusion/interpretation
source: Azzimonti et al. 10.1080/01621459.2014.946036
JPi Carbajal More motivation 5
What is emulation?
How is emulation even possible?
Alien researcher
source: Svjo, CC BY-SA 3.0, commons. wikimedia. org/
w/ index. php? curid= 25795521
JPi Carbajal What is emulation? 7
From molecules to fluids
JPi Carbajal What is emulation? 8
From molecules to fluids
source: doi:10.1088/0965-0393/12/6/R01
JPi Carbajal What is emulation? 8
Emulation
many names & similarities ...
Surrogate
modeling
Metamodeling
Coarse
graining
Model order
reduction
Inter-
polation
Phenom.
modeling
Extra-
polation
Response
surface
Empiric.
modeling
JPi Carbajal What is emulation? 9
Types of simulators
Partial Differential Equations (PDE)
• Fluid dynamics
• Convection-Diffusion-Reaction systems
• Heat transfer
• Multi-physics, e.g. fluid-structure
interaction
• . . .
source: Wikipedia
JPi Carbajal Types of models 11
Ordinary Differential Equations (ODE)
• Biochemical reactions
• Interconnected tanks
• Population dynamics
• Mechanics
(Lagrangean/Hamiltonian)
• . . .
JPi Carbajal Types of models 12
Algebraic equations
• Equilibrium solutions of
PDE/ODE
• Conservation laws (e.g. mass
(Stoichiometry), energy, . . .)
• Empirical formulas
• Data-driven models
• . . .
source: doi:10.1039/C4CY00409D
JPi Carbajal Types of models 13
Agent based systems
Description of individual (groups of) agents and their interaction
• Sociology
• Biology
• Epidemiology
• Network science
• Molecular dynamics
• . . .
source: www. youtube. com/ watch? v= GUkjC-69vaw
JPi Carbajal Types of models 14
Cellular automata
agents in a graph, discrete states
source: www. youtube. com/ watch? v= -FaqC4h5Ftg
• Agents correspond to nodes in a grid (graph)
• The agents contain one or more discrete internal states
• Each agent interacts with set of nodes: neighborhood
• Rule-set to change states according to neighborhood
JPi Carbajal Types of models 15
source: www. youtube. com/ watch? v= -FaqC4h5Ftg
Gas lattices (Boltzmann)
"continuous" state cellular automata
source: www. coursera. org/ learn/
modeling-simulation-natural-processes
• Probabilistic: collisions
redistribute pdf
• observables = moments of pdf
source: www. youtube. com/ watch? v= p67-Qiad5zc
• Incompresible Navier-Stokes
• Natural multiphysics
(fluid-structure)
JPi Carbajal Types of models 17
source: www. youtube. com/ watch? v= LGcy5OyYHmI
Combination of models
Dimensional heterogeneous models
• Some details are less relevant
• "Simplify" those
• 1D phenomenological
• 1D Emulator of detailed
model
• Reduced model
• Coupling is challenging
JPi Carbajal Types of models 19
Technical background
Intra-, extra-, inter-polation
interpolation 6= smoothing/approximation
intrapolation 6= extrapolation
JPi Carbajal Terminology 21
Regression
The Regression problem
Dataset:

x, y
Hypotheses set: H
ˆ
f = arg min
f∈H
Loss(f[x], y; θ)
H
ˆ
f
JPi Carbajal Regularization and Constraints 22
Regression
The Constrained Regression problem
Dataset:

x, y
Hypotheses set: H
(nonlinear) Operator: D
ˆ
f = arg min
f∈H
Loss(f[x], y; θ)
s. t.
D(f; µ) = 0
g(f; µ) = 0
h(f; µ) ≤ 0
H
h
g
D
ˆ
f
JPi Carbajal Regularization and Constraints 22
Regression
The Regularized Regression problem
Dataset:

x, y
Hypotheses set: H
(nonlinear) Operator: D
ˆ
f = arg min
f∈H
Loss(f[x], y; θ)
+ κDkD(f; µ)k
+ κgkg(f; µ)k
+ κhkm(h(f; µ))k
H
h
g
D
ˆ
f
JPi Carbajal Regularization and Constraints 22
Regression
The Regularized and Constrained Regression problem
Dataset:

x, y
Hypotheses set: H
(nonlinear) Operator: D
ˆ
f = arg min
f∈H
Loss(f[x], y; θ)
+ κDkD(f; µ)k
s. t.
g(f; µ) = 0
h(f; µ) ≤ 0
H
h
g
D
ˆ
f
JPi Carbajal Regularization and Constraints 22
Regularized Regression
Linear Differential Operators
Gaussian Processes
Kalman Filters
Regularized Regression (RR)
with quadratic (convex) Loss function
Given design data set with input-output values

ti, yi = (T, y(T)), find f(t) that approximates the data and
provides good predictions for unseen t:
min
f
N
X
j=1
(yj − f(tj))2
+ κ||Rf||2
f(t) should be close to the
design data, but it should be
regular according to R.
H
R
ˆ
f
JPi Carbajal 24
Regularized Regression (RR)
with quadratic (convex) Loss function
N-degree polynomial regression on N points with
kRfk2
=
Z
d2f
dt2
!2
JPi Carbajal 24
Regularization operator1
min
f
N
X
j=1
(yj − f(tj))2
+ κ||Rf||2
J [f] =
N
X
j=1

yj −
Z
f(t)δ(t − tj)dt
2
+ κhRf(t), Rf(t)i
Searching for the critical point of that functional leads to
R†
Rf(t) =
N
X
j=1
yj − fj
κ
δ(t − tj)
R†
RG t, t0
= δ(t − t0
),
1
Tomaso Poggio and F. Girosi (1990). “Networks for approximation and learning.” In: Proceedings of the IEEE
78.9, pp. 1481–1497. issn: 00189219. doi: 10.1109/5.58326.
JPi Carbajal 25
Regularization operator1
min
f
N
X
j=1
(yj − f(tj))2
+ κ||Rf||2
Solution
f(t) = G (t, T) G (T, T) + κI
−1
y(T) − n (T)

+ n (t) .
Where G (t, t0) is the Green’s function of the operator R†R. If R
has a Green’s function g(t, t0) ( and R†, g†(t, t0)), then
G t, t0
=
Z
g(t, u)g†
(u, t0
)du
1
Tomaso Poggio and F. Girosi (1990). “Networks for approximation and learning.” In: Proceedings of the IEEE
78.9, pp. 1481–1497. issn: 00189219. doi: 10.1109/5.58326.
JPi Carbajal 25
Gaussian processes (GP): characterization
GP is a distribution of functions defined by a mean function m (t)
and a covariance function k (t, t0).
Prior GP conditioned on a design data set

ti, yi = (T, y(T)),
gives posterior GP.
mean of posterior GP
f(t) = k (t, T) k (T, T) + κI
−1
y(T) − m (T)

+ m (t) .
The evaluated covariance function k (T, T) is the covariance
matrix.
JPi Carbajal 26
Comparing GP and RR
RR solution
f(t) = G (t, T) G (T, T) + κI
−1
y(T) − n (T)

+ n (t) .
GP predictive mean
f(t) = k (t, T) k (T, T) + κI
−1
y(T) − m (T)

+ m (t) .
JPi Carbajal 27
Example: ODE
1st order linear ODE with piece-wise constant input and random
input, giving distribution of functions
Lf(t) − u(t) = η(t)
η(t) ∼ N 0, Σδ(t − t0
)

0 0.5 1 1.5 2
-0.2
0
0.2
0.4
0.6
time
JPi Carbajal 28
Example: ODE
(non-stationary) Covariance function
time
time
0
0.5
1
1.5
2
0 0.5 1 1.5 2
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.5 1 1.5 2
feature
time
JPi Carbajal 28
Example: ODE
f(t) = k (t, ti) k (ti, ti)
-1
ˆ
f (ti) + L-1
u(t)
0 0.5 1 1.5 2
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
time
mean
actual
obs
JPi Carbajal 28
Relations between GP and RR2
2
Florian Steinke and Bernhard Schölkopf (Nov. 2008). “Kernels, regularization and differential equations.” In:
Pattern Recognition 41.11, pp. 3271–3286. issn: 00313203. doi: 10.1016/j.patcog.2008.06.011.
JPi Carbajal 29
Relations between GP and RR2
2
Florian Steinke and Bernhard Schölkopf (Nov. 2008). “Kernels, regularization and differential equations.” In:
Pattern Recognition 41.11, pp. 3271–3286. issn: 00313203. doi: 10.1016/j.patcog.2008.06.011.
JPi Carbajal 29
Kalman filters (KF): characterization
An iterative method (O(Tn3)) to predict (hidden) states of a
n-state space dynamic model
ẋ(t) = A x(t) + B u(t) + ν(t)
y(tk) = H x(tk) + D u(tk) + (tk)
x ∈ Rn
, y ∈ Rm
, u ∈ Rl
ν ∼ GP(0, Q ),  ∼ N(0, P )
This is the continuous-time model discrete-time measurements
flavor of KF. It can be extended to nonlinear (and non-Gaussian)
systems: the Extended KF (model linearization), or the Unscented
KF (posterior approximation).
JPi Carbajal 30
KF: algorithm structure
State update
1. Simulate forward: predict state.
2. Propagate state error covariance.
Measurement update
1. Compute Kalman gain.
2. Update state estimate with data.
3. Update state error covariance.
Initial values
observation
JPi Carbajal 31
Relations between GP and KF
... all [reviewers] mention the affinity between my results
... and the Kalman filter, ... If only the work in these
fields were more readily accessible to the statistician who
(like me) is not a specialist pure mathematician in terms
familiar to him, much duplication could be avoided. In
fact I explicitly denied any originality for these result.3
3
Anthony O’Hagan and JFC Kingman (1978). “Curve Fitting and Optimal Design for Prediction.” In: Journal of
the Royal Statistical Society. Series B (Methodological) 40.1, pp. 1–42. issn: 00359246. doi: 10.2307/2984861.
JPi Carbajal 32
Relations between GP and KF
• Evidence of GP and KF equivalence based on optimal
estimation of solution (O’Hagan, Steinke  Schölkopf, . . .)
• Stationary covariance matrices can be converted to state-space
models, where KF can be applied3, via Wiener-Khinchin
theorem
• KF is extended to non-Gaussian likelihoods4
3
Simo Särkkä, Arno Solin, and Jouni Hartikainen (July 2013). “Spatiotemporal Learning via Infinite-Dimensional
Bayesian Filtering and Smoothing: A Look at Gaussian Process Regression Through Kalman Filtering.” In: IEEE
Signal Processing Magazine 30.4, pp. 51–61. issn: 1053-5888. doi: 10.1109/MSP.2013.2246292.
4
Hannes Nickisch, Arno Solin, and Alexander Grigorievskiy (Feb. 2018). “State Space Gaussian Processes with
Non-Gaussian Likelihood.” In: arXiv: 1802.04846. url: http://arxiv.org/abs/1802.04846.
JPi Carbajal 32
Summary: Linear operators
• KF, GP and RR are unified (linear, stationary, convex
loss-function): interpretability, algorithmic versatility
• GP → KF: GP inference linear in the number of time samples
• RR → GP: non-parametric curve fitting
Conclusion: information provided as linear operators can be
merged with ML!
Technical challenges: optimal algorithms, fast implementations,
(sample) convergence rates, language abstractions, optimized
hardware
JPi Carbajal 33
Nonlinear operators
Nonlinear operators
Here it gets sketchy!
Surrogate trajectory
Differential operators that are linear on the differential part
D(x) = Lx − f(x, θ) = 0
We will use a surrogate for x
D(φ) ' 0
How to find the surrogate?
One alternative
φn+1 = L−1
f(φn, θ)
φn = x · ϕ
JPi Carbajal Surrogate trajectory method 35
Thank you!
QA
GP to KF
Stationary covariance matrices can be converted to state-space
models, where KF can be applied5, via Wiener-Khinchin theorem.
κ(t − t0
) ∝
Z
S(ω)eiω(t−t0)
dω
5
Simo Särkkä, Arno Solin, and Jouni Hartikainen (July 2013). “Spatiotemporal Learning via Infinite-Dimensional
Bayesian Filtering and Smoothing: A Look at Gaussian Process Regression Through Kalman Filtering.” In: IEEE
Signal Processing Magazine 30.4, pp. 51–61. issn: 1053-5888. doi: 10.1109/MSP.2013.2246292.
JPi Carbajal 37
Concretely
Spatio-temporal GP
representation
f(r, t) ∼ GP(0, κ(r, t; r0
, t0
))
yk = Hkf(r, tk) + k
⇒
Linear stochastic partial
differential equation
∂f(r, t)
∂t
= Ff(r, t) + Lw(r, t)
yk = Hkf(r, tk) + εk
O((Tn)3) general matrix
inversion
O(Tn3) infinite-dimensional
Kalman filtering and smoothing
1. Compute spectral density (SD) via Fourier transform of covariance
2. Approximate SD with a rational function (if necessary)
3. Find stable transfer function (poles in upper half complex plane)
4. Transform to SS using control theory methods
JPi Carbajal 38
Example: isotropic Matérn covariance function
Temporal process
κ t − t0
= σ2
λ t − t0
K1

λ t − t0
∂f(t)
∂t
=

0 1
−λ −2
√
λ
#
f(t) +

0
1
#
w(x, t)
Spatio-temporal process
r = k(x, t) − (x0
, t0
)k
κ(r) = σ2
λrK1 (λr)
∂f(x, t)
∂t
=

0 1
∂2
∂x − λ −2
q
λ − ∂2
∂x
#
f(x, t) +

0
1
#
w(x, t)
JPi Carbajal 39
Surrogate trajectory
Adomain polynomials RKHS Regularization
JPi Carbajal 40
Other methods
Volterra-Wiener kernels
JPi Carbajal 41

More Related Content

PDF
Nonnegative Matrix Factorization with Side Information for Time Series Recove...
PDF
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
PDF
Sequential Monte Carlo algorithms for agent-based models of disease transmission
PPTX
Vu_HPSC2012_02.pptx
PPT
Presentacion limac-unc
PDF
ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction Challenge
PDF
MUMS: Bayesian, Fiducial, and Frequentist Conference - Generalized Probabilis...
PDF
Random Matrix Theory and Machine Learning - Part 4
Nonnegative Matrix Factorization with Side Information for Time Series Recove...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
Sequential Monte Carlo algorithms for agent-based models of disease transmission
Vu_HPSC2012_02.pptx
Presentacion limac-unc
ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction Challenge
MUMS: Bayesian, Fiducial, and Frequentist Conference - Generalized Probabilis...
Random Matrix Theory and Machine Learning - Part 4

Similar to A walk through the intersection between machine learning and mechanistic modeling (20)

PDF
Finite frequency H∞ control for wind turbine systems in T-S form
PDF
Применение машинного обучения для навигации и управления роботами
PDF
Feedback Particle Filter and its Applications to Neuroscience
PDF
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
PDF
Quantum Annealing for Dirichlet Process Mixture Models with Applications to N...
PPT
LeastSquaresParameterEstimation.ppt
PDF
Computational methods and vibrational properties applied to materials modeling
PPT
An Introduction to Probabilistic Graphical Modeling
PDF
PDF
Measures of different reliability parameters for a complex redundant system u...
PDF
Learning to Reconstruct
PDF
safe and efficient off policy reinforcement learning
PDF
2012 mdsp pr06  hmm
PDF
Maneuvering target track prediction model
PDF
Anomaly Detection in Sequences of Short Text Using Iterative Language Models
PPT
Jörg Stelzer
PDF
reservoir-modeling-using-matlab-the-matalb-reservoir-simulation-toolbox-mrst.pdf
PDF
Multivariable Control System Design for Quadruple Tank Process using Quantita...
PPTX
2-0_Solução de Oscilações Numéricas EMTP_0.pptx
PDF
Simple representations for learning: factorizations and similarities
Finite frequency H∞ control for wind turbine systems in T-S form
Применение машинного обучения для навигации и управления роботами
Feedback Particle Filter and its Applications to Neuroscience
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
Quantum Annealing for Dirichlet Process Mixture Models with Applications to N...
LeastSquaresParameterEstimation.ppt
Computational methods and vibrational properties applied to materials modeling
An Introduction to Probabilistic Graphical Modeling
Measures of different reliability parameters for a complex redundant system u...
Learning to Reconstruct
safe and efficient off policy reinforcement learning
2012 mdsp pr06  hmm
Maneuvering target track prediction model
Anomaly Detection in Sequences of Short Text Using Iterative Language Models
Jörg Stelzer
reservoir-modeling-using-matlab-the-matalb-reservoir-simulation-toolbox-mrst.pdf
Multivariable Control System Design for Quadruple Tank Process using Quantita...
2-0_Solução de Oscilações Numéricas EMTP_0.pptx
Simple representations for learning: factorizations and similarities

Recently uploaded (20)

PDF
Pharmacokinetics Lecture_Study Material.pdf
PPT
what do you want to know about myeloprolifritive disorders .ppt
PPTX
The Electromagnetism Wave Spectrum. pptx
PPTX
ELS 2ND QUARTER 2 FOR HUMSS STUDENTS.pptx
PPTX
INTRODUCTION TO CELL STRUCTURE_LESSON.pptx
PDF
Unit Four Lesson in Carbohydrates chemistry
PPTX
Antihypertensive Medicinal Chemistry Unit II BP501T.pptx
PPTX
Chapter 1 Introductory course Biology Camp
PDF
FSNRD Proceeding Finalized on May 11 2021.pdf
PPTX
complications of tooth extraction.pptx FIRM B.pptx
PDF
SOCIAL PSYCHOLOGY chapter 1-what is social psychology and its definition
PDF
No dilute core produced in simulations of giant impacts on to Jupiter
PPT
dcs-computertraningbasics-170826004702.ppt
PPTX
Cutaneous tuberculosis Dermatology
PPTX
Bacterial and protozoal infections in pregnancy.pptx
PPTX
Models of Eucharyotic Chromosome Dr. Thirunahari Ugandhar.pptx
PPTX
Introduction of Plant Ecology and Diversity Conservation
PDF
Glycolysis by Rishikanta Usham, Dhanamanjuri University
PDF
BCKIC FOUNDATION_MAY-JUNE 2025_NEWSLETTER
PPTX
ELS 2ND QUARTER 1 FOR HUMSS STUDENTS.pptx
Pharmacokinetics Lecture_Study Material.pdf
what do you want to know about myeloprolifritive disorders .ppt
The Electromagnetism Wave Spectrum. pptx
ELS 2ND QUARTER 2 FOR HUMSS STUDENTS.pptx
INTRODUCTION TO CELL STRUCTURE_LESSON.pptx
Unit Four Lesson in Carbohydrates chemistry
Antihypertensive Medicinal Chemistry Unit II BP501T.pptx
Chapter 1 Introductory course Biology Camp
FSNRD Proceeding Finalized on May 11 2021.pdf
complications of tooth extraction.pptx FIRM B.pptx
SOCIAL PSYCHOLOGY chapter 1-what is social psychology and its definition
No dilute core produced in simulations of giant impacts on to Jupiter
dcs-computertraningbasics-170826004702.ppt
Cutaneous tuberculosis Dermatology
Bacterial and protozoal infections in pregnancy.pptx
Models of Eucharyotic Chromosome Dr. Thirunahari Ugandhar.pptx
Introduction of Plant Ecology and Diversity Conservation
Glycolysis by Rishikanta Usham, Dhanamanjuri University
BCKIC FOUNDATION_MAY-JUNE 2025_NEWSLETTER
ELS 2ND QUARTER 1 FOR HUMSS STUDENTS.pptx

A walk through the intersection between machine learning and mechanistic modeling

  • 1. A walk through the intersection between machine learning and mechanistic modeling Data Science Seminar, EURECOM, France. May 3, 2018 Juan Pablo Carbajal [email protected] Eawag: Swiss Federal Institute of Aquatic Science and Technology This work is licensed under a Creative Commons “Attribution- ShareAlike 4.0 International” license.
  • 3. Machine Learning and Mechanistic modeling Machine Learning Emulation Mechanistic modeling Machine Learning • Discover patterns from data (prediction) • Minimal processes knowledge Emulation • Discover "useful" approximation of numerical model from data • Partial/Fuzzy (sub)processes knowledge Mechanistic modeling • Interpret data/model using model/data • Detailed (sub)processes knowledge JPi Carbajal Speaker’s profile 2
  • 4. Environemental science and engineering Model calibration, Fast (extreme) event prediction, & Design source: https://www.dailyrecord.co.uk/news/scottish-news/storm-frank-rescuers-brave-storm-7096067 JPi Carbajal More motivation 3
  • 5. Engineered complex systems Optimization, Control, & Maintenance source: Leaflet, CC BY-SA 3.0, commons. wikimedia. org/ w/ index. php? curid= 5704247 JPi Carbajal More motivation 4
  • 6. Data fusion/interpretation source: Azzimonti et al. 10.1080/01621459.2014.946036 JPi Carbajal More motivation 5
  • 7. What is emulation? How is emulation even possible?
  • 8. Alien researcher source: Svjo, CC BY-SA 3.0, commons. wikimedia. org/ w/ index. php? curid= 25795521 JPi Carbajal What is emulation? 7
  • 9. From molecules to fluids JPi Carbajal What is emulation? 8
  • 10. From molecules to fluids source: doi:10.1088/0965-0393/12/6/R01 JPi Carbajal What is emulation? 8
  • 11. Emulation many names & similarities ... Surrogate modeling Metamodeling Coarse graining Model order reduction Inter- polation Phenom. modeling Extra- polation Response surface Empiric. modeling JPi Carbajal What is emulation? 9
  • 13. Partial Differential Equations (PDE) • Fluid dynamics • Convection-Diffusion-Reaction systems • Heat transfer • Multi-physics, e.g. fluid-structure interaction • . . . source: Wikipedia JPi Carbajal Types of models 11
  • 14. Ordinary Differential Equations (ODE) • Biochemical reactions • Interconnected tanks • Population dynamics • Mechanics (Lagrangean/Hamiltonian) • . . . JPi Carbajal Types of models 12
  • 15. Algebraic equations • Equilibrium solutions of PDE/ODE • Conservation laws (e.g. mass (Stoichiometry), energy, . . .) • Empirical formulas • Data-driven models • . . . source: doi:10.1039/C4CY00409D JPi Carbajal Types of models 13
  • 16. Agent based systems Description of individual (groups of) agents and their interaction • Sociology • Biology • Epidemiology • Network science • Molecular dynamics • . . . source: www. youtube. com/ watch? v= GUkjC-69vaw JPi Carbajal Types of models 14
  • 17. Cellular automata agents in a graph, discrete states source: www. youtube. com/ watch? v= -FaqC4h5Ftg • Agents correspond to nodes in a grid (graph) • The agents contain one or more discrete internal states • Each agent interacts with set of nodes: neighborhood • Rule-set to change states according to neighborhood JPi Carbajal Types of models 15
  • 18. source: www. youtube. com/ watch? v= -FaqC4h5Ftg
  • 19. Gas lattices (Boltzmann) "continuous" state cellular automata source: www. coursera. org/ learn/ modeling-simulation-natural-processes • Probabilistic: collisions redistribute pdf • observables = moments of pdf source: www. youtube. com/ watch? v= p67-Qiad5zc • Incompresible Navier-Stokes • Natural multiphysics (fluid-structure) JPi Carbajal Types of models 17
  • 20. source: www. youtube. com/ watch? v= LGcy5OyYHmI
  • 21. Combination of models Dimensional heterogeneous models • Some details are less relevant • "Simplify" those • 1D phenomenological • 1D Emulator of detailed model • Reduced model • Coupling is challenging JPi Carbajal Types of models 19
  • 23. Intra-, extra-, inter-polation interpolation 6= smoothing/approximation intrapolation 6= extrapolation JPi Carbajal Terminology 21
  • 24. Regression The Regression problem Dataset: x, y Hypotheses set: H ˆ f = arg min f∈H Loss(f[x], y; θ) H ˆ f JPi Carbajal Regularization and Constraints 22
  • 25. Regression The Constrained Regression problem Dataset: x, y Hypotheses set: H (nonlinear) Operator: D ˆ f = arg min f∈H Loss(f[x], y; θ) s. t. D(f; µ) = 0 g(f; µ) = 0 h(f; µ) ≤ 0 H h g D ˆ f JPi Carbajal Regularization and Constraints 22
  • 26. Regression The Regularized Regression problem Dataset: x, y Hypotheses set: H (nonlinear) Operator: D ˆ f = arg min f∈H Loss(f[x], y; θ) + κDkD(f; µ)k + κgkg(f; µ)k + κhkm(h(f; µ))k H h g D ˆ f JPi Carbajal Regularization and Constraints 22
  • 27. Regression The Regularized and Constrained Regression problem Dataset: x, y Hypotheses set: H (nonlinear) Operator: D ˆ f = arg min f∈H Loss(f[x], y; θ) + κDkD(f; µ)k s. t. g(f; µ) = 0 h(f; µ) ≤ 0 H h g D ˆ f JPi Carbajal Regularization and Constraints 22
  • 28. Regularized Regression Linear Differential Operators Gaussian Processes Kalman Filters
  • 29. Regularized Regression (RR) with quadratic (convex) Loss function Given design data set with input-output values ti, yi = (T, y(T)), find f(t) that approximates the data and provides good predictions for unseen t: min f N X j=1 (yj − f(tj))2 + κ||Rf||2 f(t) should be close to the design data, but it should be regular according to R. H R ˆ f JPi Carbajal 24
  • 30. Regularized Regression (RR) with quadratic (convex) Loss function N-degree polynomial regression on N points with kRfk2 = Z d2f dt2 !2 JPi Carbajal 24
  • 31. Regularization operator1 min f N X j=1 (yj − f(tj))2 + κ||Rf||2 J [f] = N X j=1 yj − Z f(t)δ(t − tj)dt 2 + κhRf(t), Rf(t)i Searching for the critical point of that functional leads to R† Rf(t) = N X j=1 yj − fj κ δ(t − tj) R† RG t, t0 = δ(t − t0 ), 1 Tomaso Poggio and F. Girosi (1990). “Networks for approximation and learning.” In: Proceedings of the IEEE 78.9, pp. 1481–1497. issn: 00189219. doi: 10.1109/5.58326. JPi Carbajal 25
  • 32. Regularization operator1 min f N X j=1 (yj − f(tj))2 + κ||Rf||2 Solution f(t) = G (t, T) G (T, T) + κI −1 y(T) − n (T) + n (t) . Where G (t, t0) is the Green’s function of the operator R†R. If R has a Green’s function g(t, t0) ( and R†, g†(t, t0)), then G t, t0 = Z g(t, u)g† (u, t0 )du 1 Tomaso Poggio and F. Girosi (1990). “Networks for approximation and learning.” In: Proceedings of the IEEE 78.9, pp. 1481–1497. issn: 00189219. doi: 10.1109/5.58326. JPi Carbajal 25
  • 33. Gaussian processes (GP): characterization GP is a distribution of functions defined by a mean function m (t) and a covariance function k (t, t0). Prior GP conditioned on a design data set ti, yi = (T, y(T)), gives posterior GP. mean of posterior GP f(t) = k (t, T) k (T, T) + κI −1 y(T) − m (T) + m (t) . The evaluated covariance function k (T, T) is the covariance matrix. JPi Carbajal 26
  • 34. Comparing GP and RR RR solution f(t) = G (t, T) G (T, T) + κI −1 y(T) − n (T) + n (t) . GP predictive mean f(t) = k (t, T) k (T, T) + κI −1 y(T) − m (T) + m (t) . JPi Carbajal 27
  • 35. Example: ODE 1st order linear ODE with piece-wise constant input and random input, giving distribution of functions Lf(t) − u(t) = η(t) η(t) ∼ N 0, Σδ(t − t0 ) 0 0.5 1 1.5 2 -0.2 0 0.2 0.4 0.6 time JPi Carbajal 28
  • 36. Example: ODE (non-stationary) Covariance function time time 0 0.5 1 1.5 2 0 0.5 1 1.5 2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.5 1 1.5 2 feature time JPi Carbajal 28
  • 37. Example: ODE f(t) = k (t, ti) k (ti, ti) -1 ˆ f (ti) + L-1 u(t) 0 0.5 1 1.5 2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 time mean actual obs JPi Carbajal 28
  • 38. Relations between GP and RR2 2 Florian Steinke and Bernhard Schölkopf (Nov. 2008). “Kernels, regularization and differential equations.” In: Pattern Recognition 41.11, pp. 3271–3286. issn: 00313203. doi: 10.1016/j.patcog.2008.06.011. JPi Carbajal 29
  • 39. Relations between GP and RR2 2 Florian Steinke and Bernhard Schölkopf (Nov. 2008). “Kernels, regularization and differential equations.” In: Pattern Recognition 41.11, pp. 3271–3286. issn: 00313203. doi: 10.1016/j.patcog.2008.06.011. JPi Carbajal 29
  • 40. Kalman filters (KF): characterization An iterative method (O(Tn3)) to predict (hidden) states of a n-state space dynamic model ẋ(t) = A x(t) + B u(t) + ν(t) y(tk) = H x(tk) + D u(tk) + (tk) x ∈ Rn , y ∈ Rm , u ∈ Rl ν ∼ GP(0, Q ), ∼ N(0, P ) This is the continuous-time model discrete-time measurements flavor of KF. It can be extended to nonlinear (and non-Gaussian) systems: the Extended KF (model linearization), or the Unscented KF (posterior approximation). JPi Carbajal 30
  • 41. KF: algorithm structure State update 1. Simulate forward: predict state. 2. Propagate state error covariance. Measurement update 1. Compute Kalman gain. 2. Update state estimate with data. 3. Update state error covariance. Initial values observation JPi Carbajal 31
  • 42. Relations between GP and KF ... all [reviewers] mention the affinity between my results ... and the Kalman filter, ... If only the work in these fields were more readily accessible to the statistician who (like me) is not a specialist pure mathematician in terms familiar to him, much duplication could be avoided. In fact I explicitly denied any originality for these result.3 3 Anthony O’Hagan and JFC Kingman (1978). “Curve Fitting and Optimal Design for Prediction.” In: Journal of the Royal Statistical Society. Series B (Methodological) 40.1, pp. 1–42. issn: 00359246. doi: 10.2307/2984861. JPi Carbajal 32
  • 43. Relations between GP and KF • Evidence of GP and KF equivalence based on optimal estimation of solution (O’Hagan, Steinke Schölkopf, . . .) • Stationary covariance matrices can be converted to state-space models, where KF can be applied3, via Wiener-Khinchin theorem • KF is extended to non-Gaussian likelihoods4 3 Simo Särkkä, Arno Solin, and Jouni Hartikainen (July 2013). “Spatiotemporal Learning via Infinite-Dimensional Bayesian Filtering and Smoothing: A Look at Gaussian Process Regression Through Kalman Filtering.” In: IEEE Signal Processing Magazine 30.4, pp. 51–61. issn: 1053-5888. doi: 10.1109/MSP.2013.2246292. 4 Hannes Nickisch, Arno Solin, and Alexander Grigorievskiy (Feb. 2018). “State Space Gaussian Processes with Non-Gaussian Likelihood.” In: arXiv: 1802.04846. url: http://arxiv.org/abs/1802.04846. JPi Carbajal 32
  • 44. Summary: Linear operators • KF, GP and RR are unified (linear, stationary, convex loss-function): interpretability, algorithmic versatility • GP → KF: GP inference linear in the number of time samples • RR → GP: non-parametric curve fitting Conclusion: information provided as linear operators can be merged with ML! Technical challenges: optimal algorithms, fast implementations, (sample) convergence rates, language abstractions, optimized hardware JPi Carbajal 33
  • 47. Surrogate trajectory Differential operators that are linear on the differential part D(x) = Lx − f(x, θ) = 0 We will use a surrogate for x D(φ) ' 0 How to find the surrogate? One alternative φn+1 = L−1 f(φn, θ) φn = x · ϕ JPi Carbajal Surrogate trajectory method 35
  • 49. GP to KF Stationary covariance matrices can be converted to state-space models, where KF can be applied5, via Wiener-Khinchin theorem. κ(t − t0 ) ∝ Z S(ω)eiω(t−t0) dω 5 Simo Särkkä, Arno Solin, and Jouni Hartikainen (July 2013). “Spatiotemporal Learning via Infinite-Dimensional Bayesian Filtering and Smoothing: A Look at Gaussian Process Regression Through Kalman Filtering.” In: IEEE Signal Processing Magazine 30.4, pp. 51–61. issn: 1053-5888. doi: 10.1109/MSP.2013.2246292. JPi Carbajal 37
  • 50. Concretely Spatio-temporal GP representation f(r, t) ∼ GP(0, κ(r, t; r0 , t0 )) yk = Hkf(r, tk) + k ⇒ Linear stochastic partial differential equation ∂f(r, t) ∂t = Ff(r, t) + Lw(r, t) yk = Hkf(r, tk) + εk O((Tn)3) general matrix inversion O(Tn3) infinite-dimensional Kalman filtering and smoothing 1. Compute spectral density (SD) via Fourier transform of covariance 2. Approximate SD with a rational function (if necessary) 3. Find stable transfer function (poles in upper half complex plane) 4. Transform to SS using control theory methods JPi Carbajal 38
  • 51. Example: isotropic Matérn covariance function Temporal process κ t − t0 = σ2 λ t − t0 K1 λ t − t0 ∂f(t) ∂t = 0 1 −λ −2 √ λ # f(t) + 0 1 # w(x, t) Spatio-temporal process r = k(x, t) − (x0 , t0 )k κ(r) = σ2 λrK1 (λr) ∂f(x, t) ∂t = 0 1 ∂2 ∂x − λ −2 q λ − ∂2 ∂x # f(x, t) + 0 1 # w(x, t) JPi Carbajal 39
  • 52. Surrogate trajectory Adomain polynomials RKHS Regularization JPi Carbajal 40