SlideShare a Scribd company logo
Model Calibration with
Neural Networks
Andres Hernandez
Motivation
The point of this talk is to provide a method that will perform
the calibration significantly faster regardless of the model, hence
removing the calibration speed from a model’s practicality.
As an added benefit, but not addressed here, neural networks, as
they are fully differentiable, could provide model parameters sensi-
tivities to market prices, informing when a model should be recali-
brated
While examples of calibrating a Hull-White model are used, they
are not intended to showcase best practice in calibrating them or
selecting the market instruments.
2
Table of contents
1 Background
Calibration Problem
Example: Hull-White
Neural Networks
2 Supervised Training
Approach
Training
Neural Network Topology
Results
Generating Training Set
3 Unsupervised Training
Approach
Reinforcement Learning
Neural networks training
other neural networks
3
Background
Definition
Model calibration is the process by which model parameters are ad-
justed to ’best’ describe/fit known observations. For a given model
M, an instrument’s theoretical quote is obtained
Q(τ) = M(θ; τ, ϕ),
where θ represents the model parameters, τ represents the identify-
ing properties of the particular instrument, e.g. maturity, day-count
convention, etc., and ϕ represents other exogenous factors used for
pricing, e.g. interest rate curve.
5
Definition
The calibration problem consists then in finding the parameters θ,
which best match a set of quotes
θ = arg min
θ∗∈S⊆Rn
Cost
(
θ∗
, {ˆQ}; {τ}, ϕ
)
= Θ
(
{ˆQ}; {τ}, ϕ
)
,
where {τ} is the set of instrument properties and {ˆQ} is the set of
relevant market quotes
{ˆQ} = {ˆQi|i = 1 . . . N}, {τ} = {τi|i = 1 . . . N}
The cost can vary, but is usually some sort of weighted average of
all the errors
Cost
(
θ∗
, {ˆQ}; {τ}, ϕ
)
=
N∑
i=1
wi(Q(τi) − ˆQ(τi))2
6
Definition
The calibration problem can be seen as a function with N inputs and
n outputs
Θ : RN
→ Rn
It need not be everywhere smooth, and may in fact contain a few
discontinuities, either in the function itself, or on its derivatives,
but in general it is expected to be continuous and smooth almost
everywhere. As N can often be quite large, this presents a good use
case for a neural network.
7
Hull-White Model
As examples, the single-factor Hull-White model and two-factor
model calibrated to 156 GBP ATM swaptions will be used
drt = (θ(t) − αrt)dt + σdWt drt = (θ(t) + ut − αrt) dt + σ1dW1
t
dut = −butdt + σ2dW2
t
with dW1
t dW2
t = ρdt. All parameters, α, σ, σ1, σ2, and b are
positive, and shared across all option maturities. ρ ∈ [−1, 1]. θ(t)
is picked to replicate the current yield curve y(t).
The related calibration problems are then
(α, σ) = Θ1F
(
{ˆQ}; {τ}, y(t)
)
(α, σ1, σ2, b, ρ) = Θ2F
(
{ˆQ}; {τ}, y(t)
)
8
Artificial neural networks
Artificial neural networks are a family of machine learning tech-
niques, which are currently used in state-of-the-art solutions for im-
age and speech recognition, and natural language processing.
In general, artificial neural networks are an extension of regression
aX + b aX2 + bX + c
1
1+exp(−a(X−b))
9
Neural Networks
In neural networks, independent regression units are stacked together
in layers, with layers stacked on top of each other
10
Supervised Training Approach
Calibration through neural networks
The calibration problem can been reduced to finding a neural net-
work to approximate Θ. The problem is split into two: a training
phase, which would normally be done offline, and the evaluation,
which gives the model parameters for a given input
Training phase:
1 Collect large training set of calibrated examples
2 Propose neural network
3 Train, validate, and test it
Calibration of a model would then proceed simply by applying the
previously trained Neural Network on the new input.
12
Supervised Training
If one is provided with a set of associated input and output samples,
one can ’train’ the neural network’s to best be able to reproduce the
desired output given the known inputs.
The most common training method are variations of gradient de-
scent. It consists of calculating the gradient, and moving along in
the opposite direction. At each iteration, the current position is xm
is updated so
xm+1 = xm − γ∇F(xm),
with γ called learning rate. What is used in practice is a form of
stochastic gradient descent, where the parameters are not updated
after calculating the gradient for all samples, but only for a random
small subsample.
13
Feed-forward neural network for 2-factor HW
Input
SWO
156×1
IR
44×1
Hidden Layer
a1 =
elu(W1 · BN(p) + b1)
p
200 × 1
BN1
64 × 200 W1
1
64 × 1
b1
+
DO1
+
Hidden Layer (x8)
ai = ai−1+
elu(Wi · BN(ai−1) + bi)
BNi
Wi
1
64 × 1
bi
+
DOi
+
Output Layer
a10 = W10 · a9 + b10
DO10
5 × 64
W10
1
5 × 1
b1
+
14
Hull-White 1-Factor: train from 01-2013 to 06-2014
Sample set created from historical examples from January 2013 to
June 2014
Average Volatily Error
01-01-2013 01-07-2013 01-01-2014 01-07-2014 01-01-2015 01-07-2015 01-01-2016
4.27 %
5.60 %
6.93 %
8.26 %
9.59 %
10.92 %
12.25 %
13.58 %
14.91 %
→ Out of sampleIn sample ←
Default Starting Point
Historical Starting Point
Feed-forward Neural Net
15
Hull-White 1-Factor: train from 01-2013 to 06-2015
Average Volatily Error
01-01-2013 01-07-2013 01-01-2014 01-07-2014 01-01-2015 01-07-2015 01-01-2016
4.10 %
5.26 %
6.42 %
7.58 %
8.74 %
9.90 %
11.06 %
12.22 %
13.37 %
14.53 %
→ Out of sampleIn sample ←
Default Starting Point
Historical Starting Point
Feed-forward Neural Net
16
Cost Function on 01-07-2015
The historical point, lies on the trough. The default starting point
(α = 0.1, σ = 0.01) starts up on the side.
17
Hull-White 2-Factor
Comparison of local optimizer against global optimizer
01-01-2013 01-07-2013 01-01-2014 01-07-2014 01-01-2015 01-07-2015 01-01-2016
3 %
4 %
5 %
6 %
7 %
8 %
9 %
10 %
11 %
12 %
Average Volatility Error
Local optimizer
Global optimizer
18
Hull-White 2-Factor - Global vs local optimizer
1.0
1.2
1.4
1.6
1.8
2.0
2.2
The above shows the plane defined by the global minimum, the local
minimum, and the default starting point.
19
Hull-White 2-Factor - retrained every 2 months
To train, a 1-year rolling window is used.
01-01-2013 01-07-2013 01-01-2014 01-07-2014 01-01-2015 01-07-2015 01-01-2016
3.06 %
3.94 %
4.83 %
5.72 %
6.61 %
7.49 %
8.38 %
9.27 %
10.16 %
→ Out of sampleIn sample ←
Average Volatility Error
Simulated Annealing
Neural Network
20
Generating Training Set
The large training set has not yet been discussed. Taking all histori-
cal values and calibrating could be a possibility. However, the inverse
of Θ is known, it is simply the regular valuation of the instruments
under a given set of parameters
{Q} = Θ−1
(α, σ; {τ}, y(t))
This means that we can generate new examples by simply generating
random parameters α and σ. There are some complications, e.g.
examples of y(t) also need to be generated, and the parameters and
y(t) need to be correlated properly for it to be meaningful.
21
Generating Training Set
The intention is to collect historical examples, and imply some kind
of statistical model from them, and then draw from that distribution.
1 Calibrate model for training history
2 Obtain errors for each instrument for each day
3 As parameters are positive, take logarithm on the historical
values
4 Rescale yield curves, parameters, and errors to have zero
mean and variance 1
5 Apply dimensional reduction via PCA to yield curve, and keep
parameters for given explained variance (e.g. 99.5%)
22
Generating Training Set - From normal distribution
6 Calculate covariance of rescaled log-parameters, PCA yield
curve values, and errors
7 Generate random normally distributed vectors consistent with
given covariance
8 Apply inverse transformations: rescale to original mean,
variance, and dimensionality, and take exponential of
parameters
9 Select reference date randomly
10 Obtain implied volatility for all swaptions, and apply random
errors
23
Generating Training Set - Variational autoencoder
Variational autoen-
coders learn a la-
tent variable model
that parametrizes a
probability distribu-
tion of the output
contingent on the
input.
24
Normal distribution vs variational autoencoder (no
retraining)
01-01-2013 01-07-2013 01-01-2014 01-07-2014 01-01-2015 01-07-2015 01-01-2016
3.70 %
5.22 %
6.75 %
8.27 %
9.80 %
11.33 %
12.85 %
14.38 %
15.90 %
→ Out of sampleIn sample ←
Average Volatility Error
Global Optimizer
FNN with Normal Dist.
FNN with VAE
25
Unsupervised Training Approach
Bespoke optimizer
But what about the case where one doesn’t have a long time-series?
Reinforcement learning can be used to create better bespoke opti-
mizers than the traditional local or global optimization procedures.
27
Deep-q learning
A common approach for reinforcement learning with a large possi-
bility of actions and states is called Q-Learning:
An agent’s behaviour is defined by a policy π, which maps states to
a probability distribution over the actions π : S → P(A).
The return Rt from an action is defined as the sum of discounted
future rewards Rt =
∑T
i=t γi−tr(si, ai).
The quality of an action is the expected return of an action at in
state st
Qπ
(at, st) = Eri≥t,si>t,ai>t [Rt|st, at]
28
Learning to learn without gradient descent with
gradient descent
A long-short-term memory (LSTM)
architecture was used to simplify
represent the whole agent. The
standard LSTM block is composed of
several gates with an internal state:
In the current case,
100 LSTM blocks were
used per layer, and 3
layers were stacked on
top of each other
t
29
Unrolled recurrent network
30
Train the optimizer
Train it with approximation of
F(x), whose gradient is available
Advantage: training proceeds fast
Disadvantage: potentially will
not reach full possibility
Train it with non-gradient based
optimizer
Local optimizer: generally
requires a number of evaluations
∼ to number of dimensions to
take next step
Global optimizer: very hard to
set hyperparameters
Train a second NN to train first NN
31
Bespoke optimizer
01-01-2013 01-07-2013 01-01-2014 01-07-2014 01-01-2015 01-07-2015
2.67 %
3.85 %
5.03 %
6.21 %
7.39 %
8.57 %
9.74 %
10.92 %
12.10 %
→ Out of sampleIn sample ←
Average Volatility Error
Neural Network
Global optimizer
32
References
https://github.com/Andres-Hernandez/CalibrationNN
A. Hernandez, Model calibration with neural networks, Risk,
June 2017
A. Hernandez, Model Calibration: Global Optimizer vs.
Neural Network, SSRN abstract id=2996930
Y. Chen, et al Learning to Learn without Gradient Descent by
Gradient Descent, arXiv:1611.03824
33
Future work
Calibration of local stochastic volatility model. Work is being
undertaken in collaboration with Professors J. Teichmann
from the ETH Zürich, and C. Cuchiero in University of Wien,
and W. Khosrawi-Sardroudi from the University of Freiburg.
Improvement of bespoke optimizers, in particular train with
more random environment: different currencies, constituents,
etc.
Use of bespoke optimizer as large-dimensional PDE solver
34
®2017 PricewaterhouseCoopers GmbH Wirtschaftsprüfungsgesellschaft. All rights reserved. In this
document, “PwC” refers to PricewaterhouseCoopers GmbH Wirtschaftsprüfungsgesellschaft, which is a
member firm of PricewaterhouseCoopers International Limited (PwCIL). Each member firm of PwCIL is a
separate and independent legal entity.

More Related Content

PDF
Estimating Future Initial Margin with Machine Learning
Andres Hernandez
 
PDF
QMC Opening Workshop, Support Points - a new way to compact distributions, wi...
The Statistical and Applied Mathematical Sciences Institute
 
PDF
[AAAI2021] Combinatorial Pure Exploration with Full-bandit or Partial Linear ...
Yuko Kuroki (黒木祐子)
 
PDF
A General Framework for Enhancing Prediction Performance on Time Series Data
HopeBay Technologies, Inc.
 
PDF
MLHEP Lectures - day 2, basic track
arogozhnikov
 
PDF
MLHEP Lectures - day 1, basic track
arogozhnikov
 
PDF
MLHEP Lectures - day 3, basic track
arogozhnikov
 
PDF
Stochastic Alternating Direction Method of Multipliers
Taiji Suzuki
 
Estimating Future Initial Margin with Machine Learning
Andres Hernandez
 
QMC Opening Workshop, Support Points - a new way to compact distributions, wi...
The Statistical and Applied Mathematical Sciences Institute
 
[AAAI2021] Combinatorial Pure Exploration with Full-bandit or Partial Linear ...
Yuko Kuroki (黒木祐子)
 
A General Framework for Enhancing Prediction Performance on Time Series Data
HopeBay Technologies, Inc.
 
MLHEP Lectures - day 2, basic track
arogozhnikov
 
MLHEP Lectures - day 1, basic track
arogozhnikov
 
MLHEP Lectures - day 3, basic track
arogozhnikov
 
Stochastic Alternating Direction Method of Multipliers
Taiji Suzuki
 

What's hot (20)

PDF
MLHEP 2015: Introductory Lecture #4
arogozhnikov
 
PDF
Expectation propagation
Dong Guo
 
PDF
MLHEP 2015: Introductory Lecture #2
arogozhnikov
 
PDF
Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...
Taiji Suzuki
 
PDF
Mdp
Ronald Teo
 
PDF
Dp
Ronald Teo
 
PDF
overlap add and overlap save method
sivakumars90
 
PDF
03 image transform
Rumah Belajar
 
PPTX
The discrete fourier transform (dsp) 4
HIMANSHU DIWAKAR
 
PDF
Sampling strategies for Sequential Monte Carlo (SMC) methods
Stephane Senecal
 
PDF
Development of a Pseudo-Spectral 3D Navier Stokes Solver for Wind Turbine App...
Emre Barlas
 
PDF
Adaptive dynamic programming algorithm for uncertain nonlinear switched systems
International Journal of Power Electronics and Drive Systems
 
PPT
Ecte401 notes week3
subhasree konar
 
PDF
Time Series Analysis
Amit Ghosh
 
PPTX
5. convolution and correlation of discrete time signals
MdFazleRabbi18
 
DOCX
signal and system Lecture 2
iqbal ahmad
 
PPTX
Dft,fft,windowing
Abhishek Verma
 
PDF
Reinforcement learning
DongHyun Kwak
 
PDF
Lecture 15 DCT, Walsh and Hadamard Transform
VARUN KUMAR
 
PDF
DSP_FOEHU - Lec 03 - Sampling of Continuous Time Signals
Amr E. Mohamed
 
MLHEP 2015: Introductory Lecture #4
arogozhnikov
 
Expectation propagation
Dong Guo
 
MLHEP 2015: Introductory Lecture #2
arogozhnikov
 
Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...
Taiji Suzuki
 
overlap add and overlap save method
sivakumars90
 
03 image transform
Rumah Belajar
 
The discrete fourier transform (dsp) 4
HIMANSHU DIWAKAR
 
Sampling strategies for Sequential Monte Carlo (SMC) methods
Stephane Senecal
 
Development of a Pseudo-Spectral 3D Navier Stokes Solver for Wind Turbine App...
Emre Barlas
 
Adaptive dynamic programming algorithm for uncertain nonlinear switched systems
International Journal of Power Electronics and Drive Systems
 
Ecte401 notes week3
subhasree konar
 
Time Series Analysis
Amit Ghosh
 
5. convolution and correlation of discrete time signals
MdFazleRabbi18
 
signal and system Lecture 2
iqbal ahmad
 
Dft,fft,windowing
Abhishek Verma
 
Reinforcement learning
DongHyun Kwak
 
Lecture 15 DCT, Walsh and Hadamard Transform
VARUN KUMAR
 
DSP_FOEHU - Lec 03 - Sampling of Continuous Time Signals
Amr E. Mohamed
 
Ad

Similar to Andres hernandez ai_machine_learning_london_nov2017 (20)

PDF
Levenberg marquardt-algorithm-for-karachi-stock-exchange-share-rates-forecast...
Cemal Ardil
 
PPTX
An Introduction to Deep Learning
milad abbasi
 
PPTX
Introduction to Deep Learning
Mehrnaz Faraz
 
PPTX
Final Semester Project
Debraj Paul
 
PPTX
Hyperparameter Tuning
Jon Lederman
 
PDF
Lesson_8_DeepLearning.pdf
ssuser7f0b19
 
PPTX
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Simplilearn
 
PPTX
CST413 KTU S7 CSE Machine Learning Neural Networks and Support Vector Machine...
resming1
 
PPTX
Training DNN Models - II.pptx
PrabhuSelvaraj15
 
PPTX
Neural network basic and introduction of Deep learning
Tapas Majumdar
 
PDF
Deep learning
Rouyun Pan
 
PDF
Machine Learning for Trading
Larry Guo
 
PPTX
DeepLearningLecture.pptx
ssuserf07225
 
PDF
Poster_Reseau_Neurones_Journees_2013
Pedro Lopes
 
PDF
Sparse autoencoder
Devashish Patel
 
PDF
Reinforcement Learning with Human Feedback ODSC talk
Luis Serrano
 
PDF
Feedforward Networks and Deep Learning Module-02.pdf
roopashreesv
 
PPTX
Nimrita deep learning
Nimrita Koul
 
PDF
Artificial Neural Network for machine learning
2303oyxxxjdeepak
 
PPTX
Deep Learning Module 2A Training MLP.pptx
vipul6601
 
Levenberg marquardt-algorithm-for-karachi-stock-exchange-share-rates-forecast...
Cemal Ardil
 
An Introduction to Deep Learning
milad abbasi
 
Introduction to Deep Learning
Mehrnaz Faraz
 
Final Semester Project
Debraj Paul
 
Hyperparameter Tuning
Jon Lederman
 
Lesson_8_DeepLearning.pdf
ssuser7f0b19
 
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Simplilearn
 
CST413 KTU S7 CSE Machine Learning Neural Networks and Support Vector Machine...
resming1
 
Training DNN Models - II.pptx
PrabhuSelvaraj15
 
Neural network basic and introduction of Deep learning
Tapas Majumdar
 
Deep learning
Rouyun Pan
 
Machine Learning for Trading
Larry Guo
 
DeepLearningLecture.pptx
ssuserf07225
 
Poster_Reseau_Neurones_Journees_2013
Pedro Lopes
 
Sparse autoencoder
Devashish Patel
 
Reinforcement Learning with Human Feedback ODSC talk
Luis Serrano
 
Feedforward Networks and Deep Learning Module-02.pdf
roopashreesv
 
Nimrita deep learning
Nimrita Koul
 
Artificial Neural Network for machine learning
2303oyxxxjdeepak
 
Deep Learning Module 2A Training MLP.pptx
vipul6601
 
Ad

Recently uploaded (20)

PPTX
Principles of Management buisness sti.pptx
CarToonMaNia5
 
PPTX
Accounting for liabilities stockholderss
Adugna37
 
PPTX
Econometrics - Introduction and Fundamentals.pptx
skillcipetcsn
 
PDF
Top Hospital CEOs in Asia 2025 - by Hospital Asia Management Journal
Gorman Bain Capital
 
PPTX
办理加利福尼亚大学圣芭芭拉分校文凭|购买UCSB毕业证录取通知书学位证书
1cz3lou8
 
PPTX
Session 1 FTP 2023 25th June 25 TRADE FINANCE
NarinderKumarBhasin
 
PPTX
Avoid These Costly Blunders_ Critical Mistakes When Selecting CA Services in ...
Sachin Gujar & Associates
 
PPTX
US inequality along numerous dimensions
Gaetan Lion
 
PDF
European Exchange Report 2024 - FESE Statistics
secretariat4
 
PDF
Torex to Acquire Prime Mining - July 2025
Adnet Communications
 
PDF
Stormy Decade - A Ten-Year Retrospective on the Ukrainian Investment Landscape
Ukrainian Venture Capital and Private Equity Association
 
PDF
LM Curve Deri IS-LM Framework sess 10.pdf
mrigankjain19
 
PPTX
Consumer-healtsusususususjjsjsjsjsjsjsjsjs
mnadygrandy1
 
PDF
Enabling Strategic Clarity in a Complex MoR Landscape.pdf
Jasper Colin
 
PDF
SCB EIC expects CLMV outlook to face diverging risks amid global trade headwinds
SCBEICSCB
 
PDF
Mining Beneficiation as a Catalyst for Broad-Based Socio-Economic Empowerment...
Matthews Bantsijang
 
PPT
Time Value of Money_Fundamentals of Financial Management
nafisa791613
 
PDF
2025 Mid-year Budget Review_SPEECH_FINAL_23ndJuly2025_v5.pdf
JeorgeWilsonKingson1
 
PPTX
PUrposive-commmunicatuon112uospptxyynsns
yunaselle7
 
PPTX
ACA_OBEEast Coast of Kamchatka, 8.7M · 30 Jul 2025 08:52:50, Public · Exchang...
gshivakrishna3
 
Principles of Management buisness sti.pptx
CarToonMaNia5
 
Accounting for liabilities stockholderss
Adugna37
 
Econometrics - Introduction and Fundamentals.pptx
skillcipetcsn
 
Top Hospital CEOs in Asia 2025 - by Hospital Asia Management Journal
Gorman Bain Capital
 
办理加利福尼亚大学圣芭芭拉分校文凭|购买UCSB毕业证录取通知书学位证书
1cz3lou8
 
Session 1 FTP 2023 25th June 25 TRADE FINANCE
NarinderKumarBhasin
 
Avoid These Costly Blunders_ Critical Mistakes When Selecting CA Services in ...
Sachin Gujar & Associates
 
US inequality along numerous dimensions
Gaetan Lion
 
European Exchange Report 2024 - FESE Statistics
secretariat4
 
Torex to Acquire Prime Mining - July 2025
Adnet Communications
 
Stormy Decade - A Ten-Year Retrospective on the Ukrainian Investment Landscape
Ukrainian Venture Capital and Private Equity Association
 
LM Curve Deri IS-LM Framework sess 10.pdf
mrigankjain19
 
Consumer-healtsusususususjjsjsjsjsjsjsjsjs
mnadygrandy1
 
Enabling Strategic Clarity in a Complex MoR Landscape.pdf
Jasper Colin
 
SCB EIC expects CLMV outlook to face diverging risks amid global trade headwinds
SCBEICSCB
 
Mining Beneficiation as a Catalyst for Broad-Based Socio-Economic Empowerment...
Matthews Bantsijang
 
Time Value of Money_Fundamentals of Financial Management
nafisa791613
 
2025 Mid-year Budget Review_SPEECH_FINAL_23ndJuly2025_v5.pdf
JeorgeWilsonKingson1
 
PUrposive-commmunicatuon112uospptxyynsns
yunaselle7
 
ACA_OBEEast Coast of Kamchatka, 8.7M · 30 Jul 2025 08:52:50, Public · Exchang...
gshivakrishna3
 

Andres hernandez ai_machine_learning_london_nov2017

  • 1. Model Calibration with Neural Networks Andres Hernandez
  • 2. Motivation The point of this talk is to provide a method that will perform the calibration significantly faster regardless of the model, hence removing the calibration speed from a model’s practicality. As an added benefit, but not addressed here, neural networks, as they are fully differentiable, could provide model parameters sensi- tivities to market prices, informing when a model should be recali- brated While examples of calibrating a Hull-White model are used, they are not intended to showcase best practice in calibrating them or selecting the market instruments. 2
  • 3. Table of contents 1 Background Calibration Problem Example: Hull-White Neural Networks 2 Supervised Training Approach Training Neural Network Topology Results Generating Training Set 3 Unsupervised Training Approach Reinforcement Learning Neural networks training other neural networks 3
  • 5. Definition Model calibration is the process by which model parameters are ad- justed to ’best’ describe/fit known observations. For a given model M, an instrument’s theoretical quote is obtained Q(τ) = M(θ; τ, ϕ), where θ represents the model parameters, τ represents the identify- ing properties of the particular instrument, e.g. maturity, day-count convention, etc., and ϕ represents other exogenous factors used for pricing, e.g. interest rate curve. 5
  • 6. Definition The calibration problem consists then in finding the parameters θ, which best match a set of quotes θ = arg min θ∗∈S⊆Rn Cost ( θ∗ , {ˆQ}; {τ}, ϕ ) = Θ ( {ˆQ}; {τ}, ϕ ) , where {τ} is the set of instrument properties and {ˆQ} is the set of relevant market quotes {ˆQ} = {ˆQi|i = 1 . . . N}, {τ} = {τi|i = 1 . . . N} The cost can vary, but is usually some sort of weighted average of all the errors Cost ( θ∗ , {ˆQ}; {τ}, ϕ ) = N∑ i=1 wi(Q(τi) − ˆQ(τi))2 6
  • 7. Definition The calibration problem can be seen as a function with N inputs and n outputs Θ : RN → Rn It need not be everywhere smooth, and may in fact contain a few discontinuities, either in the function itself, or on its derivatives, but in general it is expected to be continuous and smooth almost everywhere. As N can often be quite large, this presents a good use case for a neural network. 7
  • 8. Hull-White Model As examples, the single-factor Hull-White model and two-factor model calibrated to 156 GBP ATM swaptions will be used drt = (θ(t) − αrt)dt + σdWt drt = (θ(t) + ut − αrt) dt + σ1dW1 t dut = −butdt + σ2dW2 t with dW1 t dW2 t = ρdt. All parameters, α, σ, σ1, σ2, and b are positive, and shared across all option maturities. ρ ∈ [−1, 1]. θ(t) is picked to replicate the current yield curve y(t). The related calibration problems are then (α, σ) = Θ1F ( {ˆQ}; {τ}, y(t) ) (α, σ1, σ2, b, ρ) = Θ2F ( {ˆQ}; {τ}, y(t) ) 8
  • 9. Artificial neural networks Artificial neural networks are a family of machine learning tech- niques, which are currently used in state-of-the-art solutions for im- age and speech recognition, and natural language processing. In general, artificial neural networks are an extension of regression aX + b aX2 + bX + c 1 1+exp(−a(X−b)) 9
  • 10. Neural Networks In neural networks, independent regression units are stacked together in layers, with layers stacked on top of each other 10
  • 12. Calibration through neural networks The calibration problem can been reduced to finding a neural net- work to approximate Θ. The problem is split into two: a training phase, which would normally be done offline, and the evaluation, which gives the model parameters for a given input Training phase: 1 Collect large training set of calibrated examples 2 Propose neural network 3 Train, validate, and test it Calibration of a model would then proceed simply by applying the previously trained Neural Network on the new input. 12
  • 13. Supervised Training If one is provided with a set of associated input and output samples, one can ’train’ the neural network’s to best be able to reproduce the desired output given the known inputs. The most common training method are variations of gradient de- scent. It consists of calculating the gradient, and moving along in the opposite direction. At each iteration, the current position is xm is updated so xm+1 = xm − γ∇F(xm), with γ called learning rate. What is used in practice is a form of stochastic gradient descent, where the parameters are not updated after calculating the gradient for all samples, but only for a random small subsample. 13
  • 14. Feed-forward neural network for 2-factor HW Input SWO 156×1 IR 44×1 Hidden Layer a1 = elu(W1 · BN(p) + b1) p 200 × 1 BN1 64 × 200 W1 1 64 × 1 b1 + DO1 + Hidden Layer (x8) ai = ai−1+ elu(Wi · BN(ai−1) + bi) BNi Wi 1 64 × 1 bi + DOi + Output Layer a10 = W10 · a9 + b10 DO10 5 × 64 W10 1 5 × 1 b1 + 14
  • 15. Hull-White 1-Factor: train from 01-2013 to 06-2014 Sample set created from historical examples from January 2013 to June 2014 Average Volatily Error 01-01-2013 01-07-2013 01-01-2014 01-07-2014 01-01-2015 01-07-2015 01-01-2016 4.27 % 5.60 % 6.93 % 8.26 % 9.59 % 10.92 % 12.25 % 13.58 % 14.91 % → Out of sampleIn sample ← Default Starting Point Historical Starting Point Feed-forward Neural Net 15
  • 16. Hull-White 1-Factor: train from 01-2013 to 06-2015 Average Volatily Error 01-01-2013 01-07-2013 01-01-2014 01-07-2014 01-01-2015 01-07-2015 01-01-2016 4.10 % 5.26 % 6.42 % 7.58 % 8.74 % 9.90 % 11.06 % 12.22 % 13.37 % 14.53 % → Out of sampleIn sample ← Default Starting Point Historical Starting Point Feed-forward Neural Net 16
  • 17. Cost Function on 01-07-2015 The historical point, lies on the trough. The default starting point (α = 0.1, σ = 0.01) starts up on the side. 17
  • 18. Hull-White 2-Factor Comparison of local optimizer against global optimizer 01-01-2013 01-07-2013 01-01-2014 01-07-2014 01-01-2015 01-07-2015 01-01-2016 3 % 4 % 5 % 6 % 7 % 8 % 9 % 10 % 11 % 12 % Average Volatility Error Local optimizer Global optimizer 18
  • 19. Hull-White 2-Factor - Global vs local optimizer 1.0 1.2 1.4 1.6 1.8 2.0 2.2 The above shows the plane defined by the global minimum, the local minimum, and the default starting point. 19
  • 20. Hull-White 2-Factor - retrained every 2 months To train, a 1-year rolling window is used. 01-01-2013 01-07-2013 01-01-2014 01-07-2014 01-01-2015 01-07-2015 01-01-2016 3.06 % 3.94 % 4.83 % 5.72 % 6.61 % 7.49 % 8.38 % 9.27 % 10.16 % → Out of sampleIn sample ← Average Volatility Error Simulated Annealing Neural Network 20
  • 21. Generating Training Set The large training set has not yet been discussed. Taking all histori- cal values and calibrating could be a possibility. However, the inverse of Θ is known, it is simply the regular valuation of the instruments under a given set of parameters {Q} = Θ−1 (α, σ; {τ}, y(t)) This means that we can generate new examples by simply generating random parameters α and σ. There are some complications, e.g. examples of y(t) also need to be generated, and the parameters and y(t) need to be correlated properly for it to be meaningful. 21
  • 22. Generating Training Set The intention is to collect historical examples, and imply some kind of statistical model from them, and then draw from that distribution. 1 Calibrate model for training history 2 Obtain errors for each instrument for each day 3 As parameters are positive, take logarithm on the historical values 4 Rescale yield curves, parameters, and errors to have zero mean and variance 1 5 Apply dimensional reduction via PCA to yield curve, and keep parameters for given explained variance (e.g. 99.5%) 22
  • 23. Generating Training Set - From normal distribution 6 Calculate covariance of rescaled log-parameters, PCA yield curve values, and errors 7 Generate random normally distributed vectors consistent with given covariance 8 Apply inverse transformations: rescale to original mean, variance, and dimensionality, and take exponential of parameters 9 Select reference date randomly 10 Obtain implied volatility for all swaptions, and apply random errors 23
  • 24. Generating Training Set - Variational autoencoder Variational autoen- coders learn a la- tent variable model that parametrizes a probability distribu- tion of the output contingent on the input. 24
  • 25. Normal distribution vs variational autoencoder (no retraining) 01-01-2013 01-07-2013 01-01-2014 01-07-2014 01-01-2015 01-07-2015 01-01-2016 3.70 % 5.22 % 6.75 % 8.27 % 9.80 % 11.33 % 12.85 % 14.38 % 15.90 % → Out of sampleIn sample ← Average Volatility Error Global Optimizer FNN with Normal Dist. FNN with VAE 25
  • 27. Bespoke optimizer But what about the case where one doesn’t have a long time-series? Reinforcement learning can be used to create better bespoke opti- mizers than the traditional local or global optimization procedures. 27
  • 28. Deep-q learning A common approach for reinforcement learning with a large possi- bility of actions and states is called Q-Learning: An agent’s behaviour is defined by a policy π, which maps states to a probability distribution over the actions π : S → P(A). The return Rt from an action is defined as the sum of discounted future rewards Rt = ∑T i=t γi−tr(si, ai). The quality of an action is the expected return of an action at in state st Qπ (at, st) = Eri≥t,si>t,ai>t [Rt|st, at] 28
  • 29. Learning to learn without gradient descent with gradient descent A long-short-term memory (LSTM) architecture was used to simplify represent the whole agent. The standard LSTM block is composed of several gates with an internal state: In the current case, 100 LSTM blocks were used per layer, and 3 layers were stacked on top of each other t 29
  • 31. Train the optimizer Train it with approximation of F(x), whose gradient is available Advantage: training proceeds fast Disadvantage: potentially will not reach full possibility Train it with non-gradient based optimizer Local optimizer: generally requires a number of evaluations ∼ to number of dimensions to take next step Global optimizer: very hard to set hyperparameters Train a second NN to train first NN 31
  • 32. Bespoke optimizer 01-01-2013 01-07-2013 01-01-2014 01-07-2014 01-01-2015 01-07-2015 2.67 % 3.85 % 5.03 % 6.21 % 7.39 % 8.57 % 9.74 % 10.92 % 12.10 % → Out of sampleIn sample ← Average Volatility Error Neural Network Global optimizer 32
  • 33. References https://github.com/Andres-Hernandez/CalibrationNN A. Hernandez, Model calibration with neural networks, Risk, June 2017 A. Hernandez, Model Calibration: Global Optimizer vs. Neural Network, SSRN abstract id=2996930 Y. Chen, et al Learning to Learn without Gradient Descent by Gradient Descent, arXiv:1611.03824 33
  • 34. Future work Calibration of local stochastic volatility model. Work is being undertaken in collaboration with Professors J. Teichmann from the ETH Zürich, and C. Cuchiero in University of Wien, and W. Khosrawi-Sardroudi from the University of Freiburg. Improvement of bespoke optimizers, in particular train with more random environment: different currencies, constituents, etc. Use of bespoke optimizer as large-dimensional PDE solver 34
  • 35. ®2017 PricewaterhouseCoopers GmbH Wirtschaftsprüfungsgesellschaft. All rights reserved. In this document, “PwC” refers to PricewaterhouseCoopers GmbH Wirtschaftsprüfungsgesellschaft, which is a member firm of PricewaterhouseCoopers International Limited (PwCIL). Each member firm of PwCIL is a separate and independent legal entity.