Using machine learning results for calibration of stochastic volatility models.
Blanka Horvath
Department of Mathematics King’s College London
7CCMFM18 Machine Learning Lecture 9th March, 2020
Original paper available on SSRN:3322085
Contents of this lecture
B. Horvath Deep Learning Volatility
Contents of this lecture
How to calibrate stochastic volatility models (including rough volatility models) using Neural Networks
B. Horvath Deep Learning Volatility
Contents of this lecture
How to calibrate stochastic volatility models (including rough volatility models) using Neural Networks
Hands-on example with Python code
B. Horvath Deep Learning Volatility
Contents of this lecture
How to calibrate stochastic volatility models (including rough volatility models) using Neural Networks
Hands-on example with Python code
We address the issue of Black-Box functions in Finance and out-of-sample
performance
B. Horvath Deep Learning Volatility
Contents of this lecture
How to calibrate stochastic volatility models (including rough volatility models) using Neural Networks
Hands-on example with Python code
We address the issue of Black-Box functions in Finance and out-of-sample
performance
Deep Learning as approximation not as a (Data Driven) model itself
B. Horvath Deep Learning Volatility
Contents of this lecture
How to calibrate stochastic volatility models (including rough volatility models) using Neural Networks
Hands-on example with Python code
We address the issue of Black-Box functions in Finance and out-of-sample
performance
Deep Learning as approximation not as a (Data Driven) model itself
Typical problems in Math Finance:
B. Horvath Deep Learning Volatility
Contents of this lecture
How to calibrate stochastic volatility models (including rough volatility models) using Neural Networks
Hands-on example with Python code
We address the issue of Black-Box functions in Finance and out-of-sample
performance
Deep Learning as approximation not as a (Data Driven) model itself
Typical problems in Math Finance:
Curse of dimensionality
B. Horvath Deep Learning Volatility
Contents of this lecture
How to calibrate stochastic volatility models (including rough volatility models) using Neural Networks
Hands-on example with Python code
We address the issue of Black-Box functions in Finance and out-of-sample
performance
Deep Learning as approximation not as a (Data Driven) model itself
Typical problems in Math Finance:
Curse of dimensionality
Monte Carlo pricing too slow for calibration
B. Horvath Deep Learning Volatility
Contents of this lecture
How to calibrate stochastic volatility models (including rough volatility models) using Neural Networks
Hands-on example with Python code
We address the issue of Black-Box functions in Finance and out-of-sample
performance
Deep Learning as approximation not as a (Data Driven) model itself
Typical problems in Math Finance:
Curse of dimensionality
Monte Carlo pricing too slow for calibration XVA adds even more dimension
B. Horvath Deep Learning Volatility
Contents of this lecture
How to calibrate stochastic volatility models (including rough volatility models) using Neural Networks
Hands-on example with Python code
We address the issue of Black-Box functions in Finance and out-of-sample
performance
Deep Learning as approximation not as a (Data Driven) model itself
Typical problems in Math Finance:
Curse of dimensionality
Monte Carlo pricing too slow for calibration XVA adds even more dimension
What ML could do for us:
B. Horvath Deep Learning Volatility
Contents of this lecture
How to calibrate stochastic volatility models (including rough volatility models) using Neural Networks
Hands-on example with Python code
We address the issue of Black-Box functions in Finance and out-of-sample
performance
Deep Learning as approximation not as a (Data Driven) model itself
Typical problems in Math Finance:
Curse of dimensionality
Monte Carlo pricing too slow for calibration XVA adds even more dimension
What ML could do for us:
Speed up pricing
B. Horvath Deep Learning Volatility
Contents of this lecture
How to calibrate stochastic volatility models (including rough volatility models) using Neural Networks
Hands-on example with Python code
We address the issue of Black-Box functions in Finance and out-of-sample
performance
Deep Learning as approximation not as a (Data Driven) model itself
Typical problems in Math Finance:
Curse of dimensionality
Monte Carlo pricing too slow for calibration XVA adds even more dimension
What ML could do for us:
Speed up pricing
Allow us to calibrate any stochastic volatility as fast as SABR (especially rough
volatility models)
B. Horvath Deep Learning Volatility
Review of Model Calibration
B. Horvath Deep Learning Volatility
Review of Model Calibration
M := M(θ)θ∈Θ a financial model with parameters θ in the set Θ ⊂ Rn, n ∈ N
B. Horvath Deep Learning Volatility
Review of Model Calibration
M := M(θ)θ∈Θ a financial model with parameters θ in the set Θ ⊂ Rn, n ∈ N P:M(θ,ζ)→X⊂Rp pricingmap,andζisthepayoff
B. Horvath Deep Learning Volatility
Review of Model Calibration
M := M(θ)θ∈Θ a financial model with parameters θ in the set Θ ⊂ Rn, n ∈ N P:M(θ,ζ)→X⊂Rp pricingmap,andζisthepayoff
θˆ = argmin L(P(M(θ), ζ), PMKT (ζ)) (1) θ∈Θ
B. Horvath Deep Learning Volatility
Review of Model Calibration
M := M(θ)θ∈Θ a financial model with parameters θ in the set Θ ⊂ Rn, n ∈ N P:M(θ,ζ)→X⊂Rp pricingmap,andζisthepayoff
θˆ = argmin L(P(M(θ), ζ), PMKT (ζ)) (1) θ∈Θ
θˆ is the parameter combination that makes the model closer to the market
B. Horvath Deep Learning Volatility
Review of Model Calibration
M := M(θ)θ∈Θ a financial model with parameters θ in the set Θ ⊂ Rn, n ∈ N P:M(θ,ζ)→X⊂Rp pricingmap,andζisthepayoff
θˆ = argmin L(P(M(θ), ζ), PMKT (ζ)) (1) θ∈Θ
θˆ is the parameter combination that makes the model closer to the market
Practice: True price P(·) unknown, replace by numerical approximation P(·) P ≈ P
B. Horvath Deep Learning Volatility
UsingP≈Psolve
Model Calibration in Practice
θˆ = argmin L(P(M(θ), ζ), PMKT (ζ)) (2) θ∈Θ
B. Horvath Deep Learning Volatility
UsingP≈Psolve
Remarks:
θˆ = argmin L(P(M(θ), ζ), PMKT (ζ)) (2) θ∈Θ
P is not exact e.g. MC error
Model Calibration in Practice
B. Horvath Deep Learning Volatility
UsingP≈Psolve
Remarks:
θˆ = argmin L(P(M(θ), ζ), PMKT (ζ)) (2) θ∈Θ
Model Calibration in Practice
P is not exact e.g. MC error
One famous example of fast approximation formula, the Hagan et al. SABR expansion has been widely used
B. Horvath Deep Learning Volatility
UsingP≈Psolve
Remarks:
θˆ = argmin L(P(M(θ), ζ), PMKT (ζ)) (2) θ∈Θ
Model Calibration in Practice
P is not exact e.g. MC error
One famous example of fast approximation formula, the Hagan et al. SABR
expansion has been widely used
Natural idea:
B. Horvath Deep Learning Volatility
UsingP≈Psolve
Remarks:
θˆ = argmin L(P(M(θ), ζ), PMKT (ζ)) (2) θ∈Θ
Model Calibration in Practice
P is not exact e.g. MC error
One famous example of fast approximation formula, the Hagan et al. SABR
expansion has been widely used
Natural idea:
Approximate P via Neural Networks PNN instead
B. Horvath Deep Learning Volatility
UsingP≈Psolve
Remarks:
θˆ = argmin L(P(M(θ), ζ), PMKT (ζ)) (2) θ∈Θ
Model Calibration in Practice
P is not exact e.g. MC error
One famous example of fast approximation formula, the Hagan et al. SABR
expansion has been widely used
Natural idea:
Approximate P via Neural Networks PNN instead
Interpretation of NN same as the underlying model as long as P ≈ PNN
B. Horvath Deep Learning Volatility
UsingP≈Psolve
Remarks:
θˆ = argmin L(P(M(θ), ζ), PMKT (ζ)) (2) θ∈Θ
Model Calibration in Practice
P is not exact e.g. MC error
One famous example of fast approximation formula, the Hagan et al. SABR
expansion has been widely used
Natural idea:
Approximate P via Neural Networks PNN instead
Interpretation of NN same as the underlying model as long as P ≈ PNN
No Black-Box, we can always check the output of NN with Monte Carlo or PDE
B. Horvath Deep Learning Volatility
UsingP≈Psolve
Remarks:
θˆ = argmin L(P(M(θ), ζ), PMKT (ζ)) (2) θ∈Θ
Model Calibration in Practice
P is not exact e.g. MC error
One famous example of fast approximation formula, the Hagan et al. SABR
expansion has been widely used
Natural idea:
Approximate P via Neural Networks PNN instead
Interpretation of NN same as the underlying model as long as P ≈ PNN
No Black-Box, we can always check the output of NN with Monte Carlo or PDE No grey area! All parameter combinations can be checked by model validation
B. Horvath Deep Learning Volatility
Road Map
Generate Data from a Model
Step 1: NN training
Train Neural Network to learn pricing map
Step 2: Model calibration
Apply standard optimisers (e.g. python) to Neural Network approximation to calibrate model to Market Data
B. Horvath Deep Learning Volatility
Final result: calibration to historical data
B. Horvath Deep Learning Volatility
Now let’s get our hands dirty..
B. Horvath Deep Learning Volatility
A NN Perspective on Pricing and (ε -) Calibration
Let M denote a model (BS, Heston, SABR, Bergomi…)
Θ the set of all possible parameter combinations θ ∈ Θ in this model
P(M(θ)) = true no arbitrage price of an option for the chosen parameter combination θ
for example if M=Black Scholes, then Θ = {σ > 0} and for any σ > 0: P(M(θ)) = P(BS(σ)) is given by the Black-Scholes formula
For other models P is usually numerically approximated: P(M(θ)) = P(M(θ)) + O(ε)
B. Horvath Deep Learning Volatility
A NN Perspective on Pricing and (ε -) Calibration
Given a small ε > 0, a given neural network f is said to approximate up to original ε precision the numerical approximation P of the true option price P if for any parameter combination θ ∈ Θ of a stochastic model M(Θ)
f (θ) = P(M(θ)) + O(ε) whenever P(M(θ)) = P(M(θ)) + O(ε), then
(3)
f(θ)=P(M(θ))+O(ε) forany θ∈Θ.
That is, the network approximation f of the true option price P remains within the approximation precision of the numerical pricer P.
In this case the calibration via NN is as accurate as with traditional numerical methods.
B. Horvath Deep Learning Volatility
Advantages of Neural Network Pricing
New horizons for accurate and consistent arbitrage free pricing:
By learning the pricing map from model parameters to expected payoffs we relocate the time-consuming numerical simulation and evaluation procedure into an off-line pre-preocessing. Even rough volatility models can be calibrated accurately within milliseconds.
B. Horvath Deep Learning Volatility
Digression: Rough Volatility
Suppose a generic Itˆo process framework for the stock price (St)t≥0: dSt =μtdt+σtdBt, t≥0.
St
The phrase ”rough volatility” refers to the idea that sample paths of the log volatility log(σt ), t ≥ 0 are rougher than the sample paths of Brownian motion.
B. Horvath Deep Learning Volatility
Rough volatility models have been around since October 2014 (see the Rough Volatility website for a chronicle of developments)
These models have repeatedly proven to be superior to standard models in many areas: in volatility forecasting, in option pricing, close fits to the implied vol surface, . . .
B. Horvath Deep Learning Volatility
Rough volatility models have been around since October 2014 (see the Rough Volatility website for a chronicle of developments)
These models have repeatedly proven to be superior to standard models in many areas: in volatility forecasting, in option pricing, close fits to the implied vol surface, . . .
Relaxing the assumption of independence of volatility increments was crucial for the superior performance of rough volatility models ⇒ but: several standard pricing methods no longer available & naive Monte Carlo methods slow
Calibration time has been a bottleneck for rough volatility several advances have been made to speed up the calibration process [BLP ’15, MP ’17, HJM ’17].
B. Horvath Deep Learning Volatility
B. Horvath Deep Learning Volatility
Rough Volatility
Gatheral, Jaisson and Rosenbaum (2014) suggested that volatility is rough. The slogan ”volatility is rough” refers to the idea that sample paths of the log volatility log(σt ), t ≥ 0 are rougher than the sample paths of Brownian motion (in terms of H ̈older regularity).
Fractional Brownian motion
A fractional Brownian motion with Hurst parameter H ∈ (0, 1) is a continuous centered Gaussian process (BtH)t∈R with covariance function
Cov(BtH,BsH)=1|t|2H+|s|2H−|t−s|2H, s,t∈R. (4) 2
3 2 1 0 1 2 3
4 0.0
Simulated paths of fBM with H =0.25
0.5 1.0 1.5 2.0
Time
B. Horvath
Deep Learning Volatility
fBM
Examples of models presented in this lecture
The Rough Bergomi model
dXt =−1Vtdt+VtdWt, fort>0, X0 =0, 2
√t
(5)
(t−s)H−1/2dZs , fort>0, V0 =v0 >0
where H ∈ (0, 1) a parameter the Hurst parameter, ν > 0,
E(·) the (Wick) stochastic exponential,
ξ0(·) > 0 denotes the initial variance and
W and Z are standard Brownian motions with correlation parameter ρ ∈ [−1, 1].
Big challenge to calibrate in real time
Vt =ξ0(t)E 2Hν
0
B. Horvath Deep Learning Volatility
And a more classical example: 1 Factor Bergomi
dXt =−1Vtdt+VtdWt fort>0, X0 =0 2
t
Vt =ξ0(t)E η exp(−β(t−s))dZs fort>0, V0 =v0 >0,
0
(6)
where ν > 0, and W and Z are correlated standard Brownian motions with correlation parameter ρ ∈ [−1, 1].
One of the most popular and well understood models for equities Typically SABR type approximations are used in the calibration
B. Horvath Deep Learning Volatility
Step 1: Generating Data from a Model
Pick your favourite pricing scheme: Finite Differences, Monte-Carlo, Finite Elements etc.
In our example we focus on Monte Carlo
Compute Implied Volatilities for
strikes={50%,60%,70%,80%,90%,100%,110%,120%,130%,140%,150%}
Strikes as percentage of Spot
maturities={0.1,0.3,0.6,0.9,1.2,1.5,1.8,2.0}
Train Set: 34.000 and Test Set: 6.000
Rough Bergomi
sample:(ξ0, ν, ρ, H) ∈ U[0.01, 0.16] × U[0.5, 4.0] × U[−0.95, −0.1] × U[0.025, 0.5]
Note that each sample is 8 × 11 grid
B. Horvath Deep Learning Volatility
Monte Carlo Error analysis
Figure 1: rBergomi MC 95% confidence intervals
Benchmark errors in our sampling procedure
We would like to see NN errors to be smaller than these
B. Horvath Deep Learning Volatility
Data normalization
Standard practice in ML to normalize data
We have upper and lower bounds for model parameters
θ → scale(θ) ∈ [−1, 1] 2θi −(θi −θi )
scale(θi)= max θi −θi
min , i∈|Θ|
Standardise Implied volatilities
max min
σi,j − E[σi,j ]
scale(σi,j ) = BS BS , i ∈ Maturities, j ∈ Strikes
BS std(σi,j ) BS
Motivation: Simplifies learning problem for NN (no need to learn the magnitude of each parameter)
B. Horvath Deep Learning Volatility
Review of the Feedforward Neural Network used
Definition (Neural network)
Let L ∈ N and the tuple (N1, N2 · · · , NL) ∈ NL denote the number of layers (depth) and the number of nodes (neurons) on each layer respectively. Furthermore, we introduce the affine functions
wl : RNl −→RNl+1 for1≤l≤L−1 x → Al+1x + bl+1
(7) acting between layers for some Al+1 ∈ RNl+1×Nl . The vector bl+1 ∈ RNl+1 denotes the bias term and
each entry Al+1 denotes the weight connecting node i ∈ Nl of layer l with node j ∈ Nl+1 of layer l + 1. (i,j)
For the the collection of affine functions on each layer we fix the notation w = (w1,…,wL). We call the tuple w the network weights for any such collection of affine functions. Then a Neural Network
F (w , ·) : RN0 → RNL is defined as the composition:
F :=FL ◦···◦F1 (8)
whereeachcomponentisoftheformFl :=σl ◦Wl. Thefunctionσl :R→Risreferredtoasthe activation function. It is typically nonlinear and applied component wise on the outputs of the affine function W l .
B. Horvath Deep Learning Volatility
NN diagram
B. Horvath Deep Learning Volatility
Weights
x1 ai,1
Bias
b
NN zoom on each node
Node inputs x2
x3 ai,3
j=1
3
ai,2 bi +ai,jxj
Activation function
Output
y
σELU
B. Horvath Deep Learning Volatility
Convergence results for Neural Networks
Theorem (Universal approximation theorem (Hornik, Stinchcombe and White))
Let NNσd0,d1 be the set of neural networks with activation function σ : R → R, input dimension d0 ∈ N and output dimension d1 ∈ N. Then, if σ is continuous and non-constant, N N σd0 ,1 is dense in Lp (μ) for all finite measures μ.
B. Horvath Deep Learning Volatility
Convergence results for Neural Networks
Theorem (Universal approximation theorem (Hornik, Stinchcombe and White))
Let NNσd0,d1 be the set of neural networks with activation function σ : R → R, input dimension d0 ∈ N and output dimension d1 ∈ N. Then, if σ is continuous and non-constant, N N σd0 ,1 is dense in Lp (μ) for all finite measures μ.
Theorem (Universal approximation theorem for derivatives (Hornik, Stinchcombe and White))
Let F∗ ∈ Cn and F : Rd0 → R and NNσd0,1 be the set of single-layer neural networks with activation function σ : R → R, input dimension d0 ∈ N and output dimension 1. Then, if the (non-constant) activation function is σ ∈ Cn(R), then NNσd0,1 arbitrarily approximates f and all its derivatives up to order n.
Note that one should use at least C1 activation functions to approximate derivatives
B. Horvath Deep Learning Volatility
Training Neural Networks
Definition (Training and test sets)
Let F ∗ : RN0 → RNL be a generic function. Then we say that the labelled set
Xtrain = {xi,F∗(xi)}i=1,…,M is a training set corresponding to F∗. A part of the training set Xtest is set aside to test the network on unseen data.
B. Horvath Deep Learning Volatility
Training Neural Networks
Definition (Training and test sets)
Let F ∗ : RN0 → RNL be a generic function. Then we say that the labelled set
Xtrain = {xi,F∗(xi)}i=1,…,M is a training set corresponding to F∗. A part of the training set Xtest is set aside to test the network on unseen data.
Definition (Neural Network Training/Calibration)
Let F ∗ : RN0 → RNL be a generic function that is only available though a set of input-output pairs, and let Xtrain = {xi,F∗(xi)}i=1,…,M denote its corresponding training set. Let
F (w , ·) : RN0 → RNL be a neural network and w ∈ Ω, where Ω denotes the set of parameters that characterises the network. Then, Training F is done solving
wˆ = argminL{F(w,xi}Mi=1,{F∗(xi)}Mi=1 (9) w∈Ω
for a given training set Xtrain and where L is a loss function, corresponding to an objective function of our choice.
B. Horvath Deep Learning Volatility
Stochastic Gradient Descent
Definition (Gradient Descent Algorithm)
A Gradient Descent algorithm (GD algorithm) associated to the objective function L, corresponds to the iterative update rule
w ∈Ω and w =w −α(∇wL){F(w,x}M ,{F∗(x)}M α>0. (10) 0 n n−1 ii=1 ii=1
B. Horvath Deep Learning Volatility
Stochastic Gradient Descent
Definition (Gradient Descent Algorithm)
A Gradient Descent algorithm (GD algorithm) associated to the objective function L, corresponds to the iterative update rule
w ∈Ω and w =w −α(∇wL){F(w,x}M ,{F∗(x)}M α>0. (10) 0 n n−1 ii=1 ii=1
The standard practice is to perform Stochastic Gradient Descent (SGD) or modifications to it like ADAM
B. Horvath Deep Learning Volatility
Stochastic Gradient Descent
Definition (Gradient Descent Algorithm)
A Gradient Descent algorithm (GD algorithm) associated to the objective function L, corresponds to the iterative update rule
w ∈Ω and w =w −α(∇wL){F(w,x}M ,{F∗(x)}M α>0. (10) 0 n n−1 ii=1 ii=1
The standard practice is to perform Stochastic Gradient Descent (SGD) or modifications to it like ADAM
Faster iterations by reducing the number of data points used to compute the gradient ∇w .
B. Horvath Deep Learning Volatility
Stochastic Gradient Descent
Definition (Gradient Descent Algorithm)
A Gradient Descent algorithm (GD algorithm) associated to the objective function L, corresponds to the iterative update rule
w ∈Ω and w =w −α(∇wL){F(w,x}M ,{F∗(x)}M α>0. (10) 0 n n−1 ii=1 ii=1
The standard practice is to perform Stochastic Gradient Descent (SGD) or modifications to it like ADAM
Faster iterations by reducing the number of data points used to compute the gradient ∇w .
Random sampling from the training set one prevents overfitting to singular data points.
B. Horvath Deep Learning Volatility
Stochastic Gradient Descent
Definition (Gradient Descent Algorithm)
A Gradient Descent algorithm (GD algorithm) associated to the objective function L, corresponds to the iterative update rule
w ∈Ω and w =w −α(∇wL){F(w,x}M ,{F∗(x)}M α>0. (10) 0 n n−1 ii=1 ii=1
The standard practice is to perform Stochastic Gradient Descent (SGD) or modifications to it like ADAM
Faster iterations by reducing the number of data points used to compute the gradient ∇w .
Random sampling from the training set one prevents overfitting to singular data points.
Bypasses local minima through random sampling.
B. Horvath Deep Learning Volatility
Pointwise learning
Learn the map P(θ,T,k) = σM(θ)(T,k) via neural network BS
F(θ,T,k) := F(θ,zˆ,T,k) where
F∗ :Θ×[0,Tmax]×[kmin,kmax]−→R (11)
(θ,T,k) → F∗(θ,T,k) Learn the map F ∗(θ) = {σM(θ)(Ti , kj )}n, m via neural network
Image-based learning
F(θ) := F(θ,wˆ) where
BS i=1, j=1
F∗ :Θ−→Rn×m (12) θ → F∗(θ)
B. Horvath Deep Learning Volatility
Context
Learn with the right amount of context
Figure 2: Left: Context of pixels in image-recognition problems. Right: Creating context in financial problems. Neighbouring points for different strikes help create context and reduce problem complexity.
B. Horvath Deep Learning Volatility
Advantages of Image-based learning
B. Horvath Deep Learning Volatility
Advantages of Image-based learning
Reduced problem complexity. We don’t need to learn the strike and maturity dimension!
B. Horvath Deep Learning Volatility
Advantages of Image-based learning
Reduced problem complexity. We don’t need to learn the strike and maturity dimension!
In the literature, pointwise learning yields layers with 4000 neurons i.e. Huge networks
B. Horvath Deep Learning Volatility
Advantages of Image-based learning
Reduced problem complexity. We don’t need to learn the strike and maturity dimension!
In the literature, pointwise learning yields layers with 4000 neurons i.e. Huge networks
Image Based learning works with only 30 neurons per layer.
B. Horvath Deep Learning Volatility
Advantages of Image-based learning
Reduced problem complexity. We don’t need to learn the strike and maturity dimension!
In the literature, pointwise learning yields layers with 4000 neurons i.e. Huge networks
Image Based learning works with only 30 neurons per layer.
Similar to image recognition, we exploit neighbouring points and data structure.
B. Horvath Deep Learning Volatility
Advantages of Image-based learning
Reduced problem complexity. We don’t need to learn the strike and maturity dimension!
In the literature, pointwise learning yields layers with 4000 neurons i.e. Huge networks
Image Based learning works with only 30 neurons per layer.
Similar to image recognition, we exploit neighbouring points and data structure.
Sampling does not depend on strikes and maturity, only on model parameters.
B. Horvath Deep Learning Volatility
Advantages of Image-based learning
Reduced problem complexity. We don’t need to learn the strike and maturity dimension!
In the literature, pointwise learning yields layers with 4000 neurons i.e. Huge networks
Image Based learning works with only 30 neurons per layer.
Similar to image recognition, we exploit neighbouring points and data structure.
Sampling does not depend on strikes and maturity, only on model parameters.
Main advantage is the structure of data
B. Horvath Deep Learning Volatility
Advantages of Image-based learning
Reduced problem complexity. We don’t need to learn the strike and maturity dimension!
In the literature, pointwise learning yields layers with 4000 neurons i.e. Huge networks
Image Based learning works with only 30 neurons per layer.
Similar to image recognition, we exploit neighbouring points and data structure.
Sampling does not depend on strikes and maturity, only on model parameters.
Main advantage is the structure of data
Implied volatility surface contains much interconnected information: ATM, Skew, curvature, etc.
B. Horvath Deep Learning Volatility
Advantages of Image-based learning
Reduced problem complexity. We don’t need to learn the strike and maturity dimension!
In the literature, pointwise learning yields layers with 4000 neurons i.e. Huge networks
Image Based learning works with only 30 neurons per layer.
Similar to image recognition, we exploit neighbouring points and data structure.
Sampling does not depend on strikes and maturity, only on model parameters.
Main advantage is the structure of data
Implied volatility surface contains much interconnected information: ATM, Skew, curvature, etc.
Much easier to pick up this on a surface than a single point
B. Horvath Deep Learning Volatility
Speed benchmark
MC Pricing 1F Bergomi Full Surface
MC Pricing rBergomi Full Surface
NN Pricing Full Surface
NN Gradient Full Surface
Speed up NN vs. MC
Flat forward variance
300.000 μs
500.000 μs
14, 3 μs
47 μs
21.000 − 35.000
Piecewise constant forward variancce
300.000 μs
500.000 μs
30, 9 μs
113 μs
9.000 − 16.000
Table 1: Computational time of pricing map (entire implied volatility surface) and gradients via Neural Network approximation and Monte Carlo (MC)
Indeed we achieve a remarkable speed up!
B. Horvath Deep Learning Volatility
Step 1 works well and we have P ≈ PNN
We still need to calibrate the model (Step 2:)
θˆ := argmin L(PNN (θ), σMKT ).
θ∈Θ We choose L(·, ·) to be L2 error i.e.
nm
(F(θ)
i=1 j=1
BS
θˆ := argmin θ∈Θ
− σMKT (T , k ))2. ij BS i j
So far…
Note that parameter stability in time largely depends on the choice of L(·, ·)
B. Horvath Deep Learning Volatility
Choices of optimisation algorithms
Global Solution
Gradient-based
Depends on problem
Gradient-free
Convergence Speed
Very Fast
Slow
Always
Smooth activation function needed
Yes to apply Theorem 3
No
Accurate gradient approximation needed
Yes
No
Pricing so fast that we can use either method!!
B. Horvath Deep Learning Volatility
Online Calibration Times
Calibration within milliseconds
Open the door to genetic algorithms to ensure global optima
B. Horvath Deep Learning Volatility
Extension to term structures
Many financial models take curves as input e.g. interest rates or forward variances
We can add the curve to the model parameter space via parametrisation e.g. piecewise constant
(ξ0,ν,ρ,H) (ξ0,,ξ1,…,ξn,ν,ρ,H)
You can find examples on GitHub:NN-StochVol-Calibrations
B. Horvath Deep Learning Volatility
Further applications: Model recognition
We see that after learning, calibrating many parameters is fast
B. Horvath Deep Learning Volatility
Further applications: Model recognition
We see that after learning, calibrating many parameters is fast ⇒ approximate several models at the same time.
B. Horvath Deep Learning Volatility
Further applications: Model recognition
We see that after learning, calibrating many parameters is fast ⇒ approximate several models at the same time.
New learning procedure:
Train the generator on several models at the same time (here Heston and rBergomi) in Monte Carlo experiments as before: New mixture parameter a
a×Heston+(1−a)×rBergomi
Calibrate mixture models ⇒ determine the best-fit mixture of model the two
models to a given data.
Controlled experiments: train on both Bergomi and Heston ⇒ test on data generated by Heston.
B. Horvath Deep Learning Volatility
Further applications: Model recognition
B. Horvath Deep Learning Volatility