程序代写代做 algorithm deep learning finance data structure Using machine learning results for calibration of stochastic volatility models.

Using machine learning results for calibration of stochastic volatility models.
Blanka Horvath
Department of Mathematics King’s College London
7CCMFM18 Machine Learning Lecture 9th March, 2020
Original paper available on SSRN:3322085

Contents of this lecture
B. Horvath Deep Learning Volatility

Contents of this lecture
􏰢 How to calibrate stochastic volatility models (including rough volatility models) using Neural Networks
B. Horvath Deep Learning Volatility

Contents of this lecture
􏰢 How to calibrate stochastic volatility models (including rough volatility models) using Neural Networks
􏰢 Hands-on example with Python code
B. Horvath Deep Learning Volatility

Contents of this lecture
􏰢 How to calibrate stochastic volatility models (including rough volatility models) using Neural Networks
􏰢 Hands-on example with Python code
􏰢 We address the issue of Black-Box functions in Finance and out-of-sample
performance
􏰢 Deep Learning as approximation not as a (Data Driven) model itself
Typical problems in Math Finance:
􏰢 Curse of dimensionality
􏰢 Monte Carlo pricing too slow for calibration
B. Horvath Deep Learning Volatility

Review of Model Calibration
B. Horvath Deep Learning Volatility

Review of Model Calibration
􏰢 M := M(θ)θ∈Θ a financial model with parameters θ in the set Θ ⊂ Rn, n ∈ N
B. Horvath Deep Learning Volatility

Review of Model Calibration
􏰢 M := M(θ)θ∈Θ a financial model with parameters θ in the set Θ ⊂ Rn, n ∈ N 􏰢 P:M(θ,ζ)→X⊂Rp pricingmap,andζisthepayoff
B. Horvath Deep Learning Volatility

Review of Model Calibration
􏰢 M := M(θ)θ∈Θ a financial model with parameters θ in the set Θ ⊂ Rn, n ∈ N 􏰢 P:M(θ,ζ)→X⊂Rp pricingmap,andζisthepayoff
θˆ = argmin L(P(M(θ), ζ), PMKT (ζ)) (1) θ∈Θ
θˆ is the parameter combination that makes the model closer to the market
B. Horvath Deep Learning Volatility

􏰢 UsingP≈P􏰆solve
Model Calibration in Practice
θˆ = argmin L(P􏰆(M(θ), ζ), PMKT (ζ)) (2) θ∈Θ
B. Horvath Deep Learning Volatility

􏰢 UsingP≈P􏰆solve
Remarks:
θˆ = argmin L(P􏰆(M(θ), ζ), PMKT (ζ)) (2) θ∈Θ
􏰢 P􏰆 is not exact e.g. MC error
Model Calibration in Practice
B. Horvath Deep Learning Volatility

􏰢 UsingP≈P􏰆solve
Remarks:
θˆ = argmin L(P􏰆(M(θ), ζ), PMKT (ζ)) (2) θ∈Θ
Model Calibration in Practice
􏰢 P􏰆 is not exact e.g. MC error
􏰢 One famous example of fast approximation formula, the Hagan et al. SABR
expansion has been widely used
Natural idea:
􏰢 Approximate P􏰆 via Neural Networks P􏰆NN instead
􏰢 Interpretation of NN same as the underlying model as long as P ≈ P􏰆NN
B. Horvath Deep Learning Volatility

Road Map
Generate Data from a Model
Step 1: NN training
Train Neural Network to learn pricing map
Step 2: Model calibration
Apply standard optimisers (e.g. python) to Neural Network approximation to calibrate model to Market Data
B. Horvath Deep Learning Volatility

Final result: calibration to historical data
B. Horvath Deep Learning Volatility

Now let’s get our hands dirty..
B. Horvath Deep Learning Volatility

A NN Perspective on Pricing and (ε -) Calibration
􏰢 Let M denote a model (BS, Heston, SABR, Bergomi…)
􏰢 Θ the set of all possible parameter combinations θ ∈ Θ in this model
􏰢 P(M(θ)) = true no arbitrage price of an option for the chosen parameter combination θ
for example if M=Black Scholes, then Θ = {σ > 0} and for any σ > 0: P(M(θ)) = P(BS(σ)) is given by the Black-Scholes formula
􏰢 For other models P is usually numerically approximated: P(M(θ)) = P􏰆(M(θ)) + O(ε)
B. Horvath Deep Learning Volatility

A NN Perspective on Pricing and (ε -) Calibration
Given a small ε > 0, a given neural network f􏰆 is said to approximate up to original ε precision the numerical approximation P􏰆 of the true option price P if for any parameter combination θ ∈ Θ of a stochastic model M(Θ)
f (θ) = P(M(θ)) + O(ε) whenever P(M(θ)) = P(M(θ)) + O(ε), then
(3)
f(θ)=P(M(θ))+O(ε) forany θ∈Θ.
That is, the network approximation f􏰆 of the true option price P remains within the approximation precision of the numerical pricer P􏰆.
In this case the calibration via NN is as accurate as with traditional numerical methods.
􏰆􏰆􏰆
􏰆
B. Horvath Deep Learning Volatility

Advantages of Neural Network Pricing
New horizons for accurate and consistent arbitrage free pricing:
􏰢 By learning the pricing map from model parameters to expected payoffs we relocate the time-consuming numerical simulation and evaluation procedure into an off-line pre-preocessing. Even rough volatility models can be calibrated accurately within milliseconds.
B. Horvath Deep Learning Volatility

Digression: Rough Volatility
Suppose a generic Itˆo process framework for the stock price (St)t≥0: dSt =μtdt+σtdBt, t≥0.
St
The phrase ”rough volatility” refers to the idea that sample paths of the log volatility log(σt ), t ≥ 0 are rougher than the sample paths of Brownian motion.
B. Horvath Deep Learning Volatility

􏰢 Rough volatility models have been around since October 2014 (see the Rough Volatility website for a chronicle of developments)
􏰢 These models have repeatedly proven to be superior to standard models in many areas: in volatility forecasting, in option pricing, close fits to the implied vol surface, . . .
􏰢 Relaxing the assumption of independence of volatility increments was crucial for the superior performance of rough volatility models ⇒ but: several standard pricing methods no longer available & naive Monte Carlo methods slow
􏰢 Calibration time has been a bottleneck for rough volatility several advances have been made to speed up the calibration process [BLP ’15, MP ’17, HJM ’17].
B. Horvath Deep Learning Volatility

B. Horvath Deep Learning Volatility

Rough Volatility
Gatheral, Jaisson and Rosenbaum (2014) suggested that volatility is rough. The slogan ”volatility is rough” refers to the idea that sample paths of the log volatility log(σt ), t ≥ 0 are rougher than the sample paths of Brownian motion (in terms of H ̈older regularity).
Fractional Brownian motion
A fractional Brownian motion with Hurst parameter H ∈ (0, 1) is a continuous centered Gaussian process (BtH)t∈R with covariance function
Cov(BtH,BsH)=1􏰓|t|2H+|s|2H−|t−s|2H􏰔, s,t∈R. (4) 2
3 2 1 0 1 2 3
4 0.0
Simulated paths of fBM with H =0.25
0.5 1.0 1.5 2.0
Time
B. Horvath
Deep Learning Volatility
fBM

Examples of models presented in this lecture
The Rough Bergomi model
dXt =−1Vtdt+􏰥VtdWt, fort>0, X0 =0, 2
􏰕√􏰤t 􏰖
(5)
(t−s)H−1/2dZs , fort>0, V0 =v0 >0
where H ∈ (0, 1) a parameter the Hurst parameter, ν > 0,
E(·) the (Wick) stochastic exponential,
ξ0(·) > 0 denotes the initial variance and
W and Z are standard Brownian motions with correlation parameter ρ ∈ [−1, 1].
􏰢 Big challenge to calibrate in real time
Vt =ξ0(t)E 2Hν
0
B. Horvath Deep Learning Volatility

And a more classical example: 1 Factor Bergomi
dXt =−1Vtdt+􏰥VtdWt fort>0, X0 =0 2
􏰕􏰤t 􏰖
Vt =ξ0(t)E η exp(−β(t−s))dZs fort>0, V0 =v0 >0,
0
(6)
where ν > 0, and W and Z are correlated standard Brownian motions with correlation parameter ρ ∈ [−1, 1].
􏰢 One of the most popular and well understood models for equities 􏰢 Typically SABR type approximations are used in the calibration
B. Horvath Deep Learning Volatility

Step 1: Generating Data from a Model
􏰢 Pick your favourite pricing scheme: Finite Differences, Monte-Carlo, Finite Elements etc.
􏰢 In our example we focus on Monte Carlo
􏰢 Compute Implied Volatilities for
􏰢 strikes={50%,60%,70%,80%,90%,100%,110%,120%,130%,140%,150%}
􏰢 Strikes as percentage of Spot
􏰢 maturities={0.1,0.3,0.6,0.9,1.2,1.5,1.8,2.0}
􏰢 Train Set: 34.000 and Test Set: 6.000
􏰢 Rough Bergomi
sample:(ξ0, ν, ρ, H) ∈ U[0.01, 0.16] × U[0.5, 4.0] × U[−0.95, −0.1] × U[0.025, 0.5]
􏰢 Note that each sample is 8 × 11 grid
B. Horvath Deep Learning Volatility

Monte Carlo Error analysis
Figure 1: rBergomi MC 95% confidence intervals
􏰢 Benchmark errors in our sampling procedure
􏰢 We would like to see NN errors to be smaller than these
B. Horvath Deep Learning Volatility

Data normalization
􏰢 Standard practice in ML to normalize data
􏰢 We have upper and lower bounds for model parameters
θ → scale(θ) ∈ [−1, 1] 2θi −(θi −θi )
scale(θi)= max θi −θi
min , i∈|Θ|
􏰢 Standardise Implied volatilities
max min
σi,j − E[σi,j ]
scale(σi,j ) = BS BS , i ∈ Maturities, j ∈ Strikes
BS std(σi,j ) BS
􏰢 Motivation: Simplifies learning problem for NN (no need to learn the magnitude of each parameter)
B. Horvath Deep Learning Volatility

Review of the Feedforward Neural Network used
Definition (Neural network)
Let L ∈ N and the tuple (N1, N2 · · · , NL) ∈ NL denote the number of layers (depth) and the number of nodes (neurons) on each layer respectively. Furthermore, we introduce the affine functions
wl : RNl −→RNl+1 for1≤l≤L−1 x 􏰣→ Al+1x + bl+1
(7) acting between layers for some Al+1 ∈ RNl+1×Nl . The vector bl+1 ∈ RNl+1 denotes the bias term and
each entry Al+1 denotes the weight connecting node i ∈ Nl of layer l with node j ∈ Nl+1 of layer l + 1. (i,j)
For the the collection of affine functions on each layer we fix the notation w = (w1,…,wL). We call the tuple w the network weights for any such collection of affine functions. Then a Neural Network
F (w , ·) : RN0 → RNL is defined as the composition:
F :=FL ◦···◦F1 (8)
whereeachcomponentisoftheformFl :=σl ◦Wl. Thefunctionσl :R→Risreferredtoasthe activation function. It is typically nonlinear and applied component wise on the outputs of the affine function W l .
B. Horvath Deep Learning Volatility

NN diagram
B. Horvath Deep Learning Volatility

Weights
x1 ai,1
Bias
b
NN zoom on each node
Node inputs x2
x3 ai,3
j=1
3
ai,2 bi +􏰂ai,jxj
Activation function
Output
y
σELU
B. Horvath Deep Learning Volatility

Convergence results for Neural Networks
Theorem (Universal approximation theorem (Hornik, Stinchcombe and White))
Let NNσd0,d1 be the set of neural networks with activation function σ : R 􏰣→ R, input dimension d0 ∈ N and output dimension d1 ∈ N. Then, if σ is continuous and non-constant, N N σd0 ,1 is dense in Lp (μ) for all finite measures μ.
Theorem (Universal approximation theorem for derivatives (Hornik, Stinchcombe and White))
Let F∗ ∈ Cn and F : Rd0 → R and NNσd0,1 be the set of single-layer neural networks with activation function σ : R 􏰣→ R, input dimension d0 ∈ N and output dimension 1. Then, if the (non-constant) activation function is σ ∈ Cn(R), then NNσd0,1 arbitrarily approximates f and all its derivatives up to order n.
􏰢 Note that one should use at least C1 activation functions to approximate derivatives
B. Horvath Deep Learning Volatility

Training Neural Networks
Definition (Training and test sets)
Let F ∗ : RN0 → RNL be a generic function. Then we say that the labelled set
Xtrain = {xi,F∗(xi)}i=1,…,M is a training set corresponding to F∗. A part of the training set Xtest is set aside to test the network on unseen data.
Definition (Neural Network Training/Calibration)
Let F ∗ : RN0 → RNL be a generic function that is only available though a set of input-output pairs, and let Xtrain = {xi,F∗(xi)}i=1,…,M denote its corresponding training set. Let
F (w , ·) : RN0 → RNL be a neural network and w ∈ Ω, where Ω denotes the set of parameters that characterises the network. Then, Training F is done solving
wˆ = argminL􏰄{F(w,xi}Mi=1,{F∗(xi)}Mi=1􏰅 (9) w∈Ω
for a given training set Xtrain and where L is a loss function, corresponding to an objective function of our choice.
B. Horvath Deep Learning Volatility

Stochastic Gradient Descent
Definition (Gradient Descent Algorithm)
A Gradient Descent algorithm (GD algorithm) associated to the objective function L, corresponds to the iterative update rule
w ∈Ω and w =w −α(∇wL)􏰓{F(w,x}M ,{F∗(x)}M 􏰔 α>0. (10) 0 n n−1 ii=1 ii=1
􏰢 The standard practice is to perform Stochastic Gradient Descent (SGD) or modifications to it like ADAM
􏰢 Faster iterations by reducing the number of data points used to compute the gradient ∇w .
B. Horvath Deep Learning Volatility

Pointwise learning
􏰢 Learn the map P(θ,T,k) = σM(θ)(T,k) via neural network BS
F􏰆(θ,T,k) := F(θ,zˆ,T,k) where
F∗ :Θ×[0,Tmax]×[kmin,kmax]−→R (11)
(θ,T,k) 􏰣→ F∗(θ,T,k) 􏰢 Learn the map F ∗(θ) = {σM(θ)(Ti , kj )}n, m via neural network
Image-based learning
F􏰆(θ) := F(θ,wˆ) where
BS i=1, j=1
F∗ :Θ−→Rn×m (12) θ 􏰣→ F∗(θ)
B. Horvath Deep Learning Volatility

Context
Learn with the right amount of context
Figure 2: Left: Context of pixels in image-recognition problems. Right: Creating context in financial problems. Neighbouring points for different strikes help create context and reduce problem complexity.
B. Horvath Deep Learning Volatility

Advantages of Image-based learning
B. Horvath Deep Learning Volatility

Advantages of Image-based learning
􏰢 Reduced problem complexity. We don’t need to learn the strike and maturity dimension!
B. Horvath Deep Learning Volatility

Advantages of Image-based learning
􏰢 Reduced problem complexity. We don’t need to learn the strike and maturity dimension!
􏰢 In the literature, pointwise learning yields layers with 4000 neurons i.e. Huge networks
􏰢 Image Based learning works with only 30 neurons per layer.
􏰢 Similar to image recognition, we exploit neighbouring points and data structure.
􏰢 Sampling does not depend on strikes and maturity, only on model parameters.
B. Horvath Deep Learning Volatility

Speed benchmark
MC Pricing 1F Bergomi Full Surface
MC Pricing rBergomi Full Surface
NN Pricing Full Surface
NN Gradient Full Surface
Speed up NN vs. MC
Flat forward variance
300.000 μs
500.000 μs
14, 3 μs
47 μs
21.000 − 35.000
Piecewise constant forward variancce
300.000 μs
500.000 μs
30, 9 μs
113 μs
9.000 − 16.000
Table 1: Computational time of pricing map (entire implied volatility surface) and gradients via Neural Network approximation and Monte Carlo (MC)
Indeed we achieve a remarkable speed up!
B. Horvath Deep Learning Volatility

􏰢 Step 1 works well and we have P􏰆 ≈ P􏰆NN
􏰢 We still need to calibrate the model (Step 2:)
θˆ := argmin L(P􏰆NN (θ), σMKT ).
θ∈Θ 􏰢 We choose L(·, ·) to be L2 error i.e.
nm
􏰂􏰂 (F􏰆(θ)
i=1 j=1
BS
θˆ := argmin θ∈Θ
− σMKT (T , k ))2. ij BS i j
So far…
􏰢 Note that parameter stability in time largely depends on the choice of L(·, ·)
B. Horvath Deep Learning Volatility

Choices of optimisation algorithms
Global Solution
Gradient-based
Depends on problem
Gradient-free
Convergence Speed
Very Fast
Slow
Always
Smooth activation function needed
Yes to apply Theorem 3
No
Accurate gradient approximation needed
Yes
No
Pricing so fast that we can use either method!!
B. Horvath Deep Learning Volatility

Online Calibration Times
Calibration within milliseconds
Open the door to genetic algorithms to ensure global optima
B. Horvath Deep Learning Volatility

Extension to term structures
􏰢 Many financial models take curves as input e.g. interest rates or forward variances
􏰢 We can add the curve to the model parameter space via parametrisation e.g. piecewise constant
(ξ0,ν,ρ,H) (ξ0,,ξ1,…,ξn,ν,ρ,H)
􏰢 You can find examples on GitHub:NN-StochVol-Calibrations
B. Horvath Deep Learning Volatility

Further applications: Model recognition
We see that after learning, calibrating many parameters is fast
B. Horvath Deep Learning Volatility

Further applications: Model recognition
We see that after learning, calibrating many parameters is fast ⇒ approximate several models at the same time.
B. Horvath Deep Learning Volatility

Further applications: Model recognition
We see that after learning, calibrating many parameters is fast ⇒ approximate several models at the same time.
New learning procedure:
􏰢 Train the generator on several models at the same time (here Heston and rBergomi) in Monte Carlo experiments as before: New mixture parameter a
a×Heston+(1−a)×rBergomi
􏰢 Calibrate mixture models ⇒ determine the best-fit mixture of model the two
models to a given data.
􏰢 Controlled experiments: train on both Bergomi and Heston ⇒ test on data generated by Heston.
B. Horvath Deep Learning Volatility

Further applications: Model recognition
B. Horvath Deep Learning Volatility

Related Posts