ST227: Applications of R in Life Insurance
Optimisation in MLE & Applications in Markovian Models
27/02/2021
Copyright By PowCoder代写 加微信 powcoder
Optimisation
▶ Often in mathematical sciences, one is concerned with the following question: given an input domain O an a criterion function f : O → R,
▶ Is there a minimum point x ∈ O?
▶ Is this explicitly solvable? If it’s not, can we efficiently find it numerically?
▶ The nature of optimal point: min? max? inflection? local? global?
▶ If O is a subset of Rn, then this procedure often involves:
▶ Solving for the candidate solution, which satisfies:
∂if = 0,i = 1,…,n.
▶ Verify the nature of this point by consider the positive or negative-definiteness of the Hessian matrix:
∂2 f ∂2 f … ∂2 f 1112 1n
∂2 f ∂2 f … ∂2 f 22122 2n
Df≜. . .. . ….
∂2 f ∂2 f … ∂2 f n1n2 nn
Optimisation
▶ Let us consider a toy example where f : x → x2. Then: f′(x)=2x, x=0
f′′(x) = 2, f′′(x)|x=0 = 2 > 0.
The candidate optimiser is x = 0. The second derivative test implies that it is a
local minimum.
▶ We cannot conclude that it is a global minimum without further considerations.
▶ If the second derivative test is 0, we cannot make a conclusion.
▶ Example: g(x) = x4,h(x) = −x4 and k(x) = x3 all have x = 0 as the unique candidate – yet the nature of this point is different.
Optimisation
▶ In practice – it is extremely difficult to solve f ′ = 0 analytically.
▶ Example: O = (0,∞) and f : x → 12×2 ln(x)− 41×2 −2x. It has a (exercise!)
unique global minimiser. The first order condition is: f ′(x) = x ln(x) − 2 = 0.
Can you solve it “by hand”?
▶ Potential numerical approaches:
▶ Interval Bisection
▶ Newton-Raphson algorithm
▶ You can write your own algorithm, or . . .
Optimisation
▶ optim is a general-purpose optimisation routine in R. ▶ General syntax: optim(par,fn), where:
▶ par: initial values of parameters;
▶ fn: a function to be minimised.
▶ Additional fine-tuning is available. We will introduce them as we need them. Type
?optim for its documentation. Some additional arguments of interest are: ▶ method: Method of optimisation. Default value is the “Nelder-Mead” method. ▶ lower and upper: bounds for the candidate solution.
▶ Available methods inlucde: “Nelder-Mead”, “BFGS”, “CG”, “L-BFGS-B”, “SANN”, and “Brent”. Their algorithms are beyond the scope of ST227.
Optimisation | Example: 1-dimensional Optimisation
▶ Consider the toy example: f : x → x2. Use 1 as the initial value.
▶ The output is a list. We are interested in the following items: $par = -8.881784e-16 is the minimiser, and $value = 7.888609e-31 is the the function value at this minimiser.
▶ Be careful of the warning, though: “Warning in optim(par = 1, sqr): one-dimensional optimization by Nelder-Mead is unreliable: use”Brent” or optimize() directly”
▶ The default method is designed for multi-dimensional optimisation. Let us use the Brent method.
▶ The Brent method requires lower and upper arguments: optim(par=1,sqr,method=”Brent”,lower=-10,upper=10)
sqr <- function(x){xˆ2} optim(par=1,sqr)
Optimisation | Example: A Warning Example
▶ What if a function does not have a maximum/minimum?
id <- function(x){x}
optim(par=0,id) optim(par=0,id,method="Brent",lower=-10,upper=10)
Does this result make sense?
▶ Use numeric optimisation with reservation.
▶ Exercise: Use optim with method = "Brent" to minimise on (0, ∞) the function:
f : x → 1 x 2 ln(x ) − 1 x 2 − 2x . 24
Optimisation | Example: 2-dimensional Optimisation
▶ Consider the function: f : (x, y) → x2 + y2, which has a global minimum at the origin.
▶ The underlying syntax of optim requires f to have a single vectorial argument, instead of two real-valued arguments. That is, instead of:
f <- function(x,y){ xˆ2 + yˆ2
we will need:
▶ For a multi-dimensional optimisation problem, we can use the default Nelder-Mead method:
optim(par = c(1,1), f)
f <- function(x){ x[1]ˆ2 + x[2]ˆ2
General Warnings
▶ How does one choose initial values for an optimisation algorithm? There are no definitive answers. Some approaches are:
▶ Numerical Experimentation,
▶ Qualitative insight of the problem, e.g. constrained optimisation, ▶ Inspiration from similar, but simpler problems,
▶ Guesses.
▶ Minimisation algorithms actually search for a turning point. ▶ No guarantee on the uniqueness/nature of the turning point. ▶ Might get trapped in local optima.
▶ Definitely NOT a replacement for mathematical insights.
▶ Widespread use in Statistics: Maximum Likelihood Estimation, Least Square
Estimation (e.g. Linear Regression and GLM), Stochastic Gradient Descent (Deep Learning, Neural Network).
Maximum-Likelihood Estimation
▶ Suppose that we observe a sample (X1,...,Xn), which depend on a common parameterθ∈Rd,whereXi ∼fi(x,θ).
▶ Questions: 1. How to estimate θ based on (X1,...,Xn)? and 2. How sure are we of this estimate?
▶ There are various approaches:
▶ Method of Moments;
▶ Least Square Estimation;
▶ Maxmium Likelihood Estimation
▶ The likelihood L(θ;X) and log-likelihood l(θ;X) are defined as follows:
L(θ;X) = fi(xi,θ) (1a)
l(θ;X) = logfi(xi,θ) (1b) i=1
▶ The maximum likelihood estimator is defined as the solution of: maxl(θ;X),
which is an optimisation problem.
▶ For most distributions seen in ST102, this problem is explicitly solvable. In general,
though, we will have to solve it numerically.
Maximum Likelihood Estimation | Example: Normal Distribution
▶ A random variable X follows a N (μ, σ2) if it has density:
e 2σ2 , x ∈R.
▶ Given an i.i.d. normal sample X = (X1,...,Xn), let us estimate μ and σ2 using numerical MLE method.
▶ Suppose n = 100. We will simulated some dummy date for this task: X <- rnorm(n=100,mean=1.5,sd=5)
f(x;μ,σ2)= √
Maximum Likelihood Estimation | Example: Normal Distribution
▶ Maximising log-likelihood means minimising its negative. That is, we are minimising the function:
n 1−(xi−μ)2 −log √ e 2σ2
▶ There are at least three ways I can think of to define this sum, with varying level of difficulty:
▶ 1: using for loop and elementary calculations only. ▶ 2: utilising R’s vectorisation.
▶ 3: utilising R’s built-in statistical functions.
Maximum Likelihood Estimation | Example: Normal Distribution
▶ The first method is a low-level method. If you have studied a “traditional” programming course before, this is probably how you define a sum:
nLL <- function(param){ mu <- param[1]
sigma2 <- param[2] out <- 0
for(i in 1:length(X)){
out <- out - log((2*pi*sigma2)ˆ{-1/2}*exp(-(X[i]-mu)ˆ2/(2*sigma2)))
return(out)
▶ This displays all the fine details involved.
▶ Very reminiscent of low-level languages e.g. C.
▶ This is not recommended. Too much bookkeeping obfuscates the essential ideas.
Maximum Likelihood Estimation | Example: Normal Distribution
▶ The second method employs R’s vectorised calculations. Those of you with some Matlab experience will be familiar with this.
nLL <- function(param){ mu <- param[1]
sigma2 <- param[2]
log((2*pi*sigma2)ˆ{-1/2}*exp(-(X-mu)ˆ2/(2*sigma2)))
▶ Why does this work? Due to vectorisation, the code X-mu actually returns: (X1 − μ,X2 − μ,....,Xn − μ).
▶ The code (X-mu)ˆ2 returns: (X1 − μ)2, ...(Xn − μ)2.
▶ Tracing along all the calculations, we see that the code
log((2*pi*sigma2)ˆ{-1/2}*exp(-(X-mu)ˆ2/(2*sigma2))) returns the vector (log f (X1; μ, σ2), .., log f (Xn; μ, σ2)). Summing this gives you the log-likelihood function.
Maximum Likelihood Estimation | Example: Normal Distribution
▶ The third method is faster still:
nLL <- function(params){ mu <- params[1]
sigma2 <- params[2]
dnorm(X,mean=mu,sd=sqrt(sigma2),log=TRUE)
▶ dnorm stands distribution function of the normal distribution (in this case, the value of the density).
▶ The optional argument log=TRUE returns the log of the density - which is convenient for log-likelihood calculations.
▶ This makes your codes succinct and human-readable and idiomatic to the R language.
Maximum Likelihood Estimation | Example: Normal Distribution
▶ The negative log-likelihood has been defined. Let us optimise it. optim(par = c(1,1),nLL)
▶ Above, I just guessed the initial parameter values.
▶ Difficult optimisation problems tend to be numerically unstable. That is, if initial
parameters are not chosen carefully:
▶ the algorithm might not converge at all, or ▶ might be very slow even when it converges.
▶ We want to choose a “good” initial value. One option is to use the Method of Moment Estimator.
optim(par = c(mean(X),var(X)),nLL)
Maximum Likelihood Estimation | Example: Gamma Distribution
▶ The previous example seems trivial:
▶ The explicit MLE estimator is known.
▶ MLE and Method of Moments estimator coincide.
▶ Let’s consider a different example. Suppose that X follows a Gamma distribution with parameters (α, β), i.e.
βα α−1 −βx fX(x)= Γ(α)x e
- We create some dummy data for this task:
Y <- rgamma(100,shape=10, rate=10)
▶ Exercise: Derive the method of moment estimators for α and β. Use them as initial values, retro-fit an MLE estimator on the data Y .
Maximum Likelihood Estimation | Example: Weibull Distribution (Not Examinable)
▶ Say that a variable X follows a Weibull(λ,k) distribution if it has density:
k x e−(x/λ), x≥0
f (x ; λ, k ) = λ λ 0,
▶ We observe a sample (X1,...Xn) which are i.i.d. Weibull(λ,k). Let us estimate λ
▶ Suppose n = 100. Let’s simulate some dummy data for this task. dummydata <- rweibull(100,shape=10,scale=20)
Maximum Likelihood Estimation | Example: Weibull Distribution (Not Examinable) (2)
▶ Maximising log-likelihood means minimising its negative. The negative log-likelihood can be defined as:
negLL <- function(param){
shape <- params[1]; scale <- prams[2] -sum(
log(scale/shape*(dummyData/shape)ˆ{scale-1}*exp(-(dummyData/shape)ˆsca
▶ R has a shorthand for this. Use the dweibull function to calculate the distribution function of the weibull distribution.
negLL <- function(params){ -sum(dweibull(dummydata,shape=params[1],scale=params[2],log=TRUE))
Maximum Likelihood Estimator | Example: Weibull Distribution (Not Examinable) (3)
▶ Let’s optimise this:
optim(par =c(10,10),fn = negLL)
▶ The output is λ = 10.25863 and k = 19.99069. This is pretty close to the original. ▶ How did we choose the initial values?
▶ From other estimation methods e.g. Method of Moments. ▶ Taking “shots in the dark”, i.e. guessing.
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com