Business and Economic Forecasting
EMET3007/8012
Main Lecture Week 7 – Concentrating the Log-Likelihood
Main Lecture Week 7 – Concentrating the Log-Likelihood Business and Economic Forecasting EMET3007/8012
Plan for Today
Maximum Likelihood Computation
Grid Search and Function Evaluation
Concentrating the Likelihood Function
The Newton-Raphson method
Main Lecture Week 7 – Concentrating the Log-Likelihood Business and Economic Forecasting EMET3007/8012
Recall:
The log-likelihood function
`(✓ | x) = ln(f (x | ✓)
In particular, for X1, . . . ,Xn iid with pdf f
`(✓ | x) =
nX
i=1
ln(f (x | ✓))
We want to maximise `.
Main Lecture Week 7 – Concentrating the Log-Likelihood Business and Economic Forecasting EMET3007/8012
Analytic or Numerical
Often we can do analytic optimisation
Linear regression models
Autoregressive models
Examples from this week’s tutorials
Sometimes we can’t take the appropriate derivatives
Moving average models
Trend-Cycle Model with sin term
Many others
Instead we need to use numerical techniques
Main Lecture Week 7 – Concentrating the Log-Likelihood Business and Economic Forecasting EMET3007/8012
Grid Search
Simple method for finding maxima for one-dimensional
problems
max
✓2[a,b]
Q(✓)
Partition [a, b] into small parts
a = ✓0 < ✓1 < ✓2 < · · · < ✓N = b
Evaluate Q(✓i ) for each i . Pick the biggest.
argmax
✓0,...,✓N
Q(✓i )
Accuracy depends on grid fineness and ‘niceness’ of Q.
Main Lecture Week 7 - Concentrating the Log-Likelihood Business and Economic Forecasting EMET3007/8012
Log-likelihood for the MA(1) Process
As an example, generate T = 100 data points according to
the MA(1) model with ✓ = 0.8 and �2 = 1
Assume that �2 = 1 is known
So the log-likelihood will be a function only of ✓
Evaluate the log-likelihood over a grid of ✓’s
Main Lecture Week 7 - Concentrating the Log-Likelihood Business and Economic Forecasting EMET3007/8012
Yt EtTOEt I
MATLAB code
ngrid = 300;
theta grid = linspace(.2,1,ngrid);
%% construct A and B so that Gam = A + B*theta
A = speye(T);
B = spdiags(ones(T�1,1),[�1],T,T);
ell = zeros(ngrid,1);
for i=1:ngrid
theta = theta grid(i);
Gam = A + B*theta;
Gam2 = Gam*Gam';
ell(i) = �T/2*log(2*pi) �.5*log(det(Gam2)) +...
� .5*y'*(Gam2\y);
end
Main Lecture Week 7 - Concentrating the Log-Likelihood Business and Economic Forecasting EMET3007/8012
Figure: The log-likelihood `(✓ | y) for the MA(1) model with �2 = 1
Main Lecture Week 7 - Concentrating the Log-Likelihood Business and Economic Forecasting EMET3007/8012
Grid Search - MA(1) Process
Last week we built data from an MA(1) process, and
calculated log-likelihood
We used a grid with 300 points between 0.2 and 1
[Qmax id] = max(ell);
theta grid(id)
Gives maximum likelihood at ✓ = 0.7833
Main Lecture Week 7 - Concentrating the Log-Likelihood Business and Economic Forecasting EMET3007/8012
Grid Search
Pros:
Easy to implement
Simple Solution
Cons:
Is prohibitively slow for large dimensions
Increasing accuracy can be very very expensive
Requires Q to be ‘nice’
Main Lecture Week 7 - Concentrating the Log-Likelihood Business and Economic Forecasting EMET3007/8012
Functional Evaluations
Use built-in Matlab operations to maximise for us
Matlab can’t maximise, but
argmax
✓2⇥
Q(✓) = argmin
✓2⇥
�Q(✓)
Main functions: fminbnd and fminsearch
Use fminbnd for one-dimensional problems
Use fminsearch for multi-dimensional problems
Main Lecture Week 7 - Concentrating the Log-Likelihood Business and Economic Forecasting EMET3007/8012
Functional Evaluations - MA(1) Process
Our example is one-dimensional, so we should use fminbnd
fminbnd takes three arguments: a function, a lower bound,
and an upper bound
It will then minimise the function over values between the
bounds
First we need to set-up the function
Main Lecture Week 7 - Concentrating the Log-Likelihood Business and Economic Forecasting EMET3007/8012
MA(1) Log-likelihood function
%% negative of the log�lilkelihood for MA(1)
%% input: x = [sig, theta]; y = data
function ell = loglike MA1(x,y)
sig = x(1); theta = x(2);
T = length(y);
A = speye(T);
B = spdiags(ones(T�1,1),[�1],T,T);
Gam = A + B*theta;
Gam2 = Gam*Gam';
ell = �T/2*log(2*pi*sig)�.5*log(det(Gam2))�.5/sig*y'*(Gam2\y);
ell = � ell;
Main Lecture Week 7 - Concentrating the Log-Likelihood Business and Economic Forecasting EMET3007/8012
Functional Evaluations - MA(1) Process
Our new function loglike MA1 takes 2 arguments: a 2⇥ 1
vector of parameters (�2, ✓) and the data input y.
But we know �2 = 1, so need a new function
f = @(theta) loglike MA1([1 theta],y);
Now we can use fminbnd
%% minimize f in the interval [0,1]
[thetaMLE Qmax] = fminbnd(f,0,1);
Gives maximum likelihood at ✓ = 0.7827
Main Lecture Week 7 - Concentrating the Log-Likelihood Business and Economic Forecasting EMET3007/8012
Functional Evaluations - MA(1) Process
What if we don’t know �2?
fminbnd will be cumbersome, as it takes univariate functions
Use fminsearch instead. fminsearch takes two arguments:
a function of n variables, and an n ⇥ 1 vector of initial
guesses.
g = @(x) loglike MA1(x,y);
[thetaMLE Qmax] = fminsearch(g,[.5 .7]);
%% starting values: sigmaˆ2 = .5 and theta = .7
Gives maximum likelihood at ✓ = 0.7827, �2 = 1.0879.
Main Lecture Week 7 - Concentrating the Log-Likelihood Business and Economic Forecasting EMET3007/8012
Concentrating the Likelihood Function
Analytic tool, but useful for numerical optimisation
Partition parameter vector as ✓ = (✓01,✓
0
2)
0 and,
suppose we can analytically find ✓̂1 as a function of ✓2.
@
@✓1
`(✓ | y) = 0 =) ✓̂1 = ✓̂1(✓2)
Then build
`c(✓2 | y) = `(✓̂1(✓2),✓2 | y)
Main Lecture Week 7 - Concentrating the Log-Likelihood Business and Economic Forecasting EMET3007/8012
Concentrating the Likelihood Function
`c(✓2 | y) = `(✓̂1(✓2),✓2 | y)
This is then a function of ✓2 only, so the dimensionality has
been reduced.
Once we have the MLE for ✓2, we can calculate ✓̂1(✓2)
directly.
This method is a specialised method for when we have some
nice components, and some horrible components.
Main Lecture Week 7 - Concentrating the Log-Likelihood Business and Economic Forecasting EMET3007/8012
Concentrating the Likelihood Function - Trend-Cycle
Model
yt = a0 + a1t + b1 cos(!t) + b2 sin(!t) + ✏t , ✏t ⇠ N (0,�2)
`(✓ | y) =�
T
2
log(2⇡�2)+
�
1
2�2
TX
t=1
(yt � a0 � a1t � b1 cos(!t)� b2 sin(!t))2
But ✓ is 6-dimensional, which is a bit high. So let’s
concentrate the log-likelihood function.
Main Lecture Week 7 - Concentrating the Log-Likelihood Business and Economic Forecasting EMET3007/8012
Main Lecture Week 7 - Concentrating the Log-Likelihood Business and Economic Forecasting EMET3007/8012
cos w sMLw
s I in I b I sibSCTw sthLTD
T
omega
w
X w
Notice if we knew ! this would be a linear regression.
Let ✓ = (✓01,✓
0
2)
0 with ✓1 = (a0, a1, b1, b2,�
2)0 and ✓2 = !.
Then
`(✓ | y) = �
T
2
log(2⇡�2)�
1
2�2
(y � X(!)�)0(y � X(!)�)
Assuming we know !, we solve analytically to get
�̂ = (X0X)�1X0y, �̂2 =
1
T
(y � X�̂)0(y � X�̂)
So we can maximise the concentrated log-likelihood
`c(! | y) = `(✓̂1(!),! | y) = `(�̂, �̂2,! | y)
Main Lecture Week 7 - Concentrating the Log-Likelihood Business and Economic Forecasting EMET3007/8012
Concentrated log-likelihood
function loglike = trend cycle lc(omega,y)
T = length(y);
t = (1:T)'; %% build a vector from 1 to T
X = [ones(T,1) t cos(omega*t) sin(omega*t)];
beta = (X'*X)\(X'*y);
err = y�X*beta;
sig2 = err'*err/T;
loglike = �(�T/2*log(2*pi*sig2) �.5*(err'*err)/sig2);
Main Lecture Week 7 - Concentrating the Log-Likelihood Business and Economic Forecasting EMET3007/8012
Full log-likelihood
f = @(w) trend cycle lc(w,y); %% define f(w)
%% find the min of f from 2*pi/40 to 2*pi/4
omega = fminbnd(f,2*pi/40,2*pi/4);
t = (1:T)';
X = [ones(T,1) t cos(omega*t) sin(omega*t)];
beta = (X'*X)\(X'*y);
err = y�X*beta;
sig2 = err'*err/T;
Main Lecture Week 7 - Concentrating the Log-Likelihood Business and Economic Forecasting EMET3007/8012
Concentrating the Likelihood Function - Results
Using earlier code, we made a dataset of the trend-cycle
model with T = 50 and
✓ = (a0, a1, b1, b2,�
2,!) = (2, 0.5, 4, 2, 1, 2⇡/10)
Running the likelihood estimation we estimated
✓̂ = (â0, â1, b̂1, b̂2, �̂
2, !̂) = (1.77, 0.506, 3.96, 2.06, 0.800, 0.631)
Main Lecture Week 7 - Concentrating the Log-Likelihood Business and Economic Forecasting EMET3007/8012
Newton-Raphson Method
Define the Score Function S(✓) = @`(✓ | y)/@✓.
Then for the MLE ✓̂ we have S(✓̂) = 0.
The Newton-Raphson method is a procedure for ‘root-finding’
using approximation by linear functions.
Main Lecture Week 7 - Concentrating the Log-Likelihood Business and Economic Forecasting EMET3007/8012
i
Newton-Raphson Method
Let ✓̂ be the true maximiser, and ✓ be our first guess. Use
Taylor’s approximation:
S(✓̂) ⇡ S(✓) +
✓
@
@✓0
S(✓)
◆
(✓̂ � ✓) = S(✓) + (✓̂ � ✓)H`(✓)
where H(✓) is the Hessian of `. As S(✓̂) = 0,
S(✓) + (✓̂ � ✓)H`(✓) ⇡ 0
=) ✓̂ ⇡ ✓ � (H(✓))�1S(✓)
Main Lecture Week 7 - Concentrating the Log-Likelihood Business and Economic Forecasting EMET3007/8012
ifi i
f I din
_Oi Halil SLA
Newton-Raphson Method
✓̂ ⇡ ✓ � (H(✓))�1S(✓)
This suggests an iterative procedure
✓t+1 = ✓t � (H(✓t))�1S(✓t)
Use this recursion to make a sequence ✓0,✓1, . . . and continue
until the sequence converges
This is the (basic) Newton-Raphson method
Main Lecture Week 7 - Concentrating the Log-Likelihood Business and Economic Forecasting EMET3007/8012
Newton-Raphson Method - Gamma Distribution
Yi ⇠ G (↵,�) =) f (yi | ↵,�) =
�↵
�(↵)
y
(↵�1)
i e
��yi
Let Y = (Y1, . . . ,Yn) be iid. Then f (y | ↵,�) is the product
distribution
To use Newton-Raphson we need the Score and Hessian.
For those we first need `(↵,� | y).
`(↵,� | y) = n↵ log ��n log �(↵)+(↵�1)
nX
i=1
log yi ��
nX
i=1
yi
Main Lecture Week 7 - Concentrating the Log-Likelihood Business and Economic Forecasting EMET3007/8012
h
fix
togtfCHI fix
Main Lecture Week 7 - Concentrating the Log-Likelihood Business and Economic Forecasting EMET3007/8012
n6gB n f t
hgcyil.de
Jp _f Iyo
2
g
n 4th
221 n
j Zf 2 28 f
t
n
Newton-Raphson Method - Gamma Distribution
S(↵,�) =
@
@↵`(↵,� | y)
@
@� `(↵,� | y)
!
=
n(log(�)� (↵)) +
Pn
i=1 log yi
n↵
� �
Pn
i=1 yi
!
H(↵,�) =
@2
@↵2
`(↵,� | y) @
2
@↵@� `(↵,� | y)
@2
@�@↵`(↵,� | y)
@2
@�2
`(↵,� | y)
!
= n
� 0(↵) 1�
1
� �
↵
�2
!
where (x) = �0(x)/�(x) and is in Matlab as psi(x). The
derivative �0 is psi(1,x).
Main Lecture Week 7 - Concentrating the Log-Likelihood Business and Economic Forecasting EMET3007/8012
Newton-Raphson Method - Starting Values
Newton-Raphson is an iterative process, so we have to start
somewhere.
It is often important to have a good choice of starting value.
A good start would match the population moments with the
sample moments.
Let ȳ and s2 be the first and second sample moment. For the
Gamma G (↵,�) distribution, E(Y ) = ↵/� and
Var(Y ) = ↵/�2, so let
↵0 =
ȳ2
s2
, �0 =
ȳ
s2
Main Lecture Week 7 - Concentrating the Log-Likelihood Business and Economic Forecasting EMET3007/8012
n = length(y);
sumlogy = sum(log(y)); sumy = sum(y);
count = 0;
alpt = mean(y)ˆ2/var(y);
bett = mean(y)/var(y); % initial guess
S = [n*(log(bett)�psi(alpt))+sumlogy; n*alpt/bett�sumy];
thetat = [alpt; bett];
while sum(abs(S) > 10ˆ(�5)) > 0 %% covergence criteria
S = [n*(log(bett)�psi(alpt))+sumlogy; n*alpt/bett�sumy];
H = n * [�psi(1,alpt) 1/bett; 1/bett �alpt/bettˆ2 ];
thetat = thetat � H\S;
alpt = thetat(1); bett = thetat(2);
count = count + 1;
end
Main Lecture Week 7 – Concentrating the Log-Likelihood Business and Economic Forecasting EMET3007/8012