CS代考 ARMA Models Model fitting

ARMA Models Model fitting
7.6 Estimation of ARMA model parameters
So far we assumed that the parameters of the AR and MA models are given.
Now we consider how to estimate these parameters given n observations, x1, x2, . . . , xn.
Note lower case xt instead of upper case Xt.
We will assume the white noise is normally distributed.
The expectation μ is estimated using the sample mean μˆ = x ̄. A mean zero AR, MA or ARMA will be fitted to X −μˆ.
The parameters we want to estimate are φ1, . . . , φp for AR(p), θ1, . . . , θq for MA(q) and σW2 for white noise.
We have p + q + 1 unknown parameters in total.
In a first step we assume that p and q are known, e.g. from an analysis of the ACF and PACF.
Later we will discuss another model selection tool to choose p and q.
98/118

ARMA Models Model fitting
OLS regression for AR(p)
From now on we will assume that E(Xt) = 0.
A very simple way of fitting an AR(p) is to fit it as a regression model using OLS for estimating
Xt =φ1Xt−1+φ2Xt−2+···+φpXt−p+Wt
.
Suppose we have n datapoints x1, x2, . . . , xn.
The response variable is the n − p datapoints Xp+1, . . . , Xn.
explanatory variables are all the sequences from lag 1 (Xp,…,Xn −1) to lag p (X1,…,Xn −p).
Hence, in using OLS regression, we will lose the information from the first p observations.
This makes this simple approach is inefficient.
99/118

ARMA Models Model fitting
Example: OLS regression for US manufacturing data
Consider the US manufacture data from the lab. This is monthly new orders in USD.
y = ts(scan(“USNewOrders.txt”), frequency = 12,
start = c(1960, 1))
ACF and PACF suggest an AR(4) model. (See next slide.) Fitting the AR model with OLS could be done by lm in R.
But generating the regressors in R would be tedious.
There is already a function in R for OLS on time series
ar.ols(y, aic = FALSE, order.max = 4)
100 / 118

ARMA Models Model fitting
Example: OLS regression for US manufacturing data
1960 1970 1980
1990 2000 2010
Series y
Time Series
ACF and PACF
0.5 1.0 1.5 2.0
0.0
Lag in Years
Series y
Year
0.0
0.5
1.0
1.5 2.0
Lag in Years
101 / 118
Xt
30 40 50 60 70
Partial ACF ACF
0.0 0.4 0.8 0.0 0.4 0.8

ARMA Models Model fitting
Example: OLS regression for US manufacturing data
> ar.ols(y, aic = FALSE, order.max = 4)
Call:
ar.ols(x = y, aic = FALSE, order.max = 4)
Coefficients: 1234
0.8543 0.1102 -0.0220 -0.0786
Intercept: 0.009228 (0.1537)
Order selected 4 sigma^2 estimated as 14.12
102 / 118

ARMA Models Model fitting
Method of moments estimators
An alternative to OLS is the method of moments.
Idea
We can derive equations for the moments E(Xtk) for k = 1,2,… in terms of the parameters.
1 􏰋n
xik. This allows us to work out estimators for the parameters.
It is easy to estimate the moments E(Xtk) by n For example, the first moment is E (Xt ) = μ
i=1
The moment estimator is the sample mean
1 􏰋n
μˆ = x ̄ = n
The method of moments for AR(p) is given by the so called Yule-Walker equations.
i=1
x i
103 / 118

ARMA Models Model fitting
Example: Second moment of AR(2) process
Consider the mean zero stationary AR(2) model: Xt = φ1Xt−1 + φ2Xt−2 + Wt
Multiply both sides by Xt−h and take expectations to get:
E(XtXt−h) = φ1E(Xt−1Xt−h)+φ2E(Xt−2Xt−h)+E(WtXt−h)
E(WtXt−h) = 0 as past lags are unrelated to future errors. As E(Xt) = 0 the expectation of the product of two lags is
the covariance:
γ(h)=φ1γ(h−1)+φ2γ(h−2) forh=1,2,…
Dividing by γ(0) = Var(Xt) gives a relation for the correlation ρ(h)=φ1ρ(h−1)+φ2ρ(h−2) forh=1,2,…
104 / 118

ARMA Models Model fitting
Yule-Walker equations
Consider the general mean zero stationary AR(p) model: Xt =φ1Xt−1+…+φpXt−p+Wt
We can get a similar relation ρ(h)=φ1ρ(h−1)+…+φpρ(h−p) forh=1,2,…,p.
The variance of the white noise can be written as σW2 =γ(0)[1−φ1ρ(1)···−φpρ(p)].
The formula of ρ(h) gives a system of p linear equations with p unknowns from all the lags h = 1,…,p.
For estimation replace
γ(0) with the sample variance s2 = γˆ(0); and
ρ(h) with their sample estimates from the ACF, i.e. γˆ(0).
Then solve to find (φ1,φ2,…,φp,σW2 ).
105 / 118

ARMA Models Model fitting
Yule-Walker estimation for an AR(1)
Consider a mean zero causal stationary AR(1) (|φ1| < 1): Xt = φ1Xt−1 + Wt The Yule-Walker equations give: ρ(1) = φ1ρ(0) = φ1 σW2 =γ(0)[1−φ21] Replacing the ρ(1) with the sample ACF gives: n 􏰋(xt+1 − x ̄)(xt − x ̄) φˆ1 = i=1 n 􏰋 ( x t − x ̄ ) 2 i=1 The estimated error variance is then σW2 = γ(0)[1 − φˆ21]. 106 / 118 ARMA Models Model fitting Example: Yule-Walker for US manufacturing data > ar.yw(y, aic = FALSE, order.max = 4)
Call:
ar.yw.default(x = y, aic = FALSE, order.max = 4)
Coefficients: 1234
0.8458 0.1137 -0.0205 -0.0772
Order selected 4 sigma^2 estimated as 14.57
107 / 118

ARMA Models Model fitting
Method of moments comments
For AR(p) models, the Yule-Walker equations provide optimal estimators in the sense of normal distributed estimates with minimum variance.
The reason is that given our assumptions, the AR(p) is a linear model and these equations are essentially least squares estimators.
Such a method of moment estimator for an MA (or ARMA) model will not provide an optimal estimators because these processes are non-linear in the parameters.
108 / 118

ARMA Models Model fitting
Method of moments for MA(1) model
Consider the invertible MA(1) model with its AR(∞) representation:

Xt =Wt +θ1Wt−1 =−􏰋(−θi)jXt−j +Wt
j=1
This is non-linear in the parameter θ1.
The autocorrelation is ρ(1) = θ1/(1 + θ12).
Plug in the sample estimates

ρˆ(1) = θˆ1 gives θˆ = 1± 1−4 ρˆ(1)2 1+θˆ12 1 2 ρˆ(1)
which is a non-linear solution.
In general, for an arbitrary order MA(q) a closed form solution does not exist and typically an iterative quasi-Newton method is used to solve the non-linear problem.
A preferred alternative for MA(q) and ARMA(p, q) models is maximum likelihood estimation.
109 / 118

ARMA Models Model fitting
Maximum likelihood estimation (MLE)
Idea Choose the parameters which maximise the “likelihood” of observing the data.
We consider MLE for stationary AR(1) |θ1| < 1: Xt = φ1Xt−1 + Wt The likelihood function is the joint distribution of the observations L(φ1,σW2 ) = f(x1,x2,...,xn|φ1,σW2 ). Dropping the conditioning on the parameters for notational convenience, this can be expanded as L(φ1,σW2 ) = f(x1,x2,...,xn) = f(x1)f(x2|x1)f(x3|x1,x2)...f(xn|x1,...,xn−1) which for AR(1) reduces to L(φ1, σW2 ) = f (x1)f (x2|x1)f (x3|x2) . . . f (xn|xn−1). 110 / 118 ARMA Models Model fitting Maximum likelihood estimation AR(1) The likelihood for the AR(1) is: L(φ1,σW2 )=f(x1)f(x2|x1)f(x3|x2)...f(xn|xn−1). The log-likelihood is l(φ1,σW2 )=logL(φ1,σW2 ). Since Wt ∼ N(0, σW2 ) i.i.d. the log-likelihood is l(φ1,σW2 ) = log L(φ1,σW2 ) with = −n−1log(2π)+ n−1log(σW2 )− 1 S(φ1)+logf(x1) 2 2 2σW2 n S(φ1) = 􏰋(xt − φ1xt−1)2. t=2 111 / 118 ARMA Models Model fitting Maximum likelihood estimation AR(1) Two approaches are commonly used to handle the initial value x1 and it’s contribution to the likelihood function f (x1): 1 Treat x1 as fixed, so that ln f (x1) does not contribute to the log-likelihood. Referred to as conditional MLE. 2 Use the stationarity assumption x1 ∼ Normal 􏰀0, σW2 /(1 − φ21)􏰁 and thus log f (x1) = −1 log(2π)−1 log σW2 +1 ln(1−φ21)− 1 x12(1−φ21) 2 2 2 2σW2 The conditional MLE of φ1 has a closed form. The minimizer of the sum of squares S(φ1) = 􏰊nt=2(xt − φ1xt−1)2 is found by differentiating with respect to φ1 and equating to zero. This gives: 􏰊nt=2 xt−1xt φ1=􏰊n x2 . t=2 t−1 ˆ 112 / 118 ARMA Models Model fitting Maximum likelihood estimation AR(1) The estimate of φˆ1 can be used to estimate the variance Remark 1 􏰋n σˆW2=n−1 (xt−φˆ1xt−1)2. t=2 In general, MLE of an AR(p) conditional on the first p observations is equivalent to least squares estimation. MLE for MA(q) and ARMA(p, q) is again non-linear. 113 / 118 ARMA Models Model fitting Example: MLE for US manufacturing data MLE is what we did in the lab, using the arima function: > usorders.ar4 = arima(y, order = c(4, 0, 0))
> usorders.ar4
Call:
arima(x = y, order = c(4, 0, 0))
Coefficients:
ar1 ar2 ar3 ar4 intercept
0.853 0.1098 -0.0215 -0.0795 0.1376
s.e. 0.041 0.0541 0.0540 0.0410 1.1060
sigma^2 estimated as 14.29: log likelihood = -1655.58, aic = 3323.17
114 / 118

ARMA Models Model selection
7.7 Model selection
A commonly used criteria for model selection to choose p or q is the AIC (Akaike Information Criterion).
It measures how much of the information present in the data is lost by using the model.
The AIC is available for all the model fitting options we have looked at.
The AIC is commonly the default in R.
When using MLE for an ARMA(p, q) the order p and q are
chosen to minimise
−2l(φˆ1,…,φˆp,θˆ1,…,θˆq,σˆW2 )+2(p+q+1)n.
n−p−q−2 The final term is a penalty on the complexity of the model.
The smaller the AIC the better the model.
115 / 118

ARMA Models Model selection
Model diagnostics – residuals
As with all statistical modelling, a key stage is to assess the assumptions underlying the model.
A key assumption is the underlying innovations Wt are a white noise process.
It is common to plot the residuals over time, and their sample ACF to assess this assumption.
The tsdiag function in R plots the standardised innovations: x − xˆ
tt
􏰏 Pˆ t − 1 t
where Pˆt−1 is the estimated one-step ahead prediction error t
(details in the textbook, not important here).
The standardised residuals for a good model should behave as
a white noise sequence with mean zero and variance one.
116 / 118

ARMA Models Model selection
Model diagnostics – residuals
A Ljung-Box test should be used to assess whether the residuals are white noise.
H0 : Wt is white noise Test statistic at lag h is
Q(h)=n(n+2)􏰋h ρˆW(k)2
where ρˆW (k) is the sample ACF at lag k.
Under mild conditions Q(h) is asymptotically χ2h−p−q
distributed.
Hence, p-values can be derived from χ2h−p−q.
Small p-values will lead to rejection of the null, suggesting that Wt is not white noise.
The p-values are plotted by tsdiag.
n−k k=1
117 / 118

ARMA Models Model selection
Example: Diagnostics for US manufacturing data
Standardized Residuals
1960
1970 1980
1990
2000
2010
Time
ACF of Residuals
0.0
0.5
1.0
Lag
p values for Ljung−Box statistic
1.5
2.0
●●● ●● ●●
2 4 6 8 10
lag
118 / 118



p value ACF
0.0 0.4 0.8 −0.2 0.2 0.6 1.0 −4 −2 0 2 4