ARMA Models Parameter redundancy
7.8 Parameter redundancy
Remember: A zero mean ARMA(p, q) model is defined by
Xt −φ1Xt−1 −···−φpXt−1 = Wt +θ1Wt−1 +…θqWt−q φ(B)Xt = θ(B)Wt
with AR and MA polynomials (φp ̸= 0 and θq ̸= 0) φ(B) = 1−φ1B−···−φpBp
θ(B) = 1+θ1B+···+θqBq
and white noise, Wt ∼ WN (0, σW2 ) with finite variance σW2 > 0.
Possibly infinite MA can produce any stationary time series. We need to be careful that some features of the AR part are not already modelled by the MA part.
This could make some of the parameters in the model redundant.
119 / 138
ARMA Models Parameter redundancy
Example: Parameter redundancy
(Example 3.4 of Shumway and Stoffer) Consider the white noise process Xt = Wt .
Let’s consider lag 1 and rescale, to give 0.5Xt−1 = 0.5Wt−1
which can then be subtracted off both sides and rearranged Xt − 0.5Xt−1 = Wt − 0.5Wt−1.
This looks like an ARMA(1,1), but Xt is still white noise which is hidden by a parameter redundancy.
Let’s rewrite the possible ARMA(1,1) in backshift form (1 − 0.5B)Xt = (1 − 0.5B)Wt
Notice that there is the common factor (1 − 0.5B) on both sides. If we devide by the common factor, we get the original white noise model Xt = Wt .
120 / 138
ARMA Models Parameter redundancy
Common factors of the characteristic polynomial
Every polynomial factorises over the complex numbers.
This means for the characteristic polynomials of the AR and MA components
φ(x)=a0(a1 −z)(a2 −z)…(ap −z)
θ(x)=b0(b1 −z)(b2 −z)…(bq −z).
Hence, a1,…ap and b1,…bp are the roots of the characteristic polynomials.
The polynomials have a common factor if and only if the have a common root.
To check for parameter redundancy, we can look for common roots of the characteristic polynomials.
The characteristic polynomials in the example above
φ(z) = (1 − 0.5z) = θ(z) make the parameter redundancy obvious.
121 / 138
ARMA Models Parameter redundancy
Characteristic polynomials are key!
Three key features of the ARMA(p, q) can be checked using the characteristics polynomials.
AR(p) is stationary and invertible, when the roots of the characteristic polynomial φ(z) are outside the unit circle |z| > 1.
MA(q) is invertible if the roots of the characteristic polynomial θ(z) are outside the unit circle |z| > 1.
ARMA(p, q) has no parameter redundancy if the characteristic polynomials φ(z) and θ(z) have no common factors.
122 / 138
ARMA Models Parameter redundancy
Example: Parameter Redundancy
(Example 3.7 of Shumway and Stoffer) Consider the process
Xt − 0.4Xt−1 − 0.45Xt−2 = Wt + Wt−1 + 0.25Wt−3 The characteristic polynomials are factorised as
φ(z) = 1−0.4z−0.45z2 =(1+0.5z)(1−0.9z) θ(z) = 1+z+0.25z2 =(1+0.5z)2
> polyroot(c(1, -0.4, -0.45))
[1] 1.111111-0i -2.000000+0i
> polyroot(c(1, 1, 0.25))
[1] -2-0i -2+0i
So the AR component is stationary and the MA is invertible.
But both polynomials have a common factor due to the common root −2.
123 / 138
ARMA Models Parameter redundancy
Example: Parameter Redundancy
Xt − 0.4Xt−1 − 0.45Xt−2 = Wt + Wt−1 + 0.25Wt−3 Cancelling the common factor, shows this is actually the
ARMA(1, 1) model given by
(1 − 0.9B)Xt = (1 − 0.5B)Wt
Xt − 0.9Xt−1 = Wt + 0.5Wt−1.
It is a stationary because the AR root z = 10/9 is outside the
unit circle.
It is invertible because also the MA root z = −2 is the outside unit circle.
124 / 138
ARMA Models Parameter redundancy
Example: Parameter Redundancy
ARMA(1, 1)
Xt − 0.9Xt−1 = Wt + 0.5Wt−1
0
Autocorrelation
200 400
600
800 1000
Partial autocorrelation
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
sample acf
● ARMA(2,2) acf ARMA(1,1) acf
●●
●●
Time, t
●
sample acf
● ARMA(2,2) acf ARMA(1,1) acf
●
●
●●●●●●●●●●●●●●
0 5 10 15 20
5 10 15 20
lag, h
lag, h
125 / 138
●
●
●
−0.4
−0.2
0.0 0.2 0.4
0.6
0.8
0.0
0.2 0.4 0.6
0.8
1.0
Autocorrelation, rho(h)
Partial Autocorrelation, phi.hh
−10 −5 0 5
Xt
ARMA Models Forecasting
7.9 Forecasting
How does forecasting work with an ARMA(p, q) models? Assume we have a sample of data collected to the present
X1:n = {X1,X2,…,Xn}.
The parameters of the ARMA(p, q) are estimated from this
historical data.
In the following slides we will utilise the true parameters for notational convenience, which in practice will be replaced by their sample estimates.
For sake of simplicity, we will neglect the error coming from estimating the ARMA coefficients.
We will only consider stationary and invertible models.
126 / 138
ARMA Models Forecasting
Forecasting errors and loss function
We are interested in the m step ahead forecast X ̃n+m|n of Xn+m after the current timepoint n.
The mean squared error is the most commonly used “loss function” for evaluating the performance of forecasts
MSE(X ̃n+m|n) = E (Xn+m − X ̃n+m|n)2
= Var(Xn+m − X ̃n+m|n) + E[Xn+m − X ̃n+m|n]2
A straight forward forecast is the conditional expectation
X ̃n+m|n = E(Xn+m|X1:n). WhereX1:n denotesX1,X2,…,Xn.
This forecast minimises the MSE.
Textbook denotes the forecast xn+m, but the superscript n is often confused with a power so X ̃n+m|n is used here.
127 / 138
ARMA Models Forecasting
Forecasting with an AR(1) model
Let’s motivate the concepts using an AR(1) model
Xt = φ1Xt−1 + Wt whereWt ∼WN(0,σW2 )and|φ1|<1.
Consider a 1-step ahead prediction:
X ̃n+1|n =
E (Xn+1 |X1:n )
= E(φ1Xn + Wn+1|X1:n)
= φ1E(Xn|X1:n) + E(Wn+1|X1:n) = φ1Xn
Notes:
Xn has been observed so is not random E(Xn|X1:n) = Xn
Also Wn+1 is a future white noise error which is independent of the past and has mean zero:
E(Wn+1|X1:n) = E(Wn+1) = 0
128 / 138
ARMA Models Forecasting
Forecasting with an AR(1) model
The 1-step ahead prediction for an AR(1) is then
X ̃n+1|n = φ1 Xn .
The 2-step ahead prediction for an AR(1) is similarly
X ̃n+2|n =
E (Xn+2 |X1:n )
= E(φ1Xn+1 + Wn+2|X1:n)
= φ1E(Xn+1|X1:n) + E(Wn+2|X1:n) = φ21Xn.
In general, the m-step ahead prediction is (m ≥ 1) X ̃ n + m | n = φ 1 X ̃ n + m − 1 | n
= φ m1 X n
with the final result by recursion.
What will the m-step ahead prediction be for large m?
129 / 138
ARMA Models Forecasting
Forecasting with an AR(1) model
The m-step ahead prediction is (m ≥ 1) X ̃n+m|n = φm1Xn.
Since,|φ1|<1,wehaveφm1 →0asm→∞.
Thus, the forecasts will go to zero (or the mean if non-zero)
and so are not good for distant multi-step ahead forecasts. This feature is called mean reversion.
130 / 138
ARMA Models Forecasting
Example: Recruitment data
(Shumway and Stoffer, Example 3.25)
Recruitment (index of the number of new fish) for a period of 453 months ranging over the years 1950-1987.
library(astsa)
data(rec)
# fit AR(1) model using OLS
regr = ar.ols(rec, order = 1, demean = FALSE, intercept = TRUE)
# Do upto 50-step ahead forecasts
fore = predict(regr, n.ahead = 50)
# Plot the time series and the forecasts
# add sample mean line for comparison
ts.plot(rec, xlim = c(1980, 1992), ylab = "Recruitment")
lines(fore$pred, type = "b") # plot forecast
abline(h = mean(rec), lty = 3)
131 / 138
ARMA Models Forecasting
Example: Recruitment data
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
● ● ●
● ● ●
● ● ●
● ● ●
● ● ●
●
1980 1982 1984 1986 1988 1990 1992
Time
Don’t forecast too much into the future. It becomes unreliable quickly.
132 / 138
Recruitment
0 20 40 60 80 100
ARMA Models Forecasting
AR(1) mean square prediction error
The mean squared prediction error (MSPE) of the 1-step
ahead forecast with an AR(1) model is Var(X ̃n+1|n) = E[(Xn+1 − X ̃n+1|n)2|X1:n]
= E[(φ1Xn + Wn+1 − φ1Xn)2|X1:n] = σ W2 .
Similarly for the 2-step ahead forecast,
Var (X ̃n+2|n ) =
= E[(φ21Xn + φ1Wn+1 + Wn+2 − φ21Xn)2|X1:n]
= σW2 (1+φ21).
For general m-step ahead forecasts,
Var(X ̃n+m|n) = E[(Xn+m − X ̃n+m|n)2|X1:n]
= σ2 1+φ2 +φ4 +···+φ2(m−1). W111
E[(Xn+2 − X ̃n+2|n)2|X1:n]
133 / 138
ARMA Models Forecasting
AR(1) MSPE converges to process variance
The MSPE of the general m-step ahead forecast with an AR(1) model is
Var(X ̃n+m|n) =
= σ2 1+φ2 +φ4 +···+φ2(m−1).
E[(Xn+m − X ̃n+m|n)2|X1:n] W111
Hence, in the limit is
lim Var(Xn+m|n)=1−φ2.
̃ σW2 m→∞ 1
Which is the same as the AR(1) process variance Var(Xt).
134 / 138
ARMA Models Forecasting
Example: Recruitment data
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
● ● ●
● ● ●
● ● ●
● ● ●
● ● ●
1980 1982 1984 1986 1988 1990 1992
Time
ts.plot(rec, xlim = c(1980, 1992), ylab = "Recruitment")
lines(fore$pred, type = "b") # plot forecast
abline(h = mean(rec), lty = 3) # add sample mean line for comparison
lines(fore$pred + fore$se, lty = 2) # add confidence intervals
lines(fore$pred - fore$se, lty = 2)
135 / 138
Recruitment
0 20 40 60 80 100
ARMA Models Forecasting
Forecasting with an MA(1) model
We consider an invertible MA(1) model (|θ1| < 1)
Xt =Wt +θ1Wt−1.
The forecasts are X ̃n+1|n =
X ̃n+2|n =
For general m > 1 X ̃n+m|n = 0, i.e. the forecasts reverts to the mean (zero in this case) beyond the order q = 1.
E[Xn+1|X1:n]
= E[Wn+1|X1:n] + θ1E[Wn|X1:n]
= θ1Wn,
E[Xn+2|X1:n]
= E[Wn+2|X1:n] + θE[Wn+1|X1:n]
= 0.
136 / 138
ARMA Models Forecasting
MSPE for an MA(1) model
Mean square prediction error for MA(1) is
Var (X ̃n+1|n ) =
= E[(Wn+1 + θ1Wn − θ1Wn)2|X1:n]
E[(Xn+1 − X ̃n+1|n)2|X1:n] = σ W2 ;
Var (X ̃n+2|n ) =
= E[(Wn+2 + θ1Wn+1 − 0)2|X1:n]
E[(Xn+2 − X ̃n+2|n)2|X1:n]
= σW2 (1+θ12);
V a r ( X ̃ n + m | n ) =
= σW2 (1+θ12).
E[(Xn+m − X ̃n+m|n)2|X1:n]
The MSPE reverts to the MA(1) process variance beyond the order q = 1.
137 / 138
ARMA Models Forecasting
Forecast for ARMA(p,q) Models
The concepts can be extended to the general AR(p) or
MA(q).
In either case, the sample X1:n = {X1, X2, . . . , Xn} must be
large compared to p and q.
The general ARMA(p, q) can be represented as an MA(∞) or
an AR(∞) and the same arguments as above apply. However, there is only a finite sample so we cannot evaluate
the required infinite sums.
A practical workaround is to truncate the infinite sum, which often works well provided n is large as the coefficients in the infinite sum decay to zero.
The truncation concept is covered in the textbook Shumway and Stoffer.
138 / 138