Applied Time Series Analysis
Section 3: AutoRegressive Integrated Moving Average (ARIMA) Models
Applied Time Series Analysis – ARIMA Models
Copyright By PowCoder代写 加微信 powcoder
1 Forecast
2 ARMA models
3 Autocorrelation and Partial Autocorrelation
4 Estimation and Forecast with ARMA
5 ARIMA models
6 Regression with Autocorrelated Errors
7 Multiplicative Seasonal ARIMA Models
Example – Forecasting Global Temperature
Forecast for the year 2050?
Time (year)
Global Temperature (first difference)
-0.5 0.0 0.5 1.0 1.5
Suppose we have data Y1, . . . , Yn and we would like to forecast the time series for time points Yt+m,m = 1,2,…,H up to some fixed horizon H.
Let Yn be a forecast of Y based on Y ,…,Y . That is, for some function g, n+m n+m 1 n
Yn =g(Y ,…,Y ). n+m 1 n
How to evaluate, if a forecast is good?
We need some loss function L(x,y), e.g., |x −y|, |x −y|2. The best forecast should minimize the expected loss, that is
min(E[L(Y n , Y )]) n+m n+m
Given data only, we will not be able to compute the expectation. Best we can do is estimate it.
Recall the example of estimating the mean by the sample mean. We need assumptions like stationarity to have a chance of estimating expectations.
Is our data stationary?
In the previous chapter, we learned techniques to transform (e.g. non-linear transformations, detrending, differencing) our data Y1, . . . , Yn to a stationary time series, say, X1,…,Xn.
For instance,
Yt =δ0+tδ1+Xt.
=⇒ Pipeline: Transform data → Forecast stationary data Xn
get Y n . n+m
How to forecast Xn ? n+m
→ Transform back to
We only want to assume stationarity but not strictly stationarity. =⇒ We need to keep it simple.
How to keep it simple?
Ifg linear,e.g.,g(x1,…,xn)=c0+nj=1cjxj andL(x,y)=|x−y|2,then n
min(E[L(Xn ,X )])=minE[(X −c −cX)2]. n+mn+m n+m0jj
j=1 Only first and second moments are involved. Stationarity is enough.
Remark – Loss Function
The squared loss L(x , y ) = |x − y |2 has the advantage that it is differentiable, i.e.,
n minE[(Xn+m −c0 −cjXj)2]
can be solved analytically and only first and second moments show up.
We already know how to estimate the mean and the ACF under stationarity.
BUT, when dealing with a real problem, keep in mind what square loss means and if it fits your problem.
Example – Loss Function
Squared loss Absolute loss check loss (0.25)
-2 -1 0 1 2 x
Loss funcion 01234
Example – Loss Function
Suppose, you work for a power supplier.
Based on your forecast for the next day, you buy a supply of x electricity for your customers on the future market. Let the demand be y. There is also some spot market tomeetdemand. Ify−x>0youneedtobuyatthespotmarket,ify−x<0you need to sell at the spot market. 1
1 Absolute loss: Buying additional electricity cost you as much as selling additional electricity. The price is independently of the amount of electricity you buy/sell.
2 Quadratic loss: Buying additional electricity cost you as much as selling additional electricity. The costs increase the more electricity you sell/buy.
3 Check loss: The cost for buying and selling can be different. The cost are independently of the amount of electricity you buy/sell.
Depending on how your cost setup is, you should do you forecast.
1In both cases it costs you something, since you sell the electricity to your customers based on something like the future price and not spot price. Buying at spot market means you buy for a higher price then you sell. If you sell at the spot market you might get a very low price, so you sell lower than you bought it.
Example - Loss Function
For the problem
argmincEL(X −c)
1 Absolute loss: c = median(x)
2 Quadratic loss: c = mean(x)
3 τ -Check loss: c = τ − quantile(x ) (τ =0.25 in the figure)
=⇒ Which model is the best model depends on the loss function.
We focus here on the quadratic loss since it is mathematically most simple and need to
deal only with mean and ACF.
If X is Gaussian, median(x) = mean(x)
Definition - Prediction (BLP) for a Stationary TS
Definition 3.1
Let {Xt} be a stationary time series and we have data X1,...,Xn and the aim is to
forecast Xn+m and its forecast based on X1, . . . , Xn is denoted by Xn+m.
Xn is the best linear m-step ahead predictor for X , if it is linear in X ,...,X n+m n+m1n
such that for some coefficients X n = b + n b X and the coefficients are given n+m 0 j=1jj
(b0, b1, . . . , bn) = argminb0,b1,...,bn E[(Xn+m − b0 − bj Xj )2].
The mean square m-step ahead prediction error is given by n
Pn =E[(X −b −bX)2]. n+m n+m0 jj
Property - Prediction for a Stationary TS
We differentiate with respect to b0, . . . , bn and set the result equal to zero. This leads to (let X0 ≡ 1)
n E[Xk(Xn+m−b0−bjXj)]=0, fork=0,...,n
This leads to b0 = μX(1−nj=1 bj). Hence, we can rewrite the above for k = 1,...,n
E[Xk[(Xn+m −μX)−bj(Xj −μX)]]=0.
Using the autocovariance function γX , we rewrite this as
n γX(n+m−k)−bjγX(j−k)=0, k=1,...,n.
Property - Prediction for a Stationary TS
Let us write the previous equation system in matrix notation. Let γn =(γX(n+m−1),...,γX(m))⊤,bn =(b1,...,bn)⊤,and
Γn = (γX (i − j))i,j=1,...,n. Then, we have
γn = Γnbn. If Γ is invertible, we can solve for b = Γ−1γ .
Usually, Γn is invertible. Otherwise, we would achieve perfect prediction.2
2Since Γn is non-negative definite by definition, not invertible means λmin(Γn) = 0. This implies there exists some a = (a1,...,an) such that a⊤Γna = 0.
Wlog (stationarity), let an ̸= 0. Then, we can normalize a by dividing by an and we obtain
var(Xn − (−at /an )Xt ) = 0.
t=1 Hence,wecanperfectlypredictXn byX1,...,Xn−1.
Property - Prediction for a Stationary TS
Additionally, we have for the mean square m-step ahead prediction error nn
Pn =E[(X −b −bX)2]=E[(X −μ−b(X −μ))2] n+m n+m0 jj n+m jj
j=1 j=1 nn
=cov(Xn+m −bjXj,Xn+m −bjXj) j=1 j=1
nn =γX(0)−2bjγX(n+m−j)+bj1bj2γX(j1 −j2)
j =1 j1 ,j2
=γ (0)−2b⊤γ +b⊤Γ b =γ (0)−γ⊤Γ−1γ .
XnnnnnXnnn
is positive definite, γ⊤Γ−1γ is non-negatively increasing in n.
=⇒ Pn ≥ Pn+1 (on average and without estimation considered, more information
cannot lead to worse forecasts).
Remark - Prediction for a Stationary TS
Suppose we have daily historic data X1, . . . , Xn, where Xn is the data of today. We want to forecast tomorrow. That is, we do a one-step ahead prediction, i.e., Xn .
n+1 Then, the next day, we have data X1, . . . , Xn+1 and again want to forecast tomorrow.
For this, we solve γn = Γnbn.
That is, we do a one-step ahead prediction Xn+1.
So, we solve γn+1 = Γn+1bn+1.
This seems computationally costly. Every time, we need to invert a big matrix.
Can we do some update step instead?
The exists an update procedure called the Durbin- .
First, let us rewrite our equation system a bit. Let γn = (γ(1), . . . , γ(n)) (so re reorder the equations, last comes first, first comes last) and let φn = (φn,1, . . . , φn,n) such that
Γnφn = γn.
That means, the best one-step ahead predictor for Xn+1 is given by
Xn = n φn,n+1−jXj = n φn,jXn+1−j. n+1 j=1 j=1
Property - The Durbin-
Let ρ(h) = γ(h)/γ(0) be the ACF of our stationary time series. Set φ0,0 = 0 and P10 = γ(0).
Then, for n ≥ 1, we set
φn,n = where, for n ≥ 2
1 − n−1 φn−1,kρ(k) , k=1
(1 − φn,n)
ρ(n)−n−1 φn−1,kρ(n−k) k=1
n n−1 Pn+1 = Pn
φn,k =φn−1,k −φn,nφn−1,n−k,k=1,2,...,n−1.
How do we implement the BLP?
Given data stationary data X1 , . . . , Xn , how do we implement the BLP?
In general, we only have access to the data but we do not know the ACF.
We can estimate the ACF by γˆ(0), . . . , γˆ(n − 1).
What is the problem? Actually, two problems:
First, estimating γˆ(0), . . . , γˆ(n − 1) (and μˆ = X ̄n) means we are estimating n + 1 parameters using only n observations. Even if we assume μ = 0, we clearly have an overfit.
Second, given data X1,...,Xn γˆ(n−1) = n1(X1 −X ̄n)(Xn −X ̄n). That means γˆ(n−1) uses only 1 observation and consequently, is most likely a bad estimator.
What can we do?
How do we implement the BLP?
What can we do?
One option, use only data Xn−k,...,Xn, so we only need to estimate (using all
observations) γˆ(0), . . . , γ(k) (for a one-step ahead forecast).
Other option, we model the dependency of {Xt}, e.g. using the AR(p) or MA(q)
Objective: Come up with some model which describes the dependency such that the remaining randomness is a white noise.
E.g. we model {Xt} by an AR(1) model
Xt =aXt−1+εt.
Since a possible mean of the time series is already removed in the de-trending step, we work in the following under EXt = 0.
ARIMA models
Recall, the autocovariance describe the linear dependency of the time series.
Modeling linear dependency can be done best by linear models (e.g., autoregressive models or moving average models).
The AutoRegressive Integrated Moving Average (ARIMA) model is a combination of
• Autoregressive (AR) models
• Moving Average models (MA)
• Difference Operator (Integrated can be understood as inverse of difference like integration is inverse of differentiation)
Applied Time Series Analysis - ARIMA Models
1 Forecast
2 ARMA models
3 Autocorrelation and Partial Autocorrelation
4 Estimation and Forecast with ARMA
5 ARIMA models
6 Regression with Autocorrelated Errors
7 Multiplicative Seasonal ARIMA Models
Definition - Linear Filters as Polynomial
Recall B is the backshift operator and let us define with coefficients cj ∈ R, j ∈ Z satisfying j ∈Z |cj | < ∞ a linear filter for the time series {Xt } That is, we have
Yt = cjXt−j = cjBjXt. j∈Z j∈Z
Definition 3.2
Let cj ∈ R, j ∈ Z be coefficients satisfying j ∈Z |cj | < ∞. Then, we call the power series (roughly speaking polynomial) C(z) a linear filter. It is defined by
C(z) = cjzj, j∈Z
where z can be a complex number.
The backshift operator can be treated as a complex number of moduls one and we apply a linear filter to a time series by inserting the backshift operator B for z, that is C(B) = j∈Z cjBj.
Definition - Autoregressive Polynomial and Operator
Definition 3.3
The autoregressive polynomial of order p with coefficients φ1, φ2, . . . , φp ∈ R, φp ̸= 0
is defined as
φ(z ) = 1 − φj z j ,
where z can be any complex number. The corresponding autoregressive operator is
defined as φ(B ) = 1 − pj =1 φj B j . Then, we can write an AR(p) model as
φ(B)Xt =εt.
Remark - Autoregressive Polynomial and Operator
Let φ(z) = 1 − 0.5z and φ(B)Xt = εt. If φ(z)−1 = 1/φ(z) exists, we could obtain φ(B)−1φ(B)Xt = Xt = φ(B)−1εt.
How is φ(z)−1 looking alike?
Wehaveφ(z)=0⇐⇒z=2. Hence,φ(z)>0forall|z|<2. Thatmeans 1/φ(z) = 1/(1 − 0.5z) is well defined for all |z| < 2.
Recall the geometric series: For any |ρ| < 1 we have ∞j =0 ρj = 1/(1 − ρ). Forany|z|<2wehave|0.5z|<1. Hence,1/φ(z)=1/(1−0.5z)=∞j=00.5jzj.
That gives us
Xt = φ(B)−1εt = 0.5jBjεt = 0.5jεt−j.
=⇒ We can write the autoregressive processes as a MA(∞)-process. MA processes are always stationary, hence this autoregressive process is stationary.
Remark - Autoregressive Operator
The arguments for φ(z) = 1 − 0.5z clearly holds for any φ(z) = 1 − φ1z with |φ1| < 1. WhataboutAR(p)polynomials: φ(z)=1−pj=1φjzj. Howisφ(z)−1 lookingalike?
φ(z) is a is a polynomial of order p =⇒ it has p (possibly complex) roots. Let the roots be c1,...,cp
p 1 −1p 1−1 φ(z)= (1−c z)andφ(z) = (1−c z)
j=1 j j=1 j
We know already, if |c1 | < 1, we can write (1 − c1 z)−1 = ∞s=0(c1 )szs.
jjj Hence, if |cj| > 1 for all j = 1,…,p, we obtain
p∞1 ∞ φ(z)−1 = (c )szs = θkzk,
j=1s=0 j k=0
for some coefficients θk (can be obtain by comparing coefficients).
Property – Causal Autoregressive Processes
Let {Xt} be an autoregressive processes of order p given by
Xt =φjXt−j +εt,
where{εt}issomewhitenoiseandletφ(z)=1−pj=1φjzj bethecorresponding autoregressive polynomial.
If φ(z) ̸= 0 for all |z| ≤ 1, i.e. the roots of φ(z) are absolutely greater than 1, we have that
Xt =θkεt−k
for some coefficients {θk}. That implies that the autoregressive processes is stationary. We call Xt = ∞j=0 θkεt−k the stationary solution of the autoregressive processes. Since this solution depends only on current and past εt, i.e. it depends on εs,s ≤ t, we call the solution causal.
Remark – Autoregressive Operator
What happens in the other cases? Does φ(z)−1 exists? We know already that
Xt = Xt−1 + εt is the random walk and this is not stationary.
This AR processes has the polynomial
φ(z) = 1 − z
which leads to the root z = 1.
This also holds for AR(p) processes. If 1 is a root, we have no stationary solution.
Remark – Autoregressive Operator
WhataboutXt =2Xt−1+εt?
We have the polynomial φ(z) = 1 − 2z and this has the root z = 0.5. We can write
φ(z)=1−2z =2z(1/(2z)−1). φ(z)−1 = 1 −1
2z 1 − 1/(2z)
We have |1/(2z)| < 1 for |z| > 0.5, so we can use the geometric series argument again.
1−11∞ ∞1 φ(z)−1 = 2z1−1/(2z) = 2z(−(1/(2z))j)=−(2)jz−j
−1 ∞1j−j ∞1j
(2)B εt = We can generalize this to AR(p) processes.
Xt =φ(z) εt = We are going into the future!
Property – Stationary Solutions of Autoregressive Processes
Let {Xt} be an autoregressive processes of order p given by
Xt =φjXt−j +εt,
where{εt}issomewhitenoiseandletφ(z)=1−pj=1φjzj bethecorresponding autoregressive polynomial.
Ifφ(z)̸=0forall|z|≠ 1,wehavethat
Xt = θkεt−k
for some coefficients {θk}. That implies that the autoregressive processes is stationary. We call Xt = ∞j=0 θkεt−k the stationary solution of the autoregressive processes. If all roots of φ(z) are in modulus greater than 1, the solution is causal. If all roots are in modulus less then 1, the solution depends only on future εt’s.
Definition – Moving Average Polynomial and Operator
Definition 3.4
The moving average polynomial of order q with coefficients θ0 = 1,θ1,…,θq ∈ R,θp ̸= 0 is defined as
θ(z) = θjzj,
where z can be any complex number. The corresponding moving average operator is
defined as θ(B) = 1 + qj=1 θjBj.
Then, we can write an MA(q) model for some white noise {εt} with variance σε2 as
Xt =θ(B)εt. We use θ0 = 1 for normalization purposes.
Remark – Moving Average Operator
The same arguments used for the autoregressive operator apply to the moving average operator.
That means, if θ(z) ̸= 0 for all |z| ≤ 1, we can invert θ(z) such that Xt = θ(B)εt becomes θ−1(B)Xt = εt.
That means, we can write such a moving average processes as an AR(∞) processes,i.e.,
Xt =φjXt−j +εt.
We call this property invertibility of an MA processes.
If the MA polynomial has roots of modulus less then 1, then the autoregression is on future values.
Definition – AutoRegressive Moving Average(ARMA) Models
Definition 3.5
We denote a processes {Xt} an AutoRegressive Moving Average of order (p,q) process, in short ARMA(p,q), if for coefficients φ1,φ2,…,φp ∈ R,φp ̸= 0 and
θ1,θ2,…,θq ∈ R,θq ̸= 0 we have
Xt =φjXt−j +θsεt−s +εt,
where εt is a white noise with variance σε2. WecanwritethiswithARpolynomialφ(z)=1−pj=1φjzj andMApolynomial θ(z)=1+qj=1θjzj as
φ(B)Xt = θ(B)εt.
Remark – ARMA Models
Let{Xt}beanARMA(p,q)processeswithARpolynomialφ(z)=1−pj=1φjzj and MA polynomial θ(z) = 1 + qj=1 θj zj .
We can apply our knowledge of AR polynomials. Hence, if φ(z) ̸= 0 for all |z| ≠ 1, we have
Xt = φ−1(B)θ(B)εt.
Hence, {Xt} is stationary and we call Xt = φ−1(B)θ(B)εt again the stationary solution
of the ARMA(p, q) processes.
If φ(z) ̸= 0 for all |z| ≤ 1, the solution is again causal.
The same applies to the MA polynomial. If θ(z) ̸= 0 for all |z| ≠ 1, we can write the ARMA process as an AR(∞) processes, i.e., we have invertibility.
Remark – ARMA Models and Redundancy
Suppose we have the ARMA processes {Xt} given by
Xt =0.5Xt−1−0.5εt−1+εt.
This result in the AR polynomial φ(z) = 1 − 0.5z with root z = 2 and the MA polynomial θ(z) = 1 − 0.5z.
Lets look at the stationary solution
Xt = θ(B)εt = 1−0.5Bεt =εt.
This ARMA(1, 1) processes is a white noise! We have redundancy! What happened? The AR and MA polynomial have a common root.
Keep it simple! We want to avoid ARMA models with common roots. Keep this in mind, when fitting ARMA models.
φ(B) 1 − 0.5B
Remark – Why do we care about AR, MA and ARMA processes?
If {Xt} is a stationary time series and h∈Z |γ(h)| < ∞, then, we can write ∞
Xt =bjεt−j, j=0
where {εt} is some white noise processes.
This is called the Wold decomposition. {εt} does not need to be iid.
E.g., you have a two sided linear process with iid noise, then there is a one-sided MA representation with a possibly different noise which does not need to be iid.
If the autocovariance function is not only nonnegative definite but positive definite, there also exists an AR(∞) representation.
Xt =ajXt−j +εt.
It is to the same white noise as the MA(∞) representation.
These representations imply that any autocovariance structure can be approximated by a finite MA, AR or ARMA models. However, the order of the models can be large.
Applied Time Series Analysis - ARIMA Models
1 Forecast
2 ARMA models
3 Autocorrelation and Partial Autocorrelation
4 Estimation and Forecast with ARMA
5 ARIMA models
6 Regression with Autocorrelated Errors
7 Multiplicative Seasonal ARIMA Models
Property - Autocovariance Function of ARMA Models
Let {Xt} be an ARMA(p,q) processes with AR polynomial φ(z) = 1 − qj=1 φjzj, φ(z) ̸= 0 for all |z| ≠ 0, MA polynomial θ(z) = 1 + qj=1 θjzj, and {εt} with variance
σε2 the corresponding white noise.
We know, this ARMA processes possesses a stationary solution.
Xt =φ−1(B)θ(B)εt =ψjεt−j.
Hence, for the autocovariance function of {Xt } we have for h ≥ 0 (using our
knowledge of the ACF of linear processes)
γX(h) = cov(ψj1εt−j1,ψj2εt−j2) = ψjψj+hσε2
j1=0 j2=0 j=0 and γX(h) = γX(−h) for h < 0.
Property - Autocorrelation Function of ARMA Models
How does ρX (h) = γX (h)/γX (0) behave for large h?
For MA(q) processes, we know γMA(q)(h) = 0 for |h| > q and so ρMA(q)(h) = 0 for
Given the previous expression, it seems ρX (h) ̸= 0 for all h. How fast does it decline?
Let’slookatAR(1)modelXt =aXt−1+εt,|a|<1. Then,wehavethestationary solution Xt = ∞j=0 ajεt−j and we obtain the ACF
∞ a h σ 2 a j a j + h σ ε2 = ε
γ X ( h ) =
Hence, ρX (h) = ah. That is, the ACF decays geometrically fast with increasing h.
We may not have a neat expression for a general ARMA(p, q) processes, but our argument with the geometric series still applies. Since the MA polynomial is of finite order q, it does not affect the overall decay behavior.
=⇒ The ACF of an ARMA(p, q) processes decays geometrically fast with increasing h.
Example - ACF of an AR(2) and ARMA(4, 2)
The geometrical behavior is only in the long run.
For shorter horizons, we can have very different patterns. Let us look at
Xt =1.5Xt−1−0.75Xt−2+εt
Yt = 1.34Yt−1 − 1.88Yt−2 + 1.32Yt−3 − 0.8Yt−4 + εt + 0.71εt−1 + 0.25εt−2,
where εt is iid standard normal.
Example - AR(2) Processes
600 800 1000
AR(2) processes
Example - AR(2) Processes - ACF and MA coefficients
Autocorrelation of {Xt}
0 10 20 30 40 50 Index
MA coefficients {ψk}
0 10 20 30 40 50 Index
-0.5 0.0 0.5 1.0 1.5 -0.4 0.0 0.4 0.8
Example - ARMA(4, 2) Processes
600 800 1000
ARMA(4, 2) processes
-20 -10 0 10 20
Example - ARMA(4, 2) Processes - ACF and MA coefficients
Autocorrelation of {Yt}
0 50 100 150 200 Index
MA coefficients {ψk}
600 800 1000
-1 0 1 2 -0.5 0.0 0.5 1.0
Example - ACF of an AR(2) and ARMA(4, 2)
To better understand the behavior, we need to look at the roots of the AR polynomials.
AR(2) processes with
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com