QBUS6840 Lecture 7 ARIMA Models (I)
QBUS6840 Lecture 7
Copyright By PowCoder代写 加微信 powcoder
ARIMA Models (I)
The University of School
ARIMA Models
Box- : Part I
Online Textbook Sections 8.1-8.4
(https://otexts.com/fpp2/arima.html); and/or
BOK Ch 9 and Ch 10
https://otexts.com/fpp2/arima.html
Objectives
Conceptually distinguish between a stochastic process and a
time series
Understand the concept of stationarity
Be able to explain ACF/PACF, sample ACF/PACF
Be able to assess stationarity based on sample ACF
Fully understand AR(p) models and their basic properties
Be able to derive ACF, forecast and variance for AR(p) in
some cases
Be able to identify/apply transformations for stabilising time
Half-semester Recall…
Basic concepts: Forecasting problems, process of forecasting,
time series components, etc.
Time series decomposition: mainly for interpretation, can be
useful for forecasting
Exponential smoothing: can be used for both interpretation
and forecasting
A class of formal statistical time series models, often called
ARIMA models, based on a highly solid math background
Can capture complicated underlying patterns in the time
series, rather than trend and seasonality
Can be used as an alternative to, or in conjunction with, other
forecasting techniques such as Exponential Smoothing
Best textbook (in terms of theoretical foundation): Time
Series Analysis: forecasting and control. 1st ed. 1976 (Box
and Jenkins), 5th ed. 2015 (Box, Jenkins, Reinsel, Ljung).
Time Series verse Stochastic Processes
We have discussed so many time series. Each is a sequence of
numbers (sales, production, etc)
We introduced a number of ways to treat them: Smoothing,
Modelling and Forecasting
We rely on the patterns to decide what models to use and
project the patterns into future as our forecasts.
From now on, we will move further in theory, by considering a
(concrete) time series as a “product” from a “factory”
The factory is called a Process which is
Y1,Y2,Y3, · · · ,Yt , · · · , · · ·
where each Yt (t = 1, 2, …) is a Random Variable.
When we observe a (concrete) value yt for each Yt , we have
obtained a time series.
Stationarity
The Box-Jenkins method relies heavily on the concept of
stationarity
Definition
A time series process is strictly stationary when the joint
distribution (of the data) does not depend on time. That is, the
joint distribution of
Yt ,Yt+1, . . . ,Yt+k
does not depend on t for any k .
Think about the case of k = 0: For any t, Yt has the same
distribution.
Stationarity
The Box-Jenkins method relies heavily on the concept of
stationarity
Definition
A time series process is strictly stationary when the joint
distribution (of the data) does not depend on time. That is, the
joint distribution of
Yt ,Yt+1, . . . ,Yt+k
does not depend on t for any k .
Think about the case of k = 0: For any t, Yt has the same
distribution.
Visually Checking Stationarity
The mean of series should not be a function of time.
Picture is stolen from
http://www.blackarbs.com/blog/time-series-analysis-in-python-linear-models-to-garch/11/1/2016
Visually Checking Stationarity
The variance of the series should not be a function of time.
Picture is stolen from
http://www.blackarbs.com/blog/time-series-analysis-in-python-linear-models-to-garch/11/1/2016
Visually Checking Stationarity
The covariance of the i-th term and the (i + k)-th term should not
be a function of time.
Picture is stolen from
http://www.blackarbs.com/blog/time-series-analysis-in-python-linear-models-to-garch/11/1/2016
Stationarity
Illustration
Non-stationarity
Illustration
Australian seasonally adjusted quarterly GDP growth
(1959-2015)
Stationary or non-stationary?
S&500 returns
Stationary or non-stationary?
Weak stationarity
Definition
A process {Yt} is weakly stationary if its mean, variance and
covariance functions do not change over time. That is,
E(Yt) = µ,
V(Yt) = σ2,
and for each integer k,
Cov(Yt ,Yt−k) = Cov(Yt ,Yt+k) = γk ,
for all t.
The covariance or correlation depends on the time gap, i.e.,
k = t − (t − k)
Weak stationarity
Definition
A process {Yt} is weakly stationary if its mean, variance and
covariance functions do not change over time. That is,
E(Yt) = µ,
V(Yt) = σ2,
and for each integer k,
Cov(Yt ,Yt−k) = Cov(Yt ,Yt+k) = γk ,
for all t.
The covariance or correlation depends on the time gap, i.e.,
k = t − (t − k)
Strict and weak stationarity
If the mean, variance and covariances are finite (which is a
technical point really), then strict stationarity implies weak
stationarity.
Weak stationarity implies strict stationarity if and only if the
data is normally distributed.
Autocorrelation function (ACF)
Assessing stationarity
Measure the correlation between observations Yt and its
lagged values Yt−k , hence the name autocorrelation
Give insights into statistical models that best describe the
time series data
Box and Jenkins advocate using the ACF and PACF plots to
assess stationarity and identify a suitable model.
We may need to apply a suitable variance stabilising transform
Autocorrelation function (ACF)
Definitions
(Yt − µ)(Yt+(or −)k − µ)
V(Yt)V(Yt+(or −)k)
= Corr(Yt ,Yt+(or −)k).
Sample ACF:
t=1 (yt+k − y)(yt − y)∑N
t=1(yt − y)2
what is the value of ρ0 and r0?
What we have done with ρk is to measure the correlation of Y1
and Y1+k , Y2 and Y2+k , etc., where k is called the lag value.
For Sample ACF, we can see that, e.g., when k = 2, we compare
the curve {y1, y2, y3, …., yN−2} with the curve {y3, y4, y5, …, yN}
Sample ACF
Regression Explanation (optional)
Given a time series {y1, y2, …, yN} and a lag k, consider the
following linear regression
yt+k − y = γ(yt − y) think of it as Y = γX
Consider data set
X y1 − y y2 − y y3 − y · · · yN−k − y
Y y1+k − y y2+k − y y3+k − y · · · yN − y
Then according to the least square regression solution
t=1 (yt − y)(yt+k − y)∑N−k
t=1 (yt − y)2
which is close to rk .
Autocorrelation function (ACF)
Standard errors (optional)
Often, we want to test whether or not H0 : ρk = 0, based on
the sample ACF rk . This is done using a t-test
Standard error of rk :
, if k = 1,√
, if k > 1
The t-statistic is defined as
Often, we reject the hypothesis H0 : ρk = 0 if trk > 2.
(Sample) ACF Plots
An ACF Plot is a bar plot, such that the height of bar at lag k
We say that the plot has a spike at lag k if rk is significantly
large, i.e. its t-statistics trk > 2
The plot cuts off after lag k if there are no spikes at lags
greater than k
We say the ACF plot dies down if the plot doesn’t cut off, but
decreases in a steady fashion.
(Sample) ACF Plots
Behaviour of ACFs
This sample ACF plot has spikes at lags 1, 2 and 3.
(Sample) ACF Plots
Behaviour of ACFs
Assessing stationarity
We can assess the stationarity of {Yt} by assessing its (sample)
ACF plot. In general, it can be shown that for nonseasonal time
If the Sample ACF of a nonseasonal time series “cuts off” or
“dies down” reasonably quickly, then the time series should be
considered stationary.
If the Sample ACF of a nonseasonal time series “dies down”
extremely slowly or not at all, then the time series should be
considered nonstationary.
S&P 500 index
(a) Series (b) ACF
Visitor arrivals in Australia
(c) Series (d) ACF
Alcohol related assaults in NSW
(e) Series (f) ACF
Stationary?
Transforming
If the ACF of a time series dies down extremely slowly, data
transformation is necessary
Trying first order differencing is always a good way. See
example Lecture07 Example01.py
Zt = Yt+1 − Yt , t = 1, …,N − 1
If the ACF for the transformed data {Zt} dies down extremely
slowly, the transformed time series should be considered
nonstationary. More transformations needed
For nonseasonal data, first or second differencing will generally
produce stationary time series values.
Transforming: original time series
Transforming: differenced time series
Partial ACF
Partial autocorrelations measure the linear dependence of one
variable after removing the effect of other variable(s) that
affect to both variables.
Yt = ρ10 + ρ11Yt−1 + εt
Yt = ρ20 + ρ21Yt−1 + ρ22Yt−2 + εt
Yt = ρk0 + ρk1Yt−1 + ρk2Yt−2 + . . .+ ρkkYt−k + εt
ρkk is the correlation between Yt and Yt−k , net of effects at
times t − 1, t − 2, . . . , t − k + 1.
For example, the partial autocorrelation of 2nd order measures
the effect (linear dependence) of Yt−2 on Yt after removing
the effect of Yt−1 on both Yt and Yt−2
Partial ACF: Calculation Examples
Each partial autocorrelation could be obtained as a series of
regressions of the form:
Yt ≈ ρ10 + ρ11Yt−1
Yt ≈ ρ20 + ρ21Yt−1 + ρ22Yt−2
Yt ≈ ρk0 + ρk1Yt−1 + ρk2Yt−2 + . . .+ ρkkYt−k
The estimate rkk of ρkk will give the value of the partial
autocorrelation of order k .
The meaning of ACF coefficient ρk is
Yt = ρ0 + ρkYt−k + εt
without considering other Yt−k+1, …,Yt−1.
(Sample) Partial ACF: The Formula (optional)
The Sample Partial ACF at lag k is
r1 if k = 1
rk−1,j rk−j
if k = 2, 3, …
rk,j = rk−1,j − rkk rk−1,k−j for j = 1, 2, …, k − 1
The standard error of rkk is
First Simple Process: White noise processes
A sequence of independently and identically distributed
random variables {εt : t = 1, 2, …} with mean 0 and finite
variance σ2.
yt = εt with εt ∼ N(0, σ2)
ρk = ρkk = 0, for all k ≥ 1.
Is this a stationary time series? Can you expect to capture
any predictable pattern in this time series?
What would the ACF plot look like for a white noise process?
See Lecture07 Example02.py
Autoregressive (AR) processes
AR(p) process:
Yt = c + φ1Yt−1 + φ2Yt−2 + . . .+ φpYt−p + εt ,
where εt is i.i.d. with mean zero and variance σ
Example: AR(1) process
Properties
Yt = c + φ1Yt−1 + εt ,
where εt is i.i.d. with mean zero and variance σ
2, i.e., {εt} is a
white noise process.
Unconditional:
E(Yt) = c + φ1E(Yt−1),
Under the assumption of stationarity E(Yt) = E(Yt−1), so
AR(1) process
Properties
Yt = c + φ1Yt−1 + εt ,
V(Yt) = φ21V(Yt−1) + σ
Under the assumption of stationarity V(Yt) = V(Yt−1), so
In general, we have
Cov(Yt ,Yt−k) = φ
Example: AR(1) process
Properties
Cov(Yt ,Yt−1) = Cov(c + φ1Yt−1 + εt ,Yt−1)
= Cov(c ,Yt−1) + Cov(φ1Yt−1,Yt−1) + Cov(εt ,Yt−1)
= 0 + φ1V(Yt−1) + 0 = φ1V(Yt−1). Why?
Cov(Yt ,Yt−1)√
V(Yt)V(Yt−1)
Cov(Yt ,Yt−1)
Example: AR(1) process
Properties
Cov(Yt ,Yt−2) = Cov(c + φ1Yt−1 + εt ,Yt−2)
= Cov(φ1(c + φ1Yt−2 + εt−1),Yt−2)
= φ21V(Yt−2).
Thus, noting that V(Yt−2) = V(Yt−1) = V(Yt),
Cov(Yt ,Yt−2)
… (Similarly)
Cov(Yt ,Yt−k)
Example: AR(1) process
What happens to the ACF when −1 < φ1 < 1 and k increases? What happens when φ1 = 1? Lecture07 Example03.py By the definition of Partial ACF, it’s easy to see that: ρkk = 0 for all k > 1.
Example: AR(1) process
Example: AR(1) process
Example: AR(1) process
Example: AR(1) process
Example: AR(1) process
φ = 0.7 ACF (left) and Partial ACF (right)
Example: AR(1) process
When |φ1| < 1, the AR(1) process is weakly stationary ACF: ρk = φ 1 , k = 0, 1, 2, ... Partial ACF: ρkk = 0 for all k > 1
How to check if a time series is an AR(1)?
The sample ACF plot dies down in a steady fashion
The sample Partial ACF cuts off after lag 1.
AR(1) process: Forecasting
Yt+1 = c + φ1Yt + εt+1,
where εt is i.i.d. with mean zero and variance σ
2. Conditional:
Ŷt+1 = E(Yt+1|y1:t) = E(Yt+1|y1, . . . , yt)
= E(Yt+1|yt) = E(c + φ1yt + εt+1|yt)
= c + φ1yt + E(εt+1) = c + φ1yt
How good is the forecasting:
V(Yt+1|y1:t) = V(Yt+1|y1, . . . , yt) = V(Yt+1|yt)
= V(c + φ1yt + εt+1|yt)
= 0 + V(εt+1) = σ2
AR(1) process: Forecasting
Two steps-ahead
Ŷt+2 := E(Yt+2|y1:t)
= E(c + φ1Yt+1 + εt+2|y1:t)
= c + φ1E(Yt+1|y1:t)
= c + φ1(c + φ1yt)
= c(1 + φ1) + φ
V(Yt+2|y1:t) = V(c + φ1Yt+1 + εt+2|y1:t)
= φ21V(Yt+1|y1:t) + σ
= (1 + φ21)σ
Example: AR(1) process
Forecasting
Ŷt+h = c + φ1Ŷt+h−1
= c(1 + φ1 + φ
1 + . . .+ φ
V(Yt+h|y1:t) = φ21V(Yt+h−1|y1:t) + σ
= σ2(1 + φ21 + . . .+ φ
What happens as h gets larger?
Example: AR(1) process
In-sample fit illustration
The red curve is Ŷt|t−1, t = 2, …,N.
Example: AR(1) process
Forecasting illustration
AR(p) processes
Properties
Yt = c + φ1Yt−1 + . . .+ φpYt−p + εt ,
E(Yt) = c + φ1E(Yt−1) + . . .+ φpE(Yt−p)
Suppose it is stationary, then
1− φ1 − φ2 − . . .− φp
AR(p) processes
Properties
Yt = c + φ1Yt−1 + . . .+ φpYt−p + εt ,
V(Yt) = V(c + φ1Yt−1 + . . .+ φpYt−p + εt)
Can we continue like this?
V(Yt) = V(c) + V(φ1Yt−1) + . . .+ V(φpYt−p) + V(εt)
NO! because all
Cov(Yt−1,Yt−2) 6= 0
Under the stationary condition, it can be proved that
(1− ρ211)(1− ρ
22) . . . (1− ρ2pp)
AR(p) processes
Properties
Yt = c + φ1Yt−1 + . . .+ φpYt−p + εt ,
V(Yt) = V(c + φ1Yt−1 + . . .+ φpYt−p + εt)
Can we continue like this?
V(Yt) = V(c) + V(φ1Yt−1) + . . .+ V(φpYt−p) + V(εt)
NO! because all
Cov(Yt−1,Yt−2) 6= 0
Under the stationary condition, it can be proved that
(1− ρ211)(1− ρ
22) . . . (1− ρ2pp)
AR(p) processes
Properties
Yt = c + φ1Yt−1 + . . .+ φpYt−p + εt ,
V(Yt) = V(c + φ1Yt−1 + . . .+ φpYt−p + εt)
Can we continue like this?
V(Yt) = V(c) + V(φ1Yt−1) + . . .+ V(φpYt−p) + V(εt)
NO! because all
Cov(Yt−1,Yt−2) 6= 0
Under the stationary condition, it can be proved that
(1− ρ211)(1− ρ
22) . . . (1− ρ2pp)
AR(p) processes
Properties
Yt = c + φ1Yt−1 + . . .+ φpYt−p + εt ,
V(Yt) = V(c + φ1Yt−1 + . . .+ φpYt−p + εt)
Can we continue like this?
V(Yt) = V(c) + V(φ1Yt−1) + . . .+ V(φpYt−p) + V(εt)
NO! because all
Cov(Yt−1,Yt−2) 6= 0
Under the stationary condition, it can be proved that
(1− ρ211)(1− ρ
22) . . . (1− ρ2pp)
Example: AR(2) processes
Properties
Cov(Yt ,Yt−1) = Cov(c + φ1Yt−1 + φ2Yt−2 + εt ,Yt−1)
= φ1V(Yt−1) + φ2Cov(Yt−2,Yt−1)
Under the stationary condition we have
Cov(Yt ,Yt−1) = Cov(Yt−2,Yt−1) =
Cov(Yt ,Yt−1)√
V(Yt)V(Yt−1)
where we have used V(Yt) = V(Yt−1).
Example: AR(2) processes
Properties
Cov(Yt ,Yt−2) = Cov(c + φ1Yt−1 + φ2Yt−2 + εt ,Yt−2)
= φ2V(Yt−2) + φ1Cov(Yt−1,Yt−2)
Cov(Yt ,Yt−2)√
V(Yt)V(Yt−2)
where we have used V(Yt) = V(Yt−2).
Example: AR(2) processes
Properties
Cov(Yt ,Yt−3) = Cov(c + φ1Yt−1 + φ2Yt−2 + εt ,Yt−3)
= φ1Cov(Yt−1,Yt−3) + φ2Cov(Yt−2,Yt−3)
= φ1ρ2V(Yt−3) + φ2ρ1V(Yt−3).
where we have used ρ2 =
Cov(Yt−1,Yt−3)
Cov(Yt−2,Yt−3)
ρ3 = φ1ρ2 + φ2ρ1
ρk = φ1ρk−1 + φ2ρk−2,
AR(p) processes
Properties Summary
The process is defined as
Yt = c + φ1Yt−1 + φ2Yt−2 + · · ·+ φpYt−p + εt
where εt is i.i.d. with mean zero and variance σ
2. It can be
shown that
ACF ρk dies down exponentially.
PACF ρkk cuts off to zero after lag p.
These properties are useful to recognize an AR(p) process.
AR(p) processes
Forecasting
ŷt+h = E(Yt+h|y1:t) = c + φ1E(Yt+h−1|y1:t) + . . .+ φpE(Yt+h−p|y1:t),
E(Yt+h−i |y1:t) =
ŷt+h−i if h > i
yt+h−i if h ≤ i .
For example, consider AR(3),
Yt+1 = c + φ1Yt + φ2Yt−1 + φ3Yt−2 + εt+1
ŷt+1 = c + φ1yt + φ2yt−1 + φ3yt−2
ŷt+2 = c + φ1ŷt+1 + φ2yt + φ3yt−1
ŷt+3 = c + φ1ŷt+2 + φ2ŷt+1 + φ3yt
AR(p) processes
Forecasting
ŷt+1 =c + φ1yt + φ2yt−1 + φ3yt−2
ŷt+2 =c + φ1ŷt+1 + φ2yt + φ3yt−1
=c + φ1(c + φ1yt + φ2yt−1 + φ3yt−2) + φ2yt + φ3yt−1
=c(1 + φ1) + (φ
1 + φ2)yt + (φ1φ2 + φ3)yt−1 + φ1φ3yt−2
ŷt+3 =c + φ1ŷt+2 + φ2ŷt+1 + φ3yt
= · · · · · ·
Finally what about the variance? (optional)
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com