程序代写 QBUS6840 Lecture 7 ARIMA Models (I)

QBUS6840 Lecture 7 ARIMA Models (I)

QBUS6840 Lecture 7

Copyright By PowCoder代写 加微信 powcoder

ARIMA Models (I)

The University of School

ARIMA Models

Box- : Part I

Online Textbook Sections 8.1-8.4
(https://otexts.com/fpp2/arima.html); and/or

BOK Ch 9 and Ch 10

https://otexts.com/fpp2/arima.html

Objectives

Conceptually distinguish between a stochastic process and a
time series

Understand the concept of stationarity

Be able to explain ACF/PACF, sample ACF/PACF

Be able to assess stationarity based on sample ACF

Fully understand AR(p) models and their basic properties

Be able to derive ACF, forecast and variance for AR(p) in
some cases

Be able to identify/apply transformations for stabilising time

Half-semester Recall…

Basic concepts: Forecasting problems, process of forecasting,
time series components, etc.

Time series decomposition: mainly for interpretation, can be
useful for forecasting

Exponential smoothing: can be used for both interpretation
and forecasting

A class of formal statistical time series models, often called
ARIMA models, based on a highly solid math background

Can capture complicated underlying patterns in the time
series, rather than trend and seasonality

Can be used as an alternative to, or in conjunction with, other
forecasting techniques such as Exponential Smoothing

Best textbook (in terms of theoretical foundation): Time
Series Analysis: forecasting and control. 1st ed. 1976 (Box
and Jenkins), 5th ed. 2015 (Box, Jenkins, Reinsel, Ljung).

Time Series verse Stochastic Processes

We have discussed so many time series. Each is a sequence of
numbers (sales, production, etc)

We introduced a number of ways to treat them: Smoothing,
Modelling and Forecasting

We rely on the patterns to decide what models to use and
project the patterns into future as our forecasts.

From now on, we will move further in theory, by considering a
(concrete) time series as a “product” from a “factory”

The factory is called a Process which is

Y1,Y2,Y3, · · · ,Yt , · · · , · · ·

where each Yt (t = 1, 2, …) is a Random Variable.

When we observe a (concrete) value yt for each Yt , we have
obtained a time series.

Stationarity

The Box-Jenkins method relies heavily on the concept of
stationarity

Definition

A time series process is strictly stationary when the joint
distribution (of the data) does not depend on time. That is, the
joint distribution of

Yt ,Yt+1, . . . ,Yt+k

does not depend on t for any k .

Think about the case of k = 0: For any t, Yt has the same
distribution.

Stationarity

The Box-Jenkins method relies heavily on the concept of
stationarity

Definition

A time series process is strictly stationary when the joint
distribution (of the data) does not depend on time. That is, the
joint distribution of

Yt ,Yt+1, . . . ,Yt+k

does not depend on t for any k .

Think about the case of k = 0: For any t, Yt has the same
distribution.

Visually Checking Stationarity

The mean of series should not be a function of time.

Picture is stolen from

http://www.blackarbs.com/blog/time-series-analysis-in-python-linear-models-to-garch/11/1/2016

Visually Checking Stationarity

The variance of the series should not be a function of time.

Picture is stolen from

http://www.blackarbs.com/blog/time-series-analysis-in-python-linear-models-to-garch/11/1/2016

Visually Checking Stationarity

The covariance of the i-th term and the (i + k)-th term should not
be a function of time.

Picture is stolen from

http://www.blackarbs.com/blog/time-series-analysis-in-python-linear-models-to-garch/11/1/2016

Stationarity
Illustration

Non-stationarity
Illustration

Australian seasonally adjusted quarterly GDP growth
(1959-2015)
Stationary or non-stationary?

S&500 returns
Stationary or non-stationary?

Weak stationarity

Definition

A process {Yt} is weakly stationary if its mean, variance and
covariance functions do not change over time. That is,

E(Yt) = µ,

V(Yt) = σ2,

and for each integer k,

Cov(Yt ,Yt−k) = Cov(Yt ,Yt+k) = γk ,

for all t.

The covariance or correlation depends on the time gap, i.e.,
k = t − (t − k)

Weak stationarity

Definition

A process {Yt} is weakly stationary if its mean, variance and
covariance functions do not change over time. That is,

E(Yt) = µ,

V(Yt) = σ2,

and for each integer k,

Cov(Yt ,Yt−k) = Cov(Yt ,Yt+k) = γk ,

for all t.

The covariance or correlation depends on the time gap, i.e.,
k = t − (t − k)

Strict and weak stationarity

If the mean, variance and covariances are finite (which is a
technical point really), then strict stationarity implies weak
stationarity.

Weak stationarity implies strict stationarity if and only if the
data is normally distributed.

Autocorrelation function (ACF)
Assessing stationarity

Measure the correlation between observations Yt and its
lagged values Yt−k , hence the name autocorrelation

Give insights into statistical models that best describe the
time series data

Box and Jenkins advocate using the ACF and PACF plots to
assess stationarity and identify a suitable model.

We may need to apply a suitable variance stabilising transform

Autocorrelation function (ACF)

Definitions

(Yt − µ)(Yt+(or −)k − µ)

V(Yt)V(Yt+(or −)k)

= Corr(Yt ,Yt+(or −)k).

Sample ACF:

t=1 (yt+k − y)(yt − y)∑N

t=1(yt − y)2

what is the value of ρ0 and r0?

What we have done with ρk is to measure the correlation of Y1
and Y1+k , Y2 and Y2+k , etc., where k is called the lag value.
For Sample ACF, we can see that, e.g., when k = 2, we compare

the curve {y1, y2, y3, …., yN−2} with the curve {y3, y4, y5, …, yN}

Sample ACF
Regression Explanation (optional)

Given a time series {y1, y2, …, yN} and a lag k, consider the
following linear regression

yt+k − y = γ(yt − y) think of it as Y = γX

Consider data set
X y1 − y y2 − y y3 − y · · · yN−k − y
Y y1+k − y y2+k − y y3+k − y · · · yN − y

Then according to the least square regression solution

t=1 (yt − y)(yt+k − y)∑N−k

t=1 (yt − y)2

which is close to rk .

Autocorrelation function (ACF)
Standard errors (optional)

Often, we want to test whether or not H0 : ρk = 0, based on
the sample ACF rk . This is done using a t-test

Standard error of rk :

, if k = 1,√

, if k > 1

The t-statistic is defined as

Often, we reject the hypothesis H0 : ρk = 0 if trk > 2.

(Sample) ACF Plots

An ACF Plot is a bar plot, such that the height of bar at lag k

We say that the plot has a spike at lag k if rk is significantly
large, i.e. its t-statistics trk > 2

The plot cuts off after lag k if there are no spikes at lags
greater than k

We say the ACF plot dies down if the plot doesn’t cut off, but
decreases in a steady fashion.

(Sample) ACF Plots
Behaviour of ACFs

This sample ACF plot has spikes at lags 1, 2 and 3.

(Sample) ACF Plots
Behaviour of ACFs

Assessing stationarity

We can assess the stationarity of {Yt} by assessing its (sample)
ACF plot. In general, it can be shown that for nonseasonal time

If the Sample ACF of a nonseasonal time series “cuts off” or
“dies down” reasonably quickly, then the time series should be
considered stationary.

If the Sample ACF of a nonseasonal time series “dies down”
extremely slowly or not at all, then the time series should be
considered nonstationary.

S&P 500 index

(a) Series (b) ACF

Visitor arrivals in Australia

(c) Series (d) ACF

Alcohol related assaults in NSW

(e) Series (f) ACF

Stationary?

Transforming

If the ACF of a time series dies down extremely slowly, data
transformation is necessary

Trying first order differencing is always a good way. See
example Lecture07 Example01.py

Zt = Yt+1 − Yt , t = 1, …,N − 1

If the ACF for the transformed data {Zt} dies down extremely
slowly, the transformed time series should be considered
nonstationary. More transformations needed

For nonseasonal data, first or second differencing will generally
produce stationary time series values.

Transforming: original time series

Transforming: differenced time series

Partial ACF

Partial autocorrelations measure the linear dependence of one
variable after removing the effect of other variable(s) that
affect to both variables.

Yt = ρ10 + ρ11Yt−1 + εt

Yt = ρ20 + ρ21Yt−1 + ρ22Yt−2 + εt

Yt = ρk0 + ρk1Yt−1 + ρk2Yt−2 + . . .+ ρkkYt−k + εt

ρkk is the correlation between Yt and Yt−k , net of effects at
times t − 1, t − 2, . . . , t − k + 1.

For example, the partial autocorrelation of 2nd order measures
the effect (linear dependence) of Yt−2 on Yt after removing
the effect of Yt−1 on both Yt and Yt−2

Partial ACF: Calculation Examples

Each partial autocorrelation could be obtained as a series of
regressions of the form:

Yt ≈ ρ10 + ρ11Yt−1
Yt ≈ ρ20 + ρ21Yt−1 + ρ22Yt−2

Yt ≈ ρk0 + ρk1Yt−1 + ρk2Yt−2 + . . .+ ρkkYt−k

The estimate rkk of ρkk will give the value of the partial
autocorrelation of order k .

The meaning of ACF coefficient ρk is

Yt = ρ0 + ρkYt−k + εt

without considering other Yt−k+1, …,Yt−1.

(Sample) Partial ACF: The Formula (optional)

The Sample Partial ACF at lag k is

r1 if k = 1

rk−1,j rk−j

if k = 2, 3, …

rk,j = rk−1,j − rkk rk−1,k−j for j = 1, 2, …, k − 1

The standard error of rkk is

First Simple Process: White noise processes

A sequence of independently and identically distributed
random variables {εt : t = 1, 2, …} with mean 0 and finite
variance σ2.

yt = εt with εt ∼ N(0, σ2)

ρk = ρkk = 0, for all k ≥ 1.

Is this a stationary time series? Can you expect to capture
any predictable pattern in this time series?

What would the ACF plot look like for a white noise process?
See Lecture07 Example02.py

Autoregressive (AR) processes

AR(p) process:

Yt = c + φ1Yt−1 + φ2Yt−2 + . . .+ φpYt−p + εt ,

where εt is i.i.d. with mean zero and variance σ

Example: AR(1) process
Properties

Yt = c + φ1Yt−1 + εt ,

where εt is i.i.d. with mean zero and variance σ
2, i.e., {εt} is a

white noise process.
Unconditional:

E(Yt) = c + φ1E(Yt−1),

Under the assumption of stationarity E(Yt) = E(Yt−1), so

AR(1) process
Properties

Yt = c + φ1Yt−1 + εt ,

V(Yt) = φ21V(Yt−1) + σ

Under the assumption of stationarity V(Yt) = V(Yt−1), so

In general, we have

Cov(Yt ,Yt−k) = φ

Example: AR(1) process
Properties

Cov(Yt ,Yt−1) = Cov(c + φ1Yt−1 + εt ,Yt−1)

= Cov(c ,Yt−1) + Cov(φ1Yt−1,Yt−1) + Cov(εt ,Yt−1)

= 0 + φ1V(Yt−1) + 0 = φ1V(Yt−1). Why?

Cov(Yt ,Yt−1)√
V(Yt)V(Yt−1)

Cov(Yt ,Yt−1)

Example: AR(1) process
Properties

Cov(Yt ,Yt−2) = Cov(c + φ1Yt−1 + εt ,Yt−2)

= Cov(φ1(c + φ1Yt−2 + εt−1),Yt−2)

= φ21V(Yt−2).

Thus, noting that V(Yt−2) = V(Yt−1) = V(Yt),

Cov(Yt ,Yt−2)

… (Similarly)

Cov(Yt ,Yt−k)

Example: AR(1) process

What happens to the ACF when −1 < φ1 < 1 and k increases? What happens when φ1 = 1? Lecture07 Example03.py By the definition of Partial ACF, it’s easy to see that: ρkk = 0 for all k > 1.

Example: AR(1) process

Example: AR(1) process

Example: AR(1) process

Example: AR(1) process

Example: AR(1) process
φ = 0.7 ACF (left) and Partial ACF (right)

Example: AR(1) process

When |φ1| < 1, the AR(1) process is weakly stationary ACF: ρk = φ 1 , k = 0, 1, 2, ... Partial ACF: ρkk = 0 for all k > 1

How to check if a time series is an AR(1)?

The sample ACF plot dies down in a steady fashion
The sample Partial ACF cuts off after lag 1.

AR(1) process: Forecasting

Yt+1 = c + φ1Yt + εt+1,

where εt is i.i.d. with mean zero and variance σ
2. Conditional:

Ŷt+1 = E(Yt+1|y1:t) = E(Yt+1|y1, . . . , yt)
= E(Yt+1|yt) = E(c + φ1yt + εt+1|yt)
= c + φ1yt + E(εt+1) = c + φ1yt

How good is the forecasting:

V(Yt+1|y1:t) = V(Yt+1|y1, . . . , yt) = V(Yt+1|yt)
= V(c + φ1yt + εt+1|yt)
= 0 + V(εt+1) = σ2

AR(1) process: Forecasting
Two steps-ahead

Ŷt+2 := E(Yt+2|y1:t)
= E(c + φ1Yt+1 + εt+2|y1:t)
= c + φ1E(Yt+1|y1:t)
= c + φ1(c + φ1yt)

= c(1 + φ1) + φ

V(Yt+2|y1:t) = V(c + φ1Yt+1 + εt+2|y1:t)
= φ21V(Yt+1|y1:t) + σ

= (1 + φ21)σ

Example: AR(1) process
Forecasting

Ŷt+h = c + φ1Ŷt+h−1

= c(1 + φ1 + φ
1 + . . .+ φ

V(Yt+h|y1:t) = φ21V(Yt+h−1|y1:t) + σ

= σ2(1 + φ21 + . . .+ φ

What happens as h gets larger?

Example: AR(1) process
In-sample fit illustration

The red curve is Ŷt|t−1, t = 2, …,N.

Example: AR(1) process
Forecasting illustration

AR(p) processes
Properties

Yt = c + φ1Yt−1 + . . .+ φpYt−p + εt ,

E(Yt) = c + φ1E(Yt−1) + . . .+ φpE(Yt−p)

Suppose it is stationary, then

1− φ1 − φ2 − . . .− φp

AR(p) processes
Properties

Yt = c + φ1Yt−1 + . . .+ φpYt−p + εt ,

V(Yt) = V(c + φ1Yt−1 + . . .+ φpYt−p + εt)

Can we continue like this?

V(Yt) = V(c) + V(φ1Yt−1) + . . .+ V(φpYt−p) + V(εt)

NO! because all
Cov(Yt−1,Yt−2) 6= 0

Under the stationary condition, it can be proved that

(1− ρ211)(1− ρ
22) . . . (1− ρ2pp)

AR(p) processes
Properties

Yt = c + φ1Yt−1 + . . .+ φpYt−p + εt ,

V(Yt) = V(c + φ1Yt−1 + . . .+ φpYt−p + εt)

Can we continue like this?

V(Yt) = V(c) + V(φ1Yt−1) + . . .+ V(φpYt−p) + V(εt)

NO! because all
Cov(Yt−1,Yt−2) 6= 0

Under the stationary condition, it can be proved that

(1− ρ211)(1− ρ
22) . . . (1− ρ2pp)

AR(p) processes
Properties

Yt = c + φ1Yt−1 + . . .+ φpYt−p + εt ,

V(Yt) = V(c + φ1Yt−1 + . . .+ φpYt−p + εt)

Can we continue like this?

V(Yt) = V(c) + V(φ1Yt−1) + . . .+ V(φpYt−p) + V(εt)

NO! because all
Cov(Yt−1,Yt−2) 6= 0

Under the stationary condition, it can be proved that

(1− ρ211)(1− ρ
22) . . . (1− ρ2pp)

AR(p) processes
Properties

Yt = c + φ1Yt−1 + . . .+ φpYt−p + εt ,

V(Yt) = V(c + φ1Yt−1 + . . .+ φpYt−p + εt)

Can we continue like this?

V(Yt) = V(c) + V(φ1Yt−1) + . . .+ V(φpYt−p) + V(εt)

NO! because all
Cov(Yt−1,Yt−2) 6= 0

Under the stationary condition, it can be proved that

(1− ρ211)(1− ρ
22) . . . (1− ρ2pp)

Example: AR(2) processes
Properties

Cov(Yt ,Yt−1) = Cov(c + φ1Yt−1 + φ2Yt−2 + εt ,Yt−1)

= φ1V(Yt−1) + φ2Cov(Yt−2,Yt−1)

Under the stationary condition we have

Cov(Yt ,Yt−1) = Cov(Yt−2,Yt−1) =

Cov(Yt ,Yt−1)√
V(Yt)V(Yt−1)

where we have used V(Yt) = V(Yt−1).

Example: AR(2) processes
Properties

Cov(Yt ,Yt−2) = Cov(c + φ1Yt−1 + φ2Yt−2 + εt ,Yt−2)

= φ2V(Yt−2) + φ1Cov(Yt−1,Yt−2)

Cov(Yt ,Yt−2)√
V(Yt)V(Yt−2)

where we have used V(Yt) = V(Yt−2).

Example: AR(2) processes
Properties

Cov(Yt ,Yt−3) = Cov(c + φ1Yt−1 + φ2Yt−2 + εt ,Yt−3)

= φ1Cov(Yt−1,Yt−3) + φ2Cov(Yt−2,Yt−3)

= φ1ρ2V(Yt−3) + φ2ρ1V(Yt−3).

where we have used ρ2 =
Cov(Yt−1,Yt−3)

Cov(Yt−2,Yt−3)

ρ3 = φ1ρ2 + φ2ρ1

ρk = φ1ρk−1 + φ2ρk−2,

AR(p) processes
Properties Summary

The process is defined as

Yt = c + φ1Yt−1 + φ2Yt−2 + · · ·+ φpYt−p + εt

where εt is i.i.d. with mean zero and variance σ
2. It can be

shown that

ACF ρk dies down exponentially.
PACF ρkk cuts off to zero after lag p.

These properties are useful to recognize an AR(p) process.

AR(p) processes
Forecasting

ŷt+h = E(Yt+h|y1:t) = c + φ1E(Yt+h−1|y1:t) + . . .+ φpE(Yt+h−p|y1:t),

E(Yt+h−i |y1:t) =

ŷt+h−i if h > i

yt+h−i if h ≤ i .

For example, consider AR(3),

Yt+1 = c + φ1Yt + φ2Yt−1 + φ3Yt−2 + εt+1

ŷt+1 = c + φ1yt + φ2yt−1 + φ3yt−2

ŷt+2 = c + φ1ŷt+1 + φ2yt + φ3yt−1

ŷt+3 = c + φ1ŷt+2 + φ2ŷt+1 + φ3yt

AR(p) processes
Forecasting

ŷt+1 =c + φ1yt + φ2yt−1 + φ3yt−2

ŷt+2 =c + φ1ŷt+1 + φ2yt + φ3yt−1

=c + φ1(c + φ1yt + φ2yt−1 + φ3yt−2) + φ2yt + φ3yt−1

=c(1 + φ1) + (φ
1 + φ2)yt + (φ1φ2 + φ3)yt−1 + φ1φ3yt−2

ŷt+3 =c + φ1ŷt+2 + φ2ŷt+1 + φ3yt

= · · · · · ·

Finally what about the variance? (optional)

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com