ARMA Models Definitions
7. ARMA Models (Shumway, Stoffer 3)
7.1 Introduction
ARMA models is a class of models for stationary time series. It is probably the most important class of time series models.
There exist many extensions like ARIMA and GARCH for non-stationary series. (Not covered in this course.)
To understand ARMA models we need to take the perspective of random variables and use some (very) basic probability theory.
We start by repeating some basic definitions.
27/74
ARMA Models Definitions
7.2 Definitions Definition
A time series is a family of random variables Xt1 , Xt2 , . . . , Xtn indexed in time with t1 < t2,... < tn. We also write
{Xt|t = t1,t2,...,tn}.
Observations of the time series are denoted by xt1 , xt2 , . . . , xtn .
The value of xt is called the state of the time series at time t. Remark
As usual, we use capital letters for random variables and lowercase letters for realisations (observations) of random variables.
Hence, xt1 , xt2 , . . . , xtn is time series data while the time series is a probabilistic object.
We introduce a new random variable for every time ti .
28/74
ARMA Models Definitions
Remember
Random variables are the mathematical model for processes that are not entirely predictable.
We can assign probabilities to certain outcomes, e.g. P(Xt ≤x)=pwithp∈[0,1].
Random variables can have an expectation E(Xt) and variance Var(Xt). These describe aspects of the probabilistic behaviour of Xt.
When we observe a random variable, we collect data xt. We cannot assign probabilities to xt. Once the random
variable has been observed the outcome is certain.
For data only sample average and sample variance exist. These are different form expectation and variance of Xt.
29/74
ARMA Models Definitions
Time series are stochastic processes. Definition
A stochastic process is a collection of random variables {Xt|t = M}
indexed by some index set M. Remark
Stochastic process are more general than univariate time series.
McanbeacomplicatedsetlikeM=R3 orM=S2. Also the state space can be a complicated object.
30/74
ARMA Models Measures of dependency
Remember
We describe properties of random variables X,Y by Expectation
E(X) = xP(X = x) (discrete),
E (X ) = xfX (x )dx (continuous).
Variance Var(X) = EX − E(X)2
Covariance Cov (X , Y ) = E (X − E (X ))(Y − E (Y ))
Cov(X,Y) Correlation Corr (X , Y ) = Var (X )Var (Y )
It is straight forward to generalise these concepts to time series.
31/74
ARMA Models Measures of dependency
Expectation function
In a time series the expectation depends on time. We define the expectation function μX (t) for Xt
μX(t) = E(Xt)
Autocovariance function
The autocovariance function γX (s, t) reports the covariance of observations at two points in time s and t
γX(s,t)=Cov(Xs,Xt)=E(Xs −μX(s))(Xt −μX(t)).
measures linear dependencies between observations.
is positive if Xt and Xs tend to go into the same direction. is negative if Xt and Xs tend to go into opposite directions. typically takes larger values when s and t are close.
is usually close or equal to 0 when s and t are farther apart. γX(s,t) = γX(t,s).
γX(t,t) = Var(Xt).
32/74
ARMA Models Measures of dependency
Autocorrelation function (ACF)
Similar to the autocovariance function γX (s, t), we define the autocorrelation function ρ(s,t) for two points in time s and t
γX(s,t) ρX(s,t) = Corr(Xs,Xt) = γX(s,s)γX(t,t).
ρX(s,t) = ρX(t,s)
ρX (t, t) = 1
The ACF is always −1 ≤ ρX(s,t) ≤ 1.
The ACF measures how perfect a linear dependency between observations in the series is.
IfXs =a+bXt,thenρX(s,t)=±1.
If no linear dependency exists, ρX (s, t) = 0.
Note
Expectation, autocovariance, and autocorrelation function are defined for random variables and not for data. There are sample counterparts, but they only work for stationary time series.
33/74
ARMA Models Measures of dependency
Stationarity
A stationary time series does not change its probabilistic behaviour (too much) over time.
Definition
A time series Xt is strictly stationary if for any collection of values (Xt1,Xt2,...,Xtk )
the joint probability distribution is the same as for the shifted values
(Xt1+h, Xt2+h, . . . , Xtk +h) for any shift h. That is
P(Xt1 ≤ c1,...,Xtk ≤ ck) = P(Xt1+h ≤ c1,...,Xtk+h ≤ ck) for all c1,c2,...ck ∈ R.
34/74
ARMA Models Measures of dependency
Two strictly stationary time series
√
Moving average Xt = 1/ 3(Wt + Wt−1 + Wt−2).
Strong Gaussian white noise Wt with σW = 1.
35/74
ARMA Models Measures of dependency
Remark
One consequence of strong stationarity is that all observations have the same distribution. We can write Xt ∼ X1.
This is a very strong assumption and usually more than we need. We will use the concept of weak stationarity.
Definition
A time series Xt is weakly stationary if it has finite variance and μx(t) is the same for all t.
γX(t,t +h) does only depend on h but not on t.
Note
All observations Xt have the same mean and variance but not necessarily the same distribution.
A strictly stationary series with finite variance is also weakly stationary but not the other way around.
36/74
ARMA Models Properties
Notation We will write
stationary instead of weakly stationary
μX instead of μX (t)
γX(h) instead of γX(t,t +h) ρX(h) instead of ρX(t,t +h)
Properties
The autocovariance and autocorrelation functions γX (t, t + h) and ρX(t,t +h) only depend on h but not on t.
γX (h) = γX (−h)
ρX (h) = ρX (−h)
Remark
Stationarity is a very important concept in time series analysis.
It only makes sense for random variables and not for data.
Many time series methods are only valid for stationary time
series. 37 / 74
ARMA Models Properties
Example
Examples for approximately stationary time series are Differences of log stock prices, i.e. returns.
Log exchange rates.
Electroencephalogram.
Non-stationary observations can often be transformed to approximately stationary series by detrending, log, or differencing.
38/74
ARMA Models
Idea of ARMA models
7.3 ARMA models
ARMA models are build from three simpler type of models White noise model
Auto regressive models Moving average models
The acronym ARMA stands for
Auto Regressive Moving Average model.
Let us introduce these models step by step.
39/74
ARMA Models
Idea of ARMA models
White noise
Model for measurement errors and uninformative noise
Building block for several more complex models
Weak white noise
Random variables Wt, t = t1,t2,...,tn with
mean zero, i.e. μW (t) = E(Wt) = 0 for all t. finitevariance,i.e. γW(t,t)=Var(Wt)<∞forallt. homoscedasticity, i.e. all have the same variance
We write Var(Wt) = σW2 .
uncorrelated,i.e. ρW(t,t′)=Corr(Wt,Wt′)=0fort̸=t′
WewriteWt ∼WN(0,σW2 ). Strong white noise
In addition, all Wt are independent.
We only need to assume week white noise in the the following.
40/74
ARMA Models
Idea of ARMA models
Strong Gaussian white noise σW = 1 , i.e. Wt ∼ N (0, σW2 ) i.i.d. 100 observations (left), 500 observations (right)
Strong uniform white noise σW = 1
100 observations (left), 500 observations (right)
41/74
ARMA Models
Idea of ARMA models
Autoregression
One objective of a time series model is to capture the dependency or correlation of present and past observations.
A very successful model for correlations among random variables is of course linear regression.
We regress Xt on past values. Autoregressive model
Xt =φ0 +φ1Xt−1 +φ2Xt−2 +···+φpXt−p +Wt with φ0,φ1,φ2,...,φp ∈ R, φp ̸= 0, and Wt white noise is an
autoregressive model of order p, denoted AR(p).
Remark
The AR(1) model with φ1 = 1 is called random walk
Xt =φ0+Xt−1+Wt.
It is an important model for stock prices. It is not stationary.
42/74
ARMA Models
Idea of ARMA models
= Xt−1 − 0.9Xt−2 + Wt.
Underlying strong Gaussian white noise Wt . 43 / 74
ARMA Models
Idea of ARMA models
Random walk with strong Gaussian white noise and volatility σW = 1. Top panel without drift φ0 = 0, bottom panel with drift φ0 = 0.1.
44/74
ARMA Models
Idea of ARMA models
The random walk has a similar behaviour as stock prices.
Standard and Poors 500
Random walk with strong Gaussian white noise δ = 0, σw = 0.7
45/74
ARMA Models
Idea of ARMA models
Autoregression concept
A time series with an autoregressive structure is assumed to be generated as a linear function of its past values plus a random shock.
Xt =φ0 +φ1Xt−1 +φ2Xt−2 +···+φpXt−p +Wt
The weighted contribution of the past p observations essentially means the process has memory, where the (infinite) past can help to predict the future.
It is common to observe that the contribution reduces as the lag increases but never goes back to 0.
Overall, this structure induces an autocorrelation with all the preceding values, which for a stationary process will reduce with increasing lag.
46/74
ARMA Models
Idea of ARMA models
Moving average
Sometimes we want the correlation to go back to 0 for large lags. This can be modelled by a moving average.
Xt =μ+Wt +θ1Wt−1 +θ2Wt−2 +...+θqWt−q
with μ,θ1,θ2,...,θq ∈ R, θq ̸= 0, and Wt white noise is a moving
average model of order q, denoted MA(q).
Concept
Xt andXs arecorrelatedifandonlyif|t−s|≤q. Ift≥s≥t−q,thenXt andXs arebothcorrelatedwith
Ws,Ws−1,...,Wt−q.
The current timepoint is just a weighted sum of recent errors.
The sequence (1,θ1,...,θq) determines the profile.
The model is often stated without μ. However, μ can been added to allow for a non-zero mean level.
47/74
ARMA Models
Idea of ARMA models
√
Moving average Xt = 1/ 3(Wt + Wt−1 + Wt−2).
Underlying strong Gaussian white noise Wt . 48 / 74
ARMA Models
Idea of ARMA models
Wold decomposition theorem
Another reason for MA models is given by the Wold decomposition theorem.
Theorem
Every weakly stationary time series can be represented as a linear combination of a sequence of uncorrelated random variables
Xt = μ + Wt + ψ1Wt−1 + ψ2Wt−2 + · · · where the white noise Wt ∼ WN(0,σW2 ) has finite variance
σW2 > 0, and the coefficients are square summable
∞
ψ j2 < ∞ . j=0
49/74
ARMA Models
Idea of ARMA models
Consequence of the Wold decomposition
Xt = μ + Wt + ψ1Wt−1 + ψ2Wt−2 + · · ·
Note that the sequence can be infinite. We can say that every
stationary time series is an MA(∞) process.
Almost all |ψj | must be small and only some can be large. Otherwise, the sum ∞j=0 ψj2 would not converge.
Hence, for every stationary Xt there is an MA(q) with sufficiently large q and with suitable white noise, that provides a good approximation.
This is a strong motivation to use MA models for stationary time series.
50/74
ARMA Models
Idea of ARMA models
ARMA
An AR(p) and an MA(q) model can be combined to an ARMA(p,q) model
Xt =φ0 +φ1Xt−1 +···+φpXt−1 +Wt +θ1Wt−1 +...θqWt−q or equivalently
Xt −φ0 −φ1Xt−1 −···−φpXt−1 =Wt +θ1Wt−1 +...θqWt−q. Problem
Although AR and MA are rather simple models it turns out to be quite tricky to combine them.
Challenges are to choose the right p and q, and to fit the model to data. (OLS works for AR but not for ARMA.)
We need to study some properties of AR(p) and MA(q) to make this work.
51/74