Introducing Time Series Analysis
STAT317-455
Semester Two, 2021
Fidelio Statistical Services Introducing Time Series Analysis Semester Two, 2021 1 / 33
Time series data is pretty weird and can be a pain to handle in a general statistical software. An example of time series weirdness.
The length of months and years change so often that doing arithmetic with them can be unintuitive. Consider a simple operation, January 31st + one month. Should the answer be
1 February 31st (which doesn’t exist); or
2 March 4th (31 days after January 31); or
3 February 28th (assuming its not a leap year)?
A basic property of arithmetic is that a + b – b = a. Only solution 1. obeys this property, but it is an invalid date.
Also what time is 24 hours after 8pm on the 25th September? How about 8pm on the 26th September?
Another oddity is that the calendar has an a 28-year cycle.
Fidelio Statistical Services Introducing Time Series Analysis Semester Two, 2021 2 / 33
We need to put a statistical framework to what we have done so far.
Time Series
A (univariate) time series {yt , t = 1, …, T } is a sequence of observations on a single variable (random process), like sales, production, number of customers, ozone concentration, etc. We shall assume that observations are taken at equally spaced intervals or points in time (i.e. time is discrete) and {yt} is the shorthand for this.
Time series methods (TSM)
Operations, procedures applied to {yt , t = 1, . . . , T } (e.g. linear combinations of the time series observations), designed to extract features (such as trends, cycles, seasonals etc.) or focus on the information required for forecasting.
Fidelio Statistical Services Introducing Time Series Analysis Semester Two, 2021 3 / 33
TSM can either be based on a model (parametric) or nonparametric. What TSM you can consider depends on the scale of measurement of {yt}.
Examples
Measurement scales
1
Categorical (qualitative)
Nominal (yt = {Expansion, Recession }, xt = {Brand A, Brand B,
Brand C})
Ordinal (yt = {Sunny, Cloudy, Rainy }, xt = {Increase, No change, Decrease })
Quantitative
Interval scale (yt = {Temperature in degrees Celsius}) Ratio scale (yt = {Sales in $ }, yt = {No. of customers})
2
Fidelio Statistical Services Introducing Time Series Analysis Semester Two, 2021 4 / 33
Objectives of TSM
Our objectives will be twofold:
analysis Describing the main characteristics of the series; and
forecasting Predicting the likely evolution of the phenomenon – that is the next measurement(s)
For categorical series, we may want to describe/forecast the marginal or conditional probability of being in a particular state st (e.g. recession).
We will be dealing with TSM for quantitative series, with continuous support, and we will aim at describing/forecasting the (conditional) mean of {yt} (note, however, that for financial TS interest lies in the conditional variance, as a measure of risk).
Fidelio Statistical Services Introducing Time Series Analysis Semester Two, 2021 5 / 33
Forecasting is Hard
Fidelio Statistical Services Introducing Time Series Analysis Semester Two, 2021 6 / 33
Forecasting is Hard, Even for Professionals
Commonwealth plans to drift back to surplus show the triumph of experience over hope
Actual and forecast Commonwealth underlying cash balance
per cent of GDP 1
0 -1 -2 -3 -4 -5
Forecast made in
2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020
Fidelio Statistical Services Introducing Time Series Analysis Semester Two, 2021 7 / 33 Financial year ended
2011 2012 2013 2014
2015
2016 2017
Actual
We set off by describing a few stylized facts about time series.
We start with descriptive methods for the analysis of time series. The most important are the time series plot and the correlogram. Later on we will discuss the periodogram and the spectral density.
Perhaps the most important descriptive tool is the time series plot.
Fidelio Statistical Services Introducing Time Series Analysis Semester Two, 2021 8 / 33
Time series plots
The pair (t,yt) is displayed in a graph, where t is measured on the horizontal axis and yt on the vertical axis.
The time series plot is most helpful in revealing the key stylized facts, i.e. the essential features of the series (the presence of components such as trends, seasonality, etc.). It also may guide to to what TSM may be appropriate, though more usually which TSM not to try.
Fidelio Statistical Services Introducing Time Series Analysis Semester Two, 2021 9 / 33
NZ Arrivals
Fidelio Statistical Services Introducing Time Series Analysis Semester Two, 2021 10 / 33
Standard and Poors 500
Fidelio Statistical Services Introducing Time Series Analysis Semester Two, 2021 11 / 33
Transformations
The series may be easier to be analyzed on a transformed scale. A useful transformation is taking logarithms. It sometimes help to stabilize the variance.
Here we present a case when it stabilizes the amplitude of the seasonal fluctuations → no seasonal trends.
Fidelio Statistical Services Introducing Time Series Analysis Semester Two, 2021 12 / 33
Transformations Can Be Useful
NZ Visitor Arrivals
1920 1940 1960
2000 2020
1980
Month
NZ Visitor Arrivals (logged)
Number (,000) Number
6 8 10 12 0e+00 4e+05
1920 1940 1960
1980
2000 2020
Fidelio Statistical Services
Month
Introducing Time Series Analysis
Semester Two, 2021
13 / 33
We are often interested in analyzing growth rates rather than the levels:
100yt −yt−s yt −s
s is a positive integer which determines the horizon for growth measurement. For instance, if the series is quarterly (monthly) and s = 4 (s = 12), we measure the yearly growth rate.
Let us define the difference. When applied to the log-transformed series, it canbeshownthat∆kyt =yt −yt−k
∆klogyt=logyt−logyt−k≈yt−yt−k =rkt. yt −k
[It is actually the 1st order Taylor series approximation of rkt around 0; thus it works when rkt is “small’]
Fidelio Statistical Services Introducing Time Series Analysis Semester Two, 2021 14 / 33
Time series decomposition
Time series are often decomposed into a seasonal component and a non-seasonal one.
The latter is further decomposed into a trend and an irregular component.
Statistics NZ does this for the arrival of tourists using a non-parametric seasonal adjustment method (X-13ARIMA-SEATS).
Fidelio Statistical Services Introducing Time Series Analysis Semester Two, 2021 15 / 33
Statistical Services Introducing Time Series Analysis Semester Two, 2021 16 / 33
Descriptive analysis
Let yt,t = 1,…,T, be a time series. In most occurrences the usual statistical summaries for characterizing the distribution of {yt} that we learned in our first Statistics course are meaningless!!! In particular, they lack internal and external validity due to nonstationarity, possibly evolving distributions, etc.
Examples are
Sample mean: x ̄ = n1 nt=1 xt
Sample variance: σˆ2 = n1 nt=1(xt − x ̄)2
Sample skewness: Sˆ = n1 nt=1(xt−x ̄)3 σˆ 3
Sample kurtosis, quantiles, etc.
Fidelio Statistical Services Introducing Time Series Analysis Semester Two, 2021 17 / 33
These statistics are meaningful as long as they measure time-invariant features of the series. That is, the beginning of series looks similar to end of series.
For the Dow Jones Euro Stoxx 50 returns the very high kurtosis is an important feature (a consequence of volatility). A more subtle feature is volatility clustering.
Ultimately you want the data generating process at the beginning to be similar to that at the end or, more usually, the DGP evolves slowly over time. That’s why you might not want to fit you TSM to the whole series, but “cut off” the beginning.
Fidelio Statistical Services Introducing Time Series Analysis Semester Two, 2021 18 / 33
Not Normal
Fidelio Statistical Services Introducing Time Series Analysis Semester Two, 2021 19 / 33
Autocorrelation
Economic time series are not independent over time.
On the contrary, they display a serial correlation feature, or autocorrelation, which is quintessential for forecasting.
This feature can be measured by computing the covariance and correlation between the sequence {yt} and the lagged sequence yt−h.
The lag h sample autocovariance is computed as follows:
1 n
γˆ(h) = n (yt − y ̄)(yt−h − y ̄)
t=h+1
Fidelio Statistical Services Introducing Time Series Analysis Semester Two, 2021 20 / 33
The sample autocorrelation coefficient at lag h is ρˆ ( h ) = γˆ ( h ) .
γˆ(0)
We will explain later why it is so. For the time being let us stress that we will be able to make sense of ρˆ(h) only in particular situations (presumed stationarity).
The barplot of ρˆ(h) vs h is known as the correlogram, or sample ACF (autocorrelation function), of {yt}
Fidelio Statistical Services Introducing Time Series Analysis Semester Two, 2021 21 / 33
Time series models
A time series {yt,t = 1,…,T} is a finite realization of a random process in discrete time, i.e. a collection of random variables indexed by t.
Time series analysis can be performed by providing a model for the stochastic process. A key issue is modelling the serial dependence across the r.v.’s.
Example 1: yt N(β0 + β1t, σ2), or equivalently,
yt =β0+β1t+εt,εt NID(0,σ2).
Example2: yt|yt−1 N(φyt−1,σ2),whichisthesameasyt =φyt−1+εt, εt NID(0,σ2)
Fidelio Statistical Services Introducing Time Series Analysis Semester Two, 2021 22 / 33
To make successful inferences on the stochastic process from a time series, the former has to possess certain properties. We focus on the class of covariance (second order) stationary processes.
Def: Stationarity. {yt} is (covariance) stationary if ∀t:
E(yt) = μ < ∞ E(yt −μ)2 =γ(0)<∞
and
and
The autocovariance function, γ(h), is symmetric: γ(h) = γ(−h).
E[(yt − μ)(yt−h − μ)] = γ(h)
Fidelio Statistical Services Introducing Time Series Analysis Semester Two, 2021 23 / 33
Autocorrelation function
(ACF):
Properties:
ρ(0) = 1; |ρ(h)| < 1; ρ(h) = ρ(−h)
ρ(h) = γ(h) γ (0)
i ii iii
Fidelio Statistical Services
Introducing Time Series Analysis
Semester Two, 2021
24 / 33
A stationary stochastic process is uniquely characterized by the mean, the variance, and the autocovariance (or ACF). These quantities can be estimated from the available time series.
sample mean: μˆ = x ̄ = n1 nt=1 xt
sample variance: γˆ(0) = n1 nt=1(xt − x ̄)2
sample autocovariance: γˆ(h) = n1 nt=h+1(xt − x ̄)(xt−h − x ̄)
The ACF is estimated by ρˆ(h) = γˆ(h)/γˆ(0). The barplot (h, ρˆ(h)) is the correlogram.
Fidelio Statistical Services Introducing Time Series Analysis Semester Two, 2021 25 / 33
QGDP
Fidelio Statistical Services Introducing Time Series Analysis Semester Two, 2021 26 / 33
ACF QGDP – Two Views
Fidelio Statistical Services Introducing Time Series Analysis Semester Two, 2021 27 / 33
Simple tests for White Noise – Autocorrelation tests
If xt WN(0,σ2) then a large sample approximation to the distribution of ρˆ for this series is ρˆ ∼ N(0, n1 ).
This result can be used to test H0 : ρ(h) = 0. A test with approximate size 5% rejects H0 if the sample autocorrelation ρˆ(h) lies outside the interval
[ − 2/√n, 2/√n].
The Ljung-Box test statistic
m 2
ρˆ ( h )
Q(m) = n(n + 2) h=1
n−h Under H0, Q(m) χ2 with m degrees of freedom.
isusedtotestH0 :ρ(h)=0,∀h>0.
Fidelio Statistical Services Introducing Time Series Analysis Semester Two, 2021 28 / 33
Forecasting
Time series forecasting: it is like driving a car blindfolded while following directions given by a person looking out of the back window (do not try this!).
A forecast is a statement about the future r.v. yn+m. This statement is conditional on the information available. The information available at time t will be denoted Ft. Usually, Ft = {yt,yt−1,…,y1}.
yn+m is a random variable
a forecast is a conditional statement concerning f (yn+m|Fn) the integer h > 0 is the forecast horizon.
Fidelio Statistical Services Introducing Time Series Analysis Semester Two, 2021 29 / 33
The forecast statement
The statement concerning yn+m can take essentially 3 forms:
Point forecast : provide a representative value for the location of the distribution f (yn+m|Fn), e.g. E(yn+m|Fn), the median, the mode. The choice depends on the loss function.
Interval forecast : provide [l, u] s.t. P(l ≤ yn+m ≤ u|Fn) = 1 − α. The forecast interval is not necessarily symmetric e.g. build model and forecast on logged data, but need to provide measured values.
Density forecast : provide an estimate fˆ(yn+m|Fn)
Fidelio Statistical Services Introducing Time Series Analysis Semester Two, 2021 30 / 33
Loss function
Let yˆ be a point forecast of the r.v. y . The forecast error is the tt
difference:
e = y − yˆ ttt
The loss function measures the cost associated with e, L(e). It has the following properties:
L(e) ≥ 0;
L(0) = 0;
Continuous and increasing in |e|, i.e. the size of the forecasting error.
Fidelio Statistical Services Introducing Time Series Analysis Semester Two, 2021 31 / 33
Point forecasts under squared loss
Let yˆ denote a point forecast, a single value that provides the best n+m|n
summary (a representative value) for the random variable yn+m. The error associated with such forecast is
e =y −yˆ . n+m|n n+m n+m|n
This is a random variable. We say that yˆ n+m|n
is conditionally unbiased if
E(en+m|n|Fn) = 0.
Obviously, the conditional expectation E(yn+m|Fn) is conditionally unbiased.
Notice that E(en+m|n|Fn) = 0 → E(en+m|n) = 0.
Fidelio Statistical Services Introducing Time Series Analysis Semester Two, 2021 32 / 33
Comparing Models
Once you start fitting models you need to have some process to compare
and select the “best” model. The Mean Square Forecast Error (MSFE)
associated to the point predictor yˆ is n+m|t
MSE(yˆ )=E(y −yˆ n+m|t n+m n+m|t
)2|F =Ee2 |F n n+m|t n
Under the loss function L(e) = e2, MSE(yˆ n+m|t
) is the expected loss. The minimum MSFE predictor of yn+m based on Fn = {y1, y2, . . . , yn} is
y ̃ =E(y |F) n+m|t n+m n
It is conditionally unbiased and no other unbiased predictor has smaller variance. So theoretically this is the best model.
Fidelio Statistical Services Introducing Time Series Analysis Semester Two, 2021 33 / 33