Day 1
Day 1
MAS 640
1/17/2018
Reading
I Chapter 1 – Introduction
I Chapter 2 – Fundamental Concepts
Time Series
A time series is a sequence of ordered data. The “ordering” refers
generally to time, but other orderings could be envisioned (e.g., over
space, etc.).
In this class, we will be concerned exclusively with time series that
are
I measured on a single continuous random variable Y
I equally spaced in discrete time; that is, we will have a single
realization of Y at each second, hour, day, month, year, etc.
Time Series data are everywhere!
I Business: daily stock prices, weekly interest rates, quarterly
sales, monthly supply figures, annual earnings
I Medicine: EKG measurements, drug concentrations, blood
pressure readings
I Public Health: Flu cases per day, health-care clinic visits per
week, annual disease incidence
I Agriculture: annual yields, daily crop prices
I Social Sciences: annual birth and death rates, accident
frequencies, crime rates, school enrollment
I Meteorology: daily high temperatures, annual rainfall, hourly
wind speeds, earthquake frequency
Time Series Notation
A time series is denoted as Yt , where
Yt = Value of Y at time t, for t = 1, 2, · · · , n
The subscript t tells which time point the measurement Yt
corresponds.
Note that in the sequence Y1,Y2, · · ·Yn the subscripts are very
important because they correspond to a particular ordering of the
data.
This is perhaps a change in mind set from other courses where time
is ignored and the subscripts rarely matter.
Time Series Plot
A time series plot is the most basic graphical display in the
analysis of time series data. Always start here!
The plot is a scatterplot Yt versus t, with straight lines connecting
the points
Time Series Plots
When looking at a time series plot. . .
I Is there a trend? On average, increasing or decreasing over
time
I Is there seasonality? Regularly repeating patters corresponding
to calendar time (seasons, quarters, months, weekday, etc. . . )
I Are there any outliers?
I Is there constant variance over time?
I Are there any abrupt changes to either the level or variance?
Airline Miles
Monthly Airline Passenger−Miles in US, 1/1996−5/2005
Time
A
ir
lin
e
M
ile
s
1996 1998 2000 2002 2004
3
.0
e
+
0
7
3
.5
e
+
0
7
4
.0
e
+
0
7
4
.5
e
+
0
7
5
.0
e
+
0
7
5
.5
e
+
0
7
Airline Miles
Monthly Airline Passenger−Miles in US, 1/1996−5/2005
Time
A
ir
lin
e
M
ile
s
1996 1998 2000 2002 2004
3
.0
e
+
0
7
3
.5
e
+
0
7
4
.0
e
+
0
7
4
.5
e
+
0
7
5
.0
e
+
0
7
5
.5
e
+
0
7
J
F
M
A
M
J
J
A
S
O
N
D
J
F
M
A
M
J
J
A
S
O
N
D
J
F
M
AM
J
J
A
S
O
N
D
J
F
M
AM
J
J
A
S
O
N
D
J
F
M
A
M
J
J
A
S
O
N
D
J
F
M
AM
J
J
A
S
O
N
D
JF
M
A
M
J
JA
S
O
N
D
J
F
M
A
M
J
J
A
S
O
N
D
J
F
M
A
M
J
J
A
S
O
N
D
J
F
M
A
M
Monthly Oil Filter Sales
Time
o
ilf
ilt
e
rs
1984 1985 1986 1987
2
0
0
0
3
0
0
0
4
0
0
0
5
0
0
0
6
0
0
0
J
A
S
O
N
D
J
F
M
A M
J
J
A
S
O
N
D
J
F
M
A
M
J
J
A
S
O
N
D
J
F
M
A
M
J
J
A
S
O
N
D
J
F
M
A
M
J
Airline Passengers
Monthly Air Passengers, in Thousands, 1949−1960
Time
A
ir
P
a
ss
e
n
g
e
rs
(
1
0
0
0
s)
1950 1952 1954 1956 1958 1960
1
0
0
2
0
0
3
0
0
4
0
0
5
0
0
6
0
0
JF
MAM
J
JA
S
O
N
DJ
F
MA
M
J
JA
S
O
N
DJ
F
M
AM
J
JA
S
O
N
DJ
F
M
AM
J
J
A
S
O
N
DJF
MAM
J
JA
S
O
N
DJ
F
MAM
J
JA
S
O
N
D
JF
MAM
J
J
A
S
O
N
DJF
MAM
J
JA
S
O
N
DJF
MAM
J
JA
S
O
N
DJ
F
M
A
M
J
J
A
S
O
N
D
J
F
M
A
M
J
J
A
S
O
N
D
J
F
M
A
M
J
J
A
S
O
N
D
Airline Passengers – After Log Transformation. . .
Time
lo
g
(A
ir
P
a
ss
e
n
g
e
rs
)
1950 1952 1954 1956 1958 1960
5
.0
5
.5
6
.0
6
.5
Milk Production
Time
m
ilk
1994 1996 1998 2000 2002 2004 2006
1
3
0
0
1
4
0
0
1
5
0
0
1
6
0
0
1
7
0
0
J
F
MA
M
JJ
A
S
O
N
D
J
F
M
A
M
JJ
A
S
O
N
D
J
F
M
A
M
JJ
A
S
O
N
D
J
F
M
A
M
JJ
A
S
O
N
D
J
F
M
A
M
J
J
A
S
O
N
D
J
F
M
A
M
J
J
A
S
O
N
D
J
F
M
A
M
J
J
A
S
O
N
D
J
F
M
A
M
JJ
A
S
O
N
D
J
F
M
A
M
J
J
A
S
O
N
D
J
F
M
A
M
JJ
A
S
O
N
D
J
F
M
A
M
J
J
A
S
O
N
D
J
F
M
A
M
JJ
A
S
O
N
D
CREF Stock Fund
Daily Value of CREF Stock Fund, 8/26/04−8/15/06
Time
C
R
E
F
0 100 200 300 400 500
1
7
0
1
8
0
1
9
0
2
0
0
2
1
0
2
2
0
Time Series Plots before R
Figure 1: First known time series plot. From 10th or 11th century showing
the inclinations of the planetary orbits.
Try one yourself!
I Using the birth dataset located in the astsa package:
I Plot the time series
I What noticeable patterns are present?
I Which month tends to have the most births? The least?
library(astsa)
data() # See what’s available
data(birth, package=’astsa’)
Why Visualize the Time Series?
I It’s important to know what patters are present, as they will
guide the modeling process.
I If there’s a trend, seasonality, non-constant variance, abrupt
changes, etc, we need to account for that when modeling.
I How else can you detect these patterns if not via visualization?
Goals in Time Series
1. Model the stochastic (random) mechanism that gives rise to
the data
2. Predict or forecast the future based on the past
Model and Predict
Time
m
ilk
1998 2000 2002 2004 2006 2008
1
4
0
0
1
5
0
0
1
6
0
0
1
7
0
0
1
8
0
0
1
9
0
0
2
0
0
0
What’s Different About Time Series?
The big thing about time series data is that they are not
independent! Instead, observations are correlated through time.
I Correlated data are generally more difficult to analyze
I Statistical theory without independence is markedly more
difficult
Classical Statistics
I Most classical methods (Regression, ANOVA, GLM) assume
that observations are independent. Consider the linear model:
Yi = β0 + β1X1 + �i
I We typically assume that �i are independent and identically
distributed, normal with mean 0 and constant variance.
Course Goals
At the end of this course, I hope that you have an understanding of
how to build and use time series models.
1. Model specification
I Consider different classes of time series models
I Use descriptive statistics, graphs, subject matter knowledge to
propose sensible candidate models
I Abide by Principle of Parsimony
2. Model fitting
I After choosing a model, estimate it!
I Least square / MLE via software, understand output
3. Model diagnostics
I Inference and graphics to determine how well the model fits the
data
I Might suggest your model is inappropriate, or point toward a
more appropriate model
Chapter 2 – Some Fundamentals
Expected Value
µ = E (Y ) =
∫
R
yf (y)dy
I For any real number a, we have. . .
I E (a) = a
I E (aY ) = aE (Y )
I E (
∑k
j=1 Yj) =
∑k
j=1 E (Yj)
Variance
σ2y = var(Y ) = E
[
(Y − µ)2
]
I Typically easier to work with. . .
var(Y ) = E (Y 2)− E (Y )2
I Can you show these are equivalent?
Covariance
cov(X ,Y ) = E [(X − µx )(Y − µy )] = E (XY )− E (X )E (Y )
I If cov(X ,Y ). . .
I > 0→ X and Y are positively linearly related
I < 0→ X and Y are negatively linearly related
I = 0→ X and Y are not linearly related
I If X and Y are independent. . .
I E (XY ) = E (X )E (Y )→ cov(X ,Y ) = 0
Correlation
ρ = corr(X ,Y ) =
cov(X ,Y )
σxσy
I −1 ≤ ρ ≤ 1
I ρ = −1→ perfectly negatively related
I ρ = 1→ perfectly positively related
I ρ = 0→ not linearly related
Time Series and Stochastic Processes
The sequence of random variables Yt : t = 0, 1, 2, · · · , n, or simply
Yt , is called a stochastic process. It is a collection of random
variables indexed by time t, so. . .
Y0 = value of the process at time t = 0
Y1 = value of the process at time t = 1
...
Yn = value of the process at time t = n
In most time series processes, most of what we need is captured
with only E (Yt) and E (YtYt−k)
Time Series and Stochastic Processes
For the stochastic process Yt , define the mean function as
I Mean function
µt = E (Yt)
Note that the mean might depending on the time t (it can change
through time)
Time Series and Stochastic Processes
For the stochastic process Yt , define the autocovariance as
γt,s = cov(Yt ,Ys)
Time Series and Stochastic Processes
For the stochastic process Yt , define the autocorrelation as
ρt,s =
cov(Yt ,Ys)√
var(Yt)var(Ys)
I ρt,s near ±1 indicates strong linear dependence of Yt and Ys
I ρt,s near 0 indicates weak linear dependence
I ρt,s = 0 indicates Yt and Ys are uncorrelated
Stationarity
I Stationarity is a very important concept in time series and one
that you will often hear. Broadly speaking, a time series is
called stationary if. . .
I No systematic change in the mean (no trend),
I No systematic change in the variance,
I No noteable seasonal patterns exist
In other words, the properties of one section of the data are the
same as any other section.
Stationarity
I Notationally, the stochastic process Yt is said to be stationary if
1. The mean function µ = E (Yt) is constant through time
I µt is free of t
2. The covariance between any two observations depends only on
the time lag between them
I cov(Yt ,Ys) = cov(Yt−k ,Ys−k)
Stationarity
I Why do we care if a time series is stationary or not?
I Because almost all of the theory and models used in time series
is applicable for stationary time series.
I If you don’t like that, enroll in a PhD and develop more general
methods!
I Lucky for us, it’s typically straightforward to transform a
non-stationary time series to a stationary one.
Examples of Common Stochastic Processes
White Noise
I The process et : t = 1, 2, ... is called a white noise process
if. . .
E (et) = µ
var(et) = σ2
I Both mean and variance are constant through time
I Is this process stationary?
White Noise
100 observations from a white noise process
Time
e
0 20 40 60 80 100
−
3
−
2
−
1
0
1
2
Random Walk
I Suppose et is a zero mean white noise process. Define
Y1 = e1
Y2 = e1 + e2
...
Yn = e1 + e2 + · · ·+ en
or
Yt = Yt−1 + et
I The process Yt is referred to as a random walk. These are
very frequently used in finance for modeling stock prices,
among other things.
Random Walk
100 realizations from a random walk process
Time
y
0 20 40 60 80 100
0
5
1
0
1
5
I Does the random walk appear stationary?
I How could we transform it to stationarity?
Differencing
I One thing we’ll do a lot in this course is “difference” the time
series in an effort to transform to a stationary process.
I Rather than look at Yt , define
∆Yt = Yt − Yt−1
I Try this for the random walk data and see how it looks. . .
Model and Subtract the Trend (Detrending)
I An alternative to differencing is to build a model for the trend
and then subtract it from the original time series.
I This typically involves fitting a regression model of Yt against
time (or perhaps a more involved regression with quadratics,
splines, etc. . . )
Next Class
I Trends
I Modeling trends in time series data
I Differencing or Detrending - goal of obtaining a stationary time
series
I Residual analysis
I Chapter 3 of text