Lecture 2 – Modeling Deterministic Trends
Lecture 2 – Modeling Deterministic Trends
MAS 640
1/19/2018
Outline
I Chapter 3 – Trends
I Linear regression models for modeling deterministic trends
I Should feel familiar
Introduction
I In general, the mean of a time series can be completely
arbitrary through time
I Though in practice, they often take a relatively simple function
of time
I Increasing/decreasing linearly, maybe exponentially, maybe
prominent seasonal patterns
I We want to work with stationary time series data (constant
mean through time)
I If there’s a trend, we want to remove it. . .
Trends
I Many time series data sets exhibit a trend, or a long term
change in the mean level
I Trends can be elusive and the same time series may be viewed
very differently by different analysts
I What do we mean by “long-term?”
I Short term trends may be misleading (random walk)
I Hard to detect trends if data is noisy
Detecting Trends
Jan 03 2017 Feb 01 2017 Mar 01 2017 Apr 03 2017 May 01 2017 Jun 01 2017 Jul 03 2017 Aug 01 2017 Sep 01 2017 Oct 02 2017 Nov 01 2017 Dec 01 2017 Jan 02 2018
Apple Stock January 2017−Present 2017−01−03 / 2018−01−18
120
130
140
150
160
170
120
130
140
150
160
170
Detecting Trends
Nov 02 2017 Dec 01 2017 Jan 02 2018 Jan 18 2018
Apple Stock Nov 2017 − Present 2017−11−02 / 2018−01−18
168
170
172
174
176
178
168
170
172
174
176
178
Deterministic Trend Models
I In this lecture, we consider trend models of the form
Yt = µt + Xt
I where µt is a deterministic function that describes the trend
and Xt is random error.
I In other words, we consider trends that are pretty easy to
model.
Deterministic Trend Models
I In practice, many different functions for µt can be considered.
I Linear trend:
µt = β0 + β1t
I Quadratic trend:
µt = β0 + β1t + β2t2
Deterministic Trend Models
I The linear regression extends easily to general kth order
polynomial trends:
µt = β0 + β1t + β2t2 + · · ·βktk
Deterministic Trend Models
I For Seasonal trends:
µt = β1Jan + β2Feb + · · · + β12Dec
I For cyclical trends:
µt = β0 + α1cosωt + β1sinωt
Goal in Modeling Trends
I If we see a trend in a time series, we want to remove it
I Want to convert the non-stationary time series to a stationary
one with constant mean level
Removing Trends from Time Series
I Two general ways to remove a trend from a time series:
1. Estimate the trend and subtract it – model the trend with a
regression model, use residuals
2. Difference the data, possibly multiple times, until differenced
observations appear stationary
Regression Methods for Estimating µt
Straight line regression of time seriese versus time. Simplest model.
Yt = β0 + β1t + Xt
Note that in your code you will need to create the time variable
(often with time(DATANAME)).
Linear Regression Assumptions
I Assumptions place on error term:
1. Mean 0
2. Independence
3. Constant Variance
4. Normality
I With time series, at least one of these is typically violated.
Regression Assumptions
I Due to assumption violations, the estimated standard errors,
confidence intervals, test stats, and p-values output by the
software are typically wrong or not meaningful
I But the estimated coefficients are still ok
I So, we’ll use the model to get fitted values, but we don’t
concern ourselves with errors, p-values, etc. . .
I Which is fine, I’m generally not interested in inference regarding
the effect of time on the time series
Straight Line Regression Example
I Temperature deviations from (from 1951-1980 average)
measured in degrees centigrade, for 1880-2009.
Global Temp Deviations
Time
Te
m
p
D
e
vi
a
tio
n
0 20 40 60 80 100
−
0
.4
−
0
.2
0
.0
0
.2
0
.4
Straight Line Regression Example
I First we fit a simple linear regression to the series:
t <- time(gtemp2)
fit <- lm(gtemp2 ~ t)
summary(fit)$coefficients
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.226074058 0.027045671 -8.358974 4.913262e-13
t 0.004688777 0.000474376 9.884095 2.638413e-16
I Estimated regression equation: Ŷt = −.2261 + .0046t
Straight Line Regression Example
Time
g
te
m
p
2
0 20 40 60 80 100
−
0
.4
−
0
.2
0
.0
0
.2
0
.4
Straight Line Regression Example
I After estimating the mean function, we simply subtract it to
obtain a detrended series
X̂t = Yt − Ŷt
I Of course, we can get these directly from the software. . .
I residuals(fit)
I fit$residuals
Straight Line Regression Example
I Detrended Series look closer to stationary than previously. . .
0 20 40 60 80 100
−
0
.3
−
0
.2
−
0
.1
0
.0
0
.1
0
.2
as.vector(time(gtemp2))
re
si
d
s
Differencing
I Alternatively, we might consider differencing the time series
∆Yt = Yt − Yt−1
I Differencing in R:
I y.diff <- diff(y)
Differencing Example
I Appling this to the global temperature data gives the following:
Time
d
iff
(g
te
m
p
2
)
0 20 40 60 80 100
−
0
.3
−
0
.2
−
0
.1
0
.0
0
.1
0
.2
0
.3
0
.4
Remembering the Goal. . .
I We want to convert a non-stationary process to a stationary
one
I Detrending involved estimating the mean function via
regression and using the residuals
I Differencing involved using the differenced time series
Detrending vs Differencing
I One advantage of differencing is that no parameters must be
estimated
I One disadvantage of differencing is that it does not provide an
estimate of the error process Xt
I Forecasting: probably differencing
I Inference: probably detrending
Polynomial Regression
I We can extend the simple linear regression fit previously to
allow for polynomial functions of time
Yt = β0 + β1t + β2t2 + · · · + βktk
I If k=1, µt = β0 + β1t is a linear trend
I If k=2, µt = β0 + β1t + β2t2 is a quadratic trend
I If k=3, µt = β0 + β1t + β2t2 + β3t3 is a cubic trend, etc. . .
Polynomial Regression Example
Time
g
o
ld
0 50 100 150 200 250
4
2
0
4
4
0
4
6
0
4
8
0
5
0
0
5
2
0
5
4
0
Gold Prices - Linear Trend?
Time
g
o
ld
0 50 100 150 200 250
4
2
0
4
4
0
4
6
0
4
8
0
5
0
0
5
2
0
5
4
0
Gold Prices - Residuals from Linear Trend Model
0 50 100 150 200 250
−
2
0
0
2
0
4
0
6
0
Index
lm
(g
o
ld
~
t
im
e
(g
o
ld
))
$
re
si
d
u
a
ls
Gold Prices - Polynomial Trend Model
I Time series plot shows a clear non-linear trend
I Let’s add a quadratic term to the model. . .
Goldt = β0 + β1t + β2t2 + Xt
Polynomial Regression Example
Time
g
o
ld
0 50 100 150 200 250
4
2
0
4
4
0
4
6
0
4
8
0
5
0
0
5
2
0
5
4
0
ˆGold t = 435 − 0.36t + 0.003t2
Polynomial Regression Example
I Look at time series plot of residuals. . .
0 50 100 150 200 250
−
2
0
−
1
0
0
1
0
2
0
3
0
4
0
Time
P
ri
ce
o
f
G
o
ld
Practice with gnp Data
I Load the gnp dataset from the astsa package. data(gnp,
package='astsa')
I Plot the time series and notice the trend
I Fit a linear trend of gnp versus time. Plot the time series with
the fitted regression overlayed. How does it look?
I Construct a time series plot of the residuals from this model,
how do they look?
I Add a quadratic term and redo. How does it look now?
I Rather than model the trend, difference the time series and plot
it. How does it look?
I Difference the differenced data (i.e. use
diff(diff(DATANAME))), how does it look?
Practice with gtemp2 Data
I We have already fit a linear trend to the gtemp2 data
I Add a quadratic term to the model. Plot the time series with
the regression model overlayed. How do the predictions look?
I Add a cubic term to the model and repeat. How does this look?
Seasonal Means Model
I Consider monthly beer sales in the U.S.
I If building a regression model, what would you want in it?
Time
b
e
e
rs
a
le
s
1980 1982 1984 1986 1988 1990
1
2
1
3
1
4
1
5
1
6
1
7
J
F
M
A
M
J
J
A
S
O
N
D
J
F
M
A
M
JJ
A
S
O
N
D
J
F
M
A
M
J
J
A
S
O
N
D
J
F
M
A
M
J
J
A
S
O
N
D
J
F
M
A
M
J
J
A
S
O
N
D
J
F
M
A
M
J
J
A
S
O
N
D
J
F
M
A
M
J
J
A
S
O
N
D
J
F
M
A
M
J
J
A
S
O
N
D
J
F
MA
M
J
J
A
S
O
N
D
J
F
M
A
M
J
J
A
S
O
N
D
J
F
M
A
MJ
J
A
S
O
N
D
Seasonal Means Model
I The beer sales time series clearly shows seasonality
I Fit a regression model with “seasonal” terms as covariates
I In this case months, but could be weekday, quarter, etc. . .
BeerSalest = β1Jan + β2Feb + · · · + β12Dec + Xt
I As with time, you will need to create these seasonal
components
I season(DATANAME)
I Might have to code manually in some scenarios. . .
Beer Sales Resulting Model
betas
monthJanuary 13.16
monthFebruary 13.02
monthMarch 15.11
monthApril 15.40
monthMay 16.77
monthJune 16.88
monthJuly 16.83
monthAugust 16.57
monthSeptember 14.40
monthOctober 14.28
monthNovember 12.89
monthDecember 12.34
Beer Sales Resulting Model
Time
b
e
e
rs
a
le
s
1980 1982 1984 1986 1988 1990
1
2
1
3
1
4
1
5
1
6
1
7
Residuals from Beer Sales Seasonal Means Model
1980 1982 1984 1986 1988 1990
−
1
.0
−
0
.5
0
.0
0
.5
1
.0
1
.5
Time
R
e
si
d
u
a
ls
Milk Production Example
Time
m
ilk
1994 1996 1998 2000 2002 2004 2006
1
3
0
0
1
4
0
0
1
5
0
0
1
6
0
0
1
7
0
0
Milk Production Example
I Time series shows seasonality
I Additionally shows linear trend
I So, include both components!
Milkt = β0 + β1t + β2Jan + β3Feb + · · · + Xt
Milk Production Example
Time
m
ilk
1994 1996 1998 2000 2002 2004 2006
1
3
0
0
1
4
0
0
1
5
0
0
1
6
0
0
1
7
0
0
Milk Production Example
Linear Trend Only
Time
m
ilk
1994 1996 1998 2000 2002 2004 2006
1
3
0
0
1
4
0
0
1
5
0
0
1
6
0
0
1
7
0
0
Seasonal Trend Only
Time
m
ilk
1994 1996 1998 2000 2002 2004 2006
1
3
0
0
1
4
0
0
1
5
0
0
1
6
0
0
1
7
0
0
Practice - Problem 3.7 from text
Residual Analysis
We’ve emphasized that in the linear regression output, only the
coefficient estimates are “useful”. But what about the standard
errors and t-tests?
I For unbiased variance estimates, we need Xt independent with
var(Xt) constant. Met if Xt is white noise.
I For tests and p-values to be valid, we additionally need to
assume that Xt is normally distributed.
Residual Analysis
I We can check normality in the usual way
I Histograms
I Normal QQ Plots
I Shapiro-Wilks test
I We can check constant variance in the usual way
I Scatterplot of residuals versus fitted values
I Want to see constant spread from left to right
Residual Analysis
I Previously eyeballed residual scatterplot and said “looks ok”
(not perfect but common practice)
I Two alternatives
I Runs test
I Estimate and plot autocorrelation of standardized residuals for
a number of time lags
I Independence is particularly important to us now
I If correlations remain in the residuals, we will model that
correlation with a time series model
I Plot of estimated autocorrelation particularly useful, because it
will help determine which model to fit
Runs Test
I The runs test counts “runs” above or below median in
standardized residuals.
I It’s doing what you’re doing when you eyeball the scatterplot of
residuals, but it’s attaching a statistical model to it and
returning a p-value.
H0 : Independence
Ha : Dependence
I Fit using runs(RESIDUALS)
Sample Autocorrelation
I Estimating and plotting the sample autocorrelation function is
essential
I When we detrend the data, we are hoping to get a stationary
time series
I It’s hard to see if remaining correlations exist, but if they do,
we want to model them
I The autocorrelations will help determine how we model them
Sample Autocorrelation
I Under independence, the sample autocorrelation function of
the standardized residuals are approximately N(0, 1/n)
I We can estimate and plot the sample autocorrelation function
using the standardized residuals, and if certain lags are
above 1/n, we suspect remaining correlation in the residuals.
I Plot using acf(rstudent(FITTED MODEL))
Autocorrelation Plot Example - gtemp2 Data
5 10 15
−
0
.2
−
0
.1
0
.0
0
.1
0
.2
0
.3
0
.4
Series rstudent(fit)
Lag
A
C
F
Autocorrelation Plot Example - gold Data
5 10 15 20
−
0
.2
0
.0
0
.2
0
.4
0
.6
0
.8
Series rstudent(fit)
Lag
A
C
F
Autocorrelation Plot Example - beersales Data
5 10 15 20
−
0
.2
−
0
.1
0
.0
0
.1
0
.2
0
.3
Series rstudent(fit)
Lag
A
C
F