程序代写代做代考 Lecture 2 – Modeling Deterministic Trends

Lecture 2 – Modeling Deterministic Trends

Lecture 2 – Modeling Deterministic Trends

MAS 640

1/19/2018

Outline

I Chapter 3 – Trends
I Linear regression models for modeling deterministic trends

I Should feel familiar

Introduction

I In general, the mean of a time series can be completely
arbitrary through time

I Though in practice, they often take a relatively simple function
of time

I Increasing/decreasing linearly, maybe exponentially, maybe
prominent seasonal patterns

I We want to work with stationary time series data (constant
mean through time)

I If there’s a trend, we want to remove it. . .

Trends

I Many time series data sets exhibit a trend, or a long term
change in the mean level

I Trends can be elusive and the same time series may be viewed
very differently by different analysts

I What do we mean by “long-term?”
I Short term trends may be misleading (random walk)
I Hard to detect trends if data is noisy

Detecting Trends

Jan 03 2017 Feb 01 2017 Mar 01 2017 Apr 03 2017 May 01 2017 Jun 01 2017 Jul 03 2017 Aug 01 2017 Sep 01 2017 Oct 02 2017 Nov 01 2017 Dec 01 2017 Jan 02 2018

Apple Stock January 2017−Present 2017−01−03 / 2018−01−18

120

130

140

150

160

170

120

130

140

150

160

170

Detecting Trends

Nov 02 2017 Dec 01 2017 Jan 02 2018 Jan 18 2018

Apple Stock Nov 2017 − Present 2017−11−02 / 2018−01−18

168

170

172

174

176

178

168

170

172

174

176

178

Deterministic Trend Models

I In this lecture, we consider trend models of the form

Yt = µt + Xt

I where µt is a deterministic function that describes the trend
and Xt is random error.

I In other words, we consider trends that are pretty easy to
model.

Deterministic Trend Models

I In practice, many different functions for µt can be considered.
I Linear trend:

µt = β0 + β1t

I Quadratic trend:

µt = β0 + β1t + β2t2

Deterministic Trend Models

I The linear regression extends easily to general kth order
polynomial trends:

µt = β0 + β1t + β2t2 + · · ·βktk

Deterministic Trend Models

I For Seasonal trends:

µt = β1Jan + β2Feb + · · · + β12Dec

I For cyclical trends:

µt = β0 + α1cosωt + β1sinωt

Goal in Modeling Trends

I If we see a trend in a time series, we want to remove it
I Want to convert the non-stationary time series to a stationary

one with constant mean level

Removing Trends from Time Series

I Two general ways to remove a trend from a time series:
1. Estimate the trend and subtract it – model the trend with a

regression model, use residuals
2. Difference the data, possibly multiple times, until differenced

observations appear stationary

Regression Methods for Estimating µt

Straight line regression of time seriese versus time. Simplest model.

Yt = β0 + β1t + Xt

Note that in your code you will need to create the time variable
(often with time(DATANAME)).

Linear Regression Assumptions

I Assumptions place on error term:
1. Mean 0
2. Independence
3. Constant Variance
4. Normality

I With time series, at least one of these is typically violated.

Regression Assumptions

I Due to assumption violations, the estimated standard errors,
confidence intervals, test stats, and p-values output by the
software are typically wrong or not meaningful

I But the estimated coefficients are still ok
I So, we’ll use the model to get fitted values, but we don’t

concern ourselves with errors, p-values, etc. . .
I Which is fine, I’m generally not interested in inference regarding

the effect of time on the time series

Straight Line Regression Example
I Temperature deviations from (from 1951-1980 average)

measured in degrees centigrade, for 1880-2009.

Global Temp Deviations

Time

Te
m

p
D

e
vi

a
tio

n

0 20 40 60 80 100


0

.4

0
.2

0
.0

0
.2

0
.4

Straight Line Regression Example

I First we fit a simple linear regression to the series:

t <- time(gtemp2) fit <- lm(gtemp2 ~ t) summary(fit)$coefficients Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.226074058 0.027045671 -8.358974 4.913262e-13
t 0.004688777 0.000474376 9.884095 2.638413e-16

I Estimated regression equation: Ŷt = −.2261 + .0046t

Straight Line Regression Example

Time

g
te

m
p

2

0 20 40 60 80 100


0

.4

0
.2

0
.0

0
.2

0
.4

Straight Line Regression Example

I After estimating the mean function, we simply subtract it to
obtain a detrended series

X̂t = Yt − Ŷt

I Of course, we can get these directly from the software. . .
I residuals(fit)
I fit$residuals

Straight Line Regression Example
I Detrended Series look closer to stationary than previously. . .

0 20 40 60 80 100


0

.3

0
.2


0

.1
0

.0
0

.1
0

.2

as.vector(time(gtemp2))

re
si

d
s

Differencing

I Alternatively, we might consider differencing the time series

∆Yt = Yt − Yt−1

I Differencing in R:
I y.diff <- diff(y) Differencing Example I Appling this to the global temperature data gives the following: Time d iff (g te m p 2 ) 0 20 40 60 80 100 − 0 .3 − 0 .2 − 0 .1 0 .0 0 .1 0 .2 0 .3 0 .4 Remembering the Goal. . . I We want to convert a non-stationary process to a stationary one I Detrending involved estimating the mean function via regression and using the residuals I Differencing involved using the differenced time series Detrending vs Differencing I One advantage of differencing is that no parameters must be estimated I One disadvantage of differencing is that it does not provide an estimate of the error process Xt I Forecasting: probably differencing I Inference: probably detrending Polynomial Regression I We can extend the simple linear regression fit previously to allow for polynomial functions of time Yt = β0 + β1t + β2t2 + · · · + βktk I If k=1, µt = β0 + β1t is a linear trend I If k=2, µt = β0 + β1t + β2t2 is a quadratic trend I If k=3, µt = β0 + β1t + β2t2 + β3t3 is a cubic trend, etc. . . Polynomial Regression Example Time g o ld 0 50 100 150 200 250 4 2 0 4 4 0 4 6 0 4 8 0 5 0 0 5 2 0 5 4 0 Gold Prices - Linear Trend? Time g o ld 0 50 100 150 200 250 4 2 0 4 4 0 4 6 0 4 8 0 5 0 0 5 2 0 5 4 0 Gold Prices - Residuals from Linear Trend Model 0 50 100 150 200 250 − 2 0 0 2 0 4 0 6 0 Index lm (g o ld ~ t im e (g o ld )) $ re si d u a ls Gold Prices - Polynomial Trend Model I Time series plot shows a clear non-linear trend I Let’s add a quadratic term to the model. . . Goldt = β0 + β1t + β2t2 + Xt Polynomial Regression Example Time g o ld 0 50 100 150 200 250 4 2 0 4 4 0 4 6 0 4 8 0 5 0 0 5 2 0 5 4 0 ˆGold t = 435 − 0.36t + 0.003t2 Polynomial Regression Example I Look at time series plot of residuals. . . 0 50 100 150 200 250 − 2 0 − 1 0 0 1 0 2 0 3 0 4 0 Time P ri ce o f G o ld Practice with gnp Data I Load the gnp dataset from the astsa package. data(gnp, package='astsa') I Plot the time series and notice the trend I Fit a linear trend of gnp versus time. Plot the time series with the fitted regression overlayed. How does it look? I Construct a time series plot of the residuals from this model, how do they look? I Add a quadratic term and redo. How does it look now? I Rather than model the trend, difference the time series and plot it. How does it look? I Difference the differenced data (i.e. use diff(diff(DATANAME))), how does it look? Practice with gtemp2 Data I We have already fit a linear trend to the gtemp2 data I Add a quadratic term to the model. Plot the time series with the regression model overlayed. How do the predictions look? I Add a cubic term to the model and repeat. How does this look? Seasonal Means Model I Consider monthly beer sales in the U.S. I If building a regression model, what would you want in it? Time b e e rs a le s 1980 1982 1984 1986 1988 1990 1 2 1 3 1 4 1 5 1 6 1 7 J F M A M J J A S O N D J F M A M JJ A S O N D J F M A M J J A S O N D J F M A M J J A S O N D J F M A M J J A S O N D J F M A M J J A S O N D J F M A M J J A S O N D J F M A M J J A S O N D J F MA M J J A S O N D J F M A M J J A S O N D J F M A MJ J A S O N D Seasonal Means Model I The beer sales time series clearly shows seasonality I Fit a regression model with “seasonal” terms as covariates I In this case months, but could be weekday, quarter, etc. . . BeerSalest = β1Jan + β2Feb + · · · + β12Dec + Xt I As with time, you will need to create these seasonal components I season(DATANAME) I Might have to code manually in some scenarios. . . Beer Sales Resulting Model betas monthJanuary 13.16 monthFebruary 13.02 monthMarch 15.11 monthApril 15.40 monthMay 16.77 monthJune 16.88 monthJuly 16.83 monthAugust 16.57 monthSeptember 14.40 monthOctober 14.28 monthNovember 12.89 monthDecember 12.34 Beer Sales Resulting Model Time b e e rs a le s 1980 1982 1984 1986 1988 1990 1 2 1 3 1 4 1 5 1 6 1 7 Residuals from Beer Sales Seasonal Means Model 1980 1982 1984 1986 1988 1990 − 1 .0 − 0 .5 0 .0 0 .5 1 .0 1 .5 Time R e si d u a ls Milk Production Example Time m ilk 1994 1996 1998 2000 2002 2004 2006 1 3 0 0 1 4 0 0 1 5 0 0 1 6 0 0 1 7 0 0 Milk Production Example I Time series shows seasonality I Additionally shows linear trend I So, include both components! Milkt = β0 + β1t + β2Jan + β3Feb + · · · + Xt Milk Production Example Time m ilk 1994 1996 1998 2000 2002 2004 2006 1 3 0 0 1 4 0 0 1 5 0 0 1 6 0 0 1 7 0 0 Milk Production Example Linear Trend Only Time m ilk 1994 1996 1998 2000 2002 2004 2006 1 3 0 0 1 4 0 0 1 5 0 0 1 6 0 0 1 7 0 0 Seasonal Trend Only Time m ilk 1994 1996 1998 2000 2002 2004 2006 1 3 0 0 1 4 0 0 1 5 0 0 1 6 0 0 1 7 0 0 Practice - Problem 3.7 from text Residual Analysis We’ve emphasized that in the linear regression output, only the coefficient estimates are “useful”. But what about the standard errors and t-tests? I For unbiased variance estimates, we need Xt independent with var(Xt) constant. Met if Xt is white noise. I For tests and p-values to be valid, we additionally need to assume that Xt is normally distributed. Residual Analysis I We can check normality in the usual way I Histograms I Normal QQ Plots I Shapiro-Wilks test I We can check constant variance in the usual way I Scatterplot of residuals versus fitted values I Want to see constant spread from left to right Residual Analysis I Previously eyeballed residual scatterplot and said “looks ok” (not perfect but common practice) I Two alternatives I Runs test I Estimate and plot autocorrelation of standardized residuals for a number of time lags I Independence is particularly important to us now I If correlations remain in the residuals, we will model that correlation with a time series model I Plot of estimated autocorrelation particularly useful, because it will help determine which model to fit Runs Test I The runs test counts “runs” above or below median in standardized residuals. I It’s doing what you’re doing when you eyeball the scatterplot of residuals, but it’s attaching a statistical model to it and returning a p-value. H0 : Independence Ha : Dependence I Fit using runs(RESIDUALS) Sample Autocorrelation I Estimating and plotting the sample autocorrelation function is essential I When we detrend the data, we are hoping to get a stationary time series I It’s hard to see if remaining correlations exist, but if they do, we want to model them I The autocorrelations will help determine how we model them Sample Autocorrelation I Under independence, the sample autocorrelation function of the standardized residuals are approximately N(0, 1/n) I We can estimate and plot the sample autocorrelation function using the standardized residuals, and if certain lags are above 1/n, we suspect remaining correlation in the residuals. I Plot using acf(rstudent(FITTED MODEL)) Autocorrelation Plot Example - gtemp2 Data 5 10 15 − 0 .2 − 0 .1 0 .0 0 .1 0 .2 0 .3 0 .4 Series rstudent(fit) Lag A C F Autocorrelation Plot Example - gold Data 5 10 15 20 − 0 .2 0 .0 0 .2 0 .4 0 .6 0 .8 Series rstudent(fit) Lag A C F Autocorrelation Plot Example - beersales Data 5 10 15 20 − 0 .2 − 0 .1 0 .0 0 .1 0 .2 0 .3 Series rstudent(fit) Lag A C F