Homework 3
Homework 3
Yining Zhou, Yiyun Tang, Manling Yao
Due February 12, 2018 by Midnight
Chapter 6 Problems (No need to fit models for the Ch 6 problems, only suggest
appropriate model orders. Ch 8 problems will require fitting.)
1. Problem 6.33 – The data file named deere1 in the TSA package contains 82 consecutive values for the
amount of deviation from a specified target value that an industrial machining process at Deere & Co.
produced under certain specified operating conditions.
a. Display the time series plot and comment on any unusual points
data(‘deere1’)
plot(deere1, type = ‘o’)
Time
d
e
e
re
1
0 20 40 60 80
0
1
0
2
0
3
0
The point at time 27 seems to be an unusual point.
b. Calculate (plot) the sample ACF for this series and comment on the results.
acf(deere1)
1
5 10 15
−
0
.2
−
0
.1
0
.0
0
.1
0
.2
Series deere1
Lag
A
C
F
It seems there are no correlation remaining in the residuals.
c. Replace the unusual value by a much more typical value and recalculate the sample ACF. Comment on the change from what you saw in part (b).
deere1[deere1 == 30] <- 7
acf(deere1)
5 10 15
−
0
.2
−
0
.1
0
.0
0
.1
0
.2
Series deere1
Lag
A
C
F
2
It seems like the correlation of the residuals changed a little bit. I am not sure, it does not seems like there
are significant correlations remaining in the residuals, or the q might be 1
d. Calculate the PACF and EACF based on the revised series that you used in part (c). What model would you specify for this revised series?
pacf(deere1)
5 10 15
−
0
.2
−
0
.1
0
.0
0
.1
0
.2
Lag
P
a
rt
ia
l A
C
F
Series deere1
eacf(deere1)
AR/MA
0 1 2 3 4 5 6 7 8 9 10 11 12 13
0 o x o o o o o o o o o o o o
1 o x o o o o o o o o o o o o
2 x x o o o o o o o o o o o o
3 x x x o o o o o o o o o o o
4 x o o x o o o o o o o o o o
5 x o o o o o o o o o o o o o
6 x x x o o o o o o o o o o o
7 o x o o o o o o o o o o o o
I would probably start with ARMA(1,1) or MA(2)
2. Problem 6.34 - The data file deere2 contains 102 consecutive values for the amount of devation from a
specified target value that another industrial machining process produced at Deere & Co.
a. Display the time series plot and comment on its appearance. Would a stationary model seem to
be appropriate?
b. Display the sample ACF, PACF, and EACF. Select tentative orders for an ARMA model for the
series.
data("deere2")
plot(deere2)
3
Time
d
e
e
re
2
0 20 40 60 80 100
−
3
0
−
2
0
−
1
0
0
1
0
2
0
par(mfrow = c(2,2))
acf(deere2)
pacf(deere2)
par(mfrow = c(1,1))
5 10 15 20
−
0
.2
0
.4
Series deere2
Lag
A
C
F
5 10 15 20
−
0
.2
0
.4
Lag
P
a
rt
ia
l A
C
F
Series deere2
eacf(deere2)
AR/MA
0 1 2 3 4 5 6 7 8 9 10 11 12 13
0 x x x o o o x o o o o o o o
1 o o x o o o o o o o o o o o
2 x o o x o o o o o o o o o o
3 x o o x o o o o o o o o o o
4 x o x x o o o o o o o o o o
5 x x o o o o o o o o o o o o
6 o o x o o o o o o o o o o o
7 x o o o o o o o o o o o o o
The stationary model could be appropriate. I can see the data are approximately satationary through time.
So if we fit ARMA model with arima, I would guess the order might be (1,0,0)
3. Problem 6.36 - The data file robot contains a time series obtained from an industrial robot. The robot
4
was put through a sequence of of maneuvers, and the distance from a desired ending point was recorded
in inches. This was repeated 324 times to form the time series.
a. Display the time series plot of the data. Based on this information, do these data appear to come
from a stationary or nonstationary process?
b. Plot the sample ACF, PACF, and EACF for this time series, and suggest model orders for an
ARMA model to model these data.
c. Difference the time series and repeat part (b). Do the models agree? Without actually fitting the
models, which do you prefer based on the evidence from parts (a) and (b)?
data("robot")
# b)
plot(robot)
Time
ro
b
o
t
0 50 100 150 200 250 300
−
0
.0
0
5
0
.0
0
0
0
.0
0
5
par(mfrow = c(2,2))
acf(robot)
pacf(robot)
par(mfrow = c(1,1))
5 10 15 20 25
−
0
.1
0
.2
Series robot
Lag
A
C
F
5 10 15 20 25
−
0
.1
0
.2
Lag
P
a
rt
ia
l A
C
F
Series robot
eacf(robot)
AR/MA
0 1 2 3 4 5 6 7 8 9 10 11 12 13
5
0 x x x x x x x x x o x x x x
1 x o o o o o o o o o o o o o
2 x x o o o o o o o o o o o o
3 x x o o o o o o o o o o o o
4 x x x x o o o o o o o o x o
5 x x x o o o o o o o o o x o
6 x o o o o x o o o o o o o o
7 x o o x o x x o o o o o o o
# c)
plot(diff(robot))
Time
d
iff
(r
o
b
o
t)
0 50 100 150 200 250 300
−
0
.0
0
5
0
.0
0
0
0
.0
0
5
0
.0
1
0
par(mfrow = c(2,2))
acf(diff(robot))
pacf(diff(robot))
par(mfrow = c(1,1))
5 10 15 20 25
−
0
.4
0
.0
Series diff(robot)
Lag
A
C
F
5 10 15 20 25
−
0
.4
0
.0
Lag
P
a
rt
ia
l A
C
F
Series diff(robot)
eacf(diff(robot))
AR/MA
0 1 2 3 4 5 6 7 8 9 10 11 12 13
0 x o o o o o o o o x o o o o
6
1 x x o o o o o o o x o o o o
2 x x x o o o o o o o o o o o
3 x x x x o o o o o o o o o o
4 x x x o o o o o o o o o o o
5 x o o o o x o o o o o o o o
6 x x o x o x x o o o o o o o
7 x o o o o x x o o o o o o o
a)From the plot, we can see that its not very stationary. I can see a slight decreasing trend.
b)After examing the acf and pacf plot in b), I cannot tell which model should I fit, so based on the eacf plot,
I might start with ARMA(1,1)
c) No, after difference the time series, it seems like a MA(1). Without actually fitting the model, I would
prefer the second model because ARMA model cannot solve the unstationary data.
4. Problem 6.37 - Look at the ACF, PACF, and EACF for the logarithms of the larain time series. Argue
that the results confirm that the logs are white noise.
data("larain")
par(mfrow = c(2,2))
acf(log(larain))
pacf(log(larain))
par(mfrow = c(1,1))
5 10 15 20
−
0
.2
0
.0
Series log(larain)
Lag
A
C
F
5 10 15 20
−
0
.1
0
.1
Lag
P
a
rt
ia
l A
C
F
Series log(larain)
eacf(log(larain))
AR/MA
0 1 2 3 4 5 6 7 8 9 10 11 12 13
0 o o o o o o o o o o o o o o
1 o o o o o o o o o o o o o o
2 x x o o o o o o o o o o o o
3 x o o o o o o o o o o o o o
4 x o o o o o o o o o o o o o
5 x x x x x o o o o o o o o o
6 x x o o x o o o o o o o o o
7 x o x o o o o o o o o o o o
Both ACF and PACF plot show that the residuals remain within boundary lines, which indicates that there
are no correlation remaining in the residuals; the EACF plot suggest ARMA(0,0), which also indicates there
are no correlation remaining in the residuals. Thus we can conclude that the logs are white noise.
5. Problem 6.38 - Using the color dataset. . .
a. Using the ACF and PACF, suggest a model for this time series.
b. Using the EACF, do you suggest the same model or a different one? Note: You will likely get an
error message when you run the EACF. By default, the eacf function calculates the EACF for
7
models up to q = 13 and p = 7, and it is running into problems in the higher orders. Fix this by
setting the max number of AR and MA terms to smaller numbers (in this particular example, I
believe they need to sum to less than 17).
data("color")
par(mfrow = c(2,2))
acf(color)
pacf(color)
par(mfrow = c(1,1))
2 4 6 8 10 12 14
−
0
.4
0
.2
Series color
Lag
A
C
F
2 4 6 8 10 12 14
−
0
.2
0
.4
Lag
P
a
rt
ia
l A
C
F
Series color
eacf(color, ar.max = 6, ma.max = 10)
AR/MA
0 1 2 3 4 5 6 7 8 9 10
0 x o o o o o o o o o o
1 o o o o o o o o o o o
2 o o o o o o o o o o o
3 x o o o o o o o o o o
4 o o o o o o o o o o o
5 x o o o o o o o o o o
6 x o o o o o o o o o o
a) Using the ACF and PACF, I would suggest AR(1).
b) Using the EACF, I would suggest a different one, which is MA(1).
Chapter 8 Problems (Now you will need to fit models)
Note: If you need to work with residuals, you need to use arima() rather than sarima(). You will then
have access to residuals in the usual way (i.e. model$residuals, residuals(model), or rstandard(model)).
6. Problem 8.4 - Simulate an AR(1) model with n=30 and φ = 0.5 (use set.seed(1) so we all have the
same results).
a. Fit the correctly specified AR(1) model and look at the time series plot of the residuals. Does this
plot support the AR(1) specification? Why or why not?
b. Display a normal QQ plot of the standardized residuals. Does the plot support the AR(1)
specification?
c. Display the sample ACF of the residuals. Does the plot support the AR(1) specification?
d. Calculate the Ljung-Box statistic with K = 8. What are the null and alternative hypotheses being
tested? Do the results support the AR(1) specification?
set.seed(1)
ar1 <- arima.sim(list(order = c(1,0,0), ar = .5), n = 30)
8
# a)
m_ar1 <- arima(ar1, order = c(1,0,0))
plot(m_ar1$residuals)
Time
m
_
a
r1
$
re
si
d
u
a
ls
0 5 10 15 20 25 30
−
2
.0
−
1
.0
0
.0
1
.0
# b)
qqnorm(rstandard(m_ar1))
qqline(rstandard(m_ar1))
−2 −1 0 1 2
−
2
−
1
0
1
Normal Q−Q Plot
Theoretical Quantiles
S
a
m
p
le
Q
u
a
n
til
e
s
9
shapiro.test(rstandard(m_ar1))
Shapiro-Wilk normality test
data: rstandard(m_ar1)
W = 0.92483, p-value = 0.03586
# c)
acf(m_ar1$residuals)
2 4 6 8 10 12 14
−
0
.3
−
0
.1
0
.1
0
.3
Series m_ar1$residuals
Lag
A
C
F
# d)
Box.test(m_ar1$residuals, lag = 8, type = 'Ljung-Box')
Box-Ljung test
data: m_ar1$residuals
X-squared = 4.621, df = 8, p-value = 0.7972
a) Yes, the residual seems random, and the mean of residual is very close to 0.
b) Yes, although there are some outliers(probably related to our small sample size), but most points stays
along the line, the residuals seems normally distributed.
c) Yes, the ACF of residuals remain within boundary lines.
d) the p-value is 0.7972.
H0 : ARMA(p,q) model is appropriate
Ha : ARMA(p,q) model is not appropriate
10
Since the p-value is large, we do not reject the null hypothesis. We can conclude that the ARMA(p,q) model
is appropriate. This result support the AR(1) specification.
6. We discussed the process of “overfitting” a model. Please describe what this means, what you are doing
at each iteration of the process, and what you hope to achieve. Your discussion should include talk
about the model order (p and q), coefficient estimates and significance, residual diagnostics, and AIC
values. Finally, if we determined an appropriate model order via ACF/PACF/EACF, why do we need
to do all of this overfitting now?
Overfitteing involves fitting a model more complicated than the one currently being considered. It examining
the significance of the additional terms, and the changes in estimates from the assumed model. When
overfitting an ARMA(p,q) model, we consider to do ARMA(p+ 1,q) and then ARMA(p,q + 1) and repeat
these steps.We hope to see if the addtional term we added is significant, and do the residual analysis to
see if things improved(such as: if standardized residuals looks random now; is the ACF of residuals remain
within boundary lines; are the residuals normally distributed; if the p-value for Ljung-Box statistic above
the boundary line). Even if we determined an appropriate model order via ACF/PACF/EACF, it is still
worth to do the overfitting for further evidence. So if additional term is insignificant, it indicates that more
complicated model is not needed; if additional term is significant, it indicates that the additional term is
worth considering be adding into the model.
7. Using the CREF dataset - In previous lectures, we determined an AR(2) appeared appropriate (after
differencing) for this time series. Fit the AR(2) model using the sarima() function.
a. Are the two AR coefficients significant? What do the residual plots suggest? What is the AIC
value for this model?
b. Overfit the model by including an MA term. Is the MA term significant? What about the two AR
terms? How do the residual plots look? What is the AIC value?
c. Remove the insignificant AR term from the model and rerun. Compare this model with the
ARMA(2, 1) fit in part (b).
d. Finally, overfit the model by including yet another MA term. Is the additional MA term significant?
What is the AIC value for this model?
e. Based on everything you’ve seen, which model do you prefer and why?
data("CREF")
m_CREF <- sarima(CREF, 2,1,0)
11
Standardized Residuals
Time
0 100 200 300 400 500
−
3
2
Model: (2,1,0)
0 5 10 15 20 25 30
−
0
.2
ACF of Residuals
LAG
A
C
F
−3 −2 −1 0 1 2 3
−
3
2
Normal Q−Q Plot of Std Residuals
Theoretical Quantiles
S
a
m
p
le
Q
u
a
n
til
e
s
5 10 15 20
0
.0
1
.0
p values for Ljung−Box statistic
lag
p
v
a
lu
e
# a)
m_CREF$ttable
Estimate SE t.value p.value
ar1 0.0641 0.0448 1.4312 0.1530
ar2 -0.0757 0.0448 -1.6903 0.0916
constant 0.0950 0.0569 1.6716 0.0952
m_CREF$AIC
[1] 1.514609
# b)
m_CREF_2 <- sarima(CREF, 2,1,1)
12
Standardized Residuals
Time
0 100 200 300 400 500
−
3
2
Model: (2,1,1)
0 5 10 15 20 25 30
−
0
.2
ACF of Residuals
LAG
A
C
F
−3 −2 −1 0 1 2 3
−
3
2
Normal Q−Q Plot of Std Residuals
Theoretical Quantiles
S
a
m
p
le
Q
u
a
n
til
e
s
5 10 15 20
0
.0
1
.0
p values for Ljung−Box statistic
lag
p
v
a
lu
e
m_CREF_2$ttable
Estimate SE t.value p.value
ar1 -0.5793 0.2207 -2.6251 0.0089
ar2 -0.0405 0.0548 -0.7392 0.4601
ma1 0.6502 0.2170 2.9960 0.0029
constant 0.0952 0.0584 1.6287 0.1040
m_CREF_2$AIC
[1] 1.513671
# c)
m_CREF_3 <- sarima(CREF, 1,1,1)
13
Standardized Residuals
Time
0 100 200 300 400 500
−
3
2
Model: (1,1,1)
0 5 10 15 20 25 30
−
0
.2
ACF of Residuals
LAG
A
C
F
−3 −2 −1 0 1 2 3
−
3
2
Normal Q−Q Plot of Std Residuals
Theoretical Quantiles
S
a
m
p
le
Q
u
a
n
til
e
s
5 10 15 20
0
.0
1
.0
p values for Ljung−Box statistic
lag
p
v
a
lu
e
m_CREF_3$ttable
Estimate SE t.value p.value
ar1 -0.6486 0.1586 -4.0906 0.0001
ma1 0.7362 0.1396 5.2746 0.0000
constant 0.0954 0.0604 1.5781 0.1152
m_CREF_3$AIC
[1] 1.510735
# d)
m_CREF_4 <- sarima(CREF, 1,1,2)
14
Standardized Residuals
Time
0 100 200 300 400 500
−
3
2
Model: (1,1,2)
0 5 10 15 20 25 30
−
0
.2
ACF of Residuals
LAG
A
C
F
−3 −2 −1 0 1 2 3
−
3
2
Normal Q−Q Plot of Std Residuals
Theoretical Quantiles
S
a
m
p
le
Q
u
a
n
til
e
s
5 10 15 20
0
.0
1
.0
p values for Ljung−Box statistic
lag
p
v
a
lu
e
m_CREF_4$ttable
Estimate SE t.value p.value
ar1 -0.5241 0.2707 -1.9359 0.0534
ma1 0.5955 0.2721 2.1882 0.0291
ma2 -0.0443 0.0603 -0.7340 0.4633
constant 0.0952 0.0584 1.6312 0.1035
m_CREF_4$AIC
[1] 1.513698
a) Both of the AR coefficients are insignificant under the 95% confidence level. The residual/time plot shows
that the residual is random; the ACF of residuals plot show the ACF of residuals remain within boundary
lines; the Normal Q-Q plot shows the residuals are normally distributed. The AIC value is 1.514609.
b) The MA term is significant, the AR(1) term is significant also, the AR(2) term is insignificant. The
residual/time plot shows that the residual is random; the ACF of residuals plot show the ACF of residuals
remain within boundary lines; the Normal Q-Q plot shows the residuals are normally distributed. The AIC
improved a little bit, it is 1.513671 now.
c) Now both AR term and MA term are significant. By comparing the Ljung-Box statistic, we can see that
some of the p-value become larger. The AIC improved a little bit, it is 1.510735 now.
d) The additional MA term is not significant. The AIC is 1.513698.
e) By comparing all the model we fit above, we should use the ARMA(1,1) model. Because all the terms in
this model are significant, it has the lowest AIC value, and the Ljung-Box statistic also indicates it is the
most appropriate model.
15
Chapter 9 Problems
9. For an AR(1) model with Yt = 12.2, φ = −0.5, and µ = 10.8 (think of µ as the β0 you are used to
seeing).
a. Write down the corresponding AR(1) model.
Yt = µ+ φ(Yt−1 − µ) + et => 12.2 = 10.8 − 0.5(Yt−1 − 10.8) + et
b. Find ˆYt+1
Ŷt+1 = µ+ φ(Yt − µ) => 10.8 − 0.5(12.2 − 10.8) = 10.1
c. Find ˆYt+3
Ŷt+2 = µ+ φ(Yt+1 − µ) => 10.8 − 0.5(10.1 − 10.8) = 11.15
Ŷt+3 = µ+ φ(Yt+2 − µ) => 10.8 − 0.5(11.15 − 10.8) = 10.625
10. Suppose that annual sales (in millions of dollars) of the Acme Corporation follow the AR(2) model
Yt = 5 + 1.1Yt−1 − 0.5Yt−2 + et. If sales for 2005, 2006, and 2007 were $9 million, $11 million, and $10
million, respectively, forecast sales for 2008 and 2009.
Yt+1 = 5 + 1.1Yt − 0.5Yt−1
Y2008 = 5 + 1.1 ∗ 10 − 0.5 ∗ 11 = 10.5
Y2009 = 5 + 1.1 ∗ 11 − 0.5 ∗ 10.5 = 11.85
16
Chapter 6 Problems (No need to fit models for the Ch 6 problems, only suggest appropriate model orders. Ch 8 problems will require fitting.)
Chapter 8 Problems (Now you will need to fit models)
Chapter 9 Problems