程序代写代做代考 Homework 3

Homework 3

Homework 3
Yining Zhou, Yiyun Tang, Manling Yao

Due February 12, 2018 by Midnight

Chapter 6 Problems (No need to fit models for the Ch 6 problems, only suggest
appropriate model orders. Ch 8 problems will require fitting.)

1. Problem 6.33 – The data file named deere1 in the TSA package contains 82 consecutive values for the
amount of deviation from a specified target value that an industrial machining process at Deere & Co.
produced under certain specified operating conditions.
a. Display the time series plot and comment on any unusual points

data(‘deere1’)
plot(deere1, type = ‘o’)

Time

d
e

e
re

1

0 20 40 60 80

0
1

0
2

0
3

0

The point at time 27 seems to be an unusual point.

b. Calculate (plot) the sample ACF for this series and comment on the results.
acf(deere1)

1

5 10 15


0

.2

0
.1

0
.0

0
.1

0
.2

Series deere1

Lag

A
C

F

It seems there are no correlation remaining in the residuals.

c. Replace the unusual value by a much more typical value and recalculate the sample ACF. Comment on the change from what you saw in part (b).
deere1[deere1 == 30] <- 7 acf(deere1) 5 10 15 − 0 .2 − 0 .1 0 .0 0 .1 0 .2 Series deere1 Lag A C F 2 It seems like the correlation of the residuals changed a little bit. I am not sure, it does not seems like there are significant correlations remaining in the residuals, or the q might be 1 d. Calculate the PACF and EACF based on the revised series that you used in part (c). What model would you specify for this revised series? pacf(deere1) 5 10 15 − 0 .2 − 0 .1 0 .0 0 .1 0 .2 Lag P a rt ia l A C F Series deere1 eacf(deere1) AR/MA 0 1 2 3 4 5 6 7 8 9 10 11 12 13 0 o x o o o o o o o o o o o o 1 o x o o o o o o o o o o o o 2 x x o o o o o o o o o o o o 3 x x x o o o o o o o o o o o 4 x o o x o o o o o o o o o o 5 x o o o o o o o o o o o o o 6 x x x o o o o o o o o o o o 7 o x o o o o o o o o o o o o I would probably start with ARMA(1,1) or MA(2) 2. Problem 6.34 - The data file deere2 contains 102 consecutive values for the amount of devation from a specified target value that another industrial machining process produced at Deere & Co. a. Display the time series plot and comment on its appearance. Would a stationary model seem to be appropriate? b. Display the sample ACF, PACF, and EACF. Select tentative orders for an ARMA model for the series. data("deere2") plot(deere2) 3 Time d e e re 2 0 20 40 60 80 100 − 3 0 − 2 0 − 1 0 0 1 0 2 0 par(mfrow = c(2,2)) acf(deere2) pacf(deere2) par(mfrow = c(1,1)) 5 10 15 20 − 0 .2 0 .4 Series deere2 Lag A C F 5 10 15 20 − 0 .2 0 .4 Lag P a rt ia l A C F Series deere2 eacf(deere2) AR/MA 0 1 2 3 4 5 6 7 8 9 10 11 12 13 0 x x x o o o x o o o o o o o 1 o o x o o o o o o o o o o o 2 x o o x o o o o o o o o o o 3 x o o x o o o o o o o o o o 4 x o x x o o o o o o o o o o 5 x x o o o o o o o o o o o o 6 o o x o o o o o o o o o o o 7 x o o o o o o o o o o o o o The stationary model could be appropriate. I can see the data are approximately satationary through time. So if we fit ARMA model with arima, I would guess the order might be (1,0,0) 3. Problem 6.36 - The data file robot contains a time series obtained from an industrial robot. The robot 4 was put through a sequence of of maneuvers, and the distance from a desired ending point was recorded in inches. This was repeated 324 times to form the time series. a. Display the time series plot of the data. Based on this information, do these data appear to come from a stationary or nonstationary process? b. Plot the sample ACF, PACF, and EACF for this time series, and suggest model orders for an ARMA model to model these data. c. Difference the time series and repeat part (b). Do the models agree? Without actually fitting the models, which do you prefer based on the evidence from parts (a) and (b)? data("robot") # b) plot(robot) Time ro b o t 0 50 100 150 200 250 300 − 0 .0 0 5 0 .0 0 0 0 .0 0 5 par(mfrow = c(2,2)) acf(robot) pacf(robot) par(mfrow = c(1,1)) 5 10 15 20 25 − 0 .1 0 .2 Series robot Lag A C F 5 10 15 20 25 − 0 .1 0 .2 Lag P a rt ia l A C F Series robot eacf(robot) AR/MA 0 1 2 3 4 5 6 7 8 9 10 11 12 13 5 0 x x x x x x x x x o x x x x 1 x o o o o o o o o o o o o o 2 x x o o o o o o o o o o o o 3 x x o o o o o o o o o o o o 4 x x x x o o o o o o o o x o 5 x x x o o o o o o o o o x o 6 x o o o o x o o o o o o o o 7 x o o x o x x o o o o o o o # c) plot(diff(robot)) Time d iff (r o b o t) 0 50 100 150 200 250 300 − 0 .0 0 5 0 .0 0 0 0 .0 0 5 0 .0 1 0 par(mfrow = c(2,2)) acf(diff(robot)) pacf(diff(robot)) par(mfrow = c(1,1)) 5 10 15 20 25 − 0 .4 0 .0 Series diff(robot) Lag A C F 5 10 15 20 25 − 0 .4 0 .0 Lag P a rt ia l A C F Series diff(robot) eacf(diff(robot)) AR/MA 0 1 2 3 4 5 6 7 8 9 10 11 12 13 0 x o o o o o o o o x o o o o 6 1 x x o o o o o o o x o o o o 2 x x x o o o o o o o o o o o 3 x x x x o o o o o o o o o o 4 x x x o o o o o o o o o o o 5 x o o o o x o o o o o o o o 6 x x o x o x x o o o o o o o 7 x o o o o x x o o o o o o o a)From the plot, we can see that its not very stationary. I can see a slight decreasing trend. b)After examing the acf and pacf plot in b), I cannot tell which model should I fit, so based on the eacf plot, I might start with ARMA(1,1) c) No, after difference the time series, it seems like a MA(1). Without actually fitting the model, I would prefer the second model because ARMA model cannot solve the unstationary data. 4. Problem 6.37 - Look at the ACF, PACF, and EACF for the logarithms of the larain time series. Argue that the results confirm that the logs are white noise. data("larain") par(mfrow = c(2,2)) acf(log(larain)) pacf(log(larain)) par(mfrow = c(1,1)) 5 10 15 20 − 0 .2 0 .0 Series log(larain) Lag A C F 5 10 15 20 − 0 .1 0 .1 Lag P a rt ia l A C F Series log(larain) eacf(log(larain)) AR/MA 0 1 2 3 4 5 6 7 8 9 10 11 12 13 0 o o o o o o o o o o o o o o 1 o o o o o o o o o o o o o o 2 x x o o o o o o o o o o o o 3 x o o o o o o o o o o o o o 4 x o o o o o o o o o o o o o 5 x x x x x o o o o o o o o o 6 x x o o x o o o o o o o o o 7 x o x o o o o o o o o o o o Both ACF and PACF plot show that the residuals remain within boundary lines, which indicates that there are no correlation remaining in the residuals; the EACF plot suggest ARMA(0,0), which also indicates there are no correlation remaining in the residuals. Thus we can conclude that the logs are white noise. 5. Problem 6.38 - Using the color dataset. . . a. Using the ACF and PACF, suggest a model for this time series. b. Using the EACF, do you suggest the same model or a different one? Note: You will likely get an error message when you run the EACF. By default, the eacf function calculates the EACF for 7 models up to q = 13 and p = 7, and it is running into problems in the higher orders. Fix this by setting the max number of AR and MA terms to smaller numbers (in this particular example, I believe they need to sum to less than 17). data("color") par(mfrow = c(2,2)) acf(color) pacf(color) par(mfrow = c(1,1)) 2 4 6 8 10 12 14 − 0 .4 0 .2 Series color Lag A C F 2 4 6 8 10 12 14 − 0 .2 0 .4 Lag P a rt ia l A C F Series color eacf(color, ar.max = 6, ma.max = 10) AR/MA 0 1 2 3 4 5 6 7 8 9 10 0 x o o o o o o o o o o 1 o o o o o o o o o o o 2 o o o o o o o o o o o 3 x o o o o o o o o o o 4 o o o o o o o o o o o 5 x o o o o o o o o o o 6 x o o o o o o o o o o a) Using the ACF and PACF, I would suggest AR(1). b) Using the EACF, I would suggest a different one, which is MA(1). Chapter 8 Problems (Now you will need to fit models) Note: If you need to work with residuals, you need to use arima() rather than sarima(). You will then have access to residuals in the usual way (i.e. model$residuals, residuals(model), or rstandard(model)). 6. Problem 8.4 - Simulate an AR(1) model with n=30 and φ = 0.5 (use set.seed(1) so we all have the same results). a. Fit the correctly specified AR(1) model and look at the time series plot of the residuals. Does this plot support the AR(1) specification? Why or why not? b. Display a normal QQ plot of the standardized residuals. Does the plot support the AR(1) specification? c. Display the sample ACF of the residuals. Does the plot support the AR(1) specification? d. Calculate the Ljung-Box statistic with K = 8. What are the null and alternative hypotheses being tested? Do the results support the AR(1) specification? set.seed(1) ar1 <- arima.sim(list(order = c(1,0,0), ar = .5), n = 30) 8 # a) m_ar1 <- arima(ar1, order = c(1,0,0)) plot(m_ar1$residuals) Time m _ a r1 $ re si d u a ls 0 5 10 15 20 25 30 − 2 .0 − 1 .0 0 .0 1 .0 # b) qqnorm(rstandard(m_ar1)) qqline(rstandard(m_ar1)) −2 −1 0 1 2 − 2 − 1 0 1 Normal Q−Q Plot Theoretical Quantiles S a m p le Q u a n til e s 9 shapiro.test(rstandard(m_ar1)) Shapiro-Wilk normality test data: rstandard(m_ar1) W = 0.92483, p-value = 0.03586 # c) acf(m_ar1$residuals) 2 4 6 8 10 12 14 − 0 .3 − 0 .1 0 .1 0 .3 Series m_ar1$residuals Lag A C F # d) Box.test(m_ar1$residuals, lag = 8, type = 'Ljung-Box') Box-Ljung test data: m_ar1$residuals X-squared = 4.621, df = 8, p-value = 0.7972 a) Yes, the residual seems random, and the mean of residual is very close to 0. b) Yes, although there are some outliers(probably related to our small sample size), but most points stays along the line, the residuals seems normally distributed. c) Yes, the ACF of residuals remain within boundary lines. d) the p-value is 0.7972. H0 : ARMA(p,q) model is appropriate Ha : ARMA(p,q) model is not appropriate 10 Since the p-value is large, we do not reject the null hypothesis. We can conclude that the ARMA(p,q) model is appropriate. This result support the AR(1) specification. 6. We discussed the process of “overfitting” a model. Please describe what this means, what you are doing at each iteration of the process, and what you hope to achieve. Your discussion should include talk about the model order (p and q), coefficient estimates and significance, residual diagnostics, and AIC values. Finally, if we determined an appropriate model order via ACF/PACF/EACF, why do we need to do all of this overfitting now? Overfitteing involves fitting a model more complicated than the one currently being considered. It examining the significance of the additional terms, and the changes in estimates from the assumed model. When overfitting an ARMA(p,q) model, we consider to do ARMA(p+ 1,q) and then ARMA(p,q + 1) and repeat these steps.We hope to see if the addtional term we added is significant, and do the residual analysis to see if things improved(such as: if standardized residuals looks random now; is the ACF of residuals remain within boundary lines; are the residuals normally distributed; if the p-value for Ljung-Box statistic above the boundary line). Even if we determined an appropriate model order via ACF/PACF/EACF, it is still worth to do the overfitting for further evidence. So if additional term is insignificant, it indicates that more complicated model is not needed; if additional term is significant, it indicates that the additional term is worth considering be adding into the model. 7. Using the CREF dataset - In previous lectures, we determined an AR(2) appeared appropriate (after differencing) for this time series. Fit the AR(2) model using the sarima() function. a. Are the two AR coefficients significant? What do the residual plots suggest? What is the AIC value for this model? b. Overfit the model by including an MA term. Is the MA term significant? What about the two AR terms? How do the residual plots look? What is the AIC value? c. Remove the insignificant AR term from the model and rerun. Compare this model with the ARMA(2, 1) fit in part (b). d. Finally, overfit the model by including yet another MA term. Is the additional MA term significant? What is the AIC value for this model? e. Based on everything you’ve seen, which model do you prefer and why? data("CREF") m_CREF <- sarima(CREF, 2,1,0) 11 Standardized Residuals Time 0 100 200 300 400 500 − 3 2 Model: (2,1,0) 0 5 10 15 20 25 30 − 0 .2 ACF of Residuals LAG A C F −3 −2 −1 0 1 2 3 − 3 2 Normal Q−Q Plot of Std Residuals Theoretical Quantiles S a m p le Q u a n til e s 5 10 15 20 0 .0 1 .0 p values for Ljung−Box statistic lag p v a lu e # a) m_CREF$ttable Estimate SE t.value p.value ar1 0.0641 0.0448 1.4312 0.1530 ar2 -0.0757 0.0448 -1.6903 0.0916 constant 0.0950 0.0569 1.6716 0.0952 m_CREF$AIC [1] 1.514609 # b) m_CREF_2 <- sarima(CREF, 2,1,1) 12 Standardized Residuals Time 0 100 200 300 400 500 − 3 2 Model: (2,1,1) 0 5 10 15 20 25 30 − 0 .2 ACF of Residuals LAG A C F −3 −2 −1 0 1 2 3 − 3 2 Normal Q−Q Plot of Std Residuals Theoretical Quantiles S a m p le Q u a n til e s 5 10 15 20 0 .0 1 .0 p values for Ljung−Box statistic lag p v a lu e m_CREF_2$ttable Estimate SE t.value p.value ar1 -0.5793 0.2207 -2.6251 0.0089 ar2 -0.0405 0.0548 -0.7392 0.4601 ma1 0.6502 0.2170 2.9960 0.0029 constant 0.0952 0.0584 1.6287 0.1040 m_CREF_2$AIC [1] 1.513671 # c) m_CREF_3 <- sarima(CREF, 1,1,1) 13 Standardized Residuals Time 0 100 200 300 400 500 − 3 2 Model: (1,1,1) 0 5 10 15 20 25 30 − 0 .2 ACF of Residuals LAG A C F −3 −2 −1 0 1 2 3 − 3 2 Normal Q−Q Plot of Std Residuals Theoretical Quantiles S a m p le Q u a n til e s 5 10 15 20 0 .0 1 .0 p values for Ljung−Box statistic lag p v a lu e m_CREF_3$ttable Estimate SE t.value p.value ar1 -0.6486 0.1586 -4.0906 0.0001 ma1 0.7362 0.1396 5.2746 0.0000 constant 0.0954 0.0604 1.5781 0.1152 m_CREF_3$AIC [1] 1.510735 # d) m_CREF_4 <- sarima(CREF, 1,1,2) 14 Standardized Residuals Time 0 100 200 300 400 500 − 3 2 Model: (1,1,2) 0 5 10 15 20 25 30 − 0 .2 ACF of Residuals LAG A C F −3 −2 −1 0 1 2 3 − 3 2 Normal Q−Q Plot of Std Residuals Theoretical Quantiles S a m p le Q u a n til e s 5 10 15 20 0 .0 1 .0 p values for Ljung−Box statistic lag p v a lu e m_CREF_4$ttable Estimate SE t.value p.value ar1 -0.5241 0.2707 -1.9359 0.0534 ma1 0.5955 0.2721 2.1882 0.0291 ma2 -0.0443 0.0603 -0.7340 0.4633 constant 0.0952 0.0584 1.6312 0.1035 m_CREF_4$AIC [1] 1.513698 a) Both of the AR coefficients are insignificant under the 95% confidence level. The residual/time plot shows that the residual is random; the ACF of residuals plot show the ACF of residuals remain within boundary lines; the Normal Q-Q plot shows the residuals are normally distributed. The AIC value is 1.514609. b) The MA term is significant, the AR(1) term is significant also, the AR(2) term is insignificant. The residual/time plot shows that the residual is random; the ACF of residuals plot show the ACF of residuals remain within boundary lines; the Normal Q-Q plot shows the residuals are normally distributed. The AIC improved a little bit, it is 1.513671 now. c) Now both AR term and MA term are significant. By comparing the Ljung-Box statistic, we can see that some of the p-value become larger. The AIC improved a little bit, it is 1.510735 now. d) The additional MA term is not significant. The AIC is 1.513698. e) By comparing all the model we fit above, we should use the ARMA(1,1) model. Because all the terms in this model are significant, it has the lowest AIC value, and the Ljung-Box statistic also indicates it is the most appropriate model. 15 Chapter 9 Problems 9. For an AR(1) model with Yt = 12.2, φ = −0.5, and µ = 10.8 (think of µ as the β0 you are used to seeing). a. Write down the corresponding AR(1) model. Yt = µ+ φ(Yt−1 − µ) + et => 12.2 = 10.8 − 0.5(Yt−1 − 10.8) + et
b. Find ˆYt+1

Ŷt+1 = µ+ φ(Yt − µ) => 10.8 − 0.5(12.2 − 10.8) = 10.1

c. Find ˆYt+3
Ŷt+2 = µ+ φ(Yt+1 − µ) => 10.8 − 0.5(10.1 − 10.8) = 11.15

Ŷt+3 = µ+ φ(Yt+2 − µ) => 10.8 − 0.5(11.15 − 10.8) = 10.625

10. Suppose that annual sales (in millions of dollars) of the Acme Corporation follow the AR(2) model
Yt = 5 + 1.1Yt−1 − 0.5Yt−2 + et. If sales for 2005, 2006, and 2007 were $9 million, $11 million, and $10
million, respectively, forecast sales for 2008 and 2009.

Yt+1 = 5 + 1.1Yt − 0.5Yt−1
Y2008 = 5 + 1.1 ∗ 10 − 0.5 ∗ 11 = 10.5

Y2009 = 5 + 1.1 ∗ 11 − 0.5 ∗ 10.5 = 11.85

16

Chapter 6 Problems (No need to fit models for the Ch 6 problems, only suggest appropriate model orders. Ch 8 problems will require fitting.)
Chapter 8 Problems (Now you will need to fit models)
Chapter 9 Problems