程序代写代做代考 algorithm Homework 2 Solutions

Homework 2 Solutions

Homework 2 Solutions
MAS 640 – Time Series Analysis and Forecasting

Due Friday, 2/2 by Midnight

The assignment must be completed in R Markdown. One person per group should submit
.Rmd and PDF files to Blackboard. Late submissions will be penalized 10% per day.

All plots should be properly formatted (axis labels, units, title, etc..).

Problem 1

Generate three time series datasets, each of length n = 200, including (i) an AR(1) with φ = −0.6, (ii) an
MA(1) with θ = 0.8 and (iii) an ARMA(1,1) with φ = −0.6 and θ = 0.8. For each one,

Note that this is random, so unless you set the same seed as I do, your answers may differ

a. plot the observed time series
set.seed(1)
ar1 <- arima.sim(list(order=c(1,0,0), ar=-.6), n=200) ma1 <- arima.sim(list(order=c(0,0,1), ma=.8), n=200) arma11 <- arima.sim(list(order=c(1,0,1), ar=-.6, ma=.8), n=200) plot(ar1) plot(ma1) plot(arma11) b. plot the sample ACF, the sample PACF, and the sample EACF acf2(ar1, details=F) eacf(ar1) acf2(ma1, details=F) eacf(ma1) acf2(arma11, details=F) eacf(arma11) c. Are the sample ACF, PACF, and EACF as you expected? Why or why not? Solution: For the most part yes. The ACF cut off after 1 lag for the MA(1) and tailed off for the AR(1) and the ARMA(1,1). The PACF cut off after lag 1 for the AR(1) and tailed off for the MA(1) and ARMA(1,1). The EACF, however, did not indicate an ARMA(1,1), but rather an MA(2). It’s never perfect! d. Increase the sample size to n = 20, 000 and repeat parts a-c. ar1 <- arima.sim(list(order=c(1,0,0), ar=-.6), n=20000) ma1 <- arima.sim(list(order=c(0,0,1), ma=.8), n=20000) arma11 <- arima.sim(list(order=c(1,0,1), ar=-.6, ma=.8), n=20000) plot(ar1) plot(ma1) plot(arma11) acf2(ar1) eacf(ar1) acf2(ma1) eacf(ma1) 1 acf2(arma11) eacf(arma11) SOlution: For the AR(1) and MA(2) models, the ACF and PACF very clearly indicate the correct model and behave according to the theory. The ARMA(1,1) is still not indicated by the EACF, as it now indicates perhaps an ARMA(1,3) or an ARMA(2,2) Problem 2 The cars dataset gives daily rental car demand from January 2014 to August 2015 for a major rental car company. They are interested in forecasting demand for the next 30 days so they can determine how to increase/decrease the size of their fleet. • The dataset contains the following variables: – DATE – n - representing demand, the time series of interest – WEEKDAY – MONTH – YEAR a. Plot demand over time and comment on any trends. plot(cars$n, ylab='Demand', type='l') Comments: Clearly an increasing trend with seasonality. Variability appears constant for the most part. Something appears to have happened in late 2014 or early 2015, as the steady trend appears to have almost “reset”. b. Add a time variable to the dataset, running from time=1 to time=590. cars$time <- 1:590 c. Build a suitable regression model to the data. Plot the fitted values over the original time series plot. m <- lm(n ~ time + WEEKDAY + MONTH, data=cars) plot(cars$n, type='l', ylab='Demand', xlab='Time Period') lines(y=m$fitted.values, x=cars$time, col='red') d. Construct the time series plot of the residuals. Does the series appear stationary? plot(m$residuals, type='l', xlab='Time Period', ylab='Residual') Yes, it looks stationary e. Plot the ACF, PACF, and EACF of the residuals. After inspecting each of them, what AR and MA order (p and q) might be a reasonable place to start modeling? Note - this is real data, and I don’t expect there to be a clear answer here. It’s an art not a science. acf2(m$residuals) eacf(m$residuals) The ACF and PACF both appear to tail off, so we’re looking at possibly an ARMA model. Given the results in the EACF, I would probably start with an ARMA(1,3) and begin overfitting as needed from there. Problem 3 Up until now, we have been fitting regression models to detrend a time series and then modeling the residuals. In this problem, we look at an alternative approach. Rather than build a model for the trend, we will simply 2 difference the time series, and then model those differences. You will need the ibm data that is posted to Blackboard for this problem, which contains the daily closing price of IBM stock over a year. a. Plot the ibm time series. Do you think you could fit a good linear regression model to this data? Why or why not? Note - redefine ibm = ts(ibm) to treat it as a time series, otherwise you will likely have to add a time column to the dataset and use it as the x-variable in the time series plot. ibm <- ts(ibm) plot(ibm) b. Create a new time series, ibm.diffs, that represent differences in the ibm series. Plot the differenced data and comment on its appearance. ibm.diffs <- diff(ibm) plot(ibm.diffs) Comments: The differenced data appears roughly stationary, with possible non-constant variance c. Plot the sample ACF, PACF, and EACF for the differenced data. What do the findings suggest in terms of model order? acf2(ibm.diffs) eacf(ibm.diffs) Given results of all three plots, it looks like maybe the differences are white noise, lacking any correlation that we can try to model (ARMA(0,0)) d. Fit an ARMA(1,1) model to the differenced data using sarima(ibm.diffs, 1, 0, 1). Report φ and θ (the estimated AR and MA coefficients) and the AIC value. sarima(ibm.diffs, 1, 0, 1) φ̂ = 0.0399, θ̂ = 0.046, AIC=4.97. Both AR and MA coefficient estimates are pretty small and are statisticall insignificant, providing further evidence that the difference are white noise. NOTE: If you used arima() rather than sarima() you likely got different estimates. By default, they use different algorithms for obtaining estimates (Maximum-Likelihood vs OLS vs some others). e. Re-fit the ARMA(1,1) model, this time to the original ibm time series, using sarima(ibm, 1, 1, 1). Report φ and θ (the estimated AR and MA coefficients) and the AIC value. sarima(ibm.diffs, 1, 1, 1) The AR and MA terms, as well as the AIC value are completely unchanged. We fit the exact same model f. After comparing part (d) and (e), what does the “d” represent when specifying the model (i.e. in sarima(x, p, d, q))? The d simply tells the model to difference the data before fitting the corresponding ARMA(p, q) model 3 Problem 1 Problem 2 Problem 3