STAT 443: Forecasting Fall 2020 Assignment 3
When you upload to Crowdmark please have a separate page for each of the 12 questions. You should submit your work before 5:00pm (Waterloo time) on the 6th November.
It is allowed to work with others on this assignment but if you do you must submit a joint group submission to Crowdmark. You can create your own groups, and any member of the group can submit on behalf of the group and a single group assessment will be graded. All students in the group will receive a copy of the graded feedback in their Crowdmark portfolio.
1
Question Set 1: This set of questions looks at the mathematical and statistical structure of the ARIMA family.
1. (5 marks) Suppose the process {Xt} comes from an MA(2) model with i.i.d. 2
parameters θ1,θ2 and where the innovations satisfy Zt ∼ N(0,σ ).
By by computing the mean vector and the variance-covariance matrix of (X1, . . . , Xn) in terms of the parameters of the model, write down the likelihood function for θ1, θ2, σ.
2. (5 marks) In the same MA(2) model show the process (X1,X2,X3) is causal by explicitly constructing an appropriate linear transformation from {Zt} which maps the innovation process to the observed process.
3. (5 marks) Consider the stationary AR(1) model Xt = φXt−1 + Zt, i.i.d. 2
where Zt ∼ N (0, σ ). Use the results in Theorem 3.6.3 and Theorem 4.1.1 to compute P red(X3|X2, X1) the best linear predictor of X3 given X2,X1 intermsofφandσ2.
4. (5 marks) Consider the model
Xt =β0 +β1t+β2t(t+1)+Zt
i.i.d. 2
where Zt ∼ N(0,σ ). Show that this is an ARIMA model.
Question Set 2: This set of questions explores the ways that sim- ulations experiments can help us understand the properties of statistical models.
5. (5 marks) We can use simulation to understand the properties of max- imum likelihood estimates. The following code generate N.sim in- dependent realisations of an MA(1) model, estimates its parameter values and saves them in a vector for analysis.
MLE.Monte.Carlo <- function(theta,sigma, N.sim, sample.size)
{
out <- NULL
theta.hat <- rep(0, length= N.sim) #store MLE
sigma.hat <- rep(0, length= N.sim) #store MLE
model <- list(ma = c(theta))
for(i in 1:N.sim)
{ #loop
2
dat <- arima.sim(model=model, n=sample.size, sd=sigma) #generates data
fit <- arima(dat, c(0,0,1), method="ML", include.mean = F) #fit
theta.hat[i] <- fit$coef #store result
sigma.hat[i] <- sqrt(fit$sigma2) #store result
}
out$theta.hat <- theta.hat
out$sigma.hat <- sigma.hat
out }
Use this code to estimate what the sampling distribution is for the
maximum likelihood estimate, θ, when the true values are θ = 0.2, σ = 10 and sample.size = 100. Do you think θ is an unbiased estimate of θ?
6. (5 marks) Use the code to estimate what the sampling distribution is for the maximum likelihood estimate θ when the true values are (a) θ = 5,σ = 2 and sample.size == 100, (b) θ = 0.2,σ = 10 and sample.size = 10. Comment on your results.
7. (5 marks) Here we show that the maximum likelihood estimate for an MA(1) model is not unique.
(a) Show if we have two models with θ1, σ1 and θ2, σ2 where θ1 = 1/θ2 and σ12 = θ2σ2 then these two models have the same distribution and hence the same likelihood function. [You might want to review your answer to Question 1]
(b) Hence show if (θ, σ) is a maximum likelihood estimate then so is
( θ − 1 , θ σ ) .
8. (5 marks) Use the theoretical results of Question 7 to interpret the
simulation results of Questions 5 and 6.
i.i.d. 2 9. (5 marks) Consider the model Xt = φXt−1+Zt, where Zt ∼ N(0,σ )
where φ = 3, σ = 1. From the theory in the notes state if this is (i) stationary and (ii) causal.
Show how to write in R a simple for-loop which can generate a re- alisation of this process and graphically demonstrate its stationary properties.
3
[Hint: Don’t use the arima.sim function and think carefully what causality means]
Question Set 3: The Box-Jenkins methodology is defined in Section 4.7.1 of the notes. In these questions we will apply it to simulated time series which have large sample sizes and the only forms of non-stationarity can be dealt with using simple tools. From Learn download the file Assign- ment3Question10.csv and Assignment3Question11.csv and read the into an R session using:
dat1 <- read.csv(file="Assignment3Question10.csv", header=TRUE)
dat1 <- ts(as.vector(dat1), start = 2005, frequency = 52)
dat2 <- read.csv(file="Assignment3Question11.csv", header=TRUE)
dat2 <- ts(as.vector(dat2), frequency = 12, start=1916)
10. (5 marks) Using R, with dat1, show how to do the model identification, model fitting and model checking steps using the functions acf, pacf, qqnorm and arima. Explain clearly what your conclusions are.
[Note: You are not going to be evaluated on selecting the ‘correct model’, rather that you are using and explaining the correct method- ology.]
11. (5 marks) Using R, with dat2, show how to do the model identification, model fitting and model checking steps using the functions acf, pacf, qqnorm and arima. Explain clearly what your conclusions are.
[Note: You are not going to be evaluated on selecting the ‘correct model’, rather that you are using and explaining the correct method- ology.]
12. (5marks)IntheRlibraryforecastyouwillfindthefunctionauto.arima. Explain what this function does and compare its results with the model Model Identification step above for both dat1 and dat2.
4