STAT 443: Forecasting Fall 2020 Assignment 2
When you upload to Crowdmark please have a separate page for each of the 12 questions. You should submit your work before 5:00pm (Waterloo time) 23rd October.
It is allowed to work with others on this assignment but if you do you must submit a joint group submission to Crowdmark. You can create your own groups, and any member of the group can submit on behalf of the group and a single group assessment will be graded. All students in the group will receive a copy of the graded feedback in their Crowdmark portfolio.
This assignment mostly looks the theoretical, computational and math- ematical aspects of Chapters 2 and 3 from the notes
1
Set 1: For this set of questions you need to download the files Assigment2Set1.csv, and Assigment2newdata.csv from Learn and read into an R session using:
library(MASS)
library(lars)
set1 <-as.data.frame(read.csv(file="Assigment2Set1.csv", header=TRUE))
attach(set1)
xnewset1 <-as.data.frame(read.csv(file="Assigment2newdata.csv", header=TRUE))
xnewset1 <- xnewset1[,1]
1. (5 marks)
(a) (3 marks) In your R session explore the first and second moment structure of the variables, Y,X1,...,X9, and report what you have found using only simple english sentences.
[You should not submit code or numerical output]
(b) (2 marks) Fit the linear regression of Y on X1, . . . , X9. Are any of the variable significant? What would you use from your analysis to assess the overall fit of the model to the data?
[Note: Since all variables are centred your model should not have a intercept term. Also you should not submit code]
2. (5 marks)
(a) (4 marks) Prove that if Σ is a real symmetric matrix and xT0 x0 = 1 then the maximum value of xT0 Σx0 equals λmax which is the largest eigenvalue of Σ.
[Hint: Lagrange multipliers and constrained optimisation can be used here]
(b) (1 mark) Submit R code which allows us to find the maximum eigenvalue of (XT X)−1 where X is the matrix whose ith column is Xi.
3. (5 marks) An α%-prediction interval for multiple regression is given by
T T −1 μ±cασ 1+x0(X X) x0
2
α2 where cα is the 1 − 2 -quantile from the tN−p-distribution and σ is
defined in the centred case by
1 N 22
σ:=N−p (yi−yi). i=1
(a) (1 mark) The result of a backward search using stepAIC, from the MASS library, is
lm(formula = Y ~ X1 + X3 + X6 - 1, data = set1)
Explain briefly what the stepAIC function had done here.
(b) (2 marks) Using the predict.lm() function, show how to com- pute the 95%-prediction intervals for both this smaller model and the full model when xnewset1 is the new value of the explanatory variates i.e. x0.
(c) (2 mark) The choice of xnewset1 was particularly hard for the full model to predict. By looking at the eigenvectors of (XT X)−1 explain why this is so.
4. (5 marks)
(a) (2 marks) Show how to code in R the penalty terms used by the LASSO using the output from
lm(formula = Y ~ X1 + X3 + X6 - 1, data = set1)
(b) (3 marks) We have seen that having multi-colinearity can gener- ate large eigenvectors in the matrix (XTX)−1 which results in large prediction intervals. The corresponding object with ridge regression is (X T X + λ1p×p )−1 where λ is the tuning parameter. In R create a graph which shows how the largest eigenvalue de- pends on λ for the full data.
[Post the plot and a short description of what you found only]
(c) (1 mark) If increasing λ reduces the size of the prediction interval
should we make λ as large as possible?
3
Set 2: This set of questions looks at the best linear forecast method. Assume in this set of questions that E(Xt) = 0 for all t and
Cov(X,X)=σ2 <∞. t s ts
5. (5 marks) Suppose we want to forecast Xn+h using the information within the random vector X = (X1, X2, · · · Xn). Consider the linear predictor nt=1 βtXt of Xn+h for h > 0.
[We are not assuming the sequence is stationary]
(a) (2 marks) Write down the MSE for the linear predictor as a func-
tion of (β1,…,βn)
(b) (3 marks) In the case n = 2 show how find a set of linear equations
which defines β in terms of σ2 t ts
[You do not need to solve the equations]
6. (5 marks) For the general case of Question 5 writing the equations that define the best linear predictor as
βˆ Σ = a
where β = (β1,…,βn) and a = (Cov(X1,Xn+h),…,Cov(Xn,Xn+h)) and Σ is a real positive definite symmetric matrix. The smallest MSE is then
E (Xn+h − βˆXT )(Xn+h − βˆXT )T = V ar(Xn+h) − aΣ−1aT
(a) (3 marks) If the process is stationary can the prediction accuracy,
as measured by MSE, can get unboundedly large as h → ∞?
(b) (2 marks) When Xn = nZn ,where Zn ∼ WN(0, 1), what happens
to the MSE as h → ∞?
7. For the general case of Question 5 of forecasting Xn+h if we want to
use the method we would need to estimate the terms σ2 from data. ts
(a) (3 marks) Explain how assuming stationarity reduces the number of different values of σ2 that are needed to be estimated from
data.
ts
(b) (2 marks) For the observed data xn,xn−1,···x1 and assuming stationarity, how many observations can we use to estimate (a) σ020 and (b) σn20
4
Set 3: This set of questions looks at properties of the ARMA model
8. (5 marks) Consider an MA(3) model of the form
Xt = Zt + θ1Zt−1 + θ3Zt−3 where Zt ∼ WN(0, σ2).
(a) (1mark)ComputeE(Xt)intermsoftheparametersofthemodel.
(b) (3 marks) Compute Cov(X10,X12) in terms of the parameters of
the model.
(c) (1 mark) Do we get the same answer for Cov(X10+h,X12+h) as for part (b)?
9. (5 marks) When Theorem 3.8.4 requirements are satisfied we have a solution to the AR(1) equation of the form
∞
Xt =ψjZt−j,
j=0
[In this question you can assume the all the infinite sums have the
same properties as finite sums]
(a) (1 mark) Compute E(Xt) in terms of its representation as an infinite sum.
(b) (2 marks) Show how to compute an expression for V ar(Xt) in terms of the representation as a sum.
(c) (2 mark) Show that the process is variance stationary by direct calculation.
10. (5 marks) Consider the ARMA model
(1+1B)Xt =(1−2B+B2)Zt (1) 2
where Zt ∼ WN(0, 22).
(a) (3 marks) Formally expand this ARMA model, to represent it as an MA(∞) model, numerically computing the coefficients up to, and including, all order 4 terms.
(b) (2 marks) Using a Theorem from the notes, explain if the solution is stationary and causal.
5
Set 4: This set of question looks at using R. We can simulate ARMA models using the arima.sim() function, fit models using the arima() func- tion and for descriptive plots, acf( ) is very useful.
11. (5 marks) We can simulate data from Model 1 and then see if, just using the data, we can get good estimates of the ACF function. Further the ARMAacf( ) function computes the exact auto-covariance for an ARMA model. Both methods are shown in Fig. 1 with the estimated values being the black vertical lines and theoretical ACF the red dots.
Estimate and True ACF
0 5 10 15 20
Lag
Figure 1: Estimated (black lines) and Theoretical ACF (red dots) Starting your R code with
set.seed(2020)
n.sim = 500 #Sample size
show what code will reproduce Fig. 1.
12. (5 marks) We have a Theorem which tells us when a ARMA model is stationary. However there are models which mathematically sat- isfy this condition but are still ‘close’ to being non-stationary. Two different cases are shown in Figure 2
(a) (1 mark) Which of the plots corresponds to a model which is close to an AR(1) model with φ being close to −1. Informally justify your choice.
6
ACF
−0.5 0.0 0.5 1.0
(a)
0 100 300
Time
(c)
0 100 300
Time
500
(b) Estimate and True ACF
0 5 10 15 20
Lag
(d) Estimate and True ACF
0 5 10 15 20
Lag
500
Figure 2: Data sets from two different models (a) and (c) and their ACF (b) and (d) respectively
7
Data
Data
−6 −4 −2 0 2 4 6
−40 −20 0 20 40
ACF
ACF
0.0 0.2 0.4 0.6 0.8 1.0
−1.0 −0.5 0.0 0.5 1.0
(b) (1 mark) Describe what is happening to the ACF plots when the model is close to being non-stationary?
(c) (1 mark) What happens to the theoretical ACF in the non- stationary case?
(d) (2 marks) Describe quantitatively the patterns you see in Panels (a) and (c)
8