STAT 443: Forecasting Fall 2020 Assignment 1
When you upload to Crowdmark please have a separate page for each of the 12 questions. You should submit your work before 5:00pm (Waterloo time) on Friday 25th September.
It is allowed to work with others on this assignment but if you do you must submit a joint group submission to Crowdmark. You can create their own groups, an any member of the group can submit on behalf of the group and a single group assessment will be graded. All students in the group will receive a copy of the graded feedback in their Crowdmark portfolio.
Question Sets
Set 1: In the notes we have seen that the R decompose( ) function is designed to find the components when a model can be written as
Xt = mt + st + Yt (1) where E(Yt) = 0 and, for identification reasons, dt=1 st = 0.
1. (5 marks)
(a) (2 marks) Explain why we have the identifications conditions
E(Yt) = 0, and dt=1 st = 0 in the decomposition. (b) (3 marks) Consider a multiplicative decomposition
Xt = mtstYt.
How, and under what conditions, can you transform this model to the form (1) and why would dt=1 st = 1 be the corresponding identification condition?
2. (5 marks) From the Learn site download the file Assignment1Data.csv and read it into an R session using
1
dat <- read.csv(file="Assignment1Data.csv", header=TRUE) xt <- ts(dat["xt"], frequency = 12) mt <- ts(dat["mt"], frequency = 12) st <- ts(dat["st"], frequency = 12) yt <- ts(dat["yt"], frequency = 12) The time series objects xt, mt, st and yt are observed versions of the correspondingly named variables in Equation (1). You can use the R function decompose( ) to get estimates of mt,st and the residual yt from the data in xt. (a) (2 marks) Why, after using the function, does the estimated trend in R have 12 NA values? (b) (2 marks) We have now an estimate of mt from the decompose( ) function and, from the downloaded file, the real value is mt. To measure the quality of the estimate of mt we can compute the sample mean squared error, 1 (mt − mˆ t)2. n Compute the MSE in R, submit the annotated code and the nu- merical value. [Hint: To do this in R you must delete NAs] (c) (1 mark) The function has estimated the seasonal component even though we can see that this is zero. Would this effect the MSE estimated above? 3. (5 marks) (a) (2 marks) If, in Model (1), the variance of Yt was σ2 for all t, what would the variance of the sampling distribution of mˆt be with a 7 point moving average? [Hint for simplicity here assume Yt are i.i.d.] (b) (2 marks) Give an example of a ‘slowly varying’ function, as de- fined in the notes, mt where the estimate from the decompose( ) function would be biased. (c) (1 mark) From 3(b) we see we can reduce the sampling variance by moving from a d = 7 to a d = 14 moving average. Would that also reduce the bias? 2 Set 2: In the notes some simple models for time series have been defined. These are models which define a set of random variables which are ordered in time. These include the simple Markov Chain and the AR(1) models. The Markov Chain model can be a useful way of thinking about how to forecast Xt+h given we have observed Xt, Xt−1, Xt−2, . . . . 4. (5 marks) One way of thinking about forecasting is in terms of prob- ability, that is compute Pr(Xt+h = st+h|Xt = st, Xt−1 = st−1, . . . ). (a) (4 marks) Give a brief, high-level explanation, of how the struc- ture of a discrete state Markov chain allows us to compute this probability forecast if we know the one-step transition matrix P{Xt+1 =j|Xt =i}=pij where there are K states i = 1,...,K. (b) (1 mark) State a result connecting the eigenvector of (pij) with the largest eigenvalue, and the stationary (equilibrium) distribu- tion of the Markov chain 5. (5 marks) We can explore numerically this form of probability forecast. Suppose the 1-step transition matrix is 0.8 0.1 0.1 P = 0 . 1 0 . 8 0 . 1 0.1 0.1 0.8 (a) (2 marks) Compute numerically, by using matrix algebra in R, what the equilibrium distribution is. Show the annotated code you used. (b) (3 marks) In this case is there more uncertainty in predicting the far future than the medium future? 3 Set 3: Now lets look at a model which generates continuous random variables: the AR(1) model. Let Zt, t ∈ Z be an i.i.d. sequence of N(0,1) random variables. The AR(1) model is defined by Xt = φXt−1 + Zt for all t ∈ Z. Let us consider a simple example, (X0, X1, X2, X3), where the initial condition is X0 = Z0. By independence, the random vector (Z0, Z1, Z2, Z3) has a four dimensional Multivariate Normal distribution with mean the zero vector and identity matrix as the covariance matrix. 6. (5 marks) Show how to write (X0, X1, X2, X3) as a linear transforma- tion of (Z0, Z1, Z2, Z3). [Hint: Use expressions of the form X1 = φX0 + Z1 = φZ0 + Z1] 7. (5 marks) (a) (1 mark) Write the linear transform as a matrix, M. (b) (3 marks) Show how we can use this matrix to compute the co- variance matrix of (X0, X1, X2, X3) as being 1φφ2 φ3 φ φ2+1 φ3+φ φ4+φ2 Σ=234253 φ φ +φ φ +φ +1 φ +φ +φ φ3 φ4+φ2 φ5+φ3+φ φ6+φ4+φ2+1 [Hint: Look at the Appendix to Chapter 2 in the Notes. Do not explicitly do the algebra] (c) (1 mark) Often, for forecasting or to compute likelihood esti- mates, we need to invert Σ. Is there an advantage to inverting M first? 8. (5 marks) Consider two special cases (a) φ = 1, (b) φ = −1. (a) (3 marks) What is the numerical value of the variances of X0 and X3 in both cases? What is the correlation between X0 and X3 in both cases? [Hint use R to compute the numerical answer] 4 (b) (2 marks) Instead of a finite number of observations we might consider the infinite sequence Xt, Xt−1, Xt−2, . . . Can you hypothesise what the variance of Xt might be and under what conditions it would be finite? Set 4: The recent paper ‘Forecasting the 2013-2014 Influenza Season Using Wikipedia’ https://journals.plos.org/ploscompbiol/article?id=10.1371/journal. pcbi.1004239 looks at forecasting flu rates. We can also use the publicly available Center for Disease Control and Prevention (CDC) data here. The data Assignment1DataFlu.csv can be downloaded from the Learn site and loaded into R using ILI.file <- read.csv(file= "Assignment1DataFlu.csv", header=TRUE) ILI <- ILI.file[, 2] ILI <- ts(ILI, start=2008+39/52, frequency = 52) Holt−Winters filtering 2009 2010 2011 2012 2013 2014 2015 Time Figure 1: Using exponential smoothing on the Flu data 9. (5 marks) (a) (1 mark) What does the abstract say about the reason for trying to understand influenza dynamics? (b) (2 marks) What does the abstract say about the reason for using access logs from Wikipedia in their model? (c) (2 marks) Write down the Model (1) they use, carefully defining terms. 5 Observed / Fitted −4 02468 10. (5points)Figure1comesfromapplyingthesimpleexponentialsmooth- ing algorithm to the flu data and using the predict.HoltWinters function to construct a 95%-prediction interval for one year ahead. Reproduce Fig. 1 and submit the annotated code which you used to produced it. 11. (5 points) The Holt-Winter algorithm uses tuning parameters which R computes automatically and saves in the function output. (a) (3 marks) What value for the tuning parameter α did R give you when reproducing the Figure? What is your interpretation of what this means the method is doing? (b) (2 marks) Since the exponential smoothing method makes few assumptions on the probability structure, prediction intervals for exponential smoothing can only be approximate. R uses a pre- diction standard error related to σ1+α2(h−1) see https://otexts.com/fpp2/ets-forecasting.html. What comments would you make about the quality of prediction interval in Fig. 1 ? 12. (5 points) We can try a method similar to the paper. Define y <- ILI[-261] x <- ILI[-1] and then use linear regression to explain the variation of y by x. (a) (2 marks) How well does the simple regression model – without the wikipedia variates – do in explaining the variability? (b) (3 marks) Describe briefly if, in your opinion, the decompose method is a suitable method for explaining the patterns in the data? 6