STAT 443: Forecasting Fall 2020 Project: Part II
Introduction
The project for STAT 443 for Fall 2020 will be based on the theme of eval- uating and comparing forecasting methods in a specific context. It is due on 5:00pm December 7th via Crowdmark. Unlike the assignments this is an individual piece of work.
The project will be released in two parts to match our progress through the course. This document is Part II which is worth 60% of the final project mark. This involves a simulation study that compares the Holt-Winters method and the Kalman filter in a specific context: a ball tracking algorithm.
You will need to download data from the Learn site. Since this is indi- vidual work there is some variability in the data between students. You will need the following files:
ProjectIIPilotObserved*.csv,
ProjectIIPilot*.csv and
ProjectIIData*.csv
where you replace * with the last digit of your UW student number. i.e. if your student number is 20233110 you download ProjectIIData0.csv, · · · . You can load into R with the read.csv function.
Part II
In the notes we looked at a simple differential equation model that was used by the Kalman Filter (KF) to undertake real time filtering of a state variable. This was motivated by the idea of building a ball tracking algorithm which can be used in real time. In the example in the notes the differential equation model just looked at the height of the ball and assumed only the force of gravity played a role. Here we build a more complex model. Assume the
1
flight of the ball lies in a horizontal and vertical plane (v,w). The state variable is
(vt,v ̇t,wt,w ̇t)
a discrete time vector showing both horizontal and vertical position and speed. In the underlying physical model we assume gravity and air resistance are important and air resistance is proportional to the velocity. This would result in the set of equation
Md2v(t)−Rdv(t) = 0 (1) dt2 dt
Md2w(t)−Rdw(t)+Mg = 0 (2) dt2 dt
where M is the mass of the ball, g the gravitational constant and K a
constant depending on the size of the ball. There are initial conditions on
v(0), dv (0), w(0), dw (0). dt dt
We can turn these into discrete recurrence equations:
vt+1 =
v ̇t+1 = wt+1 = w ̇t+1=
vt+εv ̇t+O(ε2) (3) (1−Rε)v ̇t+O(ε2) (4)
There is a small error, O(ε2),
time. We model this error as
mean zero and covariance matrix Q.
M
wt + εw ̇t + O(ε2) (5)
(1 − Rε)w ̇t − gε + O(ε2) (6) M
due to moving from continuous time to discrete a four dimensional normal random vector with
We do not observe the state directly but have noisy measurements of the ball’s position calculated from a number of television cameras. In discrete time the position is recorded as (xt,yt) where
(x,y)T ∼(v,w)T +N (0,0)T,R (7) tttt2
Question Set 3
12. (5 marks) Show by taking limits as ε → 0 that Equation (1) is the limit of Equation (4)
[You can assume limε→0 O(ε2) = 0] ε
2
13. (5 marks) Use the notation from Definition 7.4.1 in the notes (chap- ter7update.pdf) for the state space model, write down the state and observation equations in terms of Equations (3 – 7) above.
14. (5 marks) As part of the calibration process a pilot study was done. You can download some of the data from Learn.
The file ProjectIIPilot*.csv contains details of a numerical solu- tion to Equations (1 – 2) above. In particular it has time, t, the corresponding horizontal and vertical distances v(t),w(t). The file ProjectIIPilotObserved*.csv has the observation of the same quan- tities at a sample of time points, i.e. xt and yt.
Using graphical methods investigate the behaviour of v(t) and w(t) and give a simple qualitative explanation of what you have discovered.
15. (5 marks) Figure 1 shows a result, also from the pilot study, of fitting part of the state using a Kalman Filter. The red dots are the observed values, the blue curve the fitted line and the black curve is the true state v(t).
Explain in real word terms why the line reaches a constant and discuss how well we would might expect to estimate this constant value.
Flight of Ball
Horizontal distance
0 20 40 60 80 100 120
0 5 10 15
Time
Figure 1: Observed process and KF fit
16. (5 marks) The {xt} data that you downloaded can be thought of as a time series. Show how to use the Box-Jenkins methodology to model
3
the first 60 points in this time series. You should show the model identification, estimation and goodness-of-fit steps clearly.
17. (5 marks) Using your fitted model, predict the remaining 90 time points –and quantify the uncertainty of prediction – showing your results graphically.
[Submit both annotated code and the figure it generates]
18. (5 marks) Comment on the quality of the Box-Jenkins forecast.
19. (5 marks) Using the {xt} data set again, use the two parameter Holt- Winters (HW) method to smooth the data. Graphically compare the result to the corresponding state variable and report the fitted values of α and β.
[Hint: you should not have a seasonal component in you model]
Question Set 4
We want to compare the behaviour of the KF and HW in terms of filtering and we undertook a simulation study designed according to the principles of the paper by Morris et al studied in Chapter 5.
(i) Aims: To investigate the accuracy of the two algorithms and to see how each depends on choice of initial conditions and the size of the error R.
(ii) Data Generation mechanisms: We solve numerically the differential equations then add noise to a subsample.
For the experimental conditions we have three level of measurement error (R), Low, Medium and High, and three different angles for the initial angle of the velocity: Low, Medium and High.
The Monte Carlo experiment has a full factorial design.
(iii) Estimate: We compare the state variable vt — at the time when the ball returns to wt = 0 – with the filtered values from both algorithms
and record the error of each.
(iv) Methods: We comparing the Kalman Filter and the two parameter
Holt-Winters.
(v) Performance measures: We record the size of the errors across a num-
ber of repeats of the random (Monte Carlo) experiment each for a
range of experimental conditions.
(vi) Coding: This has implemented using similar techniques as in the notes.
4
(vii) Analysis: [See Question 19]
Data has been stored in ProjectIIData*.csv, with Initial.Angle and Measurement.Error being the experimental conditions and each set of experiment conditions is repeated 50 times.
The response variables are KF and HW being the errors in estimating
the state for each of the methods (viii) Reporting: [See Question 20]
20. (20 marks) From the output of the Monte Carlo experiment, show how you conducted an exploratory data analysis and undertook a graphical exploration of the results. In particular you are trying to answer the questions defined in the Aims of the experiment discussed above.
[You are limited to handing in two sides. You will be evaluated both on the data analysis, but also the clarity of your writing]
21. (30 marks) Write a 500 word report, targeted at a final year Waterloo statistics student who has not taken this class, discussing the similar- ities and differences between the ways that the two methods try and reconstruct the state process in this context and what you learned from the Monte Carlo experiment.
[You will be evaluated on the clarity of your writing and explaining the important issues]
5