程序代写代做代考 go chain algorithm graph C Bayesian Hidden Markov Mode finance 7

7
The Kalman Filter
7.1 Introduction
In the two chapters we take a brief look at a number of other topics that are useful in time series modelling. We will not be looking at these in great depth, rather give pointers to where you can find ideas if you are interested.
One unifying theme is the idea of a state space, which explores the idea that the observed data does not, by itself, define the underlying time series structure. However, perhaps because of some theoretical reasons, there is an underlying and hidden structure.
7.1.1 State space model
Definition 7.1.1. (State space model) If {Yt} is an (observed) time series with (unobserved) state process {◊t}. In a state space model the dependence between these is defined by the graphical model shown in Fig. The re-
θ0 θ1 θ2 … θt−1 θt
Y1 Y 2 Yt − 1 Y t
Figure 7.1: State space structure
lationship defines the conditional independence relations: (i) ◊t is a Markov chain (ii) conditionally on ◊t for t = 0, 1, . . . , t then Yt are independent so that the joint distribution of (Y1,…,Yt)|(◊1,…,◊t) is
State space models are also called hidden Markov models. 133
Ÿt
f(yt|◊t). i=1
8.1

134 7. THE KALMAN FILTER
When assuming a state space structure, were we have an (unobserved) state variable at time s and a set of observations y1, . . . , yt. We can subdivide problems by:
1. filtering is the case where (s=t),
2. state prediction is the case when s > t and 3. smoothing is the case s < t. 7.2 State space representations of ARIMA models The ARIMA models we studied in Chapter 4 can be written in a state space form and then the ecient prediction methods of the Kalman filter can be used directly. In fact the predict.Arima( ), predict( ) and arima( ) functions all use this method. Example 7.2.1. (State space representation of AR(1) process) We can write a AR(1) model where Zt ≥ N(0,‡2) as a state space model trivially as Xt = „Xt≠1 + Zt Yt = Xt, Xt = „Xt+Zt. Example 7.2.2. (State space representation of AR(2) process) We can write a AR(2) model where Zt ≥ N(0,‡2) as a state space model by having a two dimensional Xt = „1Xt≠1 + „2Xt≠2 + Zt state variable Xt = (Xt, Xt≠1)T and Yt = 11 02Xt, Xt =1„1 „22Xt≠1+Zt Definition 7.2.3. (makeARIMA( ) function) In R if you want the state space representation of an ARIMA(p, d, q) model we can use makeARIMA( ) func- tion. This will output a state space model with the notation y = Za+÷,÷≥N(0,h) a = Ta+Re,e≥N(0,Q) it can then be used, for example in the KalmanLike( ) function to compute a likelihood. 7.3. HISTORY 135 7.3 History We now look at the Kalman filter and its computational properties. The filter, taking its name from the paper (1960), had it origins in engi- neering control problems and found early applications in controlling space- craft in the Apollo program. The relationship between forecasting and con- trol is a strong one – indeed the point of many forecasting exercises is to ask the ‘what if?’ question to see if a system needs to be changed in some way. These types of questions about engineering control also motivated many of the ideas of Box and Jenkins and their book, (1976), is titled Time series analysis: forecasting and control, showing the importance of the link in their opinion. In many of these applications the speed of computation was one of the most important features. With real time control problems you typically need to be able to compute predictions faster than you are gathering data. Think about controlling a moon-landing for example. This concern with speed is reflected in the use of many recursive computation methods. Although com- putational power has increased exponentially since the development of these ideas there are still many current applications where speed of computation is critical. The so-called high-frequency trading in finance is one such case. We note that it is also an ecient, recursive, algorithm for computing the likelihood function for arima functions, recalling that the key computational issues involves calculating the determinant and inverse – or eigen values – of a large matrix. 7.4 The Kalman filter Let us consider the case where we are interested in estimating the hidden state and compute it, and the associated uncertainty, in a time which is faster than the rate that new data arrives. With real time control problems you typically need to be able to compute predictions faster than you are gathering data. If you are computing even a small amount slower because new information keeps coming in you will eventually fall further and further behind. One problem with the literature about the Kalman filter is that there are many dierent forms of the notation that are used for setting up the problem. Here I will use that found in Time Series Analysis and its Applications by Shumway and Stoer1 We can now define the notation and structure 1This version of the book can be downloaded from Stoer’s site which you can download when on campus using Waterloo’s SpringerLink agreement. The site also has lots of good R code and libraries Kalman pitt.edu/stoffer/tsa4/ Box and Jenkins https://www.stat. 136 7. THE KALMAN FILTER Definition 7.4.1. (State space model) Consider a p-dimensional state vari- able xt which satisfies the state equation xt = xt≠1 + ut + wt (7.1) where u are a set of control or input variables and w ≥ MVN (0,Q) i.i.d ttp The observed part of the system is defined via the observation equation yt =Axt +ut +vt (7.2) where y is q-dimensional and v ≥ MVN (0,R) and the two noise terms, tq w, v are independent of one another i.i.d We treat the simplest case where the matrices , , A, and variances Q,R in (7.1) and (7.2) are considered known, while the state variable xt is the object of interest and is unobserved. We think of the observation variable, yt, as a transformed noisy version of the state variable which we do observe. We are trying to smooth, filter or forecast the state variable based on the observation variable and do this in real time. We also may want to use ut to control the hidden state variable, again in real time Since Equation (7.1) is recursive we require an initial condition x0 which is MVNp(0,0). This is assumed to be independent of the noise terms and again, in the simplest example, we will consider 0 known The Kalman filter was originally introduced as a method primarily for use in aerospace-related research. These include landing a space craft on the moon and radar tracking problem. The essential feature of these problem is that we need real time analysis There are now many other applications for example in economics, engineering and medicine Observed Position 0.0 0.2 0.4 0.6 Time 0.8 1.0 0.0 0.2 Observed Position 0.4 0.6 0.8 1.0 Time Observed height 0.0 0.5 1.0 1.5 Observed height 0.0 0.5 1.0 1.5 Figure 7.1: makes a one step ahead predictions, as the next new noisy information (red dots) comes in the KF almost immediately updates this estimate of the (now) current value (blue). : The ball tracking example and the Kalman-Filter: The KF 7.4. THE KALMAN FILTER 137 Example 7.4.2. Ball tracking example It is common to see, when watch- ing sports like baseball, cricket and tennis ball-tracking algorithms. These can be used to make decisions: in tennis if a ball was on the line or out; in cricket decide on a LBW decision. Also they are used to report statistics such as the distance a home run was hit in baseball. The tracking comes from camera information, and the underlying Physics, and have to be done in real time, or very close to it. We will explore a simplified version of one of these algorithms were we ignore everything but gravity and initial conditions Suppose we have a body falling and its height, at time t – which to start with is continuous – is z(t). This is the state and is what we want to know. We do know that d2z (t) = ≠g. dt2 and know initial conditions at t z(t ) and d2z (t ) 00d·20 2 Solving the dierential equation by integrating twice gives z(t0) + dz(t0)(t ≠ t0) ≠ g(t ≠ t0)2, dt 2 zt := z(‘t), z ̇t := dz (‘t) d· Let us therefore define the state vector xt := (zt, z ̇t)T and so, the physics has given us the state equation xt+1=A1 ‘Bxt≠A0.5‘2 Bg 01‘ The discretisation error in the above calculation can be thought of as an error term giving the state equation xt+1=A1 ‘Bxt≠A0.5‘2 Bg+wt 01‘ z(t) = dz(t) = dz(t0) ≠ g(t ≠ t0) dt dt We work in discrete time for t = 0, 1, · · · with each time increment being ‘ seconds. Thus we have, from z t + 1 = z t + ‘ z ̇ t ≠ ‘ 2 g , dt 2 dz(t0) = dz(t0)≠g(t≠t0) 2 z ̇ t + 1 = z ̇ t ≠ g ‘ 2We check this is correct by looking at Initial conditions: z(t0) = z(t0)+ dz(t0)(t0 ≠t0)≠ g(t0 ≠t0)2 =z(t0), which is correct. dt dt 138 7. THE KALMAN FILTER What we observe is a (noisy) measurement of xt. yt = zt+vt=11 02xt+vt. The distributions of wt and vt are found by calibration experiments with, in the example shown in Fig. 7.1. i.i.d. A A 2◊10≠5 0 BB wt ≥ N (0,0)T, 0 10≠2 T i.i.d. v ≥ N(0,10≠2) and ‘ = 2 ◊ 10≠3. In the ball tracking example the laws of physics and calibration exper- iments give us information which is clearly highly informative and can be used for forecasting, filtering and control. In this case the model is com- ing from known Physics and not an empirical model selection process like Box-Jenkins. Further, as we shall see, the Kalman-Filter: behaves like exponential smoothing or Holt-Winters since it uses an error correcting approach. How- ever it also updates its estimate of uncertainty in real time and it also estimates the full state variable even though it is of larger dimension than the observation. 7.5 Algorithm The algorithm at each time period calculates a prediction which is then corrected as soon as new information become available and the design is such that each step is a low dimensional calculation: i.e. there are no large matrix inversions, rather lots of small ones. The algorithm keep track, and updates, the forecast uncertainty. Let us ment system denoted by = Axt + vt. The terms , A come from the Physics and understanding the measure- define to be the best mean square estimate of xt based on Ys := (y1, . . . , ys) and xts :=E(xt|y1,...ys) Ps :=E1(x ≠xs )(x ≠xs )T2 t1,t2 t1 t1 t2 t2 with this being Pts when t1 = t2 = t 7.6. THE KALMAN FILTER AND BAYESIAN UPDATING 139 Definition 7.5.1. (Kalman filter) For the state space model defined by (7.3) (7.4) (7.5) (7.6) (7.7) (7.1) and (7.2) we have from the conditional mean xt≠1 = xt≠1 + u t t≠1 t This is the prediction step. It has a mse which satisfies: Pt≠1 =Pt≠1T +Q t t≠1 We can then also predict the next observed value by yt≠1 = A xt≠1 + u The correction step or update is defined by xt = xt≠1 + K 1y ≠ A xt≠1 ≠ u 2 tttt ttttttt It has a MSE/Posterior variance which satisfies: P t = [I ≠ K A ] P t≠1 t ttt where the term Kt is called the Kalman gain and is defined by K :=Pt≠1AT ËAPt≠1AT +RÈ≠1 ttttt For a compete proof of the structure of the Kalman filter see Chapter 6 of Shumway and Stoer. Here we will not go into all the details but will go through the key steps in the formulation by using a Bayesian formulation. In R we can implement the algorithm using the library astsa and the Kfilter1 function. 7.6 The Kalman Filter and Bayesian Updating One of the elegant aspects of Bayesian inference is the way that Bayes theo- rem tells us exactly how to up date our beliefs as we get new data. We note that the value of the Kalman Filter is the clever way it up-dates using the prediction/correction method. In this section we explain that, in fact, these are the same thing. Lets start with is a simplified Bayesian example for the case y ≥ N (◊, 1) If the prior distribution for ◊ was in the Normal family i.e. ◊ ≥ N(◊; μprior, ‡p2rior) à exp C≠(◊ ≠ μprior)2 D , 2‡p2rior then the posterior, after seeing y, is P r(◊|y) à Lik(◊) ◊ prior(◊) Q 1μprior + y‡p2rior2 ‡p2rior R = N a 11 + ‡p2rior2 , 11 + ‡p2rior2b 140 7. THE KALMAN FILTER This has defined the iterative update rules for the posterior mean 1μprior + y‡p2rior2 more data more data E(Y ) μprior æ μposterior := 11 + ‡p2rior2 æ · · · Recall we have the Kalman Filter state space structure xt = xt≠1 + ut + wt yt = Atxt+ut+vt æ where u are a set of control or input variables and w ≥ MVN (0, Q), and ttp y is q-dimensional and v ≥ MVN (0, R) and the two noise terms, w, v are tq i.i.d independent of one another. We want to get insight of how the structure of Equations (7.8, 7.9) lead to the prediction-correction iterative cycle of the Kalman filter and it’s error correction structure i.i.d (7.8) (7.9) We will highly simplify notation to get to the essentials. First we drop the control terms ut , ut , and work in one-dimension case. This is not needed but it makes the story simpler. Our estimates will be conditional expectations – i.e posterior means and we work with normality assumptions. Lets define some notation: Define Hs = {ys,ys≠1,···}. Our target is to compute E(Xt|Ht) though an update rule: 1. Predict using E(Xt|Ht≠1) 2. Correct when yt becomes available. Now suppose that conditionally on Ht≠1 we have ‘priors and likelihood’ Xt|Ht≠1 ≥ N(xt≠1, Pt≠1) denote by = N (μt≠1 , Pt≠1 ) (7.10) Yt|xt ≥ N(Axt,R) (7.11) Lets look at the Bayesian interpretation of Equations (7.10-7.11). Equa- tion (7.10) describes how much we know about the state at time t given the observations up to t ≠ 1. It a full distribution having conditional expecta- tion as point estimation and conditional variance to measure uncertainty: its our prior at time t ≠ 1. Equation (7.11) describes the uncertainty in the observation given we know the state at time t. We are going to combine them to get the posterior which tells us what we know at time t. One key assumption is that of Normality here. We will see that if Equa- tion (7.10) is Normal at stage t ≠ 1 the corresponding term will also be 7.6. THE KALMAN FILTER AND BAYESIAN UPDATING 141 normal at stage t. This assumption can be analysed further but if t is large it is usually reasonable to assume normality We can then think of Equation (7.10) as being a prior for Xt, and Equa- tion (7.11) as the likelihood. We are updating prior mean and prior variance to posterior mean and posterior variance. In terms of the Bayesian setting and notation we have C 2 D Lik(x) = P(y|x)=Ô1 exp ≠(yt≠Axt) 2fiR 2R Prior(x) = Ô 1 expC≠(xt≠μt≠1)2D ttt t 2fiPt≠1 2Pt≠1 P(x |y ) à expC≠(yt ≠Axt)2DexpC≠(xt ≠μt≠1)2D So get posterior tt 2R 2Pt≠1 à expI≠1CAA2 + 1 Bx2≠23μt≠1 +ytA4xDJ 2 R Pt≠1 t Pt≠1 R t := N(μt,Pt) So have updated our knowledge N(μt≠1, Pt≠1) æy N(μt, Pt) Direct calculation gives the posterior variance and mean as P=AA2+1B≠1= RPt≠1 t R Pt≠1 A2Pt≠1 + R 1μt≠1 +ytA2 μt = 1Pt≠1 R2 =μt≠1+Kt(y≠Aμt≠1) A2 + 1 R Pt≠1 K := APt≠1 t A2Pt≠1 + R So have have the update rule which takes the form μt≠1 æ μt := μt≠1 + Kt(y ≠ Aμt≠1) = μt≠1 + Kt ◊ (error in prediction of y) Pt≠1 æ Pt := (I ≠ KtA)Pt≠1 This form is the one dimensional version of the general case. It shows directly the error correcting form. It defines K as the sample dependent - and therefore non-constant – version of the training parameter in exponential smoothing. The generalisation to the general matrix form is the same but and is simply (long) algebra by defining i.e. the gain. 142 7. THE KALMAN FILTER