MFIN 290 Application of Machine Learning in Finance: Lecture 7
MFIN 290 Application of Machine
Learning in Finance: Lecture 7
Edward Sheng
8/7/2021
Agenda
Basic time series analysis, part I
Advanced time series analysis – State-Space Model and Kalman Filter
1
2
3
2
Basic time series analysis, part II
Section 1: Basic Time Series
Analysis, Part I
3
What is time series data
Time series is a collection of data points that evolves over time
Time dependent and serially correlated: the basic assumption of OLS that observations
are independent does not hold
Seasonality: variations of data that is specific to a particular time frame; e.g., higher sales
of woolen jacket in winter season
Examples
Stock price, real estate price
GDP, interest rate
Seasonal sales
Time series is the most important data type in finance and investment
4
Stationarity
A Time Series {rt} is said to be strictly stationary if the joint distribution of
{rt1 , . . . , rtk} is identical to that of {rt1+t, . . . , rtk+t} for all t, where k is an arbitrary
positive integer and (t1, t2, . . . , tk) is a collection of positive integers
𝐹𝐹 𝑟𝑟𝑡𝑡1 , … , 𝑟𝑟𝑡𝑡𝑘𝑘 = 𝐹𝐹 𝑟𝑟𝑡𝑡1+𝑡𝑡 , … , 𝑟𝑟𝑡𝑡𝑘𝑘+𝑡𝑡
The properties of the series are invariant to time shifts
This is a strong condition and hard to verify
5
Weak stationarity
A Time Series {rt} is said to be weakly stationary if both the mean and the
covariance are time-invariant
More specifically, {rt} is weakly stationary if E(rt) = μ and Cov(rt , rt−l) = γl, which
only depends on l (number of lags)
Constant and finite first and second moments implies weak stationarity
Weak stationarity does NOT imply stationarity, except for normally distributed
variables
6
Weak stationarity
Mean of the series should be constant and not a function of time
7
Weak stationarity
Variance of the series should be constant (homoscedasticity) and not a
function of time or variable levels (heteroscedasticity)
8
Weak stationarity
Covariance of the tth term and the t-lth term should not be a function of time
9
Why care about stationarity
Most time-series models rely on the assumption of stationary
Many traditional statistical theories do not apply to nonstationary process, such
as law of large number and central limit theory
Non-stationary data may contains
Trend
Seasonality
Convert non-stationary to stationary
Differencing: taking the difference with a particular time lag
Decomposition: modeling both trend and seasonality and removing them from data
10
Autocorrelation
The autocorrelation coefficient for {rt} is defined as
𝜌𝜌𝑙𝑙 =
Cov 𝑟𝑟𝑡𝑡 , 𝑟𝑟𝑡𝑡−𝑙𝑙
Var 𝑟𝑟𝑡𝑡 Var 𝑟𝑟𝑡𝑡−𝑙𝑙
=
Cov 𝑟𝑟𝑡𝑡 , 𝑟𝑟𝑡𝑡−𝑙𝑙
Var 𝑟𝑟𝑡𝑡
=
𝛾𝛾𝑙𝑙
𝛾𝛾0
ρ1, … , ρl, are 1st, …, lth autocorrelation with 1, …, l lag
A weak stationary series is not serially correlated if ρl = 0 for all l
11
Autocorrelation in stock returns?
Maybe a little
12
Monthly log returns (VW-CRSP Index) 1925-2009
Autocorrelation in real estate returns?
13
Monthly log house price changes in AZ (Case-Shiller Index) 1987-2010
Autocorrelation function (ACF)
The sample autocorrelation function (ACF) of a time series is defined as �𝜌𝜌1,
�𝜌𝜌2, …, �𝜌𝜌𝑙𝑙
14
ACF for monthly log return on VW-CRSP Index. Two standard error
bands around zero. 1926-2009
Autocorrelation function (ACF)
15
ACF for monthly log house price changes. Two standard error
bands around zero. 1987-2010
How to test autocorrelation – Ljung-Box test
Ljung-Box test is a statistical test with null hypothesis that data are
independently distributed with no autocorrelation
𝐻𝐻0: 𝜌𝜌1 = ⋯ = 𝜌𝜌𝑚𝑚 = 0
Test statistics
𝑄𝑄 𝑚𝑚 = 𝑇𝑇 𝑇𝑇 + 2 �
𝑙𝑙=1
𝑚𝑚
�𝜌𝜌𝑙𝑙
2
𝑇𝑇 − 𝑙𝑙
Q(m) is asymptotically 𝜒𝜒2 with m degree of freedom
Reject null if Q(m) > 𝜒𝜒2 𝛼𝛼 where 𝜒𝜒2 𝛼𝛼 denotes (1 – α) × 100-th percentile
16
White noise
A time series {rt} is white noise if {rt} is a sequence of independent and
identically distributed (i.i.d) variables
If {rt} is also normally distributed, it is called Gaussian white noise
White noise is stationary and have 0 autocorrelations in ACF
17
Random walk
A time series {pt} follows a random walk with normally distributed increments
when
𝑝𝑝𝑡𝑡 = 𝑝𝑝𝑡𝑡−1 + 𝑎𝑎𝑡𝑡 ,𝑎𝑎𝑡𝑡 IID ∈ 𝑁𝑁 0,𝜎𝜎𝑎𝑎2
Changes of pt are white noise
18
Markov property
A stochastic process has Markov property if conditional probability distribution
of future states depends only upon the present state, not on anything precedes
present state
No memory on history except present state
Random walk has Markov property
19
Martingale property
Martingale is a sequence of random variables for which, at a particular time,
the conditional expectation of next value, given all prior values, is equal to the
present value
𝐸𝐸 𝑝𝑝𝑡𝑡+1 𝑝𝑝𝑡𝑡 = 𝑝𝑝𝑡𝑡
Martingale is a “fair game”: knowledge of the past will be of no use in
predicting future winnings
Random walk has Martingale property
20
Random walk – non-stationary
𝑝𝑝𝑡𝑡 = 𝑝𝑝𝑡𝑡−1 + 𝑎𝑎𝑡𝑡 ,𝑎𝑎𝑡𝑡 IID ∈ 𝑁𝑁 0,𝜎𝜎𝑎𝑎2
Mean
𝐸𝐸 𝑝𝑝𝑡𝑡 = 𝐸𝐸 𝑝𝑝0 + 𝑎𝑎1 + ⋯+ 𝑎𝑎𝑡𝑡 = 𝐸𝐸 𝑝𝑝0
Variance
Var 𝑝𝑝𝑡𝑡 = Var 𝑝𝑝0 + 𝑎𝑎1 + ⋯+ 𝑎𝑎𝑡𝑡 = 𝑡𝑡𝜎𝜎𝑎𝑎2
Random walk has constant mean but increasing variance (also increasing
covariance); it is not a stationary process
Random walk has strong autocorrelation
21
No forecastability
Random walk has no forecastability; the best forecast is the current value (not
the unconditional mean)
1-step ahead forecast
�𝑝𝑝𝑡𝑡 1 = 𝐸𝐸 𝑝𝑝𝑡𝑡+1 𝑝𝑝𝑡𝑡,𝑝𝑝𝑡𝑡−1, … ,𝑝𝑝0 = 𝑝𝑝𝑡𝑡
n-step ahead forecast
�𝑝𝑝𝑡𝑡 𝑛𝑛 = 𝐸𝐸 𝑝𝑝𝑡𝑡+𝑛𝑛 𝑝𝑝𝑡𝑡,𝑝𝑝𝑡𝑡−1, … ,𝑝𝑝0 = 𝑝𝑝𝑡𝑡
However, forecasting uncertainty increases with the horizon
Var �𝑝𝑝𝑡𝑡 𝑛𝑛 = 𝑛𝑛𝜎𝜎𝑎𝑎2
Forecast uncertainty diverges to ∞ as n → ∞
Random walk will be further discussed in unit root later
22
Autoregression (AR) model
Assume that immediately lagged values of a series contains useful information
in predicting future values
Consider a model
𝑟𝑟𝑡𝑡 = 𝜙𝜙0 + 𝜙𝜙1𝑟𝑟𝑡𝑡−1 + 𝑎𝑎𝑡𝑡
where 𝑎𝑎𝑡𝑡 IID ∈ 𝑁𝑁 0,𝜎𝜎𝑎𝑎2 is a white noise
{at} represents the “news”
𝑎𝑎𝑡𝑡 = 𝑟𝑟𝑡𝑡 − 𝐸𝐸𝑡𝑡−1 𝑟𝑟𝑡𝑡
at is what you know about the process at t but not at t – 1
This model is referred as AR(1)
23
AR(1) – properties
𝑟𝑟𝑡𝑡+1 = 𝜙𝜙0 + 𝜙𝜙1𝑟𝑟𝑡𝑡 + 𝑎𝑎𝑡𝑡+1
Conditional mean and conditional variance
𝐸𝐸 𝑟𝑟𝑡𝑡+1 𝑟𝑟𝑡𝑡 = 𝜙𝜙0 + 𝜙𝜙1𝑟𝑟𝑡𝑡
Var 𝑟𝑟𝑡𝑡+1 𝑟𝑟𝑡𝑡 = Var 𝑎𝑎𝑡𝑡+1 = 𝜎𝜎𝑎𝑎2
Forecast of rt+1 only related to rt but not any value prior
The rest of history from rt-1 to r0 is irrelevant
AR(1) has Markov property; it is obviously restrictive
24
AR(1) – unconditional mean
Assume that the series is weakly stationary
Unconditional mean
𝐸𝐸 𝑟𝑟𝑡𝑡+1 = 𝜙𝜙0 + 𝜙𝜙1𝐸𝐸 𝑟𝑟𝑡𝑡
Use stationarity: 𝐸𝐸 𝑟𝑟𝑡𝑡+1 = 𝐸𝐸 𝑟𝑟𝑡𝑡 = 𝜇𝜇
𝜇𝜇 = 𝜙𝜙0 + 𝜙𝜙1𝜇𝜇
𝜇𝜇 =
𝜙𝜙0
1 − 𝜙𝜙1
Implication
Mean exists if and only if 𝜙𝜙1 ≠ 1
Mean is 0 if and only if 𝜙𝜙0 = 0
If 𝜙𝜙1 = 1, random walk, also called unit root (explain later)
25
AR(1) – unconditional variance
𝑟𝑟𝑡𝑡+1 = 𝜙𝜙0 + 𝜙𝜙1𝑟𝑟𝑡𝑡 + 𝑎𝑎𝑡𝑡+1
Take variance and note that 𝑬𝑬 𝒂𝒂𝒕𝒕+𝟏𝟏𝒓𝒓𝒕𝒕 = 𝟎𝟎, as 𝒂𝒂𝒕𝒕+𝟏𝟏 does not depend on any past
information
Var 𝑟𝑟𝑡𝑡+1 = 𝜙𝜙1
2Var 𝑟𝑟𝑡𝑡 + 𝜎𝜎𝑎𝑎2
Use stationarity: Var 𝑟𝑟𝑡𝑡+1 = Var 𝑟𝑟𝑡𝑡
Var 𝑟𝑟𝑡𝑡+1 =
𝜎𝜎𝑎𝑎2
1 − 𝜙𝜙1
2
For variance to be positive and bounded, 𝜙𝜙12 must be less than 1, so weak stationarity
implies
−1 < 𝜙𝜙1 < 1 26 AR(1) – autocorrelation It implies that ACF of AR(1) 𝜌𝜌𝑙𝑙 = 𝜙𝜙1𝜌𝜌𝑙𝑙−1, 𝑙𝑙 ≥ 0 and ρ0 = 1 Combing two equations implies that 𝜌𝜌𝑙𝑙 = 𝜙𝜙1 𝑙𝑙𝜌𝜌0 = 𝜙𝜙1 𝑙𝑙 Autocorrelation decays exponentially at a rate of Φ1 27 AR(1) – ACF 28 ACF for AR(1). Left panel Φ1 = 0.8. Right panel Φ1 = -0.8 AR(1) – unit root Φ1 = 0, no autocorrelation, stationary, white noise Φ1 = 0.5, weak autocorrelation, stationary 29 AR(1) – unit root Φ1 = 0.9, strong autocorrelation, stationary Φ1 = 1, unit root, strong autocorrelation, non-stationary, random walk 30 AR(2) 𝑟𝑟𝑡𝑡 = 𝜙𝜙0 + 𝜙𝜙1𝑟𝑟𝑡𝑡−1 + 𝜙𝜙2𝑟𝑟𝑡𝑡−2 + 𝑎𝑎𝑡𝑡 where 𝑎𝑎𝑡𝑡 IID ∈ 𝑁𝑁 0,𝜎𝜎𝑎𝑎2 is a white noise Unconditional mean 𝐸𝐸 𝑟𝑟𝑡𝑡 = 𝜇𝜇 = 𝜙𝜙0 1 − 𝜙𝜙1 − 𝜙𝜙2 provided that 𝜙𝜙1 + 𝜙𝜙2 ≠ 1 ACF 𝜌𝜌1 = 𝜙𝜙1 1 − 𝜙𝜙2 𝜌𝜌𝑙𝑙 = 𝜙𝜙1𝜌𝜌𝑙𝑙−1 + 𝜙𝜙2𝜌𝜌𝑙𝑙−2, 𝑙𝑙 ≥ 2 31 AR(2) – ACF 32 ACF for AR(2) Characteristic roots 1 − 𝜙𝜙1𝐵𝐵 − 𝜙𝜙2𝐵𝐵2 𝜌𝜌𝑙𝑙 = 0 where B is back-shift operator, 𝐵𝐵𝜌𝜌𝑙𝑙 = 𝜌𝜌𝑙𝑙−1 Characteristic equation for AR(2) and solution 1 − 𝜙𝜙1𝑥𝑥 − 𝜙𝜙2𝑥𝑥2 = 0 𝑥𝑥1, 𝑥𝑥2 = 𝜙𝜙1 ± 𝜙𝜙1 2 + 4𝜙𝜙2 −2𝜙𝜙2 Characteristic roots 𝜔𝜔1 = 1 𝑥𝑥1 ,𝜔𝜔2 = 1 𝑥𝑥2 Process is stationary if modulus of roots 𝜔𝜔1, 𝜔𝜔2 (for real number, absolute value) are all less than 1 (ACF converge to 0) For AR(1), 𝜔𝜔 = 1 𝑥𝑥 = 𝜙𝜙1 < 1 → −1 < 𝜙𝜙1 < 1 for stationary 33 Unit root If any one root has modulus (for real root, absolute value) equals to 1, it is a unit root series Unit root series is nonstationary time series, best know example is random walk (with or without drift) Unit root series has strong and persistent autocorrelation Identify unit root/random walk ACF close to 1 for small lags ACF decays very slowly 34 Unit root – Dickey-Fuller test 𝑝𝑝𝑡𝑡 = 𝜙𝜙1𝑝𝑝𝑡𝑡−1 + 𝑒𝑒𝑡𝑡 𝑝𝑝𝑡𝑡 = 𝜙𝜙0 + 𝜙𝜙1𝑝𝑝𝑡𝑡−1 + 𝑒𝑒𝑡𝑡 Dickey-Fuller test: test existence of unit root (nonstationary) H0: 𝜙𝜙1 = 1, H1: 𝜙𝜙1 < 1 DF = 𝑡𝑡 ratio = �𝜙𝜙1 − 1 std �𝜙𝜙1 DF depend on inclusion/exclusion of 𝜙𝜙0. When exclude 𝜙𝜙0 (equation a) DF = ∑𝑡𝑡=1 𝑇𝑇 𝑝𝑝𝑡𝑡−1𝑒𝑒𝑡𝑡 �𝜎𝜎𝑒𝑒 ∑𝑡𝑡=1 𝑇𝑇 𝑝𝑝𝑡𝑡−1 2 �𝜎𝜎𝑒𝑒 = ∑𝑡𝑡=1 𝑇𝑇 𝑝𝑝𝑡𝑡 − �𝜙𝜙1𝑝𝑝𝑡𝑡−1 2 𝑇𝑇 − 1 35 (a) (b) Unit root – Augmented Dickey-Fuller test For more complicated model 𝑝𝑝𝑡𝑡 = 𝑐𝑐𝑡𝑡 + 𝛽𝛽𝑝𝑝𝑡𝑡−1 + � 𝑖𝑖=1 𝑝𝑝 𝜙𝜙𝑖𝑖Δ𝑝𝑝𝑡𝑡−𝑖𝑖 + 𝑒𝑒𝑡𝑡 where 𝑒𝑒𝑡𝑡 is error term and 𝑐𝑐𝑡𝑡 = 𝜔𝜔0 + 𝜔𝜔1𝑡𝑡 Hypothesis test H0: 𝛽𝛽 = 1, H1:𝛽𝛽 < 1 ADF = 𝑡𝑡 ratio = �̂�𝛽 − 1 𝑠𝑠𝑡𝑡𝑠𝑠 �̂�𝛽 ADF also depends on inclusion/exclusion of constant 36 AR(p) Assume other lagged returns also contains useful information in predicting future values 𝑟𝑟𝑡𝑡 = 𝜙𝜙0 + 𝜙𝜙1𝑟𝑟𝑡𝑡−1 + ⋯+ 𝜙𝜙𝑝𝑝𝑟𝑟𝑡𝑡−𝑝𝑝 + 𝑎𝑎𝑡𝑡 where 𝑎𝑎𝑡𝑡 IID ∈ 𝑁𝑁 0,𝜎𝜎𝑎𝑎2 is a white noise Conditional mean and conditional variance 𝐸𝐸 𝑟𝑟𝑡𝑡+1 𝑟𝑟𝑡𝑡 , … , 𝑟𝑟𝑡𝑡−𝑝𝑝+1 = 𝜙𝜙0 + 𝜙𝜙1𝑟𝑟𝑡𝑡 + ⋯+ 𝜙𝜙𝑝𝑝𝑟𝑟𝑡𝑡−𝑝𝑝+1 Var 𝑟𝑟𝑡𝑡+1 𝑟𝑟𝑡𝑡 , … , 𝑟𝑟𝑡𝑡−𝑝𝑝+1 = Var 𝑎𝑎𝑡𝑡+1 = 𝜎𝜎𝑎𝑎2 Moments conditional on rt, …, rt-p+1 are not correlated with any lags prior to p lags For stationary, modulus of all characteristic roots (inverse of solutions of characteristic equation) must be less than 1 37 Select order p How to select order p of AR model Partial autocorrelation function (PACF) Information criteria Check residuals Estimate AR model Conditional least squares Maximum likelihood estimate (MLE) 38 Partial autocorrelation function (PACF) PACF of a stationary series is defined as {Φl,l}, l = 1, …, n 𝑟𝑟𝑡𝑡 = 𝜙𝜙0,1 + 𝜙𝜙1,1𝑟𝑟𝑡𝑡−1 + 𝑎𝑎1𝑡𝑡 𝑟𝑟𝑡𝑡 = 𝜙𝜙0,2 + 𝜙𝜙1,2𝑟𝑟𝑡𝑡−1 + 𝜙𝜙2,2𝑟𝑟𝑡𝑡−2 + 𝑎𝑎2𝑡𝑡 𝑟𝑟𝑡𝑡 = 𝜙𝜙0,3 + 𝜙𝜙1,3𝑟𝑟𝑡𝑡−1 + 𝜙𝜙2,3𝑟𝑟𝑡𝑡−2 + 𝜙𝜙3,3𝑟𝑟𝑡𝑡−3 + 𝑎𝑎3𝑡𝑡 … 𝑟𝑟𝑡𝑡 = 𝜙𝜙0,𝑙𝑙 + 𝜙𝜙1,𝑙𝑙𝑟𝑟𝑡𝑡−1 + 𝜙𝜙2,𝑙𝑙𝑟𝑟𝑡𝑡−2 + ⋯+ 𝜙𝜙𝑙𝑙,𝑙𝑙𝑟𝑟𝑡𝑡−𝑙𝑙 + 𝑎𝑎𝑙𝑙𝑡𝑡 Φp,p shows incremental contribution of rt-p to rt over an AR(p – 1) model 39 PACF vs. ACF Consider an AR(1) process 𝑟𝑟𝑡𝑡 = 𝜙𝜙0 + 𝜙𝜙1𝑟𝑟𝑡𝑡−1 + 𝑎𝑎𝑡𝑡 ACF between rt and rt-2 𝜌𝜌2 = Corr 𝑟𝑟𝑡𝑡 , 𝑟𝑟𝑡𝑡−2 = 𝜙𝜙1 2 > 0
rt and rt-2 are correlated
However, rt and rt-2 are not directly correlated, their correlation is indirectly, since rt is
correlated with rt-1 and rt-1 is correlated with rt-2
PACF considers only direct correlation between rt and rt-2. For this AR(1) process
𝜙𝜙2,2 = 0
For an AR(p) series, PACF cuts off after lag p
40
Information criteria
Akaike information criterion (AIC)
AIC =
−2
𝑇𝑇
ln likelihood +
2
𝑇𝑇
number of parameters
Schwarz-Bayesian information criterion (BIC)
AIC =
−2
𝑇𝑇
ln likelihood +
ln𝑇𝑇
𝑇𝑇
number of parameters
Pick p that has minimum AIC and BIC values
41
Check residuals
A good time series model should capture the pattern of a time series
It is very important to check residuals; residuals should behave as white noise,
as they represent what cannot be captured by the model
Check ACF of residuals; residuals should be uncorrelated (ACF = 0)
Perform Ljung-Box test on residuals (test on autocorrelation)
Residuals should follow a normal distribution with zero mean
42
Check residuals
43
Forecast – 1-step ahead
AR(p)
𝑟𝑟𝑡𝑡+1 = 𝜙𝜙0 + 𝜙𝜙1𝑟𝑟𝑡𝑡 + ⋯+ 𝜙𝜙𝑝𝑝𝑟𝑟𝑡𝑡−𝑝𝑝+1 + 𝑎𝑎𝑡𝑡+1
where 𝑎𝑎𝑡𝑡 IID ∈ 𝑁𝑁 0,𝜎𝜎𝑎𝑎2 is a white noise
1-step-ahead forecast is the same as conditional mean
�̂�𝑟𝑡𝑡 1 = 𝐸𝐸 𝑟𝑟𝑡𝑡+1 ℱ𝑡𝑡 = 𝜙𝜙0 + 𝜙𝜙1𝑟𝑟𝑡𝑡 + ⋯+ 𝜙𝜙𝑝𝑝𝑟𝑟𝑡𝑡−𝑝𝑝+1
1-step-ahead forecast error
𝑒𝑒𝑡𝑡 1 = 𝑟𝑟𝑡𝑡+1 − �̂�𝑟𝑡𝑡 1 = 𝑎𝑎𝑡𝑡+1
Variance of 1-step-ahead forecast error
Var 𝑒𝑒𝑡𝑡 1 = 𝜎𝜎𝑎𝑎2
If at is normally distributed, 95% confidence interval
±1.96𝜎𝜎𝑎𝑎
44
Forecast – 2-step ahead
AR(p)
𝑟𝑟𝑡𝑡+2 = 𝜙𝜙0 + 𝜙𝜙1𝑟𝑟𝑡𝑡+1 + ⋯+ 𝜙𝜙𝑝𝑝𝑟𝑟𝑡𝑡−𝑝𝑝+2 + 𝑎𝑎𝑡𝑡+2
Conditional expectation
�̂�𝑟𝑡𝑡 2 = 𝐸𝐸 𝑟𝑟𝑡𝑡+2 ℱ𝑡𝑡 = 𝜙𝜙0 + 𝜙𝜙1�̂�𝑟𝑡𝑡 1 + ⋯+ 𝜙𝜙𝑝𝑝𝑟𝑟𝑡𝑡−𝑝𝑝+2
2-step-ahead forecast error
𝑒𝑒𝑡𝑡 2 = 𝜙𝜙1 𝑒𝑒𝑡𝑡 1 + 𝑎𝑎𝑡𝑡+2 = 𝜙𝜙1𝑎𝑎𝑡𝑡+1 + 𝑎𝑎𝑡𝑡+2
Variance of forecast error
Var 𝑒𝑒𝑡𝑡 2 = 𝜎𝜎𝑎𝑎2 1 + 𝜙𝜙1
2
Variance of 2-step-ahead forecast error is larger than variance of 1-step-ahead
forecast error
45
Forecast – n-step ahead
n-step-ahead forecast
�̂�𝑟𝑡𝑡 𝑛𝑛 = 𝐸𝐸 𝑟𝑟𝑡𝑡+𝑛𝑛 ℱ𝑡𝑡 = 𝜙𝜙0 + �
𝑖𝑖=1
𝑝𝑝
𝜙𝜙𝑖𝑖�̂�𝑟𝑡𝑡 𝑛𝑛 − 𝑖𝑖
where �̂�𝑟𝑡𝑡 𝑗𝑗 = 𝑟𝑟𝑡𝑡+𝑗𝑗 if j ≤ 0
n-step-ahead forecast converges to unconditional mean as n → ∞
𝐸𝐸 𝑟𝑟𝑡𝑡 =
𝜙𝜙0
1 − 𝜙𝜙1 −⋯− 𝜙𝜙𝑝𝑝
This is referred to as mean reversion
46
Section 2: Basic Time Series
Analysis, Part II
47
Moving average (MA) model – AR(∞) → MA(1)
In theory, a series can be AR(∞)
𝑟𝑟𝑡𝑡 = 𝜙𝜙0 + 𝜙𝜙1𝑟𝑟𝑡𝑡−1 + 𝜙𝜙2𝑟𝑟𝑡𝑡−2 + ⋯+ 𝑎𝑎𝑡𝑡
However, this model is not realistic as it has infinite number of parameters
A simpler version is to constrain parameters
𝑟𝑟𝑡𝑡 = 𝜙𝜙0 − 𝜃𝜃1𝑟𝑟𝑡𝑡−1 − 𝜃𝜃1
2𝑟𝑟𝑡𝑡−2 − 𝜃𝜃1
3𝑟𝑟𝑡𝑡−3 … + 𝑎𝑎𝑡𝑡
where 𝜙𝜙𝑖𝑖 = −𝜃𝜃1𝑖𝑖 , 𝑖𝑖 ≥ 1
If time series is stationary, 𝜃𝜃1 < 1 and 𝜃𝜃1𝑖𝑖 → 0, otherwise, both 𝜃𝜃1𝑖𝑖 and the series will explode Contribution of rt-i to rt decays exponentially as i increases 48 AR(∞) → MA(1) Rearrange 𝑟𝑟𝑡𝑡 + 𝜃𝜃1𝑟𝑟𝑡𝑡−1 + 𝜃𝜃1 2𝑟𝑟𝑡𝑡−2 + ⋯ = 𝜙𝜙0 + 𝑎𝑎𝑡𝑡 𝑟𝑟𝑡𝑡−1 + 𝜃𝜃1𝑟𝑟𝑡𝑡−2 + 𝜃𝜃1 2𝑟𝑟𝑡𝑡−3 + ⋯ = 𝜙𝜙0 + 𝑎𝑎𝑡𝑡−1 Multiply (b) by 𝜃𝜃1 and subtract (a) 𝑟𝑟𝑡𝑡 = 𝜙𝜙0 1 − 𝜃𝜃1 + 𝑎𝑎𝑡𝑡 − 𝜃𝜃1𝑎𝑎𝑡𝑡−1 Restated as general form of MA(1) 𝑟𝑟𝑡𝑡 = 𝑐𝑐0 + 𝑎𝑎𝑡𝑡 − 𝜃𝜃1𝑎𝑎𝑡𝑡−1 or 𝑟𝑟𝑡𝑡 = 𝑐𝑐0 + 1 − 𝜃𝜃1𝐵𝐵 𝑎𝑎𝑡𝑡 49 (a) (b) MA(q) General form of MA(q) 𝑟𝑟𝑡𝑡 = 𝑐𝑐0 + 𝑎𝑎𝑡𝑡 − 𝜃𝜃1𝑎𝑎𝑡𝑡−1 − ⋯− 𝜃𝜃𝑞𝑞𝑎𝑎𝑡𝑡−𝑞𝑞 or 𝑟𝑟𝑡𝑡 = 𝑐𝑐0 + 1 − 𝜃𝜃1𝐵𝐵 −⋯− 𝜃𝜃𝑞𝑞𝐵𝐵𝑞𝑞 𝑎𝑎𝑡𝑡 where q > 0
Any stationary AR(p) model can be converted to MA(∞)
For example, AR(1) → MA(∞)
𝑟𝑟𝑡𝑡 = 𝜙𝜙1𝑦𝑦𝑡𝑡−1 + 𝑎𝑎𝑡𝑡 = 𝜙𝜙1 𝜙𝜙1𝑦𝑦𝑡𝑡−2 + 𝑎𝑎𝑡𝑡−1 + 𝑎𝑎𝑡𝑡
= 𝜙𝜙1
2𝑦𝑦𝑡𝑡−2 + 𝜙𝜙1𝑎𝑎𝑡𝑡−1 + 𝑎𝑎𝑡𝑡 = 𝜙𝜙1
3𝑦𝑦𝑡𝑡−3 + 𝜙𝜙1
2𝑎𝑎𝑡𝑡−2 + 𝜙𝜙1𝑎𝑎𝑡𝑡−1 + 𝑎𝑎𝑡𝑡
= 𝑎𝑎𝑡𝑡 + 𝜙𝜙1𝑎𝑎𝑡𝑡−1 + 𝜙𝜙1
2𝑎𝑎𝑡𝑡−2 + 𝜙𝜙1
3𝑎𝑎𝑡𝑡−3 + ⋯
Any invertible MA(q) model can be converted to AR(∞)
50
Stationarity
MA models are always weakly stationary
MA models are finite linear combinations of a white noise sequence
First two moments are time invariant
Unconditional mean
𝐸𝐸 𝑟𝑟𝑡𝑡 = 𝑐𝑐0
Unconditional variance
Var 𝑟𝑟𝑡𝑡 = 1 + 𝜃𝜃1
2 + 𝜃𝜃2
2 + ⋯+ 𝜃𝜃𝑞𝑞2 𝜎𝜎𝑎𝑎2
51
ACF
MA(1), assume c0 = 0, multiply model by rt-l
𝑟𝑟𝑡𝑡𝑟𝑟𝑡𝑡−𝑙𝑙 = 𝑎𝑎𝑡𝑡𝑟𝑟𝑡𝑡−𝑙𝑙 − 𝜃𝜃1𝑎𝑎𝑡𝑡−1𝑟𝑟𝑡𝑡−𝑙𝑙
Note that 𝐸𝐸 𝑎𝑎𝑡𝑡−1𝑟𝑟𝑡𝑡−1 = 𝐸𝐸 𝑎𝑎𝑡𝑡−1 𝑎𝑎𝑡𝑡−1 − 𝜃𝜃1𝑎𝑎𝑡𝑡−2 = 𝐸𝐸 𝑎𝑎𝑡𝑡−1𝑎𝑎𝑡𝑡−1 = 𝜎𝜎𝑎𝑎2
Take expectation
𝛾𝛾1 = −𝜃𝜃1𝜎𝜎𝑎𝑎2, and 𝛾𝛾𝑙𝑙 = 0, for 𝑙𝑙 ≥ 2
For MA(1), Var 𝑟𝑟𝑡𝑡 = 1 + 𝜃𝜃12 𝜎𝜎𝑎𝑎2
𝜌𝜌0 = 1,𝜌𝜌1 =
−𝜃𝜃1
1 + 𝜃𝜃1
2 ,𝜌𝜌𝑙𝑙 = 0 for 𝑙𝑙 ≥ 2
For MA(2)
𝜌𝜌1 =
−𝜃𝜃1 + 𝜃𝜃1𝜃𝜃2
1 + 𝜃𝜃1
2 + 𝜃𝜃2
2 ,𝜌𝜌2 =
−𝜃𝜃2
1 + 𝜃𝜃1
2 + 𝜃𝜃2
2 ,𝜌𝜌𝑙𝑙 = 0 for 𝑙𝑙 ≥ 3
ACF cuts off at lag 1 for MA(1), cuts off at lag 2 for MA(2), cuts off at lag q for MA(q); MA models
have “finite-memory”
52
Forecast – MA(1)
MA(1)
𝑟𝑟𝑡𝑡+1 = 𝑐𝑐0 + 𝑎𝑎𝑡𝑡+1 − 𝜃𝜃1𝑎𝑎𝑡𝑡
Take conditional expectations, note that 𝐸𝐸 𝑎𝑎𝑡𝑡+1 ℱ𝑡𝑡 = 0, 𝐸𝐸 𝑎𝑎𝑡𝑡 ℱ𝑡𝑡 = 𝑎𝑎𝑡𝑡
1-step-ahead forecast
�̂�𝑟𝑡𝑡 1 = 𝐸𝐸 𝑟𝑟𝑡𝑡+1 ℱ𝑡𝑡 = 𝑐𝑐0 − 𝜃𝜃1𝑎𝑎𝑡𝑡
𝑒𝑒𝑡𝑡 1 = 𝑟𝑟𝑡𝑡+1 − �̂�𝑟𝑡𝑡 1 = 𝑎𝑎𝑡𝑡+1
Var 𝑒𝑒𝑡𝑡 1 = 𝜎𝜎𝑎𝑎2
2-step-ahead forecast and beyond are the same as unconditional mean and variance
�̂�𝑟𝑡𝑡 2 = 𝐸𝐸 𝑟𝑟𝑡𝑡+2 ℱ𝑡𝑡 = 𝑐𝑐0
𝑒𝑒𝑡𝑡 2 = 𝑟𝑟𝑡𝑡+2 − �̂�𝑟𝑡𝑡 2 = 𝑎𝑎𝑡𝑡+2 − 𝜃𝜃1𝑎𝑎𝑡𝑡+1
Var 𝑒𝑒𝑡𝑡 2 = 1 + 𝜃𝜃1
2 𝜎𝜎𝑎𝑎2
53
Forecast – MA(2)
n-step forward of MA(2)
𝑟𝑟𝑡𝑡+𝑛𝑛 = 𝑐𝑐0 + 𝑎𝑎𝑡𝑡+𝑛𝑛 − 𝜃𝜃1𝑎𝑎𝑡𝑡+𝑛𝑛−1 − 𝜃𝜃2𝑎𝑎𝑡𝑡+𝑛𝑛−2
Forecast
�̂�𝑟𝑡𝑡 1 = 𝑐𝑐0 − 𝜃𝜃1𝑎𝑎𝑡𝑡 − 𝜃𝜃2𝑎𝑎𝑡𝑡−1
�̂�𝑟𝑡𝑡 2 = 𝑐𝑐0 − 𝜃𝜃2𝑎𝑎𝑡𝑡
�̂�𝑟𝑡𝑡 𝑛𝑛 = 𝑐𝑐0,𝑛𝑛 ≥ 3
Var 𝑒𝑒𝑡𝑡 1 = 𝜎𝜎𝑎𝑎2
Var 𝑒𝑒𝑡𝑡 2 = 1 + 𝜃𝜃1
2 𝜎𝜎𝑎𝑎2
Var 𝑒𝑒𝑡𝑡 𝑛𝑛 = 1 + 𝜃𝜃1
2 + 𝜃𝜃2
2 𝜎𝜎𝑎𝑎2,𝑛𝑛 ≥ 3
3-step-ahead forecast and beyond are the same as unconditional mean and variance
For an MA(q) model, multistep-ahead forecasts go to unconditional mean and
variance after first q steps
54
AR(p) vs. MA(q)
55
AR(p) MA(q)
ACF Gradual decline, no clear
cut
Cut off at lag q; useful to
determine q
PACF Cut off at lag p; useful to
determine p
Gradual decline, no clear
cut
Stationary Characteristic roots must
< 1 (explain later)
Always stationary
Forecast History has lasting effect
to forecast
Quickly mean revert after
q-step ahead
AR(p) vs. MA(q)
Choose AR(2) as PACF cuts off at lag 2
56
ACF PACF
AR(p) vs. MA(q)
Choose MA(2) as ACF cuts off at lag 2
57
ACF PACF
AR(p) vs. MA(q)
AR has lasting effect of shock MA mean-reverts quickly
58
Autoregressive moving average (ARMA) model
Problem: certain process can only be described by high-order AR or MA models with
many parameters to estimate
ARMA(p, q)
Use low order combination of AR and MA to describe high-order process
Reduce number of parameters to estimate (parsimony in parameterization)
General form of ARMA(p, q)
𝑟𝑟𝑡𝑡 = 𝜙𝜙0 + �
𝑖𝑖=1
𝑝𝑝
𝜙𝜙𝑖𝑖𝑟𝑟𝑡𝑡−𝑖𝑖 + 𝑎𝑎𝑡𝑡 −�
𝑗𝑗=1
𝑞𝑞
𝜃𝜃𝑗𝑗𝑎𝑎𝑡𝑡−𝑗𝑗
1 − 𝜙𝜙1𝐵𝐵 −⋯− 𝜙𝜙𝑝𝑝𝐵𝐵𝑝𝑝 𝑟𝑟𝑡𝑡 = 𝜙𝜙0 + 1 − 𝜃𝜃1𝐵𝐵 −⋯− 𝜃𝜃𝑞𝑞𝐵𝐵𝑞𝑞 𝑎𝑎𝑡𝑡
where 𝑎𝑎𝑡𝑡 IID ∈ 𝑁𝑁 0,𝜎𝜎𝑎𝑎2 is a white noise
59
ARMA(1, 1)
ARMA(1, 1)
𝑟𝑟𝑡𝑡 − 𝜙𝜙1𝑟𝑟𝑡𝑡−1 = 𝜙𝜙0 + 𝑎𝑎𝑡𝑡 − 𝜃𝜃1𝑎𝑎𝑡𝑡−1
Unconditional mean of ARMA(1, 1) is the same as AR(1)
𝐸𝐸 𝑟𝑟𝑡𝑡 − 𝜙𝜙1𝐸𝐸 𝑟𝑟𝑡𝑡−1 = 𝜙𝜙0 + 𝐸𝐸 𝑎𝑎𝑡𝑡 − 𝜃𝜃1𝐸𝐸 𝑎𝑎𝑡𝑡−1
𝐸𝐸 𝑟𝑟𝑡𝑡 = 𝜇𝜇 =
𝜙𝜙0
1 − 𝜙𝜙1
Unconditional variance: for convenience, remove 𝜙𝜙0 and note that 𝐸𝐸 𝑎𝑎𝑡𝑡−1𝑟𝑟𝑡𝑡−1 = 𝜎𝜎𝑎𝑎2
𝑟𝑟𝑡𝑡 − 𝜇𝜇 = 𝜙𝜙1 𝑟𝑟𝑡𝑡−1 − 𝜇𝜇 + 𝑎𝑎𝑡𝑡 − 𝜃𝜃1𝑎𝑎𝑡𝑡−1
Var 𝑟𝑟𝑡𝑡 = 𝜙𝜙1
2Var 𝑟𝑟𝑡𝑡−1 + 𝜎𝜎𝑎𝑎2 + 𝜃𝜃1
2𝜎𝜎𝑎𝑎2 − 2𝜙𝜙1𝜃𝜃1𝐸𝐸 𝑎𝑎𝑡𝑡−1 𝑟𝑟𝑡𝑡−1 − 𝜇𝜇
Var 𝑟𝑟𝑡𝑡 = 𝜎𝜎𝑎𝑎2
1 + 𝜃𝜃1
2 − 2𝜙𝜙1𝜃𝜃1
1 − 𝜙𝜙1
2
For stationary, 𝜙𝜙12 < 1, same requirement as AR(1)
60
ARMA(1, 1) – ACF
To compute autocovariance, multiply by 𝑟𝑟𝑡𝑡−𝑙𝑙 − 𝜇𝜇
𝐸𝐸 𝑟𝑟𝑡𝑡 − 𝜇𝜇 𝑟𝑟𝑡𝑡−𝑙𝑙 − 𝜇𝜇 = 𝜙𝜙1𝐸𝐸 𝑟𝑟𝑡𝑡−1 − 𝜇𝜇 𝑟𝑟𝑡𝑡−𝑙𝑙 − 𝜇𝜇 + 𝐸𝐸 𝑎𝑎𝑡𝑡 𝑟𝑟𝑡𝑡−𝑙𝑙 − 𝜇𝜇 − 𝜃𝜃1𝐸𝐸 𝑎𝑎𝑡𝑡−1 𝑟𝑟𝑡𝑡−𝑙𝑙 − 𝜇𝜇
For l = 1
𝛾𝛾1 = 𝜙𝜙1𝛾𝛾0 − 𝜃𝜃1𝜎𝜎𝑎𝑎2
which is different from AR: 𝛾𝛾1 = 𝜙𝜙1𝛾𝛾0
For l ≥ 2, same as AR(1)
𝛾𝛾𝑙𝑙 = 𝜙𝜙1𝛾𝛾𝑙𝑙−1
ACF of ARMA(1, 1) behaves very much like AR(1) except that exponential decay
starts with lag 2 instead of lag 1
𝜌𝜌1 = 𝜙𝜙1 − 𝜃𝜃1
𝜎𝜎𝑎𝑎2
𝛾𝛾0
, and 𝜌𝜌𝑙𝑙 = 𝜙𝜙1𝜌𝜌𝑙𝑙−1, 𝑙𝑙 ≥ 2
61
ARMA(1, 1) – ACF
62
ACF for ARMA(1, 1)
ARMA(1, 1) – PACF
PACF of ARMA(1, 1) model does not cut off at any finite lag either
It behaves very much like MA(1) except that exponential decay starts with lag
2 instead of lag 1
ARMA(1, 1) can model higher order process, as both of its ACF and PACF
decay without cut-off
63
ARMA(p, q)
Start with ARMA(1, 1) and select small number for p and q
Objective of ARMA modeling
Higher likelihood and goodness of fit
White noise residuals
64
Extension of ARMA model
Autoregressive integrated moving average, ARIMA(p, d, q)
Add integrated part I(d) to ARMA(p, q)
I(d) is d order to difference, useful to convert nonstationary data to stationary
ARIMA(p, 0, q) is ARMA(p, q)
ARIMA(0, 1, 0) is random walk
ARIMA(0, 0, 0) is white noise
Seasonal autoregressive integrated moving average, SARIMA
Add seasonal adjustment to ARIMA
65
Time-series algorithm vs. machine learning algorithms
Time-series algorithm can extrapolate patterns and project future outside of
the domain of training data, while many machine learning algorithms cannot
Time-series algorithm can derive confidence intervals, while many machine
learning models cannot, as they are not based on statistical distributions
66
Adapt machine learning algorithms to time series
Time series can be captured by adding lagging response or predictors as
additional predictors/features
For example, when predicting real estate return, add lag 1 of real estate return
as feature
It is very important to check
Input stationary before building the model (Dickey-Fuller)
Unit root constraint
White noise residual while assessing model (ACF, Ljung-Box, Q-Q plot)
67
Pitfall of using machine learning on time series
Use lag response together with other predictors as features
Model prediction looks great, matching real testing values
68
Pitfall of using machine learning on time series
Out-of-sample R2 is amazing!
69
Pitfall of using machine learning on time series
A closer look
What the model is actually doing is that when predicting the value at time “t+1”, it simply
uses the value at time “t” as its prediction
Model simply uses the previous value as the prediction for the future
70
Cross-correlation between
prediction and real value
Pitfall of using machine learning on time series
The original data is obviously nonstationary
The true process is actually random walk, which has no forecastability
The high R2 is fake
Solution: convert nonstationary to stationary by differencing
71
Pitfall of using machine learning on time series
After converting to stationary data, model shows no predictability
72
Cross validation
Simple random cross validation does not consider time linkage in time series
Sliding window or forward chaining cross validation should be used to keep
temporal dependencies and prevent data leakage
73
Nested forward chain cross validation
74
Nested forward chain cross validation
75
Machine learning algorithms for time series
Advanced machine learning and pattern finding algorithms can take time series
directly, especially exploring non-linear relationship in time series
Kalman filter and state-space model
Hidden Markov chain and regime shift model
Long short-term neural network (LSTM)
76
Section 3: Advanced Time Series
Analysis – State-Space Model and
Kalman Filter
77
State-Space Model
A commonly used advanced model for dynamic systems (like dynamic time
series) with unobserved and time varying state variables
Measurement Equation: relationship between observed variables (y) and
unobserved state variables (x)
𝑦𝑦𝑡𝑡 = 𝐻𝐻𝑡𝑡𝑥𝑥𝑡𝑡 + 𝑒𝑒𝑡𝑡
Transition Equation: describe the dynamics of the state variables
𝑥𝑥𝑡𝑡 = �𝜇𝜇 + 𝐹𝐹𝑥𝑥𝑡𝑡−1 + 𝜐𝜐𝑡𝑡
State-space model can capture various and complicated relationship in time
series data
78
State-space representation of ARMA models
ARMA models are special cases in state-space model
Example, ARMA(2, 1), 𝑦𝑦𝑡𝑡 = 𝜙𝜙1𝑦𝑦𝑡𝑡−1 + 𝜙𝜙2𝑦𝑦𝑡𝑡−2 + 𝜔𝜔𝑡𝑡 + 𝜃𝜃𝜔𝜔𝑡𝑡−1
Convert to state-space representation
Measurement Equation
𝑦𝑦𝑡𝑡 = 1 𝜃𝜃
𝑥𝑥1,𝑡𝑡
𝑥𝑥2,𝑡𝑡
Transition Equation
𝑥𝑥1,𝑡𝑡
𝑥𝑥2,𝑡𝑡 =
𝜙𝜙1 𝜙𝜙2
1 0
𝑥𝑥1,𝑡𝑡−1
𝑥𝑥2,𝑡𝑡−1 +
𝜔𝜔𝑡𝑡
0
79
How to find unobserved variables – Kalman Filter
An optimal estimation algorithm
One of the very first applications was the Project Apollo
Widely use in guidance/navigation systems, computer vision systems and
signal processing, and then brought to finance
Not a coffee filter
80
• Information processing
• Transformation
Kalman Filter – an example
A moving robot with two state variables (x): position
(p) and velocity (v)
Model (command sent to robot) may predict next
position and velocity, but subject to error: wind,
wheel slippery, bumpy terrain, etc.
Measurement: GPS sensor may identify next
position and velocity, but are indirect measure with
some uncertainty or inaccuracy
Goal: can we have better estimate of next position
and velocity?
81
A happy robot
Mission failed
http://www.bzarg.com/p/how-a-kalman-filter-works-in-pictures/
http://www.bzarg.com/p/how-a-kalman-filter-works-in-pictures/
Kalman Filter example – variable distribution
82
�⃗�𝑥 = �⃗�𝑝
�⃗�𝑣
Variable distribution space, uncorrelated Variable distribution space, correlated
Kalman Filter example – prediction from model
83
Model prediction with F Also need to incorporate uncertainty, Q
Kalman Filter example – measurement
84
Measurement from indirect sensor, represented by H
Kalman Filter example – measurement
85
Also need to consider sensor noise, R
Kalman Filter example – combine model and measurement
86
Pink: prediction from model
Green: Measurement from sensor
Estimation from combination of the
two (orange space) is more accurate
Kalman Filter – workflow
Prediction (formulas are for problem setup on Page 78, different problem setup
will have similar but slightly different Kalman Filter formulas)
𝑥𝑥𝑡𝑡|𝑡𝑡−1 = �𝜇𝜇 + 𝐹𝐹𝑥𝑥𝑡𝑡−1|𝑡𝑡−1
𝑃𝑃𝑡𝑡|𝑡𝑡−1 = 𝐹𝐹𝑃𝑃𝑡𝑡−1|𝑡𝑡−1𝐹𝐹′ + 𝑄𝑄
𝜂𝜂𝑡𝑡|𝑡𝑡−1 = 𝑦𝑦𝑡𝑡 − 𝑦𝑦𝑡𝑡|𝑡𝑡−1 = 𝑦𝑦𝑡𝑡 − 𝐻𝐻𝑡𝑡𝑥𝑥𝑡𝑡|𝑡𝑡−1
𝑓𝑓𝑡𝑡|𝑡𝑡−1 = 𝐻𝐻𝑡𝑡𝑃𝑃𝑡𝑡|𝑡𝑡−1𝐻𝐻𝑡𝑡′ + 𝑅𝑅
87
Model projection of state
Projection of covariance of x (P) with error covariance Q
Compare with measurement from observation yt
(η is prediction error)
Conditional variance of prediction error (f) with error covariance R
Kalman Filter – workflow
Update
𝐾𝐾𝑡𝑡 = 𝑃𝑃𝑡𝑡|𝑡𝑡−1𝐻𝐻𝑡𝑡′𝑓𝑓𝑡𝑡|𝑡𝑡−1
−1
𝑥𝑥𝑡𝑡|𝑡𝑡 = 𝑥𝑥𝑡𝑡|𝑡𝑡−1 + 𝐾𝐾𝑡𝑡𝜂𝜂𝑡𝑡|𝑡𝑡−1
𝑃𝑃𝑡𝑡|𝑡𝑡 = 𝑃𝑃𝑡𝑡|𝑡𝑡−1 − 𝐾𝐾𝑡𝑡𝐻𝐻𝑡𝑡𝑃𝑃𝑡𝑡|𝑡𝑡−1
Iterate from beginning (t = 1) to end (t = T)
88
Kalman Gain, weight assigned to new information
about xt contained in prediction error
Update state variables
Update covariance of state variables
Kalman Filter – Kalman Gain
Kalman gain detainments weights between xt|t-1 and new information contained
in prediction error ηt|t-1
A further transformation of Kalman gain
𝐾𝐾𝑡𝑡 = 𝑃𝑃𝑡𝑡|𝑡𝑡−1𝐻𝐻𝑡𝑡′ 𝐻𝐻𝑡𝑡𝑃𝑃𝑡𝑡|𝑡𝑡−1𝐻𝐻𝑡𝑡′ + 𝑅𝑅
−1
𝑃𝑃𝑡𝑡|𝑡𝑡−1𝐻𝐻𝑡𝑡′ is portion of prediction error variance due to uncertainty in xt|t-1, R is
the portion of prediction error variance due to random shock et
Uncertainty in xt|t-1 ↑ (𝑃𝑃𝑡𝑡|𝑡𝑡−1 ↑), weight in ηt|t-1 ↑
Uncertainty in et ↑ (R ↑), weight in ηt|t-1 ↓
Kalman gain put more weight on the portion with more accurate information
89
90
MFIN 290 Application of Machine Learning in Finance: Lecture 7
Agenda
Section 1: Basic Time Series Analysis, Part I
What is time series data
Stationarity
Weak stationarity
Weak stationarity
Weak stationarity
Weak stationarity
Why care about stationarity
Autocorrelation
Autocorrelation in stock returns?
Autocorrelation in real estate returns?
Autocorrelation function (ACF)
Autocorrelation function (ACF)
How to test autocorrelation – Ljung-Box test
White noise
Random walk
Markov property
Martingale property
Random walk – non-stationary
No forecastability
Autoregression (AR) model
AR(1) – properties
AR(1) – unconditional mean
AR(1) – unconditional variance
AR(1) – autocorrelation
AR(1) – ACF
AR(1) – unit root
AR(1) – unit root
AR(2)
AR(2) – ACF
Characteristic roots
Unit root
Unit root – Dickey-Fuller test
Unit root – Augmented Dickey-Fuller test
AR(p)
Select order p
Partial autocorrelation function (PACF)
PACF vs. ACF
Information criteria
Check residuals
Check residuals
Forecast – 1-step ahead
Forecast – 2-step ahead
Forecast – n-step ahead
Section 2: Basic Time Series Analysis, Part II
Moving average (MA) model – AR(∞) → MA(1)
AR(∞) → MA(1)
MA(q)
Stationarity
ACF
Forecast – MA(1)
Forecast – MA(2)
AR(p) vs. MA(q)
AR(p) vs. MA(q)
AR(p) vs. MA(q)
AR(p) vs. MA(q)
Autoregressive moving average (ARMA) model
ARMA(1, 1)
ARMA(1, 1) – ACF
ARMA(1, 1) – ACF
ARMA(1, 1) – PACF
ARMA(p, q)
Extension of ARMA model
Time-series algorithm vs. machine learning algorithms
Adapt machine learning algorithms to time series
Pitfall of using machine learning on time series
Pitfall of using machine learning on time series
Pitfall of using machine learning on time series
Pitfall of using machine learning on time series
Pitfall of using machine learning on time series
Cross validation
Nested forward chain cross validation
Nested forward chain cross validation
Machine learning algorithms for time series
Section 3: Advanced Time Series Analysis – State-Space Model and Kalman Filter
State-Space Model
State-space representation of ARMA models
How to find unobserved variables – Kalman Filter
Kalman Filter – an example
Kalman Filter example – variable distribution
Kalman Filter example – prediction from model
Kalman Filter example – measurement
Kalman Filter example – measurement
Kalman Filter example – combine model and measurement
Kalman Filter – workflow
Kalman Filter – workflow
Kalman Filter – Kalman Gain
幻灯片编号 90