CS计算机代考程序代写 chain Bayesian finance algorithm MFIN 290 Application of Machine Learning in Finance: Lecture 7

MFIN 290 Application of Machine Learning in Finance: Lecture 7

MFIN 290 Application of Machine
Learning in Finance: Lecture 7

Edward Sheng

8/7/2021

Agenda

Basic time series analysis, part I

Advanced time series analysis – State-Space Model and Kalman Filter

1

2

3

2

Basic time series analysis, part II

Section 1: Basic Time Series
Analysis, Part I

3

What is time series data
Time series is a collection of data points that evolves over time

Time dependent and serially correlated: the basic assumption of OLS that observations
are independent does not hold
Seasonality: variations of data that is specific to a particular time frame; e.g., higher sales
of woolen jacket in winter season

Examples
Stock price, real estate price
GDP, interest rate
Seasonal sales

Time series is the most important data type in finance and investment

4

Stationarity
A Time Series {rt} is said to be strictly stationary if the joint distribution of
{rt1 , . . . , rtk} is identical to that of {rt1+t, . . . , rtk+t} for all t, where k is an arbitrary
positive integer and (t1, t2, . . . , tk) is a collection of positive integers

𝐹𝐹 𝑟𝑟𝑡𝑡1 , … , 𝑟𝑟𝑡𝑡𝑘𝑘 = 𝐹𝐹 𝑟𝑟𝑡𝑡1+𝑡𝑡 , … , 𝑟𝑟𝑡𝑡𝑘𝑘+𝑡𝑡

The properties of the series are invariant to time shifts

This is a strong condition and hard to verify

5

Weak stationarity
A Time Series {rt} is said to be weakly stationary if both the mean and the
covariance are time-invariant

More specifically, {rt} is weakly stationary if E(rt) = μ and Cov(rt , rt−l) = γl, which
only depends on l (number of lags)

Constant and finite first and second moments implies weak stationarity

Weak stationarity does NOT imply stationarity, except for normally distributed
variables

6

Weak stationarity
Mean of the series should be constant and not a function of time

7

Weak stationarity
Variance of the series should be constant (homoscedasticity) and not a
function of time or variable levels (heteroscedasticity)

8

Weak stationarity
Covariance of the tth term and the t-lth term should not be a function of time

9

Why care about stationarity
Most time-series models rely on the assumption of stationary

Many traditional statistical theories do not apply to nonstationary process, such
as law of large number and central limit theory

Non-stationary data may contains
Trend
Seasonality

Convert non-stationary to stationary
Differencing: taking the difference with a particular time lag
Decomposition: modeling both trend and seasonality and removing them from data

10

Autocorrelation
The autocorrelation coefficient for {rt} is defined as

𝜌𝜌𝑙𝑙 =
Cov 𝑟𝑟𝑡𝑡 , 𝑟𝑟𝑡𝑡−𝑙𝑙

Var 𝑟𝑟𝑡𝑡 Var 𝑟𝑟𝑡𝑡−𝑙𝑙
=

Cov 𝑟𝑟𝑡𝑡 , 𝑟𝑟𝑡𝑡−𝑙𝑙
Var 𝑟𝑟𝑡𝑡

=
𝛾𝛾𝑙𝑙
𝛾𝛾0

ρ1, … , ρl, are 1st, …, lth autocorrelation with 1, …, l lag

A weak stationary series is not serially correlated if ρl = 0 for all l

11

Autocorrelation in stock returns?
Maybe a little

12

Monthly log returns (VW-CRSP Index) 1925-2009

Autocorrelation in real estate returns?

13

Monthly log house price changes in AZ (Case-Shiller Index) 1987-2010

Autocorrelation function (ACF)
The sample autocorrelation function (ACF) of a time series is defined as �𝜌𝜌1,
�𝜌𝜌2, …, �𝜌𝜌𝑙𝑙

14

ACF for monthly log return on VW-CRSP Index. Two standard error
bands around zero. 1926-2009

Autocorrelation function (ACF)

15

ACF for monthly log house price changes. Two standard error
bands around zero. 1987-2010

How to test autocorrelation – Ljung-Box test
Ljung-Box test is a statistical test with null hypothesis that data are
independently distributed with no autocorrelation

𝐻𝐻0: 𝜌𝜌1 = ⋯ = 𝜌𝜌𝑚𝑚 = 0

Test statistics

𝑄𝑄 𝑚𝑚 = 𝑇𝑇 𝑇𝑇 + 2 �
𝑙𝑙=1

𝑚𝑚
�𝜌𝜌𝑙𝑙
2

𝑇𝑇 − 𝑙𝑙

Q(m) is asymptotically 𝜒𝜒2 with m degree of freedom

Reject null if Q(m) > 𝜒𝜒2 𝛼𝛼 where 𝜒𝜒2 𝛼𝛼 denotes (1 – α) × 100-th percentile

16

White noise
A time series {rt} is white noise if {rt} is a sequence of independent and
identically distributed (i.i.d) variables

If {rt} is also normally distributed, it is called Gaussian white noise

White noise is stationary and have 0 autocorrelations in ACF

17

Random walk
A time series {pt} follows a random walk with normally distributed increments
when

𝑝𝑝𝑡𝑡 = 𝑝𝑝𝑡𝑡−1 + 𝑎𝑎𝑡𝑡 ,𝑎𝑎𝑡𝑡 IID ∈ 𝑁𝑁 0,𝜎𝜎𝑎𝑎2

Changes of pt are white noise

18

Markov property
A stochastic process has Markov property if conditional probability distribution
of future states depends only upon the present state, not on anything precedes
present state

No memory on history except present state

Random walk has Markov property

19

Martingale property
Martingale is a sequence of random variables for which, at a particular time,
the conditional expectation of next value, given all prior values, is equal to the
present value

𝐸𝐸 𝑝𝑝𝑡𝑡+1 𝑝𝑝𝑡𝑡 = 𝑝𝑝𝑡𝑡
Martingale is a “fair game”: knowledge of the past will be of no use in
predicting future winnings

Random walk has Martingale property

20

Random walk – non-stationary
𝑝𝑝𝑡𝑡 = 𝑝𝑝𝑡𝑡−1 + 𝑎𝑎𝑡𝑡 ,𝑎𝑎𝑡𝑡 IID ∈ 𝑁𝑁 0,𝜎𝜎𝑎𝑎2

Mean
𝐸𝐸 𝑝𝑝𝑡𝑡 = 𝐸𝐸 𝑝𝑝0 + 𝑎𝑎1 + ⋯+ 𝑎𝑎𝑡𝑡 = 𝐸𝐸 𝑝𝑝0

Variance
Var 𝑝𝑝𝑡𝑡 = Var 𝑝𝑝0 + 𝑎𝑎1 + ⋯+ 𝑎𝑎𝑡𝑡 = 𝑡𝑡𝜎𝜎𝑎𝑎2

Random walk has constant mean but increasing variance (also increasing
covariance); it is not a stationary process

Random walk has strong autocorrelation

21

No forecastability
Random walk has no forecastability; the best forecast is the current value (not
the unconditional mean)

1-step ahead forecast
�𝑝𝑝𝑡𝑡 1 = 𝐸𝐸 𝑝𝑝𝑡𝑡+1 𝑝𝑝𝑡𝑡,𝑝𝑝𝑡𝑡−1, … ,𝑝𝑝0 = 𝑝𝑝𝑡𝑡

n-step ahead forecast
�𝑝𝑝𝑡𝑡 𝑛𝑛 = 𝐸𝐸 𝑝𝑝𝑡𝑡+𝑛𝑛 𝑝𝑝𝑡𝑡,𝑝𝑝𝑡𝑡−1, … ,𝑝𝑝0 = 𝑝𝑝𝑡𝑡

However, forecasting uncertainty increases with the horizon
Var �𝑝𝑝𝑡𝑡 𝑛𝑛 = 𝑛𝑛𝜎𝜎𝑎𝑎2

Forecast uncertainty diverges to ∞ as n → ∞

Random walk will be further discussed in unit root later

22

Autoregression (AR) model
Assume that immediately lagged values of a series contains useful information
in predicting future values

Consider a model
𝑟𝑟𝑡𝑡 = 𝜙𝜙0 + 𝜙𝜙1𝑟𝑟𝑡𝑡−1 + 𝑎𝑎𝑡𝑡

where 𝑎𝑎𝑡𝑡 IID ∈ 𝑁𝑁 0,𝜎𝜎𝑎𝑎2 is a white noise

{at} represents the “news”
𝑎𝑎𝑡𝑡 = 𝑟𝑟𝑡𝑡 − 𝐸𝐸𝑡𝑡−1 𝑟𝑟𝑡𝑡

at is what you know about the process at t but not at t – 1

This model is referred as AR(1)

23

AR(1) – properties
𝑟𝑟𝑡𝑡+1 = 𝜙𝜙0 + 𝜙𝜙1𝑟𝑟𝑡𝑡 + 𝑎𝑎𝑡𝑡+1

Conditional mean and conditional variance
𝐸𝐸 𝑟𝑟𝑡𝑡+1 𝑟𝑟𝑡𝑡 = 𝜙𝜙0 + 𝜙𝜙1𝑟𝑟𝑡𝑡

Var 𝑟𝑟𝑡𝑡+1 𝑟𝑟𝑡𝑡 = Var 𝑎𝑎𝑡𝑡+1 = 𝜎𝜎𝑎𝑎2

Forecast of rt+1 only related to rt but not any value prior

The rest of history from rt-1 to r0 is irrelevant

AR(1) has Markov property; it is obviously restrictive

24

AR(1) – unconditional mean
Assume that the series is weakly stationary
Unconditional mean

𝐸𝐸 𝑟𝑟𝑡𝑡+1 = 𝜙𝜙0 + 𝜙𝜙1𝐸𝐸 𝑟𝑟𝑡𝑡
Use stationarity: 𝐸𝐸 𝑟𝑟𝑡𝑡+1 = 𝐸𝐸 𝑟𝑟𝑡𝑡 = 𝜇𝜇

𝜇𝜇 = 𝜙𝜙0 + 𝜙𝜙1𝜇𝜇

𝜇𝜇 =
𝜙𝜙0

1 − 𝜙𝜙1
Implication

Mean exists if and only if 𝜙𝜙1 ≠ 1
Mean is 0 if and only if 𝜙𝜙0 = 0
If 𝜙𝜙1 = 1, random walk, also called unit root (explain later)

25

AR(1) – unconditional variance
𝑟𝑟𝑡𝑡+1 = 𝜙𝜙0 + 𝜙𝜙1𝑟𝑟𝑡𝑡 + 𝑎𝑎𝑡𝑡+1

Take variance and note that 𝑬𝑬 𝒂𝒂𝒕𝒕+𝟏𝟏𝒓𝒓𝒕𝒕 = 𝟎𝟎, as 𝒂𝒂𝒕𝒕+𝟏𝟏 does not depend on any past
information

Var 𝑟𝑟𝑡𝑡+1 = 𝜙𝜙1
2Var 𝑟𝑟𝑡𝑡 + 𝜎𝜎𝑎𝑎2

Use stationarity: Var 𝑟𝑟𝑡𝑡+1 = Var 𝑟𝑟𝑡𝑡

Var 𝑟𝑟𝑡𝑡+1 =
𝜎𝜎𝑎𝑎2

1 − 𝜙𝜙1
2

For variance to be positive and bounded, 𝜙𝜙12 must be less than 1, so weak stationarity
implies

−1 < 𝜙𝜙1 < 1 26 AR(1) – autocorrelation It implies that ACF of AR(1) 𝜌𝜌𝑙𝑙 = 𝜙𝜙1𝜌𝜌𝑙𝑙−1, 𝑙𝑙 ≥ 0 and ρ0 = 1 Combing two equations implies that 𝜌𝜌𝑙𝑙 = 𝜙𝜙1 𝑙𝑙𝜌𝜌0 = 𝜙𝜙1 𝑙𝑙 Autocorrelation decays exponentially at a rate of Φ1 27 AR(1) – ACF 28 ACF for AR(1). Left panel Φ1 = 0.8. Right panel Φ1 = -0.8 AR(1) – unit root Φ1 = 0, no autocorrelation, stationary, white noise Φ1 = 0.5, weak autocorrelation, stationary 29 AR(1) – unit root Φ1 = 0.9, strong autocorrelation, stationary Φ1 = 1, unit root, strong autocorrelation, non-stationary, random walk 30 AR(2) 𝑟𝑟𝑡𝑡 = 𝜙𝜙0 + 𝜙𝜙1𝑟𝑟𝑡𝑡−1 + 𝜙𝜙2𝑟𝑟𝑡𝑡−2 + 𝑎𝑎𝑡𝑡 where 𝑎𝑎𝑡𝑡 IID ∈ 𝑁𝑁 0,𝜎𝜎𝑎𝑎2 is a white noise Unconditional mean 𝐸𝐸 𝑟𝑟𝑡𝑡 = 𝜇𝜇 = 𝜙𝜙0 1 − 𝜙𝜙1 − 𝜙𝜙2 provided that 𝜙𝜙1 + 𝜙𝜙2 ≠ 1 ACF 𝜌𝜌1 = 𝜙𝜙1 1 − 𝜙𝜙2 𝜌𝜌𝑙𝑙 = 𝜙𝜙1𝜌𝜌𝑙𝑙−1 + 𝜙𝜙2𝜌𝜌𝑙𝑙−2, 𝑙𝑙 ≥ 2 31 AR(2) – ACF 32 ACF for AR(2) Characteristic roots 1 − 𝜙𝜙1𝐵𝐵 − 𝜙𝜙2𝐵𝐵2 𝜌𝜌𝑙𝑙 = 0 where B is back-shift operator, 𝐵𝐵𝜌𝜌𝑙𝑙 = 𝜌𝜌𝑙𝑙−1 Characteristic equation for AR(2) and solution 1 − 𝜙𝜙1𝑥𝑥 − 𝜙𝜙2𝑥𝑥2 = 0 𝑥𝑥1, 𝑥𝑥2 = 𝜙𝜙1 ± 𝜙𝜙1 2 + 4𝜙𝜙2 −2𝜙𝜙2 Characteristic roots 𝜔𝜔1 = 1 𝑥𝑥1 ,𝜔𝜔2 = 1 𝑥𝑥2 Process is stationary if modulus of roots 𝜔𝜔1, 𝜔𝜔2 (for real number, absolute value) are all less than 1 (ACF converge to 0) For AR(1), 𝜔𝜔 = 1 𝑥𝑥 = 𝜙𝜙1 < 1 → −1 < 𝜙𝜙1 < 1 for stationary 33 Unit root If any one root has modulus (for real root, absolute value) equals to 1, it is a unit root series Unit root series is nonstationary time series, best know example is random walk (with or without drift) Unit root series has strong and persistent autocorrelation Identify unit root/random walk ACF close to 1 for small lags ACF decays very slowly 34 Unit root – Dickey-Fuller test 𝑝𝑝𝑡𝑡 = 𝜙𝜙1𝑝𝑝𝑡𝑡−1 + 𝑒𝑒𝑡𝑡 𝑝𝑝𝑡𝑡 = 𝜙𝜙0 + 𝜙𝜙1𝑝𝑝𝑡𝑡−1 + 𝑒𝑒𝑡𝑡 Dickey-Fuller test: test existence of unit root (nonstationary) H0: 𝜙𝜙1 = 1, H1: 𝜙𝜙1 < 1 DF = 𝑡𝑡 ratio = �𝜙𝜙1 − 1 std �𝜙𝜙1 DF depend on inclusion/exclusion of 𝜙𝜙0. When exclude 𝜙𝜙0 (equation a) DF = ∑𝑡𝑡=1 𝑇𝑇 𝑝𝑝𝑡𝑡−1𝑒𝑒𝑡𝑡 �𝜎𝜎𝑒𝑒 ∑𝑡𝑡=1 𝑇𝑇 𝑝𝑝𝑡𝑡−1 2 �𝜎𝜎𝑒𝑒 = ∑𝑡𝑡=1 𝑇𝑇 𝑝𝑝𝑡𝑡 − �𝜙𝜙1𝑝𝑝𝑡𝑡−1 2 𝑇𝑇 − 1 35 (a) (b) Unit root – Augmented Dickey-Fuller test For more complicated model 𝑝𝑝𝑡𝑡 = 𝑐𝑐𝑡𝑡 + 𝛽𝛽𝑝𝑝𝑡𝑡−1 + � 𝑖𝑖=1 𝑝𝑝 𝜙𝜙𝑖𝑖Δ𝑝𝑝𝑡𝑡−𝑖𝑖 + 𝑒𝑒𝑡𝑡 where 𝑒𝑒𝑡𝑡 is error term and 𝑐𝑐𝑡𝑡 = 𝜔𝜔0 + 𝜔𝜔1𝑡𝑡 Hypothesis test H0: 𝛽𝛽 = 1, H1:𝛽𝛽 < 1 ADF = 𝑡𝑡 ratio = �̂�𝛽 − 1 𝑠𝑠𝑡𝑡𝑠𝑠 �̂�𝛽 ADF also depends on inclusion/exclusion of constant 36 AR(p) Assume other lagged returns also contains useful information in predicting future values 𝑟𝑟𝑡𝑡 = 𝜙𝜙0 + 𝜙𝜙1𝑟𝑟𝑡𝑡−1 + ⋯+ 𝜙𝜙𝑝𝑝𝑟𝑟𝑡𝑡−𝑝𝑝 + 𝑎𝑎𝑡𝑡 where 𝑎𝑎𝑡𝑡 IID ∈ 𝑁𝑁 0,𝜎𝜎𝑎𝑎2 is a white noise Conditional mean and conditional variance 𝐸𝐸 𝑟𝑟𝑡𝑡+1 𝑟𝑟𝑡𝑡 , … , 𝑟𝑟𝑡𝑡−𝑝𝑝+1 = 𝜙𝜙0 + 𝜙𝜙1𝑟𝑟𝑡𝑡 + ⋯+ 𝜙𝜙𝑝𝑝𝑟𝑟𝑡𝑡−𝑝𝑝+1 Var 𝑟𝑟𝑡𝑡+1 𝑟𝑟𝑡𝑡 , … , 𝑟𝑟𝑡𝑡−𝑝𝑝+1 = Var 𝑎𝑎𝑡𝑡+1 = 𝜎𝜎𝑎𝑎2 Moments conditional on rt, …, rt-p+1 are not correlated with any lags prior to p lags For stationary, modulus of all characteristic roots (inverse of solutions of characteristic equation) must be less than 1 37 Select order p How to select order p of AR model Partial autocorrelation function (PACF) Information criteria Check residuals Estimate AR model Conditional least squares Maximum likelihood estimate (MLE) 38 Partial autocorrelation function (PACF) PACF of a stationary series is defined as {Φl,l}, l = 1, …, n 𝑟𝑟𝑡𝑡 = 𝜙𝜙0,1 + 𝜙𝜙1,1𝑟𝑟𝑡𝑡−1 + 𝑎𝑎1𝑡𝑡 𝑟𝑟𝑡𝑡 = 𝜙𝜙0,2 + 𝜙𝜙1,2𝑟𝑟𝑡𝑡−1 + 𝜙𝜙2,2𝑟𝑟𝑡𝑡−2 + 𝑎𝑎2𝑡𝑡 𝑟𝑟𝑡𝑡 = 𝜙𝜙0,3 + 𝜙𝜙1,3𝑟𝑟𝑡𝑡−1 + 𝜙𝜙2,3𝑟𝑟𝑡𝑡−2 + 𝜙𝜙3,3𝑟𝑟𝑡𝑡−3 + 𝑎𝑎3𝑡𝑡 … 𝑟𝑟𝑡𝑡 = 𝜙𝜙0,𝑙𝑙 + 𝜙𝜙1,𝑙𝑙𝑟𝑟𝑡𝑡−1 + 𝜙𝜙2,𝑙𝑙𝑟𝑟𝑡𝑡−2 + ⋯+ 𝜙𝜙𝑙𝑙,𝑙𝑙𝑟𝑟𝑡𝑡−𝑙𝑙 + 𝑎𝑎𝑙𝑙𝑡𝑡 Φp,p shows incremental contribution of rt-p to rt over an AR(p – 1) model 39 PACF vs. ACF Consider an AR(1) process 𝑟𝑟𝑡𝑡 = 𝜙𝜙0 + 𝜙𝜙1𝑟𝑟𝑡𝑡−1 + 𝑎𝑎𝑡𝑡 ACF between rt and rt-2 𝜌𝜌2 = Corr 𝑟𝑟𝑡𝑡 , 𝑟𝑟𝑡𝑡−2 = 𝜙𝜙1 2 > 0

rt and rt-2 are correlated

However, rt and rt-2 are not directly correlated, their correlation is indirectly, since rt is
correlated with rt-1 and rt-1 is correlated with rt-2
PACF considers only direct correlation between rt and rt-2. For this AR(1) process

𝜙𝜙2,2 = 0

For an AR(p) series, PACF cuts off after lag p

40

Information criteria
Akaike information criterion (AIC)

AIC =
−2
𝑇𝑇

ln likelihood +
2
𝑇𝑇

number of parameters

Schwarz-Bayesian information criterion (BIC)

AIC =
−2
𝑇𝑇

ln likelihood +
ln𝑇𝑇
𝑇𝑇

number of parameters

Pick p that has minimum AIC and BIC values

41

Check residuals
A good time series model should capture the pattern of a time series

It is very important to check residuals; residuals should behave as white noise,
as they represent what cannot be captured by the model

Check ACF of residuals; residuals should be uncorrelated (ACF = 0)
Perform Ljung-Box test on residuals (test on autocorrelation)
Residuals should follow a normal distribution with zero mean

42

Check residuals

43

Forecast – 1-step ahead
AR(p)

𝑟𝑟𝑡𝑡+1 = 𝜙𝜙0 + 𝜙𝜙1𝑟𝑟𝑡𝑡 + ⋯+ 𝜙𝜙𝑝𝑝𝑟𝑟𝑡𝑡−𝑝𝑝+1 + 𝑎𝑎𝑡𝑡+1
where 𝑎𝑎𝑡𝑡 IID ∈ 𝑁𝑁 0,𝜎𝜎𝑎𝑎2 is a white noise

1-step-ahead forecast is the same as conditional mean
�̂�𝑟𝑡𝑡 1 = 𝐸𝐸 𝑟𝑟𝑡𝑡+1 ℱ𝑡𝑡 = 𝜙𝜙0 + 𝜙𝜙1𝑟𝑟𝑡𝑡 + ⋯+ 𝜙𝜙𝑝𝑝𝑟𝑟𝑡𝑡−𝑝𝑝+1

1-step-ahead forecast error
𝑒𝑒𝑡𝑡 1 = 𝑟𝑟𝑡𝑡+1 − �̂�𝑟𝑡𝑡 1 = 𝑎𝑎𝑡𝑡+1

Variance of 1-step-ahead forecast error
Var 𝑒𝑒𝑡𝑡 1 = 𝜎𝜎𝑎𝑎2

If at is normally distributed, 95% confidence interval
±1.96𝜎𝜎𝑎𝑎

44

Forecast – 2-step ahead
AR(p)

𝑟𝑟𝑡𝑡+2 = 𝜙𝜙0 + 𝜙𝜙1𝑟𝑟𝑡𝑡+1 + ⋯+ 𝜙𝜙𝑝𝑝𝑟𝑟𝑡𝑡−𝑝𝑝+2 + 𝑎𝑎𝑡𝑡+2
Conditional expectation

�̂�𝑟𝑡𝑡 2 = 𝐸𝐸 𝑟𝑟𝑡𝑡+2 ℱ𝑡𝑡 = 𝜙𝜙0 + 𝜙𝜙1�̂�𝑟𝑡𝑡 1 + ⋯+ 𝜙𝜙𝑝𝑝𝑟𝑟𝑡𝑡−𝑝𝑝+2
2-step-ahead forecast error

𝑒𝑒𝑡𝑡 2 = 𝜙𝜙1 𝑒𝑒𝑡𝑡 1 + 𝑎𝑎𝑡𝑡+2 = 𝜙𝜙1𝑎𝑎𝑡𝑡+1 + 𝑎𝑎𝑡𝑡+2
Variance of forecast error

Var 𝑒𝑒𝑡𝑡 2 = 𝜎𝜎𝑎𝑎2 1 + 𝜙𝜙1
2

Variance of 2-step-ahead forecast error is larger than variance of 1-step-ahead
forecast error

45

Forecast – n-step ahead
n-step-ahead forecast

�̂�𝑟𝑡𝑡 𝑛𝑛 = 𝐸𝐸 𝑟𝑟𝑡𝑡+𝑛𝑛 ℱ𝑡𝑡 = 𝜙𝜙0 + �
𝑖𝑖=1

𝑝𝑝

𝜙𝜙𝑖𝑖�̂�𝑟𝑡𝑡 𝑛𝑛 − 𝑖𝑖

where �̂�𝑟𝑡𝑡 𝑗𝑗 = 𝑟𝑟𝑡𝑡+𝑗𝑗 if j ≤ 0

n-step-ahead forecast converges to unconditional mean as n → ∞

𝐸𝐸 𝑟𝑟𝑡𝑡 =
𝜙𝜙0

1 − 𝜙𝜙1 −⋯− 𝜙𝜙𝑝𝑝
This is referred to as mean reversion

46

Section 2: Basic Time Series
Analysis, Part II

47

Moving average (MA) model – AR(∞) → MA(1)
In theory, a series can be AR(∞)

𝑟𝑟𝑡𝑡 = 𝜙𝜙0 + 𝜙𝜙1𝑟𝑟𝑡𝑡−1 + 𝜙𝜙2𝑟𝑟𝑡𝑡−2 + ⋯+ 𝑎𝑎𝑡𝑡
However, this model is not realistic as it has infinite number of parameters

A simpler version is to constrain parameters
𝑟𝑟𝑡𝑡 = 𝜙𝜙0 − 𝜃𝜃1𝑟𝑟𝑡𝑡−1 − 𝜃𝜃1

2𝑟𝑟𝑡𝑡−2 − 𝜃𝜃1
3𝑟𝑟𝑡𝑡−3 … + 𝑎𝑎𝑡𝑡

where 𝜙𝜙𝑖𝑖 = −𝜃𝜃1𝑖𝑖 , 𝑖𝑖 ≥ 1

If time series is stationary, 𝜃𝜃1 < 1 and 𝜃𝜃1𝑖𝑖 → 0, otherwise, both 𝜃𝜃1𝑖𝑖 and the series will explode Contribution of rt-i to rt decays exponentially as i increases 48 AR(∞) → MA(1) Rearrange 𝑟𝑟𝑡𝑡 + 𝜃𝜃1𝑟𝑟𝑡𝑡−1 + 𝜃𝜃1 2𝑟𝑟𝑡𝑡−2 + ⋯ = 𝜙𝜙0 + 𝑎𝑎𝑡𝑡 𝑟𝑟𝑡𝑡−1 + 𝜃𝜃1𝑟𝑟𝑡𝑡−2 + 𝜃𝜃1 2𝑟𝑟𝑡𝑡−3 + ⋯ = 𝜙𝜙0 + 𝑎𝑎𝑡𝑡−1 Multiply (b) by 𝜃𝜃1 and subtract (a) 𝑟𝑟𝑡𝑡 = 𝜙𝜙0 1 − 𝜃𝜃1 + 𝑎𝑎𝑡𝑡 − 𝜃𝜃1𝑎𝑎𝑡𝑡−1 Restated as general form of MA(1) 𝑟𝑟𝑡𝑡 = 𝑐𝑐0 + 𝑎𝑎𝑡𝑡 − 𝜃𝜃1𝑎𝑎𝑡𝑡−1 or 𝑟𝑟𝑡𝑡 = 𝑐𝑐0 + 1 − 𝜃𝜃1𝐵𝐵 𝑎𝑎𝑡𝑡 49 (a) (b) MA(q) General form of MA(q) 𝑟𝑟𝑡𝑡 = 𝑐𝑐0 + 𝑎𝑎𝑡𝑡 − 𝜃𝜃1𝑎𝑎𝑡𝑡−1 − ⋯− 𝜃𝜃𝑞𝑞𝑎𝑎𝑡𝑡−𝑞𝑞 or 𝑟𝑟𝑡𝑡 = 𝑐𝑐0 + 1 − 𝜃𝜃1𝐵𝐵 −⋯− 𝜃𝜃𝑞𝑞𝐵𝐵𝑞𝑞 𝑎𝑎𝑡𝑡 where q > 0

Any stationary AR(p) model can be converted to MA(∞)
For example, AR(1) → MA(∞)
𝑟𝑟𝑡𝑡 = 𝜙𝜙1𝑦𝑦𝑡𝑡−1 + 𝑎𝑎𝑡𝑡 = 𝜙𝜙1 𝜙𝜙1𝑦𝑦𝑡𝑡−2 + 𝑎𝑎𝑡𝑡−1 + 𝑎𝑎𝑡𝑡
= 𝜙𝜙1

2𝑦𝑦𝑡𝑡−2 + 𝜙𝜙1𝑎𝑎𝑡𝑡−1 + 𝑎𝑎𝑡𝑡 = 𝜙𝜙1
3𝑦𝑦𝑡𝑡−3 + 𝜙𝜙1

2𝑎𝑎𝑡𝑡−2 + 𝜙𝜙1𝑎𝑎𝑡𝑡−1 + 𝑎𝑎𝑡𝑡
= 𝑎𝑎𝑡𝑡 + 𝜙𝜙1𝑎𝑎𝑡𝑡−1 + 𝜙𝜙1

2𝑎𝑎𝑡𝑡−2 + 𝜙𝜙1
3𝑎𝑎𝑡𝑡−3 + ⋯

Any invertible MA(q) model can be converted to AR(∞)
50

Stationarity
MA models are always weakly stationary

MA models are finite linear combinations of a white noise sequence
First two moments are time invariant

Unconditional mean
𝐸𝐸 𝑟𝑟𝑡𝑡 = 𝑐𝑐0

Unconditional variance
Var 𝑟𝑟𝑡𝑡 = 1 + 𝜃𝜃1

2 + 𝜃𝜃2
2 + ⋯+ 𝜃𝜃𝑞𝑞2 𝜎𝜎𝑎𝑎2

51

ACF
MA(1), assume c0 = 0, multiply model by rt-l

𝑟𝑟𝑡𝑡𝑟𝑟𝑡𝑡−𝑙𝑙 = 𝑎𝑎𝑡𝑡𝑟𝑟𝑡𝑡−𝑙𝑙 − 𝜃𝜃1𝑎𝑎𝑡𝑡−1𝑟𝑟𝑡𝑡−𝑙𝑙
Note that 𝐸𝐸 𝑎𝑎𝑡𝑡−1𝑟𝑟𝑡𝑡−1 = 𝐸𝐸 𝑎𝑎𝑡𝑡−1 𝑎𝑎𝑡𝑡−1 − 𝜃𝜃1𝑎𝑎𝑡𝑡−2 = 𝐸𝐸 𝑎𝑎𝑡𝑡−1𝑎𝑎𝑡𝑡−1 = 𝜎𝜎𝑎𝑎2

Take expectation
𝛾𝛾1 = −𝜃𝜃1𝜎𝜎𝑎𝑎2, and 𝛾𝛾𝑙𝑙 = 0, for 𝑙𝑙 ≥ 2

For MA(1), Var 𝑟𝑟𝑡𝑡 = 1 + 𝜃𝜃12 𝜎𝜎𝑎𝑎2

𝜌𝜌0 = 1,𝜌𝜌1 =
−𝜃𝜃1

1 + 𝜃𝜃1
2 ,𝜌𝜌𝑙𝑙 = 0 for 𝑙𝑙 ≥ 2

For MA(2)

𝜌𝜌1 =
−𝜃𝜃1 + 𝜃𝜃1𝜃𝜃2
1 + 𝜃𝜃1

2 + 𝜃𝜃2
2 ,𝜌𝜌2 =

−𝜃𝜃2
1 + 𝜃𝜃1

2 + 𝜃𝜃2
2 ,𝜌𝜌𝑙𝑙 = 0 for 𝑙𝑙 ≥ 3

ACF cuts off at lag 1 for MA(1), cuts off at lag 2 for MA(2), cuts off at lag q for MA(q); MA models
have “finite-memory”

52

Forecast – MA(1)
MA(1)

𝑟𝑟𝑡𝑡+1 = 𝑐𝑐0 + 𝑎𝑎𝑡𝑡+1 − 𝜃𝜃1𝑎𝑎𝑡𝑡

Take conditional expectations, note that 𝐸𝐸 𝑎𝑎𝑡𝑡+1 ℱ𝑡𝑡 = 0, 𝐸𝐸 𝑎𝑎𝑡𝑡 ℱ𝑡𝑡 = 𝑎𝑎𝑡𝑡
1-step-ahead forecast

�̂�𝑟𝑡𝑡 1 = 𝐸𝐸 𝑟𝑟𝑡𝑡+1 ℱ𝑡𝑡 = 𝑐𝑐0 − 𝜃𝜃1𝑎𝑎𝑡𝑡
𝑒𝑒𝑡𝑡 1 = 𝑟𝑟𝑡𝑡+1 − �̂�𝑟𝑡𝑡 1 = 𝑎𝑎𝑡𝑡+1

Var 𝑒𝑒𝑡𝑡 1 = 𝜎𝜎𝑎𝑎2

2-step-ahead forecast and beyond are the same as unconditional mean and variance
�̂�𝑟𝑡𝑡 2 = 𝐸𝐸 𝑟𝑟𝑡𝑡+2 ℱ𝑡𝑡 = 𝑐𝑐0

𝑒𝑒𝑡𝑡 2 = 𝑟𝑟𝑡𝑡+2 − �̂�𝑟𝑡𝑡 2 = 𝑎𝑎𝑡𝑡+2 − 𝜃𝜃1𝑎𝑎𝑡𝑡+1
Var 𝑒𝑒𝑡𝑡 2 = 1 + 𝜃𝜃1

2 𝜎𝜎𝑎𝑎2
53

Forecast – MA(2)
n-step forward of MA(2)

𝑟𝑟𝑡𝑡+𝑛𝑛 = 𝑐𝑐0 + 𝑎𝑎𝑡𝑡+𝑛𝑛 − 𝜃𝜃1𝑎𝑎𝑡𝑡+𝑛𝑛−1 − 𝜃𝜃2𝑎𝑎𝑡𝑡+𝑛𝑛−2
Forecast

�̂�𝑟𝑡𝑡 1 = 𝑐𝑐0 − 𝜃𝜃1𝑎𝑎𝑡𝑡 − 𝜃𝜃2𝑎𝑎𝑡𝑡−1
�̂�𝑟𝑡𝑡 2 = 𝑐𝑐0 − 𝜃𝜃2𝑎𝑎𝑡𝑡
�̂�𝑟𝑡𝑡 𝑛𝑛 = 𝑐𝑐0,𝑛𝑛 ≥ 3
Var 𝑒𝑒𝑡𝑡 1 = 𝜎𝜎𝑎𝑎2

Var 𝑒𝑒𝑡𝑡 2 = 1 + 𝜃𝜃1
2 𝜎𝜎𝑎𝑎2

Var 𝑒𝑒𝑡𝑡 𝑛𝑛 = 1 + 𝜃𝜃1
2 + 𝜃𝜃2

2 𝜎𝜎𝑎𝑎2,𝑛𝑛 ≥ 3

3-step-ahead forecast and beyond are the same as unconditional mean and variance
For an MA(q) model, multistep-ahead forecasts go to unconditional mean and
variance after first q steps

54

AR(p) vs. MA(q)

55

AR(p) MA(q)
ACF Gradual decline, no clear

cut
Cut off at lag q; useful to
determine q

PACF Cut off at lag p; useful to
determine p

Gradual decline, no clear
cut

Stationary Characteristic roots must
< 1 (explain later) Always stationary Forecast History has lasting effect to forecast Quickly mean revert after q-step ahead AR(p) vs. MA(q) Choose AR(2) as PACF cuts off at lag 2 56 ACF PACF AR(p) vs. MA(q) Choose MA(2) as ACF cuts off at lag 2 57 ACF PACF AR(p) vs. MA(q) AR has lasting effect of shock MA mean-reverts quickly 58 Autoregressive moving average (ARMA) model Problem: certain process can only be described by high-order AR or MA models with many parameters to estimate ARMA(p, q) Use low order combination of AR and MA to describe high-order process Reduce number of parameters to estimate (parsimony in parameterization) General form of ARMA(p, q) 𝑟𝑟𝑡𝑡 = 𝜙𝜙0 + � 𝑖𝑖=1 𝑝𝑝 𝜙𝜙𝑖𝑖𝑟𝑟𝑡𝑡−𝑖𝑖 + 𝑎𝑎𝑡𝑡 −� 𝑗𝑗=1 𝑞𝑞 𝜃𝜃𝑗𝑗𝑎𝑎𝑡𝑡−𝑗𝑗 1 − 𝜙𝜙1𝐵𝐵 −⋯− 𝜙𝜙𝑝𝑝𝐵𝐵𝑝𝑝 𝑟𝑟𝑡𝑡 = 𝜙𝜙0 + 1 − 𝜃𝜃1𝐵𝐵 −⋯− 𝜃𝜃𝑞𝑞𝐵𝐵𝑞𝑞 𝑎𝑎𝑡𝑡 where 𝑎𝑎𝑡𝑡 IID ∈ 𝑁𝑁 0,𝜎𝜎𝑎𝑎2 is a white noise 59 ARMA(1, 1) ARMA(1, 1) 𝑟𝑟𝑡𝑡 − 𝜙𝜙1𝑟𝑟𝑡𝑡−1 = 𝜙𝜙0 + 𝑎𝑎𝑡𝑡 − 𝜃𝜃1𝑎𝑎𝑡𝑡−1 Unconditional mean of ARMA(1, 1) is the same as AR(1) 𝐸𝐸 𝑟𝑟𝑡𝑡 − 𝜙𝜙1𝐸𝐸 𝑟𝑟𝑡𝑡−1 = 𝜙𝜙0 + 𝐸𝐸 𝑎𝑎𝑡𝑡 − 𝜃𝜃1𝐸𝐸 𝑎𝑎𝑡𝑡−1 𝐸𝐸 𝑟𝑟𝑡𝑡 = 𝜇𝜇 = 𝜙𝜙0 1 − 𝜙𝜙1 Unconditional variance: for convenience, remove 𝜙𝜙0 and note that 𝐸𝐸 𝑎𝑎𝑡𝑡−1𝑟𝑟𝑡𝑡−1 = 𝜎𝜎𝑎𝑎2 𝑟𝑟𝑡𝑡 − 𝜇𝜇 = 𝜙𝜙1 𝑟𝑟𝑡𝑡−1 − 𝜇𝜇 + 𝑎𝑎𝑡𝑡 − 𝜃𝜃1𝑎𝑎𝑡𝑡−1 Var 𝑟𝑟𝑡𝑡 = 𝜙𝜙1 2Var 𝑟𝑟𝑡𝑡−1 + 𝜎𝜎𝑎𝑎2 + 𝜃𝜃1 2𝜎𝜎𝑎𝑎2 − 2𝜙𝜙1𝜃𝜃1𝐸𝐸 𝑎𝑎𝑡𝑡−1 𝑟𝑟𝑡𝑡−1 − 𝜇𝜇 Var 𝑟𝑟𝑡𝑡 = 𝜎𝜎𝑎𝑎2 1 + 𝜃𝜃1 2 − 2𝜙𝜙1𝜃𝜃1 1 − 𝜙𝜙1 2 For stationary, 𝜙𝜙12 < 1, same requirement as AR(1) 60 ARMA(1, 1) – ACF To compute autocovariance, multiply by 𝑟𝑟𝑡𝑡−𝑙𝑙 − 𝜇𝜇 𝐸𝐸 𝑟𝑟𝑡𝑡 − 𝜇𝜇 𝑟𝑟𝑡𝑡−𝑙𝑙 − 𝜇𝜇 = 𝜙𝜙1𝐸𝐸 𝑟𝑟𝑡𝑡−1 − 𝜇𝜇 𝑟𝑟𝑡𝑡−𝑙𝑙 − 𝜇𝜇 + 𝐸𝐸 𝑎𝑎𝑡𝑡 𝑟𝑟𝑡𝑡−𝑙𝑙 − 𝜇𝜇 − 𝜃𝜃1𝐸𝐸 𝑎𝑎𝑡𝑡−1 𝑟𝑟𝑡𝑡−𝑙𝑙 − 𝜇𝜇 For l = 1 𝛾𝛾1 = 𝜙𝜙1𝛾𝛾0 − 𝜃𝜃1𝜎𝜎𝑎𝑎2 which is different from AR: 𝛾𝛾1 = 𝜙𝜙1𝛾𝛾0 For l ≥ 2, same as AR(1) 𝛾𝛾𝑙𝑙 = 𝜙𝜙1𝛾𝛾𝑙𝑙−1 ACF of ARMA(1, 1) behaves very much like AR(1) except that exponential decay starts with lag 2 instead of lag 1 𝜌𝜌1 = 𝜙𝜙1 − 𝜃𝜃1 𝜎𝜎𝑎𝑎2 𝛾𝛾0 , and 𝜌𝜌𝑙𝑙 = 𝜙𝜙1𝜌𝜌𝑙𝑙−1, 𝑙𝑙 ≥ 2 61 ARMA(1, 1) – ACF 62 ACF for ARMA(1, 1) ARMA(1, 1) – PACF PACF of ARMA(1, 1) model does not cut off at any finite lag either It behaves very much like MA(1) except that exponential decay starts with lag 2 instead of lag 1 ARMA(1, 1) can model higher order process, as both of its ACF and PACF decay without cut-off 63 ARMA(p, q) Start with ARMA(1, 1) and select small number for p and q Objective of ARMA modeling Higher likelihood and goodness of fit White noise residuals 64 Extension of ARMA model Autoregressive integrated moving average, ARIMA(p, d, q) Add integrated part I(d) to ARMA(p, q) I(d) is d order to difference, useful to convert nonstationary data to stationary ARIMA(p, 0, q) is ARMA(p, q) ARIMA(0, 1, 0) is random walk ARIMA(0, 0, 0) is white noise Seasonal autoregressive integrated moving average, SARIMA Add seasonal adjustment to ARIMA 65 Time-series algorithm vs. machine learning algorithms Time-series algorithm can extrapolate patterns and project future outside of the domain of training data, while many machine learning algorithms cannot Time-series algorithm can derive confidence intervals, while many machine learning models cannot, as they are not based on statistical distributions 66 Adapt machine learning algorithms to time series Time series can be captured by adding lagging response or predictors as additional predictors/features For example, when predicting real estate return, add lag 1 of real estate return as feature It is very important to check Input stationary before building the model (Dickey-Fuller) Unit root constraint White noise residual while assessing model (ACF, Ljung-Box, Q-Q plot) 67 Pitfall of using machine learning on time series Use lag response together with other predictors as features Model prediction looks great, matching real testing values 68 Pitfall of using machine learning on time series Out-of-sample R2 is amazing! 69 Pitfall of using machine learning on time series A closer look What the model is actually doing is that when predicting the value at time “t+1”, it simply uses the value at time “t” as its prediction Model simply uses the previous value as the prediction for the future 70 Cross-correlation between prediction and real value Pitfall of using machine learning on time series The original data is obviously nonstationary The true process is actually random walk, which has no forecastability The high R2 is fake Solution: convert nonstationary to stationary by differencing 71 Pitfall of using machine learning on time series After converting to stationary data, model shows no predictability 72 Cross validation Simple random cross validation does not consider time linkage in time series Sliding window or forward chaining cross validation should be used to keep temporal dependencies and prevent data leakage 73 Nested forward chain cross validation 74 Nested forward chain cross validation 75 Machine learning algorithms for time series Advanced machine learning and pattern finding algorithms can take time series directly, especially exploring non-linear relationship in time series Kalman filter and state-space model Hidden Markov chain and regime shift model Long short-term neural network (LSTM) 76 Section 3: Advanced Time Series Analysis – State-Space Model and Kalman Filter 77 State-Space Model A commonly used advanced model for dynamic systems (like dynamic time series) with unobserved and time varying state variables Measurement Equation: relationship between observed variables (y) and unobserved state variables (x) 𝑦𝑦𝑡𝑡 = 𝐻𝐻𝑡𝑡𝑥𝑥𝑡𝑡 + 𝑒𝑒𝑡𝑡 Transition Equation: describe the dynamics of the state variables 𝑥𝑥𝑡𝑡 = �𝜇𝜇 + 𝐹𝐹𝑥𝑥𝑡𝑡−1 + 𝜐𝜐𝑡𝑡 State-space model can capture various and complicated relationship in time series data 78 State-space representation of ARMA models ARMA models are special cases in state-space model Example, ARMA(2, 1), 𝑦𝑦𝑡𝑡 = 𝜙𝜙1𝑦𝑦𝑡𝑡−1 + 𝜙𝜙2𝑦𝑦𝑡𝑡−2 + 𝜔𝜔𝑡𝑡 + 𝜃𝜃𝜔𝜔𝑡𝑡−1 Convert to state-space representation Measurement Equation 𝑦𝑦𝑡𝑡 = 1 𝜃𝜃 𝑥𝑥1,𝑡𝑡 𝑥𝑥2,𝑡𝑡 Transition Equation 𝑥𝑥1,𝑡𝑡 𝑥𝑥2,𝑡𝑡 = 𝜙𝜙1 𝜙𝜙2 1 0 𝑥𝑥1,𝑡𝑡−1 𝑥𝑥2,𝑡𝑡−1 + 𝜔𝜔𝑡𝑡 0 79 How to find unobserved variables – Kalman Filter An optimal estimation algorithm One of the very first applications was the Project Apollo Widely use in guidance/navigation systems, computer vision systems and signal processing, and then brought to finance Not a coffee filter 80 • Information processing • Transformation Kalman Filter – an example A moving robot with two state variables (x): position (p) and velocity (v) Model (command sent to robot) may predict next position and velocity, but subject to error: wind, wheel slippery, bumpy terrain, etc. Measurement: GPS sensor may identify next position and velocity, but are indirect measure with some uncertainty or inaccuracy Goal: can we have better estimate of next position and velocity? 81 A happy robot Mission failed http://www.bzarg.com/p/how-a-kalman-filter-works-in-pictures/ http://www.bzarg.com/p/how-a-kalman-filter-works-in-pictures/ Kalman Filter example – variable distribution 82 �⃗�𝑥 = �⃗�𝑝 �⃗�𝑣 Variable distribution space, uncorrelated Variable distribution space, correlated Kalman Filter example – prediction from model 83 Model prediction with F Also need to incorporate uncertainty, Q Kalman Filter example – measurement 84 Measurement from indirect sensor, represented by H Kalman Filter example – measurement 85 Also need to consider sensor noise, R Kalman Filter example – combine model and measurement 86 Pink: prediction from model Green: Measurement from sensor Estimation from combination of the two (orange space) is more accurate Kalman Filter – workflow Prediction (formulas are for problem setup on Page 78, different problem setup will have similar but slightly different Kalman Filter formulas) 𝑥𝑥𝑡𝑡|𝑡𝑡−1 = �𝜇𝜇 + 𝐹𝐹𝑥𝑥𝑡𝑡−1|𝑡𝑡−1 𝑃𝑃𝑡𝑡|𝑡𝑡−1 = 𝐹𝐹𝑃𝑃𝑡𝑡−1|𝑡𝑡−1𝐹𝐹′ + 𝑄𝑄 𝜂𝜂𝑡𝑡|𝑡𝑡−1 = 𝑦𝑦𝑡𝑡 − 𝑦𝑦𝑡𝑡|𝑡𝑡−1 = 𝑦𝑦𝑡𝑡 − 𝐻𝐻𝑡𝑡𝑥𝑥𝑡𝑡|𝑡𝑡−1 𝑓𝑓𝑡𝑡|𝑡𝑡−1 = 𝐻𝐻𝑡𝑡𝑃𝑃𝑡𝑡|𝑡𝑡−1𝐻𝐻𝑡𝑡′ + 𝑅𝑅 87 Model projection of state Projection of covariance of x (P) with error covariance Q Compare with measurement from observation yt (η is prediction error) Conditional variance of prediction error (f) with error covariance R Kalman Filter – workflow Update 𝐾𝐾𝑡𝑡 = 𝑃𝑃𝑡𝑡|𝑡𝑡−1𝐻𝐻𝑡𝑡′𝑓𝑓𝑡𝑡|𝑡𝑡−1 −1 𝑥𝑥𝑡𝑡|𝑡𝑡 = 𝑥𝑥𝑡𝑡|𝑡𝑡−1 + 𝐾𝐾𝑡𝑡𝜂𝜂𝑡𝑡|𝑡𝑡−1 𝑃𝑃𝑡𝑡|𝑡𝑡 = 𝑃𝑃𝑡𝑡|𝑡𝑡−1 − 𝐾𝐾𝑡𝑡𝐻𝐻𝑡𝑡𝑃𝑃𝑡𝑡|𝑡𝑡−1 Iterate from beginning (t = 1) to end (t = T) 88 Kalman Gain, weight assigned to new information about xt contained in prediction error Update state variables Update covariance of state variables Kalman Filter – Kalman Gain Kalman gain detainments weights between xt|t-1 and new information contained in prediction error ηt|t-1 A further transformation of Kalman gain 𝐾𝐾𝑡𝑡 = 𝑃𝑃𝑡𝑡|𝑡𝑡−1𝐻𝐻𝑡𝑡′ 𝐻𝐻𝑡𝑡𝑃𝑃𝑡𝑡|𝑡𝑡−1𝐻𝐻𝑡𝑡′ + 𝑅𝑅 −1 𝑃𝑃𝑡𝑡|𝑡𝑡−1𝐻𝐻𝑡𝑡′ is portion of prediction error variance due to uncertainty in xt|t-1, R is the portion of prediction error variance due to random shock et Uncertainty in xt|t-1 ↑ (𝑃𝑃𝑡𝑡|𝑡𝑡−1 ↑), weight in ηt|t-1 ↑ Uncertainty in et ↑ (R ↑), weight in ηt|t-1 ↓ Kalman gain put more weight on the portion with more accurate information 89 90 MFIN 290 Application of Machine Learning in Finance: Lecture 7 Agenda Section 1: Basic Time Series Analysis, Part I What is time series data Stationarity Weak stationarity Weak stationarity Weak stationarity Weak stationarity Why care about stationarity Autocorrelation Autocorrelation in stock returns? Autocorrelation in real estate returns? Autocorrelation function (ACF) Autocorrelation function (ACF) How to test autocorrelation – Ljung-Box test White noise Random walk Markov property Martingale property Random walk – non-stationary No forecastability Autoregression (AR) model AR(1) – properties AR(1) – unconditional mean AR(1) – unconditional variance AR(1) – autocorrelation AR(1) – ACF AR(1) – unit root AR(1) – unit root AR(2) AR(2) – ACF Characteristic roots Unit root Unit root – Dickey-Fuller test Unit root – Augmented Dickey-Fuller test AR(p) Select order p Partial autocorrelation function (PACF) PACF vs. ACF Information criteria Check residuals Check residuals Forecast – 1-step ahead Forecast – 2-step ahead Forecast – n-step ahead Section 2: Basic Time Series Analysis, Part II Moving average (MA) model – AR(∞) → MA(1) AR(∞) → MA(1) MA(q) Stationarity ACF Forecast – MA(1) Forecast – MA(2) AR(p) vs. MA(q) AR(p) vs. MA(q) AR(p) vs. MA(q) AR(p) vs. MA(q) Autoregressive moving average (ARMA) model ARMA(1, 1) ARMA(1, 1) – ACF ARMA(1, 1) – ACF ARMA(1, 1) – PACF ARMA(p, q) Extension of ARMA model Time-series algorithm vs. machine learning algorithms Adapt machine learning algorithms to time series Pitfall of using machine learning on time series Pitfall of using machine learning on time series Pitfall of using machine learning on time series Pitfall of using machine learning on time series Pitfall of using machine learning on time series Cross validation Nested forward chain cross validation Nested forward chain cross validation Machine learning algorithms for time series Section 3: Advanced Time Series Analysis – State-Space Model and Kalman Filter State-Space Model State-space representation of ARMA models How to find unobserved variables – Kalman Filter Kalman Filter – an example Kalman Filter example – variable distribution Kalman Filter example – prediction from model Kalman Filter example – measurement Kalman Filter example – measurement Kalman Filter example – combine model and measurement Kalman Filter – workflow Kalman Filter – workflow Kalman Filter – Kalman Gain 幻灯片编号 90