程序代写代做代考 ER AI finance scheme chain algorithm GMM matlab database Bayesian data mining Lecture 1: Introduction to Forecasting

Lecture 1: Introduction to Forecasting
UCSD, January 9 2017

Allan Timmermann1

1UC San Diego

Timmermann (UCSD) Forecasting Winter, 2017 1 / 64

1 Course objectives

2 Challenges facing forecasters

3 Forecast Objectives: the Loss Function

4 Common Assumptions on Loss

5 Specific Types of Loss Functions

6 Multivariate loss

7 Does the loss function matter?

8 Informal Evaluation Methods

9 Out-of-Sample Forecast Evaluation

10 Some easy and hard to predict variables

11 Weak predictability but large economic gains
Timmermann (UCSD) Forecasting Winter, 2017 2 / 64

Course objectives: Develop

Skills in analyzing, modeling and working with time series data from
finance and economics

Ability to construct forecasting models and generate forecasts

formulating a class of models – using information intelligently
model selection
estimation – making best use of historical data

Develop creativity in posing forecasting questions, collecting and
using often incomplete data

which data help me build a better forecasting model?

Ability to critically evaluate and compare forecasts

reasonable (simple) benchmarks
skill or luck? Overfitting (data mining)
Compete or combine?

Timmermann (UCSD) Forecasting Winter, 2017 2 / 64

Ranking forecasters: Mexican inflation

Timmermann (UCSD) Forecasting Winter, 2017 3 / 64

Forecast situations

Forecasts are used to guide current decisions that affect the future
welfare of a decision maker (forecast user)

Predicting my grade – updating information on the likely grade as the
course progresses
Choosing between a fixed-rate mortgage (interest rate fixed for 20
years) versus a floating-rate (variable) mortgage

Depends on interest rate and inflation forecast

Political or sports outcomes – prediction markets
Investing in the stock market. How volatile will the stock market be?
Predicting Chinese property prices. Supply and demand considerations,
economic growth

Structural versus reduced-form approaches
Depends on the forecast horizon: 1 month vs 10 years

Timmermann (UCSD) Forecasting Winter, 2017 4 / 64

Forecasting and decisions

Credit card company deciding which transactions are potentially
fraudulent and should be denied (in real time)

requires fitting a model to past credit card transactions
binary data (zero-one)

Central Bank predicting the state of the economy – timing issues

Predicting which fund manager (if any) or asset class will outperform

Forecasting the outcome of the world cup:
http://www.goldmansachs.com/our-thinking/outlook/world-cup-
sections/world-cup-book-2014-statistical-model.html

Timmermann (UCSD) Forecasting Winter, 2017 5 / 64

Forecasting the outcome of the world cup

Timmermann (UCSD) Forecasting Winter, 2017 6 / 64

Key issues

Decision maker’s actions depend on predicted future outcomes

Trade off relative costs of over- or underpredicting outcomes
Actions and forecasts are inextricably linked

good forecasts are expected to lead to good decisions
bad forecasts are expected to lead to poor decisions

Forecast is an intermediate input in a decision process, rather than an
end product of separate interest

Loss function weighs the cost of possible forecast errors – like a utility
function uses preferences to weigh different outcomes

Timmermann (UCSD) Forecasting Winter, 2017 7 / 64

Loss functions

Forecasts play an important role in almost all decision problems where
a decision maker’s utility or wealth is affected by his current and
future actions and depend on unknown future events

Central Banks

Forecast inflation, unemployment, GDP growth
Action: interest rate; monetary policy
Trade off cost of over- vs. under-predictions

Firms

Forecast sales
Action: production level, new product launch
Trade off inventory vs. stock-out/goodwill costs

Money managers

Forecast returns (mean, variance, density)
Action: portfolio weights/trading strategy
Trade off Risk vs. return

Timmermann (UCSD) Forecasting Winter, 2017 8 / 64

Ways to generate forecasts

Rule of thumb. Simple decision rule that is not optimal, but may be
robust

Judgmental/subjective forecast, e.g., expert opinion

Combine with other information/forecasts

Quantitative models

“… an estimated forecasting model provides a characterization of what
we expect in the present, conditional upon the past, from which we
infer what to expect in the future, conditional upon the present and the
past. Quite simply, we use the estimated forecasting model to
extrapolate the observed historical data.” (Frank Diebold, Elements of
Forecasting).

Combine different types of forecasts

Timmermann (UCSD) Forecasting Winter, 2017 9 / 64

Forecasts: key considerations

Forecasting models are simplified approximations to a complex reality

How do we make the right shortcuts?
Which methods seem to work in general or in specific situations?

Economic theory may suggest relevant predictor variables, but is silent
about functional form, dynamics of forecasting model

combine art (judgment) and science
how much can we learn from the past?

Timmermann (UCSD) Forecasting Winter, 2017 10 / 64

Forecast object – what are we trying to forecast?

Event outcome: predict if a certain event will happen

Will a bank or hedge fund close?
Will oil prices fall below $40/barrel in 2017?
Will Europe experience deflation in 2017?

Event timing: it is known that an event will happen, but unknown
when it will occur

When will US stocks enter a “bear” market (Dow drops by 10%)?

Time-series: forecasting future values of a continuous variable by
means of current and past data

Predicting the level of the Dow Jones Index on March 15, 2017

Timmermann (UCSD) Forecasting Winter, 2017 11 / 64

Forecast statement

Point forecast

Single number summarizing “best guess”. No information on how
certain or precise the point forecast is. Random shocks affect all
time-series so a non-zero forecast error is to be expected even from a
very good forecast
Ex: US GDP growth for 2017 is expected to be 2.5%

Interval forecast

Lower and upper bound on outcome. Gives a range of values inside
which we expect the outcome will fall with some probability (e.g., 50%
or 95%). Confidence interval for the predicted variable. Length of
interval conveys information about forecast uncertainty.
Ex: 90% chance US GDP growth will fall between 1% and 4%

Density or probability forecast

Entire probability distribution of the future outcome
Ex: US GDP growth for 2017 is Normally distributed N(2.5,1)

Timmermann (UCSD) Forecasting Winter, 2017 12 / 64

Forecast horizon

The best forecasting model is likely to depend on whether we are
forecasting 1 minute, 1 day, 1 month or 1 year ahead

We refer to an h−step-ahead forecast, where h (short for “horizon”)
is the number of time periods ahead that we predict

Often you hear the argument that “fundamentals matter in the long
run, psychological factors are more important in the short run”

Timmermann (UCSD) Forecasting Winter, 2017 13 / 64

Information set

Do we simply use past values of a series itself or do we include a
larger information set?
Suppose we wish to forecast some outcome y for period T + 1 and
have historical data on this variable from t = 1, ..,T . The univariate
information set consists of the series itself up to time T :

IunivariateT = {y1, …, yT }

If data on other series, zt (typically an N × 1 vector), are available,
we have a multivariate information set

ImultivariateT = {y1, …, yT , z1, …, zT }

It is often important to establish whether a forecast can benefit from
using such additional information

Timmermann (UCSD) Forecasting Winter, 2017 14 / 64

Loss function: notations

Outcome: Y

Forecast: f

Forecast error: e = Y − f
Observed data: Z

Loss function: L(f ,Y )→ R
maps inputs f ,Y to the real number line R
yields a complete ordering of forecasts
describes in relative terms how costly it is to make forecast errors

Timmermann (UCSD) Forecasting Winter, 2017 15 / 64

Loss Function Considerations

Choice of loss function that appropriately measures trade-offs is
important for every facet of the forecasting exercise and affects

which forecasting models are preferred
how parameters are estimated
how forecasts are evaluated and compared

Loss function reflects the economics of the decision problem

Financial analysts’forecasts; Hong and Kubik (2003), Lim (2001)

Analysts tend to bias their earnings forecasts (walk-down effect)

Sometimes a forecast is best viewed as a signal in a strategic game
that explicitly accounts for the forecast provider’s incentives

Timmermann (UCSD) Forecasting Winter, 2017 16 / 64

Constructing a loss function

For profit maximizing investors the natural choice of loss is the
function relating payoffs (through trading rule) to the forecast and
realized returns

Link between loss and utility functions: both are used to minimize risk
arising from economic decisions

Loss is sometimes viewed as the negative of utility

U(f ,Y ) ≈ −L(Y , f )

Majority of forecasting papers use simple ‘off the shelf’statistical loss
functions such as Mean Squared Error (MSE)

Timmermann (UCSD) Forecasting Winter, 2017 17 / 64

Common Assumptions on Loss

Granger (1999) proposes three ‘required’properties for error loss
functions, L(f , y) = L(y − f ) = L(e):

A1. L(0) = 0 (minimal loss of zero for perfect forecast);
A2. L(e) ≥ 0 for all e;
A3. L(e) is monotonically non-decreasing in |e| :

L(e1) ≥ L(e2) if e1 > e2 > 0
L(e1) ≥ L(e2) if e1 < e2 < 0 A1: normalization A2: imperfect forecasts are more costly than perfect ones A3: regularity condition - bigger forecast mistakes are (weakly) costlier than smaller mistakes (of same sign) Timmermann (UCSD) Forecasting Winter, 2017 18 / 64 Additional Assumptions on Loss Symmetry: L(y − f , y) = L(y + f , y) Granger and Newbold (1986, p. 125): “.. an assumption of symmetry about the conditional mean ... is likely to be an easy one to accept ... an assumption of symmetry for the cost function is much less acceptable.” Homogeneity: for some positive function h(a) : L(ae) = h(a)L(e) scaling doesn’t matter Differentiability of loss with respect to the forecast (regularity condition) Timmermann (UCSD) Forecasting Winter, 2017 19 / 64 Squared Error (MSE) Loss L(e) = ae2, a > 0

Satisfies the three Granger properties
Homogenous, symmetric, differentiable everywhere
Convex: penalizes large forecast errors at an increasing rate
Optimal forecast:

f ∗ = arg
f
min


(y − f )2pY dy

First order condition

f ∗ =

ypY dy = E (y)

The optimal forecast under MSE loss is the conditional mean

Timmermann (UCSD) Forecasting Winter, 2017 20 / 64

Piece-wise Linear (lin-lin) Loss

L(e) = (1− α)e1e>0 − αe1e≤0, 0 < α < 1 1e>0 = 1 if e > 0, otherwise 1e>0 = 0. Indicator variable

Weight on positive forecast errors: (1− α)
Weight on negative forecast errors: α

Lin-lin loss satisfies the three Granger properties and is homogenous
and differentiable everywhere with regard to f , except at zero

Lin-lin loss does not penalize large errors as much as MSE loss

Mean absolute error (MAE) loss arises if α = 1/2:

L(e) = |e|

Timmermann (UCSD) Forecasting Winter, 2017 21 / 64

MSE vs. piece-wise Linear (lin-lin) Loss

-3 -2 -1 0 1 2 3
0

5

10
L(

e)

e

α = 0.25

-3 -2 -1 0 1 2 3
0

5

10

L(
e)

e

α = 0.5, MAE loss

-3 -2 -1 0 1 2 3
0

5

10

L(
e)

e

α = 0.75

MSE

linlin

MSE

linlin

MSE

linlin

Timmermann (UCSD) Forecasting Winter, 2017 22 / 64

Optimal forecast under lin-lin Loss

Expected loss under lin-lin loss:

EY [L(Y − f )] = (1− α)E [Y |Y > f ]− αE [Y |Y ≤ f ]

First order condition:
f ∗ = P−1Y (1− α)

PY : CDF of Y
The optimal forecast is the (1− α) quantile of Y
α = 1/2 : optimal forecast is the median of Y
As α increases towards one, the optimal forecast moves further to the
left of the tail of the predicted outcome distribution

Timmermann (UCSD) Forecasting Winter, 2017 23 / 64

Optimal forecast of N(0,1) variable under lin-lin loss

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

α

f*

Timmermann (UCSD) Forecasting Winter, 2017 24 / 64

Linex Loss

L(e) = exp(a2e)− a2e − 1, a2 6= 0

Differentiable everywhere

Asymmetric: a2 controls both the degree and direction of asymmetry

a2 > 0 : loss is approximately linear for e < 0 and approximately exponential for e > 0

Large underpredictions are very costly (f < y , so e = y − f > 0)

Converse is true when a2 < 0 Timmermann (UCSD) Forecasting Winter, 2017 25 / 64 MSE versus Linex Loss -3 -2 -1 0 1 2 3 0 5 10 15 20 L( e) e right-skewed linex loss with a 2 =1 -3 -2 -1 0 1 2 3 0 5 10 15 20 L( e) e left-skewed linex loss with a 2 =-1 MSE Linex MSE Linex Timmermann (UCSD) Forecasting Winter, 2017 26 / 64 Linex Loss Suppose Y ∼ N(µY , σ 2 Y ). Then E [L(e)] = exp(a2(µY − f ) + a22 2 σ2Y )− a2(µY − f ) Optimal forecast: f ∗ = µY + a2 2 σ2Y Under linex loss, the optimal forecast depends on both the mean and variance of Y (µY and σ 2 Y ) as well as on the curvature parameter of the loss function, a2 Timmermann (UCSD) Forecasting Winter, 2017 27 / 64 Optimal bias under Linex Loss for N(0,1) variable -3 -2 -1 0 1 2 3 0 0.2 0.4 e MSE loss -3 -2 -1 0 1 2 3 0 0.2 0.4 e linex loss with a 2 =1 -3 -2 -1 0 1 2 3 0 0.2 0.4 e linex loss with a 2 =-1 Timmermann (UCSD) Forecasting Winter, 2017 28 / 64 Multivariate Loss Functions Multivariate MSE loss with n errors e = (e1, ..., en)′ : MSE (A) = e ′Ae A is a nonnegative and positive definite n× n matrix This satisfies the basic assumptions for a loss function When A = In, covariances can be ignored and the loss function simplifies to MSE (In) = E [e ′e] = ∑ n i=1 e 2 i , i.e., the sum of the individual mean squared errors Timmermann (UCSD) Forecasting Winter, 2017 29 / 64 Does the loss function matter? Cenesizoglu and Timmermann (2012) compare statistical and economic measures of forecasting performance across a large set of stock return prediction models with time-varying mean and volatility Economic performance is measured through the certainty equivalent return (CER), i.e., the risk-adjusted return Statistical performance is measured through mean squared error (MSE) Performance is measured relative to that of a constant expected return (prevailing mean) benchmark Common for forecast models to produce worse mean squared error (MSE) but better return performance than the benchmark Relation between statistical and economic measures of forecasting performance can be weak Timmermann (UCSD) Forecasting Winter, 2017 30 / 64 Does loss function matter? Cenesizoglu and Timmermann Timmermann (UCSD) Forecasting Winter, 2017 31 / 64 Percentage of models with worse statistical but better economic performance than prevailing mean (CT, 2012) CER is certainty equivalent return Sharpe is the Sharpe ratio RAR is risk-adjusted return RMSE is root mean squared (forecast) error Timmermann (UCSD) Forecasting Winter, 2017 32 / 64 Example: Directional Trading system Consider the decisions of a risk-neutral ‘market timer’whose utility is linear in the return on the market portfolio (y) U(δ(f ), y) = δy Investor’s decision rule, δ(f ) : go ‘long’one unit in the risky asset if a positive return is predicted (f > 0), otherwise go short one unit:

δ(f ) =
{

1 if f ≥ 0
−1 if f < 0 Let sign(y) = 1, if y > 0, otherwise sign(y) = 0. Payoff:

U(y , δ(f )) = (2sign(f )− 1)y

Sign and magnitude of y and sign of f matter to trader’s utility

Timmermann (UCSD) Forecasting Winter, 2017 33 / 64

Example: Directional Trading system (cont.)

Which forecast approach is best under the directional trading rule?

Since the trader ignores information about the magnitude of the
forecast, an approach that focuses on predicting only the sign of the
excess return could make sense

Leitch and Tanner (1991) studied forecasts of T-bill futures:

Professional forecasters reported predictions with higher mean squared
error (MSE) than those from simple time-series models

Puzzling since the time-series models incorporate far less information
than the professional forecasts

When measured by their ability to generate profits or correctly forecast
the direction of future interest rate movements the professional
forecasters did better than the time-series models
Professional forecasters’objectives are poorly approximated by MSE
loss – closer to directional or ‘sign’loss

Timmermann (UCSD) Forecasting Winter, 2017 34 / 64

Common estimates of forecasting performance

Define the forecast error et+h|t = yt+h − ft+h|t . Then

MSE = T−1
T


t=1
e2t+h|t

RMSE =

√√√√T−1 T∑
t=1
e2
t+h|t

MAE = T−1
T


t=1
|et+h|t |

Directional accuracy (DA): let Ixt+1>0 = 1 if xt+1 > 0, otherwise
Ixt+1>0 = 0. Then an estimate of DA is

DA = T−1
T


t=1
Iyt+h×ft+h|t>0

Timmermann (UCSD) Forecasting Winter, 2017 35 / 64

Forecast evaluation

ft+h|t : forecast of yt+h given information available at time t
Given a sequence of forecasts, ft+h|t , and outcomes, yt+h,
t = 1, …,T , it is natural to ask if the forecast was “optimal”or
obviously deficient

Questions posed by forecast evaluation are related to the
measurement of predictive accuracy

Absolute performance measures the accuracy of an individual
forecast relative to the outcome, using either an economic
(loss-based) or a statistical metric

Relative performance compares the performance of one or several
forecasts against some benchmark

Timmermann (UCSD) Forecasting Winter, 2017 36 / 64

Forecast evaluation (cont.)

Forecast evaluation amounts to understanding if the loss from a given
forecast is “small enough”

Informal methods – graphical plots, decompositions
Formal methods – distribution of test statistic for sample averages of
loss estimates can depend on how the forecasts were constructed, e.g.
which estimation method was used

The method (not only the model) used to construct the forecast
matters – expanding vs. rolling estimation window

Formal evaluation of an individual forecast requires testing whether
the forecast is optimal with respect to some loss function and a
specific information set

Rejection of forecast optimality suggests that the forecast can be
improved

Timmermann (UCSD) Forecasting Winter, 2017 37 / 64

Effi cient Forecast: Definition

A forecast is effi cient (optimal) if no other forecast using the available
data, xt ∈ It , can be used to generate a smaller expected loss
Under MSE loss:

f̂ ∗t+h|t = arg
f̂ (xt )

minE
[
(yt+h − f̂ (xt ))2

]

If we can use information in It to produce a more accurate forecast,
then the original forecast would be suboptimal

Effi ciency is conditional on the information set

weak form forecast effi ciency tests include only past forecasts and
past outcomes It = {yt , yt−1, …, f̂t |t−1, et |t−1, …}
strong form effi ciency tests extend this to include all other variables
xt ∈ It

Timmermann (UCSD) Forecasting Winter, 2017 38 / 64

Optimality under MSE loss

First order condition for an optimal forecast under MSE loss:

E [
∂(yt+h − ft+h|t )2

∂ft+h|t
] = −2E

[
yt+h − ft+h|t

]
= −2E

[
et+h|t

]
= 0

Similarly, conditional on information at time t, It :
E [et+h|t |It ] = 0

The expected value of the forecast error must equal zero given
current information, It
Test E [et+h|txt ] = 0 for all variables xt ∈ It known at time t
If the forecast is optimal, no variable known at time t can predict its
future forecast error et+h|t . Otherwise the forecast wouldn’t be
optimal
If I can predict that my forecast will be too low, I should increase my
forecast
Timmermann (UCSD) Forecasting Winter, 2017 39 / 64

Optimality properties under Squared Error Loss

1 Optimal forecasts are unbiased: the forecast error et+h|t has zero
mean, both conditionally and unconditionally:

E [et+h|t ] = E [et+h|t |It ] = 0

2 h-period forecast errors (et+h|t) are uncorrelated with information
available at the time the forecast was computed (It). In particular,
single-period forecast errors, et+1|t , are serially uncorrelated:

E [et+1|tet |t−1] = 0

3 The variance of the forecast error (et+h|t) increases (weakly) in the
forecast horizon, h :

Var(et+h+1|t ) ≥ Var(et+h|t ) for all h ≥ 1

Timmermann (UCSD) Forecasting Winter, 2017 40 / 64

Optimality properties under Squared Error Loss (cont.)

Forecasts should be unbiased. Why? If they were biased, we could
improve the forecast simply by correcting for the bias

Suppose ft+1|t is biased:

yt+1 = 1+ ft+1|t + εt+1, εt+1 ∼ WN(0, σ
2)

The bias-corrected forecast:

f ∗t+1|t = 1+ ft+1|t

is more accurate than ft+1|t
Forecast errors should be unpredictable:

Suppose yt+1 − ft+1|t = et+1 = 0.5et + εt+1 so the one-step forecast
error is serially correlated
Adding back 0.5et to the original forecast yields a more accurate
forecast: f ∗t+1|t = ft+1|t + 0.5et is better than f


t+1|t

Variance of forecast error increases in the forecast horizon
We learn more information as we get closer to the forecast “target”

Timmermann (UCSD) Forecasting Winter, 2017 41 / 64

Informal evaluation methods (Greenbook forecasts)

Time-series graph of forecasts and outcomes {ft+h|t , yt+h}Tt=1

1965 1970 1975 1980 1985 1990 1995 2000 2005 2010
-10

-5

0

5

10
GDP growth

time

an
nu

al
iz

ed
c

ha
ng

e

Actual
Forecast

1965 1970 1975 1980 1985 1990 1995 2000 2005 2010
0

2

4

6

8

10

12

14
inflation rate

time, t

an
nu

al
iz

ed
c

ha
ng

e

Actual
Forecast

Timmermann (UCSD) Forecasting Winter, 2017 42 / 64

Informal evaluation methods (Greenbook forecasts)

Scatterplots of {ft+h|t , yt+h}Tt=1

-10 -8 -6 -4 -2 0 2 4 6 8 10
-10

-5

0

5

10
GDP growth

forecast

ac
tu

al

0 5 10 15
0

5

10

15
inflation rate

forecast

ac
tu

al

Timmermann (UCSD) Forecasting Winter, 2017 43 / 64

Informal evaluation methods (Greenbook Forecasts)

Plots of ft+h|t − yt against yt+h − yt : directional accuracy

-15 -10 -5 0 5 10 15
-10

-5

0

5

10

forecast

ac
tu

al
GDP growth

-10

-5

0

5

10

-15 -10 -5 0 5 10 15

-4 -3 -2 -1 0 1 2 3 4
-6

-4

-2

0

2

4

6

forecast

ac
tu

al

inflation rate

-6

-4

-2

0

2

4

6

-4 -3 -2 -1 0 1 2 3 4

Timmermann (UCSD) Forecasting Winter, 2017 44 / 64

Informal evaluation methods (Greenbook forecasts)

Plot of forecast errors et+h = yt+h − ft+h|t

1965 1970 1975 1980 1985 1990 1995 2000 2005 2010
-5

0

5

10
fo

re
ca

st
e

rr
or

GDP growth

1965 1970 1975 1980 1985 1990 1995 2000 2005 2010
-4

-2

0

2

4

6

fo
re

ca
st

e
rr

or

time, t

Inflation rate

Timmermann (UCSD) Forecasting Winter, 2017 45 / 64

Informal evaluation methods

Theil (1961) suggested the following decomposition:

E [y − f ]2 = E [(y − Ey)− (f − Ef ) + (Ey − Ef )]2

= (Ey − Ef )2 + (σy − σf )2 + 2σyσf (1− ρ)

MSE depends on

squared bias (Ey − Ef )2
squared differences in standard deviations (σy − σf )2
correlation between the forecast and outcome ρ

Timmermann (UCSD) Forecasting Winter, 2017 46 / 64

Pseudo out-of-sample Forecasts

Simulated (“pseudo”) out-of-sample (OoS) forecasts seek to mimic
the “real time”updating underlying most forecasts

What would a forecaster have done (historically) at a given point in
time?

Method splits data into an initial estimation sample (in-sample
period) and a subsequent evaluation sample (OoS period)

Forecasts are based on parameter estimates that use data only up to
the date when the forecast is computed

As the sample expands, the model parameters get updated, resulting
in a sequence of forecasts

Why do out-of-sample forecasting?

control for data mining – harder to “game”
feasible in real time (less “look-ahead” bias)

Timmermann (UCSD) Forecasting Winter, 2017 47 / 64

Pseudo out-of-sample forecasts (cont.)

Out-of-sample (OoS) forecasts impose the constraint that the
parameter estimates of the forecasting model only use information
available at the time the forecast was computed

Only information known at time t can be used to estimate and select
the forecasting model and generate forecasts ft+h|t
Many variants of OoS forecast estimation methods exist. These can
be illustrated for the linear regression model

yt+1 = β
′xt + εt+1

f̂t+1|t = β̂

txt

β̂t =

(
t


s=1

ω(s, t)xs−1x

s−1

)−1 (
t


s=1

ω(s, t)xs−1y

s

)

Different methods use different weighting functions ω(s, t)

Timmermann (UCSD) Forecasting Winter, 2017 48 / 64

Expanding window

Expanding or recursive estimation windows put equal weight on all
observations s = 1, …, t to estimate the parameters of the model:

ω(s, t) =
{
1 1 ≤ s ≤ t
0 otherwise

As time progresses, the estimation sample grows larger, It ⊆ It+1
If the parameters of the model do not change (“stationarity”), the
expanding window approach makes effi cient use of the data and leads
to consistent parameter estimates

If model parameters are subject to change, the approach leads to
biased forecasts

The approach works well empirically due to its use of all available
data which reduces the effect of estimation error on the forecasts

Timmermann (UCSD) Forecasting Winter, 2017 49 / 64

Expanding window

1 t t+1 t+2 T-1
time

Timmermann (UCSD) Forecasting Winter, 2017 50 / 64

Rolling window

Rolling window uses an equal-weighted kernel of the most recent ω̄
observations to estimate the parameters of the forecasting model

ω(s, t) =
{
1 t − ω̄+ 1 ≤ s ≤ t
0 otherwise

Only one ‘design’parameter: ω̄ (length of window)

Practical way to account for slowly-moving changes to the data
generating process

Does this address “breaks”?

window too long immediately after breaks
window too short further away

Timmermann (UCSD) Forecasting Winter, 2017 51 / 64

Rolling window

t-w+1 t-w+2 t t+1 t+2 T-1
time

Timmermann (UCSD) Forecasting Winter, 2017 52 / 64

Fixed window

Fixed window uses only the first ω̄0 observations to once and for all
estimate the parameters of the forecasting model

ω(s, t) =
{
1 1 ≤ s ≤ ω̄0
0 otherwise

This method is typically employed when the costs of estimation are
very high, so re-estimating the model with new data is prohibitively
expensive or impractical in real time

The method also makes analytical results easier

Timmermann (UCSD) Forecasting Winter, 2017 53 / 64

Fixed window

1 w t t+1 t+2 T-1
time

Timmermann (UCSD) Forecasting Winter, 2017 54 / 64

Exponentially declining weights

In the presence of model instability, it is common to discount past
observations using weights that get smaller, the older the data

Exponentially declining weights take the following form:

ω(s, t) =
{

λt−s 1 ≤ s ≤ t
0 otherwise

0 < λ < 1. This method is sometimes called discounted least squares as the discount factor, λ, puts less weight on past observations Timmermann (UCSD) Forecasting Winter, 2017 55 / 64 Comparisons Expanding estimation window: number of observations available for estimating model parameters increases with the sample size Effect of estimation error gets reduced Fixed/rolling/discounted window: parameter estimation error continues to affect the forecasts even as the sample grows large model parameters are inconsistent Forecasts vary more under the short (fixed and rolling) estimation windows than under the expanding window Timmermann (UCSD) Forecasting Winter, 2017 56 / 64 US stock index Timmermann (UCSD) Forecasting Winter, 2017 57 / 64 Monthly US stock returns Timmermann (UCSD) Forecasting Winter, 2017 58 / 64 Monthly inflation Timmermann (UCSD) Forecasting Winter, 2017 59 / 64 US T-bill rate Timmermann (UCSD) Forecasting Winter, 2017 60 / 64 US Stock market volatility Timmermann (UCSD) Forecasting Winter, 2017 61 / 64 Example: Portfolio Choice under Mean-Variance Utility T-bills with known payoff rf vs stocks with uncertain return r s t+1 and excess return rt+1 = r st+1 − rf Wt = $1 : Initial wealth ωt : portion of portfolio held in stocks at time t (1−ωt ) : portion of portfolio held in Tbills Wt+1 : future wealth Wt+1 = (1−ωt )rf +ωt (rt+1 + rf ) = rf +ωt rt+1 Investor chooses ωt to maximize mean-variance utility: Et [U(Wt+1)] = Et [Wt+1]− A 2 Vart (Wt+1) Et [Wt+1] and Vart (Wt+1) : conditional mean and variance of Wt+1 Timmermann (UCSD) Forecasting Winter, 2017 62 / 64 Portfolio Choice under Mean-Variance Utility (cont.) Suppose stock returns follow the process rt+1 = µ+ xt + εt+1 xt ∼ (0, σ2x ), εt+1 ∼ (0, σ2ε ), cov(xt , εt+1) = 0 xt : predictable component given information at t εt+1 : unpredictable innovation (shock) Uninformed investor’s (no information on xt) stock holding: ω∗t = arg ωt max { ωtµ+ rf − A 2 ω2t (σ 2 x + σ 2 ε ) } = µ A(σ2x + σ 2 ε ) E [U(Wt+1(ω ∗ t ))] = rf + µ2 2A(σ2x + σ 2 ε ) = rf + S2 2A S = µ/ √ σ2x + σ 2 ε : unconditional Sharpe ratio Timmermann (UCSD) Forecasting Winter, 2017 63 / 64 Portfolio Choice under Mean-Variance Utility (cont.) Informed investor knows xt . His stock holdings are ω∗t = µ+ xt Aσ2ε Et [U(Wt+1(ω ∗ t ))] = rf + (µ+ xt )2 2Aσ2ε Average (unconditional expectation) value of this is E [Et [U(Wt+1(ω ∗ t ))]] = rf + µ2 + σ2x 2Aσ2ε Increase in expected utility due to knowing the predictor variable: E [U inf ]− E [Uun inf ] = σ2x 2Aσ2ε = R2 2A(1− R2) Plausible empirical numbers, i.e., R2 = 0.005, and A = 3, give an annualized certainty equivalent return of about 1% Timmermann (UCSD) Forecasting Winter, 2017 64 / 64 Lecture 2: Univariate Forecasting Models UCSD, January 18 2017 Allan Timmermann1 1UC San Diego Timmermann (UCSD) ARMA Winter, 2017 1 / 59 1 Introduction to ARMA models 2 Covariance Stationarity and Wold Representation Theorem 3 Forecasting with ARMA models 4 Estimation and Lag Selection for ARMA Models Choice of Lag Order 5 Random walk model 6 Trend and Seasonal Components Seasonal components Trended Variables Timmermann (UCSD) ARMA Winter, 2017 2 / 59 Introduction: ARMA models When building a forecasting model for an economic or financial variable, the variable’s own past time series is often the first thing that comes to mind Many time series are persistent Effect of past and current shocks takes time to evolve Auto Regressive Moving Average (ARMA) models Work hors