Lecture 1: Introduction to Forecasting
UCSD, January 9 2017
Allan Timmermann1
1UC San Diego
Timmermann (UCSD) Forecasting Winter, 2017 1 / 64
1 Course objectives
2 Challenges facing forecasters
3 Forecast Objectives: the Loss Function
4 Common Assumptions on Loss
5 Specific Types of Loss Functions
6 Multivariate loss
7 Does the loss function matter?
8 Informal Evaluation Methods
9 Out-of-Sample Forecast Evaluation
10 Some easy and hard to predict variables
11 Weak predictability but large economic gains
Timmermann (UCSD) Forecasting Winter, 2017 2 / 64
Course objectives: Develop
Skills in analyzing, modeling and working with time series data from
finance and economics
Ability to construct forecasting models and generate forecasts
formulating a class of models – using information intelligently
model selection
estimation – making best use of historical data
Develop creativity in posing forecasting questions, collecting and
using often incomplete data
which data help me build a better forecasting model?
Ability to critically evaluate and compare forecasts
reasonable (simple) benchmarks
skill or luck? Overfitting (data mining)
Compete or combine?
Timmermann (UCSD) Forecasting Winter, 2017 2 / 64
Ranking forecasters: Mexican inflation
Timmermann (UCSD) Forecasting Winter, 2017 3 / 64
Forecast situations
Forecasts are used to guide current decisions that affect the future
welfare of a decision maker (forecast user)
Predicting my grade – updating information on the likely grade as the
course progresses
Choosing between a fixed-rate mortgage (interest rate fixed for 20
years) versus a floating-rate (variable) mortgage
Depends on interest rate and inflation forecast
Political or sports outcomes – prediction markets
Investing in the stock market. How volatile will the stock market be?
Predicting Chinese property prices. Supply and demand considerations,
economic growth
Structural versus reduced-form approaches
Depends on the forecast horizon: 1 month vs 10 years
Timmermann (UCSD) Forecasting Winter, 2017 4 / 64
Forecasting and decisions
Credit card company deciding which transactions are potentially
fraudulent and should be denied (in real time)
requires fitting a model to past credit card transactions
binary data (zero-one)
Central Bank predicting the state of the economy – timing issues
Predicting which fund manager (if any) or asset class will outperform
Forecasting the outcome of the world cup:
http://www.goldmansachs.com/our-thinking/outlook/world-cup-
sections/world-cup-book-2014-statistical-model.html
Timmermann (UCSD) Forecasting Winter, 2017 5 / 64
Forecasting the outcome of the world cup
Timmermann (UCSD) Forecasting Winter, 2017 6 / 64
Key issues
Decision maker’s actions depend on predicted future outcomes
Trade off relative costs of over- or underpredicting outcomes
Actions and forecasts are inextricably linked
good forecasts are expected to lead to good decisions
bad forecasts are expected to lead to poor decisions
Forecast is an intermediate input in a decision process, rather than an
end product of separate interest
Loss function weighs the cost of possible forecast errors – like a utility
function uses preferences to weigh different outcomes
Timmermann (UCSD) Forecasting Winter, 2017 7 / 64
Loss functions
Forecasts play an important role in almost all decision problems where
a decision maker’s utility or wealth is affected by his current and
future actions and depend on unknown future events
Central Banks
Forecast inflation, unemployment, GDP growth
Action: interest rate; monetary policy
Trade off cost of over- vs. under-predictions
Firms
Forecast sales
Action: production level, new product launch
Trade off inventory vs. stock-out/goodwill costs
Money managers
Forecast returns (mean, variance, density)
Action: portfolio weights/trading strategy
Trade off Risk vs. return
Timmermann (UCSD) Forecasting Winter, 2017 8 / 64
Ways to generate forecasts
Rule of thumb. Simple decision rule that is not optimal, but may be
robust
Judgmental/subjective forecast, e.g., expert opinion
Combine with other information/forecasts
Quantitative models
“… an estimated forecasting model provides a characterization of what
we expect in the present, conditional upon the past, from which we
infer what to expect in the future, conditional upon the present and the
past. Quite simply, we use the estimated forecasting model to
extrapolate the observed historical data.” (Frank Diebold, Elements of
Forecasting).
Combine different types of forecasts
Timmermann (UCSD) Forecasting Winter, 2017 9 / 64
Forecasts: key considerations
Forecasting models are simplified approximations to a complex reality
How do we make the right shortcuts?
Which methods seem to work in general or in specific situations?
Economic theory may suggest relevant predictor variables, but is silent
about functional form, dynamics of forecasting model
combine art (judgment) and science
how much can we learn from the past?
Timmermann (UCSD) Forecasting Winter, 2017 10 / 64
Forecast object – what are we trying to forecast?
Event outcome: predict if a certain event will happen
Will a bank or hedge fund close?
Will oil prices fall below $40/barrel in 2017?
Will Europe experience deflation in 2017?
Event timing: it is known that an event will happen, but unknown
when it will occur
When will US stocks enter a “bear” market (Dow drops by 10%)?
Time-series: forecasting future values of a continuous variable by
means of current and past data
Predicting the level of the Dow Jones Index on March 15, 2017
Timmermann (UCSD) Forecasting Winter, 2017 11 / 64
Forecast statement
Point forecast
Single number summarizing “best guess”. No information on how
certain or precise the point forecast is. Random shocks affect all
time-series so a non-zero forecast error is to be expected even from a
very good forecast
Ex: US GDP growth for 2017 is expected to be 2.5%
Interval forecast
Lower and upper bound on outcome. Gives a range of values inside
which we expect the outcome will fall with some probability (e.g., 50%
or 95%). Confidence interval for the predicted variable. Length of
interval conveys information about forecast uncertainty.
Ex: 90% chance US GDP growth will fall between 1% and 4%
Density or probability forecast
Entire probability distribution of the future outcome
Ex: US GDP growth for 2017 is Normally distributed N(2.5,1)
Timmermann (UCSD) Forecasting Winter, 2017 12 / 64
Forecast horizon
The best forecasting model is likely to depend on whether we are
forecasting 1 minute, 1 day, 1 month or 1 year ahead
We refer to an h−step-ahead forecast, where h (short for “horizon”)
is the number of time periods ahead that we predict
Often you hear the argument that “fundamentals matter in the long
run, psychological factors are more important in the short run”
Timmermann (UCSD) Forecasting Winter, 2017 13 / 64
Information set
Do we simply use past values of a series itself or do we include a
larger information set?
Suppose we wish to forecast some outcome y for period T + 1 and
have historical data on this variable from t = 1, ..,T . The univariate
information set consists of the series itself up to time T :
IunivariateT = {y1, …, yT }
If data on other series, zt (typically an N × 1 vector), are available,
we have a multivariate information set
ImultivariateT = {y1, …, yT , z1, …, zT }
It is often important to establish whether a forecast can benefit from
using such additional information
Timmermann (UCSD) Forecasting Winter, 2017 14 / 64
Loss function: notations
Outcome: Y
Forecast: f
Forecast error: e = Y − f
Observed data: Z
Loss function: L(f ,Y )→ R
maps inputs f ,Y to the real number line R
yields a complete ordering of forecasts
describes in relative terms how costly it is to make forecast errors
Timmermann (UCSD) Forecasting Winter, 2017 15 / 64
Loss Function Considerations
Choice of loss function that appropriately measures trade-offs is
important for every facet of the forecasting exercise and affects
which forecasting models are preferred
how parameters are estimated
how forecasts are evaluated and compared
Loss function reflects the economics of the decision problem
Financial analysts’forecasts; Hong and Kubik (2003), Lim (2001)
Analysts tend to bias their earnings forecasts (walk-down effect)
Sometimes a forecast is best viewed as a signal in a strategic game
that explicitly accounts for the forecast provider’s incentives
Timmermann (UCSD) Forecasting Winter, 2017 16 / 64
Constructing a loss function
For profit maximizing investors the natural choice of loss is the
function relating payoffs (through trading rule) to the forecast and
realized returns
Link between loss and utility functions: both are used to minimize risk
arising from economic decisions
Loss is sometimes viewed as the negative of utility
U(f ,Y ) ≈ −L(Y , f )
Majority of forecasting papers use simple ‘off the shelf’statistical loss
functions such as Mean Squared Error (MSE)
Timmermann (UCSD) Forecasting Winter, 2017 17 / 64
Common Assumptions on Loss
Granger (1999) proposes three ‘required’properties for error loss
functions, L(f , y) = L(y − f ) = L(e):
A1. L(0) = 0 (minimal loss of zero for perfect forecast);
A2. L(e) ≥ 0 for all e;
A3. L(e) is monotonically non-decreasing in |e| :
L(e1) ≥ L(e2) if e1 > e2 > 0
L(e1) ≥ L(e2) if e1 < e2 < 0
A1: normalization
A2: imperfect forecasts are more costly than perfect ones
A3: regularity condition - bigger forecast mistakes are (weakly)
costlier than smaller mistakes (of same sign)
Timmermann (UCSD) Forecasting Winter, 2017 18 / 64
Additional Assumptions on Loss
Symmetry:
L(y − f , y) = L(y + f , y)
Granger and Newbold (1986, p. 125): “.. an assumption of symmetry
about the conditional mean ... is likely to be an easy one to accept ...
an assumption of symmetry for the cost function is much less
acceptable.”
Homogeneity: for some positive function h(a) :
L(ae) = h(a)L(e)
scaling doesn’t matter
Differentiability of loss with respect to the forecast (regularity
condition)
Timmermann (UCSD) Forecasting Winter, 2017 19 / 64
Squared Error (MSE) Loss
L(e) = ae2, a > 0
Satisfies the three Granger properties
Homogenous, symmetric, differentiable everywhere
Convex: penalizes large forecast errors at an increasing rate
Optimal forecast:
f ∗ = arg
f
min
∫
(y − f )2pY dy
First order condition
f ∗ =
∫
ypY dy = E (y)
The optimal forecast under MSE loss is the conditional mean
Timmermann (UCSD) Forecasting Winter, 2017 20 / 64
Piece-wise Linear (lin-lin) Loss
L(e) = (1− α)e1e>0 − αe1e≤0, 0 < α < 1 1e>0 = 1 if e > 0, otherwise 1e>0 = 0. Indicator variable
Weight on positive forecast errors: (1− α)
Weight on negative forecast errors: α
Lin-lin loss satisfies the three Granger properties and is homogenous
and differentiable everywhere with regard to f , except at zero
Lin-lin loss does not penalize large errors as much as MSE loss
Mean absolute error (MAE) loss arises if α = 1/2:
L(e) = |e|
Timmermann (UCSD) Forecasting Winter, 2017 21 / 64
MSE vs. piece-wise Linear (lin-lin) Loss
-3 -2 -1 0 1 2 3
0
5
10
L(
e)
e
α = 0.25
-3 -2 -1 0 1 2 3
0
5
10
L(
e)
e
α = 0.5, MAE loss
-3 -2 -1 0 1 2 3
0
5
10
L(
e)
e
α = 0.75
MSE
linlin
MSE
linlin
MSE
linlin
Timmermann (UCSD) Forecasting Winter, 2017 22 / 64
Optimal forecast under lin-lin Loss
Expected loss under lin-lin loss:
EY [L(Y − f )] = (1− α)E [Y |Y > f ]− αE [Y |Y ≤ f ]
First order condition:
f ∗ = P−1Y (1− α)
PY : CDF of Y
The optimal forecast is the (1− α) quantile of Y
α = 1/2 : optimal forecast is the median of Y
As α increases towards one, the optimal forecast moves further to the
left of the tail of the predicted outcome distribution
Timmermann (UCSD) Forecasting Winter, 2017 23 / 64
Optimal forecast of N(0,1) variable under lin-lin loss
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
α
f*
Timmermann (UCSD) Forecasting Winter, 2017 24 / 64
Linex Loss
L(e) = exp(a2e)− a2e − 1, a2 6= 0
Differentiable everywhere
Asymmetric: a2 controls both the degree and direction of asymmetry
a2 > 0 : loss is approximately linear for e < 0 and approximately exponential for e > 0
Large underpredictions are very costly (f < y , so e = y − f > 0)
Converse is true when a2 < 0 Timmermann (UCSD) Forecasting Winter, 2017 25 / 64 MSE versus Linex Loss -3 -2 -1 0 1 2 3 0 5 10 15 20 L( e) e right-skewed linex loss with a 2 =1 -3 -2 -1 0 1 2 3 0 5 10 15 20 L( e) e left-skewed linex loss with a 2 =-1 MSE Linex MSE Linex Timmermann (UCSD) Forecasting Winter, 2017 26 / 64 Linex Loss Suppose Y ∼ N(µY , σ 2 Y ). Then E [L(e)] = exp(a2(µY − f ) + a22 2 σ2Y )− a2(µY − f ) Optimal forecast: f ∗ = µY + a2 2 σ2Y Under linex loss, the optimal forecast depends on both the mean and variance of Y (µY and σ 2 Y ) as well as on the curvature parameter of the loss function, a2 Timmermann (UCSD) Forecasting Winter, 2017 27 / 64 Optimal bias under Linex Loss for N(0,1) variable -3 -2 -1 0 1 2 3 0 0.2 0.4 e MSE loss -3 -2 -1 0 1 2 3 0 0.2 0.4 e linex loss with a 2 =1 -3 -2 -1 0 1 2 3 0 0.2 0.4 e linex loss with a 2 =-1 Timmermann (UCSD) Forecasting Winter, 2017 28 / 64 Multivariate Loss Functions Multivariate MSE loss with n errors e = (e1, ..., en)′ : MSE (A) = e ′Ae A is a nonnegative and positive definite n× n matrix This satisfies the basic assumptions for a loss function When A = In, covariances can be ignored and the loss function simplifies to MSE (In) = E [e ′e] = ∑ n i=1 e 2 i , i.e., the sum of the individual mean squared errors Timmermann (UCSD) Forecasting Winter, 2017 29 / 64 Does the loss function matter? Cenesizoglu and Timmermann (2012) compare statistical and economic measures of forecasting performance across a large set of stock return prediction models with time-varying mean and volatility Economic performance is measured through the certainty equivalent return (CER), i.e., the risk-adjusted return Statistical performance is measured through mean squared error (MSE) Performance is measured relative to that of a constant expected return (prevailing mean) benchmark Common for forecast models to produce worse mean squared error (MSE) but better return performance than the benchmark Relation between statistical and economic measures of forecasting performance can be weak Timmermann (UCSD) Forecasting Winter, 2017 30 / 64 Does loss function matter? Cenesizoglu and Timmermann Timmermann (UCSD) Forecasting Winter, 2017 31 / 64 Percentage of models with worse statistical but better economic performance than prevailing mean (CT, 2012) CER is certainty equivalent return Sharpe is the Sharpe ratio RAR is risk-adjusted return RMSE is root mean squared (forecast) error Timmermann (UCSD) Forecasting Winter, 2017 32 / 64 Example: Directional Trading system Consider the decisions of a risk-neutral ‘market timer’whose utility is linear in the return on the market portfolio (y) U(δ(f ), y) = δy Investor’s decision rule, δ(f ) : go ‘long’one unit in the risky asset if a positive return is predicted (f > 0), otherwise go short one unit:
δ(f ) =
{
1 if f ≥ 0
−1 if f < 0
Let sign(y) = 1, if y > 0, otherwise sign(y) = 0. Payoff:
U(y , δ(f )) = (2sign(f )− 1)y
Sign and magnitude of y and sign of f matter to trader’s utility
Timmermann (UCSD) Forecasting Winter, 2017 33 / 64
Example: Directional Trading system (cont.)
Which forecast approach is best under the directional trading rule?
Since the trader ignores information about the magnitude of the
forecast, an approach that focuses on predicting only the sign of the
excess return could make sense
Leitch and Tanner (1991) studied forecasts of T-bill futures:
Professional forecasters reported predictions with higher mean squared
error (MSE) than those from simple time-series models
Puzzling since the time-series models incorporate far less information
than the professional forecasts
When measured by their ability to generate profits or correctly forecast
the direction of future interest rate movements the professional
forecasters did better than the time-series models
Professional forecasters’objectives are poorly approximated by MSE
loss – closer to directional or ‘sign’loss
Timmermann (UCSD) Forecasting Winter, 2017 34 / 64
Common estimates of forecasting performance
Define the forecast error et+h|t = yt+h − ft+h|t . Then
MSE = T−1
T
∑
t=1
e2t+h|t
RMSE =
√√√√T−1 T∑
t=1
e2
t+h|t
MAE = T−1
T
∑
t=1
|et+h|t |
Directional accuracy (DA): let Ixt+1>0 = 1 if xt+1 > 0, otherwise
Ixt+1>0 = 0. Then an estimate of DA is
DA = T−1
T
∑
t=1
Iyt+h×ft+h|t>0
Timmermann (UCSD) Forecasting Winter, 2017 35 / 64
Forecast evaluation
ft+h|t : forecast of yt+h given information available at time t
Given a sequence of forecasts, ft+h|t , and outcomes, yt+h,
t = 1, …,T , it is natural to ask if the forecast was “optimal”or
obviously deficient
Questions posed by forecast evaluation are related to the
measurement of predictive accuracy
Absolute performance measures the accuracy of an individual
forecast relative to the outcome, using either an economic
(loss-based) or a statistical metric
Relative performance compares the performance of one or several
forecasts against some benchmark
Timmermann (UCSD) Forecasting Winter, 2017 36 / 64
Forecast evaluation (cont.)
Forecast evaluation amounts to understanding if the loss from a given
forecast is “small enough”
Informal methods – graphical plots, decompositions
Formal methods – distribution of test statistic for sample averages of
loss estimates can depend on how the forecasts were constructed, e.g.
which estimation method was used
The method (not only the model) used to construct the forecast
matters – expanding vs. rolling estimation window
Formal evaluation of an individual forecast requires testing whether
the forecast is optimal with respect to some loss function and a
specific information set
Rejection of forecast optimality suggests that the forecast can be
improved
Timmermann (UCSD) Forecasting Winter, 2017 37 / 64
Effi cient Forecast: Definition
A forecast is effi cient (optimal) if no other forecast using the available
data, xt ∈ It , can be used to generate a smaller expected loss
Under MSE loss:
f̂ ∗t+h|t = arg
f̂ (xt )
minE
[
(yt+h − f̂ (xt ))2
]
If we can use information in It to produce a more accurate forecast,
then the original forecast would be suboptimal
Effi ciency is conditional on the information set
weak form forecast effi ciency tests include only past forecasts and
past outcomes It = {yt , yt−1, …, f̂t |t−1, et |t−1, …}
strong form effi ciency tests extend this to include all other variables
xt ∈ It
Timmermann (UCSD) Forecasting Winter, 2017 38 / 64
Optimality under MSE loss
First order condition for an optimal forecast under MSE loss:
E [
∂(yt+h − ft+h|t )2
∂ft+h|t
] = −2E
[
yt+h − ft+h|t
]
= −2E
[
et+h|t
]
= 0
Similarly, conditional on information at time t, It :
E [et+h|t |It ] = 0
The expected value of the forecast error must equal zero given
current information, It
Test E [et+h|txt ] = 0 for all variables xt ∈ It known at time t
If the forecast is optimal, no variable known at time t can predict its
future forecast error et+h|t . Otherwise the forecast wouldn’t be
optimal
If I can predict that my forecast will be too low, I should increase my
forecast
Timmermann (UCSD) Forecasting Winter, 2017 39 / 64
Optimality properties under Squared Error Loss
1 Optimal forecasts are unbiased: the forecast error et+h|t has zero
mean, both conditionally and unconditionally:
E [et+h|t ] = E [et+h|t |It ] = 0
2 h-period forecast errors (et+h|t) are uncorrelated with information
available at the time the forecast was computed (It). In particular,
single-period forecast errors, et+1|t , are serially uncorrelated:
E [et+1|tet |t−1] = 0
3 The variance of the forecast error (et+h|t) increases (weakly) in the
forecast horizon, h :
Var(et+h+1|t ) ≥ Var(et+h|t ) for all h ≥ 1
Timmermann (UCSD) Forecasting Winter, 2017 40 / 64
Optimality properties under Squared Error Loss (cont.)
Forecasts should be unbiased. Why? If they were biased, we could
improve the forecast simply by correcting for the bias
Suppose ft+1|t is biased:
yt+1 = 1+ ft+1|t + εt+1, εt+1 ∼ WN(0, σ
2)
The bias-corrected forecast:
f ∗t+1|t = 1+ ft+1|t
is more accurate than ft+1|t
Forecast errors should be unpredictable:
Suppose yt+1 − ft+1|t = et+1 = 0.5et + εt+1 so the one-step forecast
error is serially correlated
Adding back 0.5et to the original forecast yields a more accurate
forecast: f ∗t+1|t = ft+1|t + 0.5et is better than f
∗
t+1|t
Variance of forecast error increases in the forecast horizon
We learn more information as we get closer to the forecast “target”
Timmermann (UCSD) Forecasting Winter, 2017 41 / 64
Informal evaluation methods (Greenbook forecasts)
Time-series graph of forecasts and outcomes {ft+h|t , yt+h}Tt=1
1965 1970 1975 1980 1985 1990 1995 2000 2005 2010
-10
-5
0
5
10
GDP growth
time
an
nu
al
iz
ed
c
ha
ng
e
Actual
Forecast
1965 1970 1975 1980 1985 1990 1995 2000 2005 2010
0
2
4
6
8
10
12
14
inflation rate
time, t
an
nu
al
iz
ed
c
ha
ng
e
Actual
Forecast
Timmermann (UCSD) Forecasting Winter, 2017 42 / 64
Informal evaluation methods (Greenbook forecasts)
Scatterplots of {ft+h|t , yt+h}Tt=1
-10 -8 -6 -4 -2 0 2 4 6 8 10
-10
-5
0
5
10
GDP growth
forecast
ac
tu
al
0 5 10 15
0
5
10
15
inflation rate
forecast
ac
tu
al
Timmermann (UCSD) Forecasting Winter, 2017 43 / 64
Informal evaluation methods (Greenbook Forecasts)
Plots of ft+h|t − yt against yt+h − yt : directional accuracy
-15 -10 -5 0 5 10 15
-10
-5
0
5
10
forecast
ac
tu
al
GDP growth
-10
-5
0
5
10
-15 -10 -5 0 5 10 15
-4 -3 -2 -1 0 1 2 3 4
-6
-4
-2
0
2
4
6
forecast
ac
tu
al
inflation rate
-6
-4
-2
0
2
4
6
-4 -3 -2 -1 0 1 2 3 4
Timmermann (UCSD) Forecasting Winter, 2017 44 / 64
Informal evaluation methods (Greenbook forecasts)
Plot of forecast errors et+h = yt+h − ft+h|t
1965 1970 1975 1980 1985 1990 1995 2000 2005 2010
-5
0
5
10
fo
re
ca
st
e
rr
or
GDP growth
1965 1970 1975 1980 1985 1990 1995 2000 2005 2010
-4
-2
0
2
4
6
fo
re
ca
st
e
rr
or
time, t
Inflation rate
Timmermann (UCSD) Forecasting Winter, 2017 45 / 64
Informal evaluation methods
Theil (1961) suggested the following decomposition:
E [y − f ]2 = E [(y − Ey)− (f − Ef ) + (Ey − Ef )]2
= (Ey − Ef )2 + (σy − σf )2 + 2σyσf (1− ρ)
MSE depends on
squared bias (Ey − Ef )2
squared differences in standard deviations (σy − σf )2
correlation between the forecast and outcome ρ
Timmermann (UCSD) Forecasting Winter, 2017 46 / 64
Pseudo out-of-sample Forecasts
Simulated (“pseudo”) out-of-sample (OoS) forecasts seek to mimic
the “real time”updating underlying most forecasts
What would a forecaster have done (historically) at a given point in
time?
Method splits data into an initial estimation sample (in-sample
period) and a subsequent evaluation sample (OoS period)
Forecasts are based on parameter estimates that use data only up to
the date when the forecast is computed
As the sample expands, the model parameters get updated, resulting
in a sequence of forecasts
Why do out-of-sample forecasting?
control for data mining – harder to “game”
feasible in real time (less “look-ahead” bias)
Timmermann (UCSD) Forecasting Winter, 2017 47 / 64
Pseudo out-of-sample forecasts (cont.)
Out-of-sample (OoS) forecasts impose the constraint that the
parameter estimates of the forecasting model only use information
available at the time the forecast was computed
Only information known at time t can be used to estimate and select
the forecasting model and generate forecasts ft+h|t
Many variants of OoS forecast estimation methods exist. These can
be illustrated for the linear regression model
yt+1 = β
′xt + εt+1
f̂t+1|t = β̂
′
txt
β̂t =
(
t
∑
s=1
ω(s, t)xs−1x
′
s−1
)−1 (
t
∑
s=1
ω(s, t)xs−1y
′
s
)
Different methods use different weighting functions ω(s, t)
Timmermann (UCSD) Forecasting Winter, 2017 48 / 64
Expanding window
Expanding or recursive estimation windows put equal weight on all
observations s = 1, …, t to estimate the parameters of the model:
ω(s, t) =
{
1 1 ≤ s ≤ t
0 otherwise
As time progresses, the estimation sample grows larger, It ⊆ It+1
If the parameters of the model do not change (“stationarity”), the
expanding window approach makes effi cient use of the data and leads
to consistent parameter estimates
If model parameters are subject to change, the approach leads to
biased forecasts
The approach works well empirically due to its use of all available
data which reduces the effect of estimation error on the forecasts
Timmermann (UCSD) Forecasting Winter, 2017 49 / 64
Expanding window
1 t t+1 t+2 T-1
time
Timmermann (UCSD) Forecasting Winter, 2017 50 / 64
Rolling window
Rolling window uses an equal-weighted kernel of the most recent ω̄
observations to estimate the parameters of the forecasting model
ω(s, t) =
{
1 t − ω̄+ 1 ≤ s ≤ t
0 otherwise
Only one ‘design’parameter: ω̄ (length of window)
Practical way to account for slowly-moving changes to the data
generating process
Does this address “breaks”?
window too long immediately after breaks
window too short further away
Timmermann (UCSD) Forecasting Winter, 2017 51 / 64
Rolling window
t-w+1 t-w+2 t t+1 t+2 T-1
time
Timmermann (UCSD) Forecasting Winter, 2017 52 / 64
Fixed window
Fixed window uses only the first ω̄0 observations to once and for all
estimate the parameters of the forecasting model
ω(s, t) =
{
1 1 ≤ s ≤ ω̄0
0 otherwise
This method is typically employed when the costs of estimation are
very high, so re-estimating the model with new data is prohibitively
expensive or impractical in real time
The method also makes analytical results easier
Timmermann (UCSD) Forecasting Winter, 2017 53 / 64
Fixed window
1 w t t+1 t+2 T-1
time
Timmermann (UCSD) Forecasting Winter, 2017 54 / 64
Exponentially declining weights
In the presence of model instability, it is common to discount past
observations using weights that get smaller, the older the data
Exponentially declining weights take the following form:
ω(s, t) =
{
λt−s 1 ≤ s ≤ t
0 otherwise
0 < λ < 1. This method is sometimes called discounted least squares as the discount factor, λ, puts less weight on past observations Timmermann (UCSD) Forecasting Winter, 2017 55 / 64 Comparisons Expanding estimation window: number of observations available for estimating model parameters increases with the sample size Effect of estimation error gets reduced Fixed/rolling/discounted window: parameter estimation error continues to affect the forecasts even as the sample grows large model parameters are inconsistent Forecasts vary more under the short (fixed and rolling) estimation windows than under the expanding window Timmermann (UCSD) Forecasting Winter, 2017 56 / 64 US stock index Timmermann (UCSD) Forecasting Winter, 2017 57 / 64 Monthly US stock returns Timmermann (UCSD) Forecasting Winter, 2017 58 / 64 Monthly inflation Timmermann (UCSD) Forecasting Winter, 2017 59 / 64 US T-bill rate Timmermann (UCSD) Forecasting Winter, 2017 60 / 64 US Stock market volatility Timmermann (UCSD) Forecasting Winter, 2017 61 / 64 Example: Portfolio Choice under Mean-Variance Utility T-bills with known payoff rf vs stocks with uncertain return r s t+1 and excess return rt+1 = r st+1 − rf Wt = $1 : Initial wealth ωt : portion of portfolio held in stocks at time t (1−ωt ) : portion of portfolio held in Tbills Wt+1 : future wealth Wt+1 = (1−ωt )rf +ωt (rt+1 + rf ) = rf +ωt rt+1 Investor chooses ωt to maximize mean-variance utility: Et [U(Wt+1)] = Et [Wt+1]− A 2 Vart (Wt+1) Et [Wt+1] and Vart (Wt+1) : conditional mean and variance of Wt+1 Timmermann (UCSD) Forecasting Winter, 2017 62 / 64 Portfolio Choice under Mean-Variance Utility (cont.) Suppose stock returns follow the process rt+1 = µ+ xt + εt+1 xt ∼ (0, σ2x ), εt+1 ∼ (0, σ2ε ), cov(xt , εt+1) = 0 xt : predictable component given information at t εt+1 : unpredictable innovation (shock) Uninformed investor’s (no information on xt) stock holding: ω∗t = arg ωt max { ωtµ+ rf − A 2 ω2t (σ 2 x + σ 2 ε ) } = µ A(σ2x + σ 2 ε ) E [U(Wt+1(ω ∗ t ))] = rf + µ2 2A(σ2x + σ 2 ε ) = rf + S2 2A S = µ/ √ σ2x + σ 2 ε : unconditional Sharpe ratio Timmermann (UCSD) Forecasting Winter, 2017 63 / 64 Portfolio Choice under Mean-Variance Utility (cont.) Informed investor knows xt . His stock holdings are ω∗t = µ+ xt Aσ2ε Et [U(Wt+1(ω ∗ t ))] = rf + (µ+ xt )2 2Aσ2ε Average (unconditional expectation) value of this is E [Et [U(Wt+1(ω ∗ t ))]] = rf + µ2 + σ2x 2Aσ2ε Increase in expected utility due to knowing the predictor variable: E [U inf ]− E [Uun inf ] = σ2x 2Aσ2ε = R2 2A(1− R2) Plausible empirical numbers, i.e., R2 = 0.005, and A = 3, give an annualized certainty equivalent return of about 1% Timmermann (UCSD) Forecasting Winter, 2017 64 / 64 Lecture 2: Univariate Forecasting Models UCSD, January 18 2017 Allan Timmermann1 1UC San Diego Timmermann (UCSD) ARMA Winter, 2017 1 / 59 1 Introduction to ARMA models 2 Covariance Stationarity and Wold Representation Theorem 3 Forecasting with ARMA models 4 Estimation and Lag Selection for ARMA Models Choice of Lag Order 5 Random walk model 6 Trend and Seasonal Components Seasonal components Trended Variables Timmermann (UCSD) ARMA Winter, 2017 2 / 59 Introduction: ARMA models When building a forecasting model for an economic or financial variable, the variable’s own past time series is often the first thing that comes to mind Many time series are persistent Effect of past and current shocks takes time to evolve Auto Regressive Moving Average (ARMA) models Work hors