Economics 430
Lecture 2 Modeling and Forecasting Trend
1
Today’s Class 1 of 2
• Loss Function
• Modeling Trend – Linear
– Quadratic
– Log-linear
– Exponential
– Cyclical or Seasonal – Cosine
• Model Selection • R Example
2
Today’s Class 2 of 2
• ForecastingChallenges
• ForecastingEnvironments • Model Selection
– MSE – AIC – SIC
Trend Fitting via Periodic Functions Trend Fitting via Holt-Winters Filtering R Example
• • •
3
Loss Function 1 of 4
• Good forecasts lead to good decisions! Strong link between forecasts and decisions.
• Example: You started a firm and need to decide (now) how much inventory to hold going into the next sales period.
Strategy
Demand is highbuild inventory Demand is lowreduce inventory
4
Loss Function 2 of 4
Loss
Demand High
Demand Low
Build Inventory
0
$10,000
Reduce Inventory
$10,000
0
Symmetric Loss Structure: Both bad outcomes have the same loss.
Loss
Demand High
Demand Low
Build Inventory
0
$10,000
Reduce Inventory
$20,000
0
Asymmetric Loss Structure: Outcomes have different losses.
5
•
Loss Function 3 of 4
For every decision-making problem, there is an associated loss structure; for each decision/outcome pair, there is an associated loss.
0Correct Decision >0Incorrect Decision
Loss
We could also forecast the sales!
Loss
High Actual Sales
Low Actual Sales
High Forecasted Sales
0
$10,000
Low Forecasted Sales
$10,000
0
Forecasting with Symmetric Loss: Both bad forecasts have the same loss.
6
Loss Function 4 of 4
• Forecast Error (e): Difference between the realization (y) and the previously made forecast(ŷ)e = y – ŷ
• Loss Function (L(e)): Loss associated with a forecast. Must satisfy: (1) L(0)=0,(2) L(e) is continuous, and (3) L(e) is increasing on each side of the origin.
Quadratic Loss Function
L(e) =e2
Examples of loss functions:
Absolute Loss Function
L(e) =|e|
Error
Error
7
Loss
Loss
Modeling Trend
• Trend: Is slow , long-run evolution in the variables that we want to model and forecast.
• Deterministic Trend: Trend evolves in a perfectly predictable way.
• To characterize a particular trend, we need a model. For example,
in the case of linear regression, the model is: yt = β0 + β1 x t
• Often, given the broad range of time scales encountered in time- series, it is convenient to adopt one common time variable (time dummy or time trend) such that: TIME* = (1, 2,…,T) where TIME=1 is the first period of the sample, and so on.
*The notation convention is TIMEt = t.
8
Modeling Trend (Linear)
http://stateofworkingamerica.org/charts/labor-force-participation-rate-of-population-age-16-and-older-by-gender/
Decreasing
Increasing
Model: Tt = β0 + β1 TIME t
9
Modeling Trend Linear
Female Data
Fitted
Actual
R2 =0.97Excellent!
Do you see anything wrong?
Time : 1948-1990
10
Modeling Trend Linear
Male Data
Fitted
Actual
R2 =0.95Excellent!
Do you see anything wrong?
Time : 1948-1990
11
Modeling Trend Linear
U.S. Recession Bands
Model: Tt = β0 + β1 TIME t
12
Modeling Trend Quadratic
Model: Tt = β0 + β1 TIME t +β2 TIME2
13
Modeling Trend Log-Linear
Model: log(Tt) = β0 + β1 TIME t
14
Modeling Trend Exponential
Model:
15
Model Selection via AIC and BIC
Model
df
AIC
BIC
Linear
3
1229.0482
1241.7865
Quadratic
4
808.3932
825.3776
Log-Linear
3
-3361.0625
-3348.3242
Exponential
3
1160.3758
1173.1141
The smaller the AIC/BIC value, the better the model.
Both AIC and BIC select the quadratic fit as the preferred model.
16
Modeling Trend
Random Walk with Linear Time Trend
17
Modeling Trend Cyclical or Seasonal Trends
18
Modeling Trend Cyclical or Seasonal Trends
Represents the series, where E[Xt] = 0∀ t Seasonal Means
Twelve constant parameters giving the expected average temperature for each of the 12 months
19
Modeling Trend Cyclical or Seasonal Trends
20
Modeling Trend CosineTrends
Amplitude (A)
Frequency Phase
Difficult to estimate because the parameters 𝛽𝛽, f and 𝛷𝛷 are not linear
21
Modeling Trend CosineTrends
Easier model to estimate
22
Modeling Trend CosineTrends
23
Modeling Trend CosineTrends
Time Series Analysis –W. S. Wei
24
Forecasting Trend 1 of 2
• Example (Point Forecast): Initially at T, and want to use a trend model to forecast the h-step-ahead value.
• Assume a linear trend:
• At time T+h:
• Point forecast:
~ 0 (zero-mean random noise)
Forecast is for t = T+h but based on t =T
• However, β0 and β1 are unknown. Solution: replace them with their
LS estimates and .
• Point Forecast:
25
Forecasting Trend 2 of 2
• Example (Interval Forecast): Same idea as before. Assume the trend regression disturbance is normally distributed, then:
• Interval Forecast:
• In practice, use:
• Example (Density Forecast): Same idea, yet again!
• Density Forecast:
• In practice, use:
26
Forecasting Trend (Example)
95% Confidence Interval 95% Prediction Interval
27
Forecasting Environments
• The data sample is divided into two parts: usually 2/3 are used for estimation and 1/3 for prediction.
• Def: Estimation Sample
This sample is used for estimating the model and
respective parameters. • Def: Prediction Sample
This sample is used to assess the accuracy of the forecast.
• Forecasting Methods: – Recursive
– Rolling – Fixed
28
Forecasting Challenges
• Lack of understanding of the phenomenon • Lack of statistical methods
• High uncertainty
• Lack of integration of skills
29
One-step ahead prediction at time t
0
Estimation sample
(t observations) t
Prediction sample
T
Recursive Scheme
t+1 ft,1 → Yt+1
Estimation sample (t+1 observations)
et,1
Prediction sample
0
t +1 .
.
t +j
t+1
ft+1,1 → Yt+2
et+1,1
Estimation sample (t+j observations)
T
t+2
0
t+j
Prediction sample
t+j+1 ft+ j,1 → Yt+ j+1
T
et+ j,1
Model parameters are updated one observation at a time.
30
One-step ahead prediction at time
t
Prediction sample
Rolling Scheme
Estimation sample 0 (t observations)
t t+1 ft,1 → Yt+1
T
Estimation sample (t observations)
et,1
Prediction sample
. .
.
t+2 et +1,1
t+1 01
t+1
ft+1,1 → Yt+2
T
Estimation sample j (t observations)
Prediction sample
t+j 0
t+j
t+j+1 ft+j,1 → Yt+j+1
T
et + j,1
Model parameters are update using a fixed ‘window’ of observations.
31
One-step ahead prediction at time
t
Estimation sample 0 (t observations)
Prediction sample
Fixed Scheme
t t+1 ft,1 → Yt+1
et,1 Update
T
Estimation sample
0 (t observations) t
Prediction sample
t+1 .
. .
t+j
t+1
ft+1,1 → Yt+2
T
t+2 et+1,1
Update information set t+1 ………… t+j
Estimation sample 0 t observations)
Prediction sample
t
t+j+1 ft+j,1 → Yt+j+1
T
et+ j,1 Model parameters computed only once
32
Model Selection 1 of 9
• Among the various model fits, how do we select the best one?
• Need a measure of “best fit model”.
• There are many metrics used for model selection such as
e.g., MSE, AIC, SIC, Mallows CP, etc.
• Depending on the Forecast problem on hand, certain metrics will be better suited than others for choosing an optimal model.
33
Model Selection 3 of 9
• Mean Squared Error (MSE):
where and
• The model with the smallest MSE is also the model with
the smallest sum of squared residuals (maximizes R2). Total sum of squares
34
Model Selection 4 of 9
• As the number of parameters increases, the MSE performance deteriorates (overfitting)!
• The out-of-sample forecast will not necessarily improve. However, it will improve the model’s fit on the historical data.
• MSE is a biased estimator of the out-of-sample 1-step-ahead prediction error variance.
The variance increases as the number of variables increases.
Need to include a penalty for including more degrees of freedom (variables)!
35
Model Selection 5 of 9
• MSE(adjustedfordf):
where k is the number of degrees of freedom (df)
used in model fitting.
• AdjustedR2:
36
Model Selection 6 of 9
• Since:
Penalty Factor
37
Model Selection 7 of 9
Two popular model selection metrics are:
Akaike Information Criterion
Schwarz Information Criterion Note: SIC is more commonly known as the Bayesian Information Criterion (BIC).
38
Model Selection 8 of 9
• Consistency: A model selection criterion is consistent if
1. (a) when the data-generating process (DGP) is among the models considered, the probability of selecting the true DGP approaches 1 as the sample size increases.
2. (b) when the DGP is not among the models considered, the probability of selecting the best approximation to the true DGP, approaches 1 as the sample size increases.
– MSE:inconsistent
– AIC:biasedtowardsoverparameterizedmodels
– SIC:consistent
• Asymptotic Efficiency: Rate of the model selection process – AIC:asymptoticallyefficient
– SIC: not asymptotically efficient
39
Example: Modeling and Forecasting Trend 1 of 10 Monthly Beer Production in Australia from Jan 1956 – Aug 1995
http://www.statoek.wiso.uni-goettingen.de/veranstaltungen/zeitreihen/sommer03/ts_r_intro.pdf
40
Example: Modeling and Forecasting Trend 2 of 10
Model 1: (Quadratic)
41
Example: Modeling and Forecasting Trend 3 of 10 Model 1 (Quadratic): Summary
42
Example: Modeling and Forecasting Trend 4 of 10
Model 2: (Quadratic + Periodic)
Add a periodic term.
43
Example: Modeling and Forecasting Trend 5 of 10 Model 2 (Quadratic + Periodic): Summary
44
Example: Modeling and Forecasting Trend 6 of 10 Model 1 vs. Model 2
AIC (m1, m2)
df
AIC
Model 1
4
-509.3847
Model 2
6
-673.7203
Model 2 is better.
BIC (m1, m2)
df
BIC
Model 1
4
-492.7230
Model 2
6
-648.7278
Model 2 is better.
The smaller the value returned from AIC and BIC, the better the model.
45
Example: Modeling and Forecasting Trend 7 of 10 Holt-Winters Filter: Considerably better model!
46
Example: Modeling and Forecasting Trend 8 of 10 Holt-Winters Prediction/Forecast for next 4 years
Forecast
47
Example: Modeling and Forecasting Trend 9 of 10 Holt-Winters Point and Interval Forecast for next 4 years
Interval (95%)
Interval (80%)
Point forecast
48
Example: Modeling and Forecasting Trend 10 of 10 Trend + Seasonal Components Decoupled
49