Regression From a Forecasting Perspective
Zhenhao Gong University of Connecticut
Welcome 2
This course is designed to be:
1. Introductory
2. Leading by interesting questions and applications 3. Less math, useful, and fun!
Most important:
Feel free to ask any questions!
Enjoy!
Regression in time series data 3
Basic regression model:
yt = β0 + β1xt + εt
iid 2
εt ∼ (0,σ ), t = 1,··· ,T,
where β0, β1, and σ2 are called the model’s parameters. The index t keeps track of time.
Conditional expectation 4
If the regression model postulated above holds true, then the expected value of y conditional upon x∗ is,
E(y|x∗) = β0 + β1x∗,
so the regression function is the conditional expectation of y. In fact, as we will see later, the expectation of future y conditional upon available information is a particularly good forecast.
OLS Estimation 5
We assume the model sketched above is true in population and estimate the unknown parameters by solving the problem
T
min ε2t (sum of the squared error terms)
β0,β1 t=1
T
= min [yt − β0 − β1xt]2 . β0,β1 t=1
We denote the set of estimated parameters by βˆ, and its elements by βˆ0 and βˆ1.
Fitted values 6
The fitted values, or in-sample forecasts, are yˆ t = βˆ 0 + βˆ 1 x t , t = 1 , · · · , T .
The corresponding residuals, or in-sample forecast errors, are e t = y t − yˆ t , t = 1 , · · · , T .
Remark: Systematic patterns in forecast errors indicate that the forecasting model is inadequate; forecast errors from a good forecasting model must be unforecastable!
In-sample vs out-of-sample forecast 7
In-sample forecasting: forecasting for an observation that was part of the data sample.
Out-of-sample forecasting: forecasting for an observation that was not part of the data sample.
Example: if you use data 1990-2013 to fit the model and then you forecast for 2011-2013, it’s in-sample forecast. But if you only use 1990-2010 for fitting the model and then you forecast 2011-2013, then its out-of-sample forecast.
Multiple regressors 8
Multiple linear regression model:
yt =β0 +β1xt +β2zt +εt
iid 2
εt ∼ (0,σ ), t = 1,··· ,T.
We estimate the coefficients by OLS method. The fitted values are yˆt = βˆ0 + βˆ1xt + βˆ2zt, and the corresponding residuals are
e t = y t − yˆ t , t = 1 , · · · , T .
Remark: Each estimated coefficient gives the weight put on the corresponding variable in forming the best linear forecast of y.
Forecasting results: Eviews format
Std.Error 10
Std.Error:
indicate the sampling variability and hence the
reliability of each estimated coefficient.
95 % confidence interval: [βˆ − 1.96SE(βˆ), βˆ + 1.96SE(βˆ)].
large coefficient standard errors translate into wide confidence intervals ⇒ poor estimation.
t-statistics: test the hypothesis of variable irrelevance.
(β = 0 ⇒ variable contributes nothing and can be dropped)
Prob. 11
Probability value (P-value):
the probability of getting a value of the t statistic at least as large in absolute value as the one actually obtained, assuming that the irrelevance hypothesis is true.
the smaller the probability value, the stronger the evidence against irrelevance.
probability values below 0.05 are viewed as very strong evidence against irrelevance.
Sum squared residuals 12
Sum squared residuals: record the minimized value of the sum of squared residuals.
T SSR = e2t .
t=1
It serves as an input to other diagnostics that we’ll discuss shortly.
Log likelihood 13
Log likelihood: The likelihood function is the joint density function of the data, viewed as a function of the model parameters.
maximum likelihood estimation (MLE).
equivalent to minimizing the sum of squared residuals in the case of normally distributed regression distrubances.
F-statistic 14
F-statistic: test the hypothesis that the coefficients of all variables in the regression except the intercept are jointly zero. The formula is
F = (SSRres −SSR)/(k−1), SSR/(T − k)
where SSRres is the sum of squared residuals from a restricted regression that contains only an intercept.
Examining how much the SSR increases when all the variables except the constant are dropped.
If it increases by a great deal, there’s evidence that at least one of the variables has predictive content.
SER 15
S.E of regression (SER):
√ T e2 SER= s2= t=1 t.
s2 is the sample variance of the observed residuals, et. It is a natural estimator of σ2, which is the population variance of the unobserved residuals, εt.
s2 is used to assess goodness of fit of the model, as well as the magnitude of forecast errors.
The larger is s2, the worse the model’s fit, and the larger the forecast errors. (SER ≤ 10 or 15% of y ̄t)
T−k
R-squared
16
R-squared (R2):
2 SSres
Tt=1 e2t =1−T 2,
t=1 (yt − y)
The percent of the variance of y explained by the variables
included in the regression.
SSres, the residual sum of squares: Tt (yt − yˆt)2. SStot, the total sum of squares: Tt (yt − y ̄)2.
R-squared must be between zero and one.
R =1−SS tot
Adjusted R-squared 17
Adjusted R-squared (R ̄2) :
1 T e2 ̄2 T−k t=1t
R =1− 1 T (yt−y)2, T−1 t=1
where k is the number of right-hand-side variables, including the constant term.
incorporates adjustments for degrees of freedom used in fitting the model.
a more trustworthy goodness-of-fit measure than R2 in multiple regression models.
AIC 18
Akaike info criterion (AIC):
AIC = e(2k )Tt=1 e2t . T
T
An estimate of the out-of-sample forecast error variance,
as is s2, but it penalizes degree of freedom more harshly. It is used to select among competing forecasting models.
SIC/BIC 19
Schwarz/Bayesian information criterion (SIC/BIC):
SIC=T(k )Tt=1e2t. T
T
An alternative to the AIC with the same interpretation, but a still harsher degrees-of-freedom penalty.
Durbin-Watson Statistic 20
The Durbin-Watson (DW) statistic tests for serial correlation, in regression disturbances. It works within the context of the model
yt =β0 +β1xt +β2zt +εt εt = φεt−1 + vt
iid 2
vt ∼ (0,σ ), t = 1,··· ,T.
The regression disturbance is serially correlated when φ ̸= 0 (εt follows AR(1) ). The hypothesis of interest if that φ = 0.
Durbin-Watson Statistic 21
The formula for the DW test is
Tt=2 (et − et−1)2
DW= Te2 . t=1 t
DW takes values in the interval [0, 4]. A value near 2 indicates non-autocorrelation; a value toward 0 indicates positive autocorrelation; a value toward 4 indicates negative autocorrelation.
As a rough rule of thumb, if DW is less than 1.5, there may be cause for alarm.
Residual plot 22
Plot the actual data (yt’s), the fitted values (yˆt’s), and the residuals (et = yt − yˆt) in a single graph to assess the adequacy of the model.