Week-10 Regression Models
Some of the slides are adapted from the lecture notes provided by Prof. Antoine Saure and Prof. Rob Hyndman
Business Forecasting Analytics
ADM 4307 – Fall 2021
Regression Models (cont’d)
Ahmet Kandakoglu, PhD
15 November, 2021
Outline
• Review of Last Lecture
• The Linear Model with Time Series
• Simple Linear Regression
• Multiple Linear Regression
• Least Squares Estimation
• Evaluating the Regression Model
• Some Useful Predictors
• Selecting Predictors
• Forecasting with Regression
• Nonlinear Regression
• Correlation, Causation and Forecasting
ADM 4307 Business Forecasting Analytics 2Fall 2021
Simple Linear Regression
• The basic concept is that we forecast variable 𝑦 assuming it has a linear relationship with 𝑥
variable
𝑦𝑡 = 𝛽0 + 𝛽1𝑥𝑡 + 𝜀𝑡
• The model is called simple regression as we only allow one predictor variable 𝑥. The forecast
variable 𝑦 is sometimes also called the dependent or explained variable. The predictor
variable 𝑥 is sometimes also called the independent or explanatory variable.
• The parameters 𝛽0 and 𝛽1 determine the intercept and the slope of the line respectively. The
intercept 𝛽0 represents the predicted value of 𝑦 when 𝑥 = 0. The slope 𝛽1 represents the
average predicted change in 𝑦 resulting from a one unit increase in 𝑥.
ADM 4307 Business Forecasting Analytics 3Fall 2021
Multiple Linear Regression
• The general form of a multiple regression is
𝑦𝑡 = 𝛽0 + 𝛽1𝑥1,𝑡 + 𝛽2𝑥2,𝑡 +⋯+ 𝛽𝑘 𝑥𝑘,𝑡 + 𝜀𝑡
where 𝑦𝑡 is the variable to be forecast and 𝑥1,𝑡 , … , 𝑥𝑘,𝑡 are the predictor variables.
• The coefficients 𝛽1, … , 𝛽𝑘 measure the effect of each predictor after taking account of
the effect of all other predictors in the model.
• Thus, the coefficients measure the marginal effects of the predictor variables.
ADM 4307 Business Forecasting Analytics 4Fall 2021
Assumptions
• For forecasting, we make the following assumptions about the errors
(𝜀1, … , 𝜀𝑇):
• they have mean zero
• they are not autocorrelated
• they are unrelated to the predictor variables
• It is also useful to have the errors normally distributed with constant variance
in order to produce prediction intervals, but this is not necessary for
forecasting.
ADM 4307 Business Forecasting Analytics 5Fall 2021
Least Squares Estimation
• The values of the coefficients 𝛽0, 𝛽1, … , 𝛽𝑘 are obtained by finding the minimum sum of squares
of the errors. That is, we find the values of 𝛽0, 𝛽1, … , 𝛽𝑘 which minimize
𝑡=1
𝑇
𝜀𝑡
2 =
𝑡=1
𝑇
𝑦𝑡 − 𝛽0 − 𝛽1𝑥1,𝑡 − 𝛽2𝑥2,𝑡 −⋯− 𝛽𝑘 𝑥𝑘,𝑡
2
• This is called least squares estimation because it gives
the least value of the sum of squared errors.
Fall 2021 ADM 4307 Business Forecasting Analytics 6
Goodness-of-Fit
• A common way to summarize how well a linear regression model fits the data is via
the coefficient of determination, or 𝑅2.
𝑅2 =
σ ො𝑦𝑖 − ത𝑦
2
σ 𝑦𝑖 − ത𝑦
2
where the summations are over all observations.
• Thus, it reflects the proportion of variation in the forecast variable that is accounted
for (or explained) by the regression model
• In all cases, 0 ≤ 𝑅2 ≤ 1.
• If the predictions are close to the actual values, we would expect 𝑅2 to be close to 1.
On the other hand, if the predictions are unrelated to the actual values, then 𝑅2 = 0.
ADM 4307 Business Forecasting Analytics 7Fall 2021
Evaluating the Regression Model
• After selecting the regression variables and fitting a regression model, it is necessary
to plot the residuals to check that the assumptions of the model have been satisfied.
• There are a series of plots that should be produced in order to check different aspects
of the fitted model and the underlying assumptions (whether the linear model was
appropriate).
• ACF plot of residuals
• Histogram of residuals
• Residual plots against predictors
• Residual plots against fitted values
ADM 4307 Business Forecasting Analytics 8Fall 2021
Some Useful Predictors
• There are several useful predictors that occur frequently when using regression for
time series data:
• Trend
• Dummy variables
• Seasonal dummy variables
• Intervention variables
• Trading days
• Distributed lags
• Holidays
• Fourier series
ADM 4307 Business Forecasting Analytics 9Fall 2021
Outline
• Review of Last Lecture
• The Linear Model with Time Series
• Simple Linear Regression
• Multiple Linear Regression
• Least Squares Estimation
• Evaluating the Regression Model
• Some Useful Predictors
• Selecting Predictors
• Forecasting with Regression
• Nonlinear Regression
• Correlation, Causation and Forecasting
ADM 4307 Business Forecasting Analytics 10Fall 2021
Selecting Predictors
• When there are many predictors, how should we choose which ones to use?
• We need a way of comparing two competing models.
• What not to do!
• Plot 𝑦 against a particular predictor (𝑥𝑖) and if it shows no noticeable relationship, drop it.
• Do a multiple linear regression on all the predictors and disregard all variables whose 𝑝
values are greater than 0.05.
• Maximize 𝑅2 or minimize MSE
• Instead, we will use five measures of predictive accuracy (CV, AIC, AICc, BIC and
ത𝑅2).
• They can be shown using the glance() function.
Fall 2019 ADM 4307 Business Forecasting Analytics 11
Adjusted 𝑅2 ( ത𝑅2)
• Computer output for regression will always give the 𝑅2 value
• This is a useful summary of the model
• However
• 𝑅2 does not allow for “degrees of freedom”.
• Adding any variable tends to increase the value of 𝑅2, even if that variable is
irrelevant.
• To overcome this problem, we can use adjusted 𝑅2 ( ത𝑅2).
• Using this measure, the best model will be the one with the largest value of ത𝑅2.
• Maximizing ത𝑅2 is equivalent to minimizing the standard error.
Fall 2019 ADM 4307 Business Forecasting Analytics 12
Cross-validation (CV)
• Time series cross-validation is as a general tool for determining the predictive ability
of a model.
• For regression models, it is also possible to use classical leave-one-out cross-
validation to selection predictors.
• This is faster and makes more efficient use of the data.
• In case of many predictors, it may a time-consuming procedure.
• The best model is the one with the smallest value of CV.
Fall 2019 ADM 4307 Business Forecasting Analytics 13
Akaike’s Information Criterion (AIC)
• A closely-related method is Akaike’s Information Criterion.
• This is a penalized likelihood approach.
• Minimizing the AIC gives the best model for prediction.
• AIC penalizes terms more heavily than 𝑅2.
• For small values of number of observations, the AIC tends to select too many
predictors, and so a bias-corrected version of the AIC has been developed.
• As with the AIC, the AICc should be minimized.
Fall 2019 ADM 4307 Business Forecasting Analytics 14
Bayesian Information Criterion (BIC)
• A related measure is Schwarz’s Bayesian Information Criterion (BIC).
• As with the AIC, minimizing the BIC is intended to give the best model.
• BIC penalizes terms more heavily than AIC.
• Also called SBIC and SC.
Fall 2019 ADM 4307 Business Forecasting Analytics 15
Which Measure Should We Use?
• While ത𝑅2 is widely used, its tendency to select too many predictor variables makes it
less suitable for forecasting.
• BIC tries to find the true underlying model among the set of candidates. However, in
reality, this is very challenging.
• Consequently, one of the AICc, AIC or CV can be used, each of which has
forecasting as their objective.
• In this course, we use the AICc value to select the forecasting model.
Fall 2019 ADM 4307 Business Forecasting Analytics 16
Example: US Consumption
• In this example for forecasting US consumption we considered four predictors.
• With four predictors, there are 24 = 16 possible models.
• Now we can check if all four predictors are actually useful, or whether we can
drop one or more of them.
• All 16 models were fitted and the results are summarized.
• A “⬤” indicates that the predictor was included in the model.
• The results have been sorted according to the AICc.
Fall 2019 ADM 4307 Business Forecasting Analytics 17
Example: US Consumption
Fall 2021 ADM 4307 Business Forecasting Analytics 18
Income Production Savings Unemployment AdjR2 CV AIC AICc BIC
⬤ ⬤ ⬤ ⬤ 0.763 0.104 -456.6 -456.1 -436.9
⬤ ⬤ ⬤ 0.761 0.105 -455.2 -454.9 -438.7
⬤ ⬤ ⬤ 0.760 0.104 -454.4 -454.1 -437.9
⬤ ⬤ 0.735 0.114 -435.7 -435.5 -422.6
⬤ ⬤ ⬤ 0.366 0.271 -262.3 -262.0 -245.8
⬤ ⬤ ⬤ 0.349 0.279 -257.1 -256.8 -240.7
⬤ ⬤ 0.345 0.276 -256.9 -256.6 -243.7
⬤ ⬤ 0.336 0.282 -254.2 -254.0 -241.0
⬤ ⬤ 0.324 0.287 -250.7 -250.5 -237.5
⬤ ⬤ 0.311 0.291 -246.9 -246.7 -233.7
⬤ ⬤ 0.308 0.293 -246.1 -245.9 -232.9
⬤ 0.276 0.304 -238.1 -238.0 -228.2
⬤ 0.274 0.303 -237.4 -237.3 -227.5
⬤ 0.143 0.356 -204.6 -204.5 -194.7
⬤ 0.061 0.388 -186.5 -186.4 -176.7
0.000 0.409 -175.1 -175.0 -168.5
The best model contains all four predictors.
Choosing Regression Variables
• Best subsets regression
• Fit all possible regression models using one or more of the predictors.
• Choose the best model based on one of the measures of predictive ability (CV,
AIC, AICc).
• Warning!
• If there are a large number of predictors, it is not possible to fit all possible
models.
• For example, 40 predictors leads to 240 > 1 trillion possible models!
• Consequently, a strategy is required to limit the number of models to be explored.
Fall 2019 ADM 4307 Business Forecasting Analytics 19
Choosing Regression Variables
• Backwards stepwise regression
• Start with a model containing all variables.
• Try subtracting one variable at a time. Keep the model if it improves the measure
of predictive accuracy.
• Iterate until no further improvement.
• Warning!
• If the number of potential predictors is too large, then the backwards stepwise
regression will not work and forward stepwise regression can be used instead.
• Stepwise approach is not guaranteed to lead to the best possible model, but it always
leads to a good model.
Fall 2019 ADM 4307 Business Forecasting Analytics 20
Choosing Regression Variables
• Forward stepwise regression
• Start with a model that includes only the intercept.
• Try adding one variable at a time. Keep the model if it improves the measure of
predictive accuracy.
• Iterate until no further improvement.
• This is also not guaranteed to lead to the best possible model
Fall 2019 ADM 4307 Business Forecasting Analytics 21
Outline
• Review of Last Lecture
• The Linear Model with Time Series
• Simple Linear Regression
• Multiple Linear Regression
• Least Squares Estimation
• Evaluating the Regression Model
• Some Useful Predictors
• Selecting Predictors
• Forecasting with Regression
• Nonlinear Regression
• Correlation, Causation and Forecasting
ADM 4307 Business Forecasting Analytics 22Fall 2021
Forecasting with Regression
• Different types of forecasts that can be produced, depending on what is assumed to
be known.
• Ex-ante versus ex-post forecasts
• Ex-ante forecasts are those that are made using only the information that is
available in advance.
• require forecasts of predictors
• Ex-post forecasts are those that are made using later information on the
predictors (assume knowledge of the 𝑥 predictor variables)
• useful for studying behavior of forecasting models
• Trend, seasonal and calendar variables are all known in advance, so these don’t
need to be forecast.
Fall 2019 ADM 4307 Business Forecasting Analytics 23
Example: Australian Beer Production
recent_production <- aus_production %>% filter(year(Quarter) >= 1992)
fit_beer <- recent_production %>% model(TSLM(Beer ~ trend() + season()))
fc_beer <- forecast(fit_beer) fc_beer %>% autoplot(recent_production) +
labs(title = “Forecasts of beer production using regression”, y = “megalitres”)
ADM 4307 Business Forecasting Analytics 24Fall 2021
Scenario Based Forecasting
• Assumes possible scenarios for the predictor variables that are of interest.
• Prediction intervals for scenario based forecasts do not include the uncertainty
associated with the future values of the predictor variables
Fall 2021 ADM 4307 Business Forecasting Analytics 25
Example: US Consumption
fit_consBest <- us_change %>%
model(
lm = TSLM(Consumption ~ Income + Savings + Unemployment)
)
future_scenarios <- scenarios( Increase = new_data(us_change, 4) %>%
mutate(Income=1, Savings=0.5, Unemployment=0),
Decrease = new_data(us_change, 4) %>%
mutate(Income=-1, Savings=-0.5, Unemployment=0),
names_to = “Scenario”)
ADM 4307 Business Forecasting Analytics 26Fall 2021
> future_scenarios
$Increase
# A tsibble: 4 x 4 [1Q]
Quarter Income Savings Unemployment
1 2019 Q3 1 0.5 0
2 2019 Q4 1 0.5 0
3 2020 Q1 1 0.5 0
4 2020 Q2 1 0.5 0
$Decrease
# A tsibble: 4 x 4 [1Q]
Quarter Income Savings Unemployment
1 2019 Q3 -1 -0.5 0
2 2019 Q4 -1 -0.5 0
3 2020 Q1 -1 -0.5 0
4 2020 Q2 -1 -0.5 0
Example: US Consumption
fc <- forecast(fit_consBest, new_data = future_scenarios) us_change %>% autoplot(Consumption) + autolayer(fc) + labs(title = “US consumption”, y = “% change”)
ADM 4307 Business Forecasting Analytics 27Fall 2021
Building a Predictive Regression Model
• The great advantage of regression models is that they can be used to capture
important relationships between the forecast variable of interest and the
predictor variables.
• A major challenge however, is that in order to generate ex-ante forecasts, the
model requires future values of each predictor.
• An alternative formulation is to use as predictors their lagged values.
• If scenario based forecasting is of interest then these models are extremely
useful.
Fall 2021 ADM 4307 Business Forecasting Analytics 28
Outline
• Review of Last Lecture
• The Linear Model with Time Series
• Simple Linear Regression
• Multiple Linear Regression
• Least Squares Estimation
• Evaluating the Regression Model
• Some Useful Predictors
• Selecting Predictors
• Forecasting with Regression
• Nonlinear Regression
• Correlation, Causation and Forecasting
ADM 4307 Business Forecasting Analytics 29Fall 2021
Nonlinear Regression
• There are many cases in which a nonlinear functional form is more suitable
• The simplest way of modelling a nonlinear relationship is to transform the forecast
variable 𝑦 and/or the predictor variable 𝑥 before estimating a regression model.
• While this provides a non-linear functional form, the model is still linear in the
parameters.
log 𝑦 = 𝛽0 + 𝛽1log 𝑥 + 𝜀
• The other way is to make the function 𝑓 (𝑦 = 𝑓 𝑥 + 𝜀) piecewise linear. That means,
introducing points where the slope of 𝑓 can change. These points are called knots.
Fall 2021 ADM 4307 Business Forecasting Analytics 30
Example: Boston Marathon
Winning times (in minutes) for the Boston Marathon Men’s Open Division.
marathon <- boston_marathon %>% filter(Event == “Men’s open division”) %>%
select(-Event) %>% mutate(Minutes = as.numeric(Time)/60)
marathon %>% autoplot(Minutes) + labs(y=”Winning times in minutes”)
ADM 4307 Business Forecasting Analytics 31Fall 2021
The plot of winning times
reveals three different
periods.
There is a lot of volatility in
the winning times up to
about 1940, with the
winning times barely
declining.
After 1940 there is a clear
decrease in times,
followed by a flattening out
after the 1980s.
Example: Boston Marathon
fit_trends <- marathon %>%
model(
# Linear trend
linear = TSLM(Minutes ~ trend()),
# Exponential trend
exponential = TSLM(log(Minutes) ~ trend()),
# Piecewise linear trend
piecewise = TSLM(Minutes ~ trend(knots = c(1940, 1980)))
)
fit_trends
ADM 4307 Business Forecasting Analytics 32Fall 2021
# A mable: 1 x 3
linear exponential piecewise
1
Identification of knots is
subjective
We specify the years 1940
and 1980 as knots.
Example: Boston Marathon
fc_trends <- fit_trends %>% forecast(h = 10)
marathon %>%
autoplot(Minutes) + geom_line(data = fitted(fit_trends), aes(y = .fitted, colour = .model)) +
autolayer(fc_trends, alpha = 0.5, level = 95) +
labs(y = “Minutes”, title = “Boston marathon winning times”)
ADM 4307 Business Forecasting Analytics 33Fall 2021
Example: Boston Marathon
The best forecasts appear to come from the piecewise linear trend.
glance(fit_trends) %>% select(.model, adj_r_squared, CV, AIC, AICc, BIC)
ADM 4307 Business Forecasting Analytics 34Fall 2021
# A tibble: 3 x 6
.model adj_r_squared CV AIC AICc BIC
1 linear 0.726 39.1 452. 452. 460.
2 exponential 0.742 0.00176 -779. -779. -771.
3 piecewise 0.761 34.8 437. 438. 451.
Example: Boston Marathon
fit_trends %>% select(piecewise) %>% gg_tsresiduals()
ADM 4307 Business Forecasting Analytics 35Fall 2021
Outline
• Review of Last Lecture
• The Linear Model with Time Series
• Simple Linear Regression
• Multiple Linear Regression
• Least Squares Estimation
• Evaluating the Regression Model
• Some Useful Predictors
• Selecting Predictors
• Forecasting with Regression
• Nonlinear Regression
• Correlation, Causation and Forecasting
ADM 4307 Business Forecasting Analytics 36Fall 2021
Correlation is not Causation
• It is important not to confuse
• correlation with causation, or
• causation with forecasting.
• When x is useful for predicting y, it is not necessarily causing y
• Correlations are useful for forecasting, even when there is no causality.
• Better models usually involve causal relationships
ADM 4307 Business Forecasting Analytics 37Fall 2021
Multicollinearity
• In regression analysis, multicollinearity occurs when:
• Two predictors are highly correlated (i.e., the correlation between them is close to
±1).
• A linear combination of some of the predictors is highly correlated with another
predictor.
• A linear combination of one subset of predictors is highly correlated with a linear
combination of another subset of predictors.
ADM 4307 Business Forecasting Analytics 38Fall 2021
Multicollinearity
• If multicollinearity exists. . .
• the numerical estimates of coefficients may be wrong (worse in Excel than in a
statistics package)
• don’t rely on the p-values to determine significance
• there is no problem with model predictions provided the predictors used for
forecasting are within the range used for fitting
• omitting variables can help.
• combining variables can help.
ADM 4307 Business Forecasting Analytics 39Fall 2021
Business Forecasting Analytics
ADM 4307 – Fall 2021
Regression Models (cont’d)
ADM 4307 Business Forecasting Analytics 40Fall 2021