Questions for class discussion of Airbus A3XX case study
Assignment 1
Part I. Constructing prediction models for different variables
The file time_series_data_2017.xlsx in the Assignment 1 folder on TED contains
monthly data on US consumer prices (CPIAUCSL), gold prices (GOLD prices), the
unemployment rate (UNRATE) and the S&P500 index. You are asked to analyze each of
the four time series and explore if they are stationary as well as predictable.
1. Produce a time-series plot for each variable. Comment on whether they are
trending over time or not. If the variable is trending, then compute the log-first
difference, Δlog(yt) = log(yt) – log(yt-1), plot the transformed series, and use this
in the further analysis. The log first-difference is the inflation rate for the
CPIUCSL series. For gold prices and stock prices, the log-first difference is the
continuously compounded rate of return. [Hint: As an option to test if a variable is
stationary or is non-stationary and needs to be first-differenced, you can also
compute an (augmented) Dickey-Fuller test.]
2. Using the raw or transformed series, report the first 10 autocorrelations. Is the
variable persistent? Is the serial correlation statistically significant? [hint: use the
autocorr command in matlab]
3. For each of the variables, estimate an ARMA forecasting model. Experiment with
different lag orders for the autoregressive and moving average terms. Which
model seems to best fit each variable? [Hint: use the arima command in matlab.
To select the best model you can look at the t-statistics of the individual
coefficients. You can also use information criteria (aicbic in matlab) and look for
serial correlation (persistence) in the model residuals. The residuals from a good
forecasting model should not be serially correlated.]
4. Plot the fitted value from (3) against the actual (realized) value. What do you
conclude about the goodness of fit of your forecasting model?
Part II. Modeling Trend and Seasonality
This assignment uses data on COS emissions from the file Keeling_CO2data_2017.xlsx
available in the Assignment 1 folder on TED. The data was collected by Dave Keeling
who worked at the Scripps Institute of Oceanography for many years. This is a famous
data set that shows monthly measurements of CO2 in Hawaii from 1958 through
2016:10.
1. Plot the CO2 time series in column E. Briefly summarize the evidence of
seasonal effects and trends in the time-series plot.
2. Using data up to 2005, estimate a linear trend model for the CO2
measurements. Is the trend significant? Is the linear trend a good specification
that yields reliable forecasts?
3. Using data up to 2005, develop a better trend specification. Report your trend
estimates and explain what your specification is and what makes it better.
4. Include seasonal effects and report results for your preferred model that
includes both trend and seasonal effects. Again, use data only up to 2005.
5. Using data up to 2005m12, forecast future CO2 for the period 2006m01 to
2016m10. Evaluate how good the forecasts from this model are. Are there any
obvious problems with your forecasts [hint: are the forecast errors
unpredictable]?
6. Include one additional forecasting variable in your model. Argue why you
think it may help forecast CO2 and test if your intuition is right. [NB this is a
variable of your own choosing and so you need to add it to the data set. To do
so, add data covering the same period as the CO2 data (monthly from 1958-
2016:10)]
7. Produce a forecast of CO2 for December 2020. How reliable do you think
your forecast is? Be as specific as you can in discussing this point.
Assignment 2
Part I. Model Selection
This assignment uses the file Goyal_Welch_data.xlsx in the Assignment 2 folder on
TED. The data is downloaded from Amit Goyal’s web site and is an extended version of
the data used by Goyal and Welch (Review of Financial Studies, 2008). It contains
monthly information on US stock returns as well as on a range of predictor variables
proposed in the literature. We have also provided matlab and R code for you to use in the
analysis (available in the Assignment 2 folder).
You are asked to estimate forecasting models and simulate their performance out-of-
sample. To do so, use data from 1927m1 to 1969m12 to estimate each forecasting model,
then generate a return forecast for 1970m1. Then add the monthly data for 1970 to the
estimation model and produce a forecast for 1970m2. Repeat this process until the end of
the sample in 2015m12. This is called back testing.
Each forecasting model has the excess stock return as the dependent variable. The excess
stock return is listed in column E (as a rate of return per month). As predictors you can
choose from a list of 10 variables: the dividend-price ratio D/P (col F), earnings-price
ratio E/P (col G), book-to-market ratio b/m (col H), T-bill rate tbl (col I), default-spread
def_spread (col J), long-term yield lty (col K), net equity issues ntis (col L), inflation rate
infl (col M), long-term returns ltr (col N), and stock variance svar (col O). How these
variables are defined and constructed is explained in Goyal and Welch (2008).
Estimate linear regression models of the form (where yt+1 = excess returnt+1 from col. E)
yt 1 0 1 x1t … k xkt t 1
Make sure to use one-month lagged values of the predictors. Univariate models have only
a single predictor x1. Multivariate models have two or more predictors x1, x2,…. All
models include an intercept term.
1. At each point in time where you are generating a forecast (1969m12, 1970m1, ..,
2015m11) find a way to select a preferred forecasting model. You can do this by
using stepwise selection methods (general to specific or specific to general) or
you can do this by conducting an exhaustive model selection search over all 211
possible regression models, selecting the model by AIC or SIC. Or, you can use
another method such as LASSO. Plot the forecasts against the actual values and
report the root mean squared forecast error associated with the forecasts.
2. How often does your preferred model selection approach include different
predictor variables? [hint: compute the average of the number of times that a
predictor gets selected over the sample from 1970 through 2015].
3. Next, repeat the exercise when you use only a constant in the forecasting model.
This is the prevailing mean (pm) model of Goyal and Welch and corresponds to a
constant expected excess return. Again, compute the out-of-sample forecasts and
the root mean squared forecast error from this model. Which produces the most
accurate forecasts, your model in (1) or the prevailing mean model?
4. Repeat the exercise when you use the kitchen sink model that includes all 10
predictors (plus a constant). Plot the forecasts from this model and report the root
mean squared forecast error.
5. Suppose that instead of using root mean squared forecast errors, you are using an
economic loss function. If you predict a negative excess return, then you hold T-
bills and earn the monthly risk-free rate in column Q in the excel file. Otherwise
you go long in the market and earn the stock return in column P. Compute the
mean return on this market timing strategy using the forecasts in (1), (3), and (4).
Which model generates the highest mean return and the highest Sharpe ratio
(mean return divided by the standard deviation of returns)?
6. Discuss your findings – what are your conclusions regarding how easy it is to
predict stock returns?
Assignment 3
Forecasting with multivariate information
This assignment uses the file HongKong_property_index_combined.xlsx in the
Assignment 3 folder on Triton ED. The data is downloaded from the Global Financial
Data base. It contains monthly information on the Hong Kong Hang Seng property price
index (HKPI, column B), Hong Kong consumer price index (CPI, column C), Hong
Kong 5-year government bond yields (column D), the Hong Kong unemployment rate
(column E) and the Hong Kong-US dollar exchange rate (column F).
You are asked to estimate both univariate and multivariate forecasting models and see if
using multivariate information helps predict changes to the Hong Kong property price
index. To do so, use the longest common sample, 09/30/1994-12/31/2016.
To prepare the data, first take the first-difference of the property price index, i.e., ΔHKPIt
= ln(HKPIt/HKPIt-1). This is the variable we are interested in forecasting. Similarly,
define inflation as inflt = ln(CPIt/CPIt-1). In each case use the longest available data
sample to answer the following questions.
1. Estimate a univariate ARMA forecasting model for ΔHKPIt. Present plots of the
fitted versus the actual value. Do you find significant evidence that ΔHKPI is
predictable? Make sure to present any evidence such as R2-values, t-statistics,
information criteria etc.
2. Next, estimate a multivariate regression model where you add lagged values of
any of the four predictor variables in columns C-F to the ARMA model from
point (1). Do you find evidence that any of these variables help predict ΔHKPIt ?
Present any results that support your argument.
Finally, estimate a vector autoregression (VAR) that includes ΔHKPI, CPI, the 5-year
yield, the unemployment rate, and the HK-USD exchange rate.
3. Using the BIC, which model is best, a VAR(1) or a VAR(2)?
Use the preferred 5-variable VAR model to answer the following questions:
4. Using impulse response analysis, do you find evidence that a shock to 5-year
interest rates affects the growth in Hong Kong property prices?
5. Generate a forecast of the growth in Hong Kong property prices ΔHKPI for the
24-month period at the end of the sample (12/31/2016). Present your forecast in a
graph and comment on what your prediction is for the growth in Hong Kong
property prices.
Assignment 4
Part I. Cointegration analysis
Download the file gold_silver_price_combined.xlsx from the Assignment 4 folder on
Triton Ed. This file contains monthly gold and silver prices over the period 1970-2017.
Use this data to answer the following questions.
1. Are gold and silver prices cointegrated?
2. Can you use past information on silver prices to predict future changes in gold
prices?
3. Can you use past information on gold prices to predict future changes in silver
prices?
Part II Volatility Forecasting
This assignment uses the file Shanghai_SE_composite_price.xlsx in the Assignment 4
folder on Triton Ed. The data is downloaded from the Global Financial Data base. It
contains daily closing values of the Shanghai SE Composite price index, SSECD from
1991 to 2017. We are interested in seeing whether there is evidence of volatility
clustering in Chinese stock prices.
First compute the log first-difference to get the (continuously compounded) daily stock
returns: rt = ln(SSECDt/SSECDt-1).
1. Looking at the autocorrelations, do you find evidence that daily stock returns, rt,
are serially correlated?
2. Looking at the autocorrelations, do you find evidence that daily squared returns,
2
t
r , are serially correlated?
3. Conduct a test for ARCH effects in the return series rt. Do you find that there is
evidence of ARCH effects?
4. Estimate a GARCH model for daily stock returns, rt. Report the estimates and
comment on how persistent the return volatility is. You can model the mean of
returns either as a constant or as a simple ARMA process.
5. Does a regular GARCH(1,1), an EGARCH(1,1) or a GJR(1,1) GARCH model
best fit the Shanghai stock exchange returns series? Make sure to present any
statistical evidence used to support your answer.
6. Save the residuals from your GARCH(1,1) model, εt. Then generate normalized
residuals, ,/ 1|| ttt where
2
1| tt
is the GARCH(1,1) estimate of the conditional
variance at time t, given information at time t-1. If the GARCH(1,1) model is
correctly specified, these normalized residuals should follow a standard normal
N(0,1) distribution. Evaluate if this holds here.
7. Using the volatility forecasts from part 5, generate 50% and 95% interval
forecasts for returns on the Shanghai stock index for the period 1/3/2017-
2/10/2017. Plot the time-series of actual returns against these interval forecasts
and comment on what you find.
Assignment 5
Part I: Evaluating Analysts’ Forecasts
The data set Alcoa.xlsx in the Assignment 5 folder on Triton Ed contains Analysts’
consensus (mean) forecast of earnings per share (EPS) for Alcoa for fiscal years 1977-
2014. The worksheet “forecasts” shows analysts’ forecasts of EPS for 12 monthly
horizons, running from h = 1 through h = 12 (columns B-M) as well as the “actual” value
announced subsequently. You can think of h = 1 as corresponding to the December
forecast for the current fiscal year’s EPS, while h = 12 corresponds to the January
forecast, when less is known about the firm’s earnings. The worksheet “forecast errors”
contains the difference between the actual and the forecast forecasts.
You are asked to evaluate how precise the forecasts are and whether they get better as
analysts accumulate more information about EPS during the year.
1. Is there evidence that the Forecasts of EPS are biased at different horizons? To
answer this question, for each horizon (each value of h) test if the mean of the
forecast error is different from zero – you can do this by regressing the time-series
of forecast errors (t = 1977,..,, 2014) on an intercept and using a t-test. Explain
what you find.
2. Do forecasts get more accurate as the forecast horizon shrinks? Plot the mean
forecast errors as a function of ‘h’ for h = 1,…, 12, and explain what you find.
3. Are analysts’ December forecasts (h = 1) more accurate than their forecasts in
January (h = 12)? To answer this question, run a Diebold-Mariano regression of
the time-series difference in the squared forecast errors in December and January.
Report what you find.
4. Are analysts any better at forecasting EPS than a model that simply uses the past
year’s EPS to predict the current EPS (i.e., an annual regression of EPSt+1 on an
intercept and EPSt)?
Part II: Evaluation of stock Return Forecasts
This problem uses the Goyal_Welch_Data.xlsx file in the Assignment 5 folder on Ted.
Use the forecastEvaluation.m file on Triton Ed to generate 11 univariate out-of-sample
forecasts of stock excess returns over the period 1970M1 – 2015M12. Then compute an
equal-weighted average of these forecasts. Also create the prevailing mean forecast and
forecasts using all 11 predictor variables (Kitchen sink model). For each of these three
forecasts, do the following:
1. Compute the Root Mean Squared Forecast Error (RMSFE).
2. Compute the Minzer-Zarnowitz test for unbiasedness of these forecasts.
3. Compute a directional accuracy test based on forecasts of positive or negative
excess returns.
4. Using the DM test, which forecast is best- the equal-weighted combination or
the prevailing mean? Can you tell with much confidence?