University of Toronto Scarborough
Department of Computer and Mathematical Sciences
Introduction to Machine Learning and Data Mining
CSCC11H3, Fall 2021
Dr. Masoud Ataei
Take-home Final Exam
12/12/2021 – 12/21/2021, 11:59 pm
Forces Driving the Market Volatility
In the first part of this case study, you investigated the number of regimes that underlie the stock market’s
volatility. Our main hypothesis was that the rare fluctuation which have occur in the stock market, could
potentially be the outcome of the market changing from one regime to another one. To this end, you were
asked to make inference on existing regime(s) of the market by analyzing the behaviour VIX index over
time. Even though that analysis provided valuable information on many aspects pertaining to volatility of
the stock markets, there still remains the question that what driving forces are behind the regime-switching
phenomenon of stock markets.
There are several different metrics to quantify market volatility; however, the most well-known and
followed metrics are realized and implied volatility. The realized volatility gauges the fluctuations of
underlying securities or indices by measuring price changes over predetermined periods, while implied
volatility is a forward-looking metric representing future expectations of the market’s uncertainty. The
most important member of the latter family is the Chicago Board Options Exchange’s (CBOE) VIX index
that can be considered an estimator of the equity market’s implied volatility.
In this part of the final exam, your task is to build a regression model to predict the one-step-ahead value
of the VIX using publicly available economic and financial information that are obtained using text mining
algorithms. Two predominant daily indexes of such kind are the economic policy uncertainty (EPU)
developed in (Baker, Bloom, and Davis, 2016) and the equity market volatility (EMV) tracker presented
in (Baker, Bloom, Davis, and Kost, 2019). More specifically, the former index is constructed in a way that
it reflects the frequency of articles in 10 leading U.S. newspapers that include the following trio of terms:
economy or economic; uncertain or uncertainty; and one or more of the category of terms containing
Congress, deficit, Federal Reserve, legislation, regulation and the White House.
On the other hand, EMV is constructed by obtaining daily counts of articles containing at least one term
1
2
in the categories economy or economic; uncertain or uncertainty; and one or more words from the equity
market, equity price, stock market and stock price. It is noted that the U.S.-related articles used in the
construction of EMV exceed 1000 newspapers and are retrieved from the Access World News’ NewsBank
service.
In the following, we will only consider the monthly category-specific EMV uncertainty indexes for the
purpose of identifying the economic and financial factors that trigger regime switches in the stock market.
Note that EMV and its various categories found in MEMV.pkl are constructed similarly to EPU. The
main difference consists of more restrictive criteria imposed on the terms related to economy, policy, and
uncertainty.
Steps
– Load the file MEMV.pkl containing monthly values of MVIX and all categories of monthly EMV (MEMV)
trackers from January 1990 – December 2019, and develop a regression model not only to for the purpose
of predicting the further values of MEMV. But, more importantly, the results of the regression analysis
should possibly lead to new information by revealing the economic, financial and political factors that
affect the behaviour of VIX over time. For implementation purposes, you are free to either use your
previous program codes, the provided GA solver, or any other packages available on open source. Also,
note that the last column corresponds to values of monthly VIX at time t, whereas its covariates are at each
row pertain to t− 1.
– Solve this problem using OLS, LASSO, Ridge Regression and Elastic Net Regression, and compare their
performances by providing a thorough interpretation of their results. For each method, you may discuss its
advantages and drawbacks (such as presence of multicollinearity), outliers, etc., and then elaborate on their
other aspects like feature selection, sparsity, etc.
– Can you provide a coherent picture of mechanisms underlying the sudden volatility changes in stock
markets?