University of Toronto Scarborough
Department of Computer and Mathematical Sciences
Introduction to Machine Learning and Data Mining
CSCC11H3, Fall 2021
Dr. Masoud Ataei
Take-home Final Exam
12/12/2021 – 12/21/2021, 11:59 pm
1 Stock Market Regimes
Preliminaries
Let C1, C2, . . . , Ck be k mutually exclusive and exhaustive events, and let πi be the probability that the
outcome of a random experiment belongs to Ci, i = 1, 2, . . . , k, where the random experiment is repeated
for n independent times. Define the random variable Zi to be equal to the number of outcomes that are
elements of Ci, and note that each one of the events {ω : Zi(ω) = zi} occurs with one and only one of the
mutually disjoint sets defined above; that is,
Zi = Zi ∩ (C1 ∪ C2 ∪ · · · ∪ Ck)
hence
Zi = (Zi ∩ C1) ∪ (Zi ∩ C2) ∪ · · · ∪ (Zi ∩ Ck) .
Then, the probability that exactly zi terminations of the random experiment are in Ci, i = 1, 2, . . . , k, is
given by the multinomial distribution
P (Z1 = z1, Z2 = z2, . . . , Zk = zk) =
n!
z1!z2! . . . zk!
πz11 π
z2
2 . . . π
zk
k
when
∑k
i=1 zi = n and
∑k
i=1 πi = 1. The situation frequently encountered in statistics is that a complex
set of sample points is given for which the underlying distribution does not take the form of simple
distributions. In this case, a practical approach to model the data would be to resort to mixture models
where it is assumed that the sampling distribution of the data contains several components such that each
component of the mixture model follows a simple parametric distribution.
For instance, let Z denote a multinomial random variable whose support contains k discrete categories.
Furthermore, let X denote a random variable with n realized real values x = (x1, x2, . . . , xn) . The main
1
2
idea behind mixture models is to assume that the data are generated by first sampling Z, and thereafter the
data points x are realized under a sampling distribution which depends on Z.
Furthermore, letC1, C2, . . . , Ck be k mutually exclusive and exhaustive events, and let πi be the probability
that the outcome of a random experiment belongs to Ci, i = 1, 2, . . . , k, where the random experiment
is repeated for n independent times and
∑k
i=1 πi = 1. Note that, in context of mixture distributions, the
probabilities πi are referred to as mixing probabilities.
Furthermore, each of the disjoint sets C1, C2, . . . , Ck provides a support for its respective pdf fi(x), i =
1, 2, . . . , k such that each pdf has mean µi and finite variance σ2i . Then, the following function
f(x) =
k∑
i=1
P(Z = i)fi(x | z = i)
=
k∑
i=1
πifi(x)
is a valid pdf for some continuous-type random variable X . Given that every fi(x), i = 1, 2, . . . , k be
following normal distributions, then the mixture distribution presented above is called a Gaussian mixture
distribution.
Now, let Xt be a random variable representing the monthly log-returns of VIX, i.e.,
Xt = log
VIXt
VIXt−1
.
The CBOE volatility index (VIX) is a prominent measure for the market’s expected volatility implied by
S&P500 options.
Given the following density function for the mixture of Normal distributions
fXt (xt;θ) =
k∑
i=1
πi
(
1
√
2πσi
)
exp
{
−
(xt − µi)
2
2σ2i
}
where
θ =
θ1
θ2
…
θk
=
µ1 σ1 π1
µ2 σ2 π2
…
…
…
µk σk πk
our goal would be to maximize the log-likelihood function by solving the following constrained optimiza-
3
tion problem
min l(θ) = −2n
∑m
j=1 p̂j log
(
pj(θ)
p̂j
)
s.t.
∑k
i=1 πi = 1
πi ∈ [0, 1] i = 1, . . . , k
µi ∈ (−∞,∞) i = 1, . . . , k
σi ∈ (0,∞) i = 1, . . . , k
where
pj(θ) = P [bj ≤ Xt ≤ bj + s | p̂]
=
∫ bj+s
bj
k∑
i=1
πi
(
1
√
2πσi
)
exp
{
−
(xt − µi)
2
2σ2i
}
dxt
=
k∑
i=1
πi
∫ bj+s
bj
(
1
√
2πσi
)
exp
{
−
(xt − µi)
2
2σ2i
}
dxt
The null hypothesis that the model based on k regimes fits the data sufficiently well gets accepted if its
corresponding p-value of χ2 test is greater than 0.05 level of significance.
Recall that in practice, the actual value of k is unknown, and it is usually identified through conducting
some experiments. For this purpose, you should perform goodness-of-fit tests for each regime hypothesis
by taking advantage of the asymptotic distribution of the objective function l(θ) which has a χ2 distribution
with (m− 3k − 1) degrees of freedom.
In addition, in order to determine which regime was primarily dominant at any given month, we can define
a random variable St denoting the state of the Xt at a given month t. Then, the probability that regime
St = i has generated a particular observation xt is obtained by
P (St = i | xt;θ) =
fi (xt;θi)
f (xt;θ)
where f (xt;θ) is the value (i.e. height) of the unconditional density function of random variable Xt at
observation xt and fi (xt;θi) is the height of the regime i conditional density function at observation xt.
The regime present at any given month is that with the maximum conditional probability (mcp)
max
i
{P (St = i | xt;θ)}
4
Questions
1- Load the file MVIX.pkl containing monthly values of VIX from January 1990 – December 2019, and use
the derivative-free optimization GA solver to learn the parameters of MoG. The optimal number will be
one of k = 1, 2, 3, 4, 5.
2- For each component of the mixture distribution, compute and report the estimates of its parameters (you
may report them in a table).
3- How did you decide the correct number of the components (regimes) of the mixture distribution, say
denoted by k?.
4- Find the mcp at each observation and then compute a k? × k? probability transition matrix, where each
element will be the probability by which the market could switch from its current regime to another one, or
else just remain in the same regime.
5- Why do you believe the regime-switching phenomenon occurs in stock market?
Stock Market Regimes