Microsoft Word – HW3_assignment.docx
Vadim Elenev Computational Finance 2021 Fall II
Computational Finance – HW 3
1. Generate 5,000 random rolls of a fair six-sided die i.e. one in which each number 1
through 6 is equally likely. Fix your random number generator seed at 200 for
reproducibility.
Write your code in a flexible way that allows you to change how many sides the die has,
or to make it loaded (i.e. some sides have higher probabilities than others), with no
more than one additional or different line of code.
a. Plot and label a histogram showing how many rolls produced each number.
b. Plot and label a histogram showing how many of the first 50 rolls produced
each number.
c. If you did not know whether the die was fair or not, what would you conclude?
d. I offer you the opportunity to bet 20 cents on any number to win $1 if you roll
that number, and $0 otherwise. You are neutral to risk. If you did not know that
the die is fair, just based on what you observed from the first 50 rolls of the die,
do you take the bet? If yes, which number do you bet on?
e. If you decided to bet on a particular number in part (d), how does your bet do
over the next 50 rolls of the die i.e. how much do you make or lose?
f. What does this exercise tell you about learning from small samples? Connect
your answer to the Law of Large Numbers.
2. One common distribution we often encounter in real-world data is the Pareto
distribution. For instance, it approximately describes the upper end of distributions of
income and wealth in a population, the size of cities in a country, or the size of
companies. Pareto-distributed data has a “fat tail” – extremely high positive values are
much more likely than what would have been predicted by some of the other
distributions we know (e.g. normal).
The CDF of the Pareto distribution is
𝐹(𝑥) = &1 − )
𝑥!
𝑥
*
”
, 𝑥 ≥ 𝑥!
0, 𝑥 < 𝑥!
where 𝑥! is the smallest value the Pareto-distributed random variable can take and 𝛼
is the rate at which the probability decreases for large values.
Vadim Elenev Computational Finance 2021 Fall II
In this question, we will approximate the distribution of market capitalizations of
companies in the S&P 500 index as of 12/31/2020.
a. Import market cap data from market_cap.xlsx and use it to estimate 𝑥!.
b. Sample 500 x 100,000 uniformly distributed random numbers.
c. Use the inverse transform method to transform them into a sample of Pareto-
distributed random numbers, assuming 𝛼 = 1.5.
d. Sort each column of the vector using the sort() function and then order
elements largest to smallest, and average across rows to get a 500x1 vector.
What does each element in this vector represent?
e. Use a scatterplot of actual vs. simulated market caps to evaluate your
simulations. To make it easier to gauge results from the graph, I recommend (i)
plotting both market caps in logs, and (ii) also drawing a 45 degree (i.e. y=x)
line.
f. Repeat steps (c) through (e) for 𝛼 = 2.0 and 1.0. Which simulation seems to
describe the data best? Why?