title: “Monte Carlo”
output: html_document
“`{r,echo=F}
Copyright By PowCoder代写 加微信 powcoder
knitr::opts_chunk$set(cache=T)
[link to source](L03_monte-carlo.Rmd)
Monte Carlo is a class of simulation methods that use random number generation to solve a wide range of problems from prediction to estimation to testing. They have been used as early as the 1930s, and are now more powerful than ever due to modern technological advancements.
## Origin of name
Stanisław Ulam was working on top secret nuclear weapons projects in the 1940s at Los Alamos National Laboratory when he invented one of the most modern Monte Carlo methods. John von Neumann, one of the fathers of computing, helped him write the first computer program to run the algorithm.
Due to the secret nature of the research, they needed a codename for it. Their colleague suggested the name *Monte Carlo* after the Monte Carlo Casino in Monaco, due to the inherent randomness of the method and due to Ulam’s uncle frequently borrowing gambling money because he “just had to go to Monte
Carlo”. ^[Metropolis, N. (1987). “The beginning of the Monte Carlo method”. Los Alamos Science: 125–130.]
Despite being limited by the computational power available at the time, this method proved very powerful and is widely used today in many fields of research.
## Example 0: estimating $\pi$
As an initial motivating example, suppose we don’t know **any** digits of $\pi$, but we do know it’s the ratio of the area of a circle to its radius. How can we use random numbers to estimate $\pi$?
One thing we can do is generate uniformly-spaced points in the square below:
If we then count how many of the points are inside the circle, we can estimate the percentage of the square that is covered by the circle, and use that to estimate $\pi$.
# choose N (simulation size)
# use this to count how many points are in circle
in.circ = 0
# repeat steps N times
for(i in 1:N){
# for each point, generate 2 coordinates (x,y) randomly between -1 and 1
point = runif(n=2, min=-1, max=1)
# to be inside circle, must satisfy x^2 + y^2 < 1 if(point[1]^2 + point[2]^2 < 1){ # if inside, add to count in.circ = in.circ+1 # to get proportion of square covered, take in.circ/N prop = in.circ/N # to get estimate of pi, take prop * 4 pi.mc = prop * 4 # what are our estimate and percentage error? cat(sprintf("estimate: %.4f\n %% error: %.2f%%",pi.mc,abs(pi.mc-pi)/pi*100)) ## Some theory It should make sense intuitively why this works, but it can also be shown mathematically to be correct. Let $X$ be the RV of whether a uniformly randomly sampled point in the square lies inside the circle. So, $X=1$ with probability $\dfrac\pi4$ and $X=0$ with probability $1-\dfrac\pi4$. Let $X_i$ represent repeated [I.I.D.](https://en.wikipedia.org/wiki/Independent_and_identically_distributed_random_variables) samples of this RV. Now, let's think carefully about how we got our previous estimate. First, we randomly and uniformly sample $N$ points inside the square. This can be represented by sampling $X_1,X_2,...,X_N$. Then, we counted the proportion of them that lie inside the circle. This can be represented by adding the $X_i$ in our sample and dividing by $N$. This proportion gives us the percentage of the area of the square that is covered by the circle. Thus, to get our final estimate for $\pi$, we just need to multiply by $4$. Thus, our estimator can be represented as $$\hat\pi:=4*\frac1N\sum_{i=1}^NX_i$$ We can easily derive the expectation and variance of this estimator: $$\begin{align} E(\hat\pi)&=E\left(\frac4N\sum_{i=1}^NX_i\right)\\ &=\frac4NE\left(\sum_{i=1}^NX_i\right)\\ &=\frac4N\sum_{i=1}^NE(X_i)\\ &=\frac4N\sum_{i=1}^N\left[(1)(\tfrac\pi4)+(0)(1-\tfrac\pi4)\right]\\ &=\frac4N\sum_{i=1}^N\frac\pi4\\ &=\frac4N\left(N*\frac\pi4\right)\\ \end{align}$$ $$\begin{align} Var(\hat\pi)&=Var\left(\frac4N\sum_{i=1}^NX_i\right)\\ &=\left(\frac4N\right)^2Var\left(\sum_{i=1}^NX_i\right)\\ &=\frac{16}{N^2}\sum_{i=1}^NVar(X_i)\\ &=\frac{16}{N^2}\sum_{i=1}^N\left[(1-\tfrac\pi4)^2(\tfrac\pi4)+(0-\tfrac\pi4)^2(1-\tfrac\pi4)\right]\\ &=\frac{16}{N^2}\sum_{i=1}^N\frac\pi4\left(1-\frac\pi4\right)\\ &=\frac{16}{N^2}\left(N*\frac\pi4\left(1-\frac\pi4\right)\right)\\ &=\frac{\pi(4-\pi)}N \end{align}$$ _**bonus exercise**: derive same equations for $E$ and $Var$ by treating $\sum X_i$ as a binomial variable and applying the binomial RV equations for $E$ and $Var$._ We can easily verify the truth of these formulae by simulating multiple "runs" of this experiment and looking at the distribution of estimates we obtain. # choose M (number of times to repeat MC experiment) # create vector to save results in mc.est = rep(NA,M) # for each experiment, do all the steps done before, get an estimate, and save it for(j in 1:M){ # these lines are copied exactly from above in.circ = 0 for(i in 1:N){ point = runif(n=2, min=-1, max=1) if(point[1]^2 + point[2]^2 < 1){ in.circ = in.circ+1 prop = in.circ/N pi.mc = prop * 4 # save result in vector mc.est[j] = pi.mc # what do the estimates look like? options(max.print=500) mean(mc.est) var(mc.est) var.theory = pi*(4-pi)/N var.theory # deviation of our mean and variance from theory: cat(sprintf("%% deviation from E : %.3f%% \n%% deviation from Var: %.3f%%", abs(mean(mc.est)-pi)/pi*100,abs(var(mc.est)-var.theory)/var.theory*100)) ## Important notes: - the estimator is **unbiased**, since $E(\hat\pi)-\pi=0$, which means there's no systematic error - the estimator variance $\propto1/N$, so we can increase our precision by simulating more points - note the standard deviation $\propto1/\sqrt N$, so if you want to lower error by factor of $\frac12$ you need to increase simulation size by factor of $4$ ## Generalization We can generalize this a little so you can see a bigger picture of what we're doing (without going into too much detail). Suppose we want to find the following integral $$I=\int_Df(x)\,dx$$ where the integral is taken over some arbitrary region $D$. If we can write $$f(x)=g(x)\cdot p(x)$$ for some $g$ and $p$, then we have $$I=\int_Dg(x)p(x)\,dx=E(g(x))$$ Then, we just need to take a large sample of $g(x_i)$ and find its expectation to estimate the integral. This is all analogous to what we did before in Example 0 ## Example 1: Buffon's needle Asked by Georges- , Comte de Buffon in 1700s: > Suppose we have a floor made of parallel strips of wood, each the same width, and we drop a needle onto the floor. What is the probability that the needle will lie across a line between two strips? ^[https://en.wikipedia.org/wiki/Buffon%27s_needle_problem]
This question is actually not that hard to derive, but we can very easily solve it using MC!
Suppose we again take the same square from the previous example with endpoints at $(\pm1,\pm1)$ and divide it into 2 halves: the top half is one strip of wood, and the bottom half is another strip of wood.
“`{r,fig.height=4,fig.width=4}
library(ggplot2)
ggplot() + geom_rect(aes(xmin=-1,xmax=1,ymin=0,ymax=1),fill=”grey30″,color=NA) +
geom_rect(aes(xmin=-1,xmax=1,ymin=-1,ymax=0),fill=”grey80″,color=NA) +
scale_x_continuous(expand=c(0,0)) + scale_y_continuous(expand=c(0,0))
We start by randomly sampling a point somewhere in this square, then we randomly pick an angle and extend the ends of the needle each 0.5 units of length. Finally, we check if the needle crosses any of the lines $y=0,\pm1$.
Let’s start by plotting our function to make sure it’s doing what we want.
# define needle dropping function
needle = function(){
# randomly pick center of needle
xy = runif(2,-1,1)
# randomly pick angle
angle = runif(1,0,pi)
# calculate delta x and delta y of ends from center
# we are defining the angle using the standard definition
# (going counterclockwise from positive x axis)
dx = cos(angle)*0.5
dy = sin(angle)*0.5
# calculate coordinates of end points
# note in this way, our dy is always positive, so p1 is always higher than p2
# this will be useful later for checking crossing
p1 = setNames(xy + c(dx,dy), c(“p1x”,”p1y”))
p2 = setNames(xy – c(dx,dy), c(“p2x”,”p2y”))
# return endpoints
return(c(p1,p2))
# plot a few needle drops
# create dummy variable for ldply
library(plyr)
ndl = function(i) needle()
needles = ldply(1:10,ndl)
ggplot() + geom_rect(aes(xmin=-1,xmax=1,ymin=0,ymax=1),fill=”grey30″,color=NA) +
geom_rect(aes(xmin=-1,xmax=1,ymin=-1,ymax=0),fill=”grey80″,color=NA) +
geom_segment(data=needles,mapping=aes(x=p1x,y=p1y,xend=p2x,yend=p2y), color=”red”,size=1.5)
Now, we run our function a large number of times and check how many cross the lines.
“`{r, cache=T}
# set size
N = 1000000
# create vector to count crossings
mc.cross = 0
# run function
for(i in 1:N){
# drop a needle
points = needle()
p1y = points[‘p1y’]
p2y = points[‘p2y’]
# check if p1 and p2 are on opposite sides of a line
if( (p1y > -1 & p2y < -1) || (p1y > 0 & p2y < 0) || (p1y > 1 & p2y < 1) ){
mc.cross = mc.cross+1
Now, we have the solution to our problem.
mc.cross/N
We can compare with the correct answer and see how close we are.
true.cross = 2/pi
cat(sprintf(" true value: %.5f \n our value: %.5f \n%% deviation: %.3f%%",
true.cross,mc.cross/N,abs(true.cross-mc.cross/N)/true.cross*100))
_**exercise**: Write a function that takes 3 arguments: `L` for length of needle, `T` for width of floor, and `N` for number of replicates; performs the above process; and returns the probability of a crossing.
## Random v. pseudorandom
For MC, it's important to have a good source of random numbers whose distribution is precisely known. This is a **surprisingly difficult** problem. There are ways of generating (as far as we can tell), almost perfectly uniformly random numbers, such as measuring [atmospheric noise](https://www.random.org/randomness/), measuring [radioactive decay](https://www.techrepublic.com/article/computer-scientists-derive-truly-random-numbers-using-two-source-extractors/), or even [lava lamps](https://en.wikipedia.org/wiki/Lavarand) ([used by Cloudflare](https://www.cloudflare.com/learning/ssl/lava-lamp-encryption/)). These sources are generally considered capable of producing the most **truly random** numbers.
Your computer (unless it's attached to a Geiger counter or a wall of lava lamps) is only capable of producing **pseudorandom** numbers. These are made by running a [pseudorandom number generator algorithm](https://en.wikipedia.org/wiki/Pseudorandom_number_generator) which is deterministic (i.e. same output for same input) such as the Mersenne-Twister (which is used by default in R). Even though these are deterministic, they pass the test of **statistical randomness** which means as far as we can tell, there are no discernable patterns in the output, and are usually close enough to random for computational purposes.
Some resources if you want to learn more:
## Example 2: sampling from arbitrary CDF
Onto more examples! Here’s a much more interesting one: given an arbitrary CDF (or PDF, which you can obviously just integrate to get the associated CDF), how can you draw samples from it?
It can be easily shown that the CDF of a random variable follows the uniform distribution. What this means is that for **any** arbitrary distribution, if you draw many samples from it, and then plot a histogram of the *percentile* of each observation (i.e. what proportion of samples in the population are less than your observation), it would always appear uniform.
N = 100000
options(repr.plot.width=16)
par(mfrow=c(2,4))
hist(rnorm(N)) ; hist(rchisq(N,10)) ; hist(rexp(N,1)) ; hist(rbeta(N,.5,.5))
hist(pnorm(rnorm(N))) ; hist(pchisq(rchisq(N,10),10)) ; hist(pexp(rexp(N,1),1)) ; hist(pbeta(rbeta(N,.5,.5),.5,.5))
We can use this fact to generate observations from any arbitrary distribution.
Let $F_X(x)$ be the CDF of some arbitrary variable $X$, and let $Y_i$ be uniformly distributed on $[0,1]$. Then, we have that $F_X^{-1}(Y_i)\simeq X$ where $F_X^{-1}$ is the inverse of the CDF (which is always [nondecreasing and right-continuous](https://en.wikipedia.org/wiki/Cumulative_distribution_function#Properties)), and $\simeq$ means having the same distribution.
In other words, randomly sampling from the uniform distribution and then applying the inverse of the given CDF function gives the desired target distribution.
We can first demonstrate this by sampling from a chi-square distribution with 12 degrees of freedom:
# reset print width
options(repr.plot.width=7)
# generate N uniform observations between 0,1
x.unif = runif(10000)
# apply built in inverse CDF function for chi-square
x.chi.mc = qchisq(x.unif,12)
# plot results
hist(x.chi.mc)
# compare with build in R method for random generation
hist(rchisq(10000,12))
# plot our sample quantiles against theoretical quantiles for comparison
plot(sort(x.chi.mc),qchisq(ppoints(10000,1),12))
As you can see, the above method produces the exact same output as the built in `rchisq( )` generator.
## Example 3: hot hands
For the next example, let’s do something slightly more complicated.
A certain professional basketball player believes he has “[hot hands](https://en.wikipedia.org/wiki/Hot_hand)” when shooting 3-point shots (i.e. if he makes a shot, he’s more likely to also make the next shot). His friend doesn’t believe him, so they make a wager and hire you, a statistician, to settle the bet.
As a sample, you observe the next morning as the player takes the same 3-point shot 200 times in a row (assume he is well rested, in good physical shape, and doesn’t feel significantly more tired after the experiment), so his level of mental focus doesn’t change during the experiment). You obtain the following results, where Y denotes a success and N denotes a miss:
YNNNNYYNNNYNNNYYYNNNNNYNNNNNNNNNYNNNNNYYNYYNNNYNNNNYNNYYYYNNYYNNNNNNNNNNNNNNNYYYNNNYYYYNNNNNYNYYNNNNYNNNNNNYNNNYNNYNNNNNYNYYYNNYYYNYNNNNYNNNNNNNYYNNYYNNNNNNYNNNYNNNNNNNNYNNNYNNNNNYYNNNNNNYYYYYYNYYNNYN
Note that the existence of a “hot hands” effect means the shots are not indepedent. Also note that there’s a third possibility: that the player is more likely to “[choke](https://en.wikipedia.org/wiki/Choke_(sports))” and miss the next shot if he scored the previous one (e.g. maybe scoring a shot makes him feel more nervous because he feels like he’s under pressure).
### Attempt 1: run length
Since the existence of a hot hands effect tends to increase the run lengths of `Y`s compared to if the shots were independent, we can use the longest run length as a way of comparing independence vs hot hands (note if the player is a choker, they will tend to have shorter runs of `Y`s than if they were independent, so you can simply ignore this case for now and compare hot hands v. independence for simplicity).
Now, how exactly do you compare these two situations and determine which is a better fit for the data?
One thing that’s worth noting is that ***if a sequence of repeated experiments is independent, then it shouldn’t matter what order the results are in***. This should be fairly easy to understand and agree with.
Let’s ***assume that the throws are totally independent***. Recall we also assume he doesn’t get tired so his baseline shot-making ability doesn’t change over the course of the experiment. Therefore, we should be able to (under these assumptions) ***arbitrarily reorder his shots without affecting any statistical properties of his shot sequence***. So let’s do that!
We begin by parsing the throws into a vector of `Y` and `N`.
# the sequence of throws is broken up into 4 chunks for readbility, then
# paste0 is used to merge them into a single sequence, then
# strplit(“YN…N”,split=””) is used to split the string at every “”, so
# we get a vector of each character, and finally
# [[1]] is used to get the vector itself (strsplit actually outputs a list
# with the vector as the first element; [[1]] removes the list wrapper)
# for more info about the strsplit function, see
# https://www.journaldev.com/43001/strsplit-function-in-r
throws = strsplit(
paste0(“YNNNNYYNNNYNNNYYYNNNNNYNNNNNNNNNYNNNNNYYNYYNNNYNNN”,
“NYNNYYYYNNYYNNNNNNNNNNNNNNNYYYNNNYYYYNNNNNYNYYNNNN”,
“YNNNNNNYNNNYNNYNNNNNYNYYYNNYYYNYNNNNYNNNNNNNYYNNYY”,
“NNNNNNYNNNYNNNNNNNNYNNNYNNNNNYYNNNNNNYYYYYYNYYNNYN”), split=””)[[1]]
Next, we write a function to get the longest run of `Y`s in the throw sequence. Here we use a convenient function called `rle( )` which is short for [run length encoding](https://en.wikipedia.org/wiki/Run-length_encoding), which turns our sequence of throws into sequences of runs (e.g. YNNNNYYNNNY becomes something like “1 `Y`, 4 `N`s, 2 `Y`s, 3 `N`s, and 1 `Y`”). We can then simply take the longest of the `Y` runs.
longestRun = function(x,target = ‘Y’){
max(0,with(rle(x), lengths[values==target]))
longestRun(throws)
Now, we randomly shuffle the sequence of throws many times and see what the longest `Y` runs look like for these shuffled sequences.
# set number of reps to use
# create vector to save results in
mc.runs = rep(NA,N)
# for each rep, randomize sequence and find longest run of Y
for(i in 1:N){
mc.runs[i] = longestRun(sample(throws))
options(max.print=500)
hist(mc.runs)
compared to other shuffled sequences, our run length doesn’t seem that unlikely. Therefore, this method seems inconclusive.
Can we find an even better “statistic” to use?
### Attempt 2: running odds ratio
Consider **every pair of consecutive throws** and make a table of the outcomes. For example, the first 8 throws in the sequence are YNNNNYYN. Breaking this into consecutive pairs, we have YN, NN, NN, NN, NY, YY, YN. This gives the table:
| NN | NY | YN | YY |
|:–:|:–:|:–:|:–:|
| 3 | 1 | 2 | 1 |
Suppose we do this for the entire sequence of 200 throws (note this gives you 199 pairs). If we **divide the number of NY by the number of NN**, we get an estimate for **how much _more_ likely he is to make the next shot _assuming he missed his last shot_**.
Similarly, we can **divide the number of YY by the number of YN** to get an estimate for **how much _more_ likely he is to make the next shot _assuming he scored his last shot_**.
Now, note that **if the “hot hands” effect really exists** in the data, then **YY/YN should be larger than NY/NN** in a large enough sample. We use this fact to define the following quantity:
$$R=\frac{(\text{# of YY})/(\text{# of YN})}{(\text{# of NY})/(\text{# of NN})}$$
The ratio $R$ represents, in some sense, **how much more likely** the player is to **make the next shot** if he **made the previous shot _vs_ if he didn’t make the previous shot** (note the **_vs_**). This is exactly what we’re trying to investigate!
If there is a “hot hands” effect, the numerator should be greater than the denominator and we should have $R>1$. If the throws are independent and do not affect each other then in theory we should have $R=1$. If the player is actually
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com