title: “STAT340: Discussion 2: Monte Carlo”
documentclass: article
classoption: letterpaper
html_document:
Copyright By PowCoder代写 加微信 powcoder
highlight: tango
fig_caption: false
table{width:50%!important;margin-left:auto!important;margin-right:auto!important;}
ol[style*=”decimal”]>li{margin-top:40px!important;}
“`{r setup, include=FALSE}
# check packages installed
if(!require(pacman)) install.packages(“pacman”)
pacman::p_load(knitr,tidyverse)
knitr::opts_chunk$set(tidy=FALSE,strip.white=FALSE,fig.align=”center”,comment=” #”)
options(width=120)
[link to source](ds02.Rmd)
## XKCD comic
## Exercises
You can do these exercises individually, but we recommend you work on them in a small group. **Choose 2** of the following exercises to complete.
1. Estimating [Robbin’s constant](https://mathworld.wolfram.com/CubeLinePicking.html) (mean distance between points in a cube).
a. Randomly generate 2 points $(x_1,y_1,z_1)$, $(x_2,y_2,z_2)$ **uniformly** in the unit cube a total of $N$ times (at least $1000$, but the more the better!)
– hint: you can easily generate all the coordinates you need at once with `runif(6*N)`, then [reshape](https://stackoverflow.com/questions/17752830/r-reshape-a-vector-into-multiple-columns) as an $N\times 6$ matrix (one column for each coordinate component, with each row representing a pair of points) and then perform the arithmetic in the next step by using vectorized operations on the columns (i.e. using each column all at once) to improve computational efficiency.
– if you are having difficulties with the above^ you can always use the naive way of running a for loop N times, where in each step of the loop you generate 2 points (6 coordinates total) and then perform the arithmetic in the next step.
b. Next, compute the standard [Euclidean distance](https://en.wikipedia.org/wiki/Euclidean_distance#Higher_dimensions) between each pair of points and find the mean distance. (Bonus: plot the distribution of these distances!)
c. Calculate your [percentage error](https://www.mathsisfun.com/numbers/percentage-error.html) from the [true value](https://stat.ethz.ch/R-manual/R-devel/library/base/html/Constants.html).
d. **Bonus**: can you increase the accuracy of your estimation by using more points? How low can you get the error?
e. **Super bonus**: Repeat the above for another 2D or 3D object of your choice (how about a triangle or a sphere?)
# Do exercise 2 here
2. How many heads in a row should you see in $N$ flips of a fair coin?
a. Start by randomly flipping a fair coin ($p=0.5$) a total of $N=10$ times (hint: use either `rbernoulli` function from `purrr` or `rbinom` with `n=10` and `size=1`) and record how many heads (defined as a value of $1$) in a row you observe (this has been implemented in the function `longestHeadRun` function below for you.
b. Repeat the above step $M$ times (at least $1000$ times, but this time, don’t use an extremely large $M$, since we will repeat the previous step for other values of $N$). What is the mean length of the largest run of heads in $10$ flips?
– **NOTE**: $N$ here is the _**size of each experiment**_ (i.e. each experiment consists of $N$ flips), whereas $M$ is _**how many experiments**_ are performed. It is common in Monte Carlo methods to have two types of parameters, one type for the properties of each experiment, and one type that determines how many experiments are done. Increasing $N$ (number of flips in each experiment) will increase the mean-run-length, whereas increasing $M$ (number of experiments) will increase the precision of your mean-run-length estimate for a particular number of flips.
c. Now, repeat the above (you may use the same $M$) for **at least 3** other values of $N$ (again, feel free to do more if you wish!). Display your results in a table.
– **NOTE** this step should be easy if you’ve written your code with good style. I recommend writing a function that does all the above for any given $N$ and $M$ and maybe $p$, e.g. `findMeanRun = function(N,M,p=0.5){……}`. Then, for different values of $N$ and $M$ you can simply change the arguments given to the function, e.g. `findMeanRun(10,1000)` or `findMeanRun(20,1000)`, etc, then put them in a data frame.
– **ALSO NOTE** the above function syntax^ sets `N` and `M` as arguments to the function without default values, but sets `0.5` as the default value of the argument `p`. For a different example, [see this](https://www.javatpoint.com/r-function-default-arguments).
d. Validate your results against other people’s results (for example, [this post](https://math.stackexchange.com/a/1409539)). Are your results consistent with others?
e. **Bonus**: run a few more values of $N$ and plot the results, showing the mean run length vs number of flips $N$. (bonus²: what happens if you increase $M$?)
f. **Super bonus** if you still want MORE: Like [the post referenced above](https://math.stackexchange.com/questions/1409372/what-is-the-expected-length-of-the-largest-run-of-heads-if-we-make-1-000-flips/1409539#1409539), can you fit a smooth curve through the points?
“`{r, results=’hide’}
# given output of rbernoulli or rbinom (a vector of 0’s and 1’s)
# compute the length of the longest continuous run of 1’s
longestHeadRun = function(trials){
with(rle(trials),max(c(0,lengths[values==1])))
# demo (output hidden for brevity)
longestHeadRun(c(0,0,0,0,0,0,0,0,0,0,0,0)) # returns 0
longestHeadRun(c(1,0,1,1,0,1,1,1,0,0,0,0)) # returns 3
# Do exercise 3 here
3. Estimating a $t$-distribution with $N-1$ degrees of freedom.
a. Choose an arbitrary $\mu$ and $\sigma>0$ to **use for the rest of the problem** (you may choose the standard normal $N(0,1)$ if you _really_ wish, but where’s the fun in that?).
b. Start by sampling $N=2$ values from the normal distribution with mean $\mu$ and standard deviation $\sigma$ (note this counts as $1$ experiment) and calculate the $t$-statistic of your sample. Recall the $t$-statistic for a sample $X$ is defined as
$$t=\frac{\overline{X}-\mu}{s/\sqrt{N}}~,~~~s=\sqrt{\frac{1}{N-1}\sum_{i=1}^{N}(X_i-\overline{X})^2}$$ where $\overline{X}$ is the sample mean and $s$ is the [**sample standard deviation**](https://www.khanacademy.org/math/statistics-probability/summarizing-quantitative-data/variance-standard-deviation-sample/a/population-and-sample-standard-deviation-review)
– **NOTE**: Make sure you’re actually computing the $s$ for this sample, NOT just using $\sigma$ here!
– You can use the built-in `mean( )` and `sd( )`, but if you _really_ want to do a completely manual Monte Carlo, feel free to compute the $t$-statistic yourself.
– **Also NOTE**: Similar to the note in exercise 3.b., $N$ here is the _**size of each experiment**_ and $M$ is _**how many experiments**_ are performed. Increasing $N$ gives a $t$-distribution with a different number of degrees of freedom (namely, $N-1$), whereas increasing $M$ gives a more accurate estimate of each distribution of a particular degree.
c. Repeat the above step $M$ times (similar to exercise 3.b., use at least $1000$ times, but don’t use an extremely large $M$ since we will repeat this for other values of $N$).
d. You’ve just simulated drawing from a $t$-distribution with $N-1=1$ degree of freedom! Now plot the resultant values in a [density](https://www.r-graph-gallery.com/21-distribution-plot-using-ggplot2) plot.
e. For comparison, plot the theoretical distribution with $1$ degree of freedom ([this page](https://t-redactyl.io/blog/2016/03/creating-plots-in-r-using-ggplot2-part-9-function-plots.html) may be helpful). For best results, overlay this on top of the previous plot, but you’re having trouble with this, you can also plot them side-by-side.
f. Repeat the above steps for **at least 3** other values of $N$ (for example 3, 6, 11, but feel free to choose your own or choose more than 3!). For each $N$, plot both your simulated distribution and the theoretical distribution.
– **NOTE**: again, like the note in exercise 3.c., this should be easy if you used a function!
g. **Bonus**: What do you notice about these distributions? What do they converge to and why?
# Do exercise 4 here
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com