代写 graph 11/17/2019 https://stodden.net/DataScience/HW/HW5_IS457.R

11/17/2019 https://stodden.net/DataScience/HW/HW5_IS457.R
# Do not remove any of the comments. These are marked by #
# HW 5 – Due Monday, Nov 18th, 2019, hardcopy in class.
# Naming: IS457_HW5_YOURCLASSID
#*******************ONLY with your class ID*****************************
#***********************************************************************
#Part 1: Simulation (25 points)
#***********************************************************************
## Let X and Y be two random variables following a normal distribution.
## We will use simulation techniques to find the distribution of X+Y.
##Q1
# As we will generate random numbers, to ensure reproducibility, please set the seed to 457.
## Your code here
##Q2
# Generate 1000 samples from a normal distribution with mean=10, standard deviation=2 as X and
# the other 1000 samples from a normal distribution with mean=5, standard deviation=3 as Y.
# Then find the mean and standard deviation of X+Y.
## Your code here
##Q3
# Now use the simulation to estimate the distribution of X+Y and create confidence intervals.
##(a)
## Form a set of Xs and Ys by repeating the individual experiment for B = 2000 times, where
## each experiment has n = 1000 samples. You may want to write a for loop and create two
## matrices “sample_X” and “sample_Y” to save those values.
## Your code here
##(b)
## Calculate the mean of X+Y for each experiment and save it
## to a vector which has a length of B, and plot a histogram of these means.
## Your code here
##(c)
## Now as we have a simulated sampling distribution of X+Y,
## calculate the 95% confidence interval for mean of X+Y (this can be done empirically).
https://stodden.net/DataScience/HW/HW5_IS457.R
1/5

11/17/2019 https://stodden.net/DataScience/HW/HW5_IS457.R
## Your code here
#(d)
# In the above example, we have fixed the sample size n and number of experiments B.
# Next, we want to change B and n, and see how the confidence interval will change.
# Please write a function to create confidence intervals for any B and n.
## Your code here
##(e)
# Suppose the sample size n varies (100, 200, 300, …. , 1000) (fix B=2000) and
# the number of experiments B varies (1000, 2000, … , 10000) (fix n=500).
# Plot your confidence intervals to compare the effect of changing the sample size n and
# changing the number of simulation replications B
# What do you conclude?
# (Hint: Check function errbar() in Hmisc package for plot)
library(Hmisc)
## Your code here
# fix n, B varies
# fix B, n varies
#******************************************************************************
# Part 2: Dice Simulation (20 pts.)
#******************************************************************************
## First we will try to find out how to compute the probability of simple events using simulation.
## Q1. Create a Vector called “die” that contains the numbers 1-6.
# What is the average value you would expect per a dice roll?
### Your code here
## Q2. Create a function to simulate a roll of a fair six-sided dice, called roll_die that has only one
## function argument: the number of times to roll a die, it should return a vector of the result
## from dice rolls.
## Call the function to roll a die 5 times.
### Your code here
https://stodden.net/DataScience/HW/HW5_IS457.R
2/5

11/17/2019 https://stodden.net/DataScience/HW/HW5_IS457.R
## Q3. Roll five fair dice 100 times respectively, and create a 100-element vector
## for the sum of the five rolls called five_dice_100, make a histogram for five_dice_100.
## Set the breakpoints to be all the possible sums of the five dice rolls.
### Your code here
## Q4. What if we roll the dice 10000 times, what would the histogram look like?
## Compare the two histograms side by side, that is, two plots on one graphic device.
## Impose a density line on each plot to help interpretation.
### Your code here
## Q5. What are some noticeable differences from the graph above and why?
### You answer here
## Q6. Let’s roll ten fair dice. Calculate the chance that the sum of their result is
## larger than 30 by simulating 100 throws using replicate() function.
## Hint: compute the proportion of them with sum larger than 30.
### Your code here
## Q7. A dice is not necessarily fair, in which case the probabilities for the 6 sides are different.
## We will look at a way to simulate unfair dice rolls in R.
## First simulate the rolling of some unfair dice and return the logical values
## if the sum is larger than a threshold value (this is an event).
## Then calculate the proportion of them with sum larger than the threshold using the logicals above.
## Your function should have 4 parameters: a threshold value, how many dice, events you have and
## possibilities for 6 unfair sides, and it should return a possibility.
### Your code here
## Q8.
## Draw independently from 1 dice with probability 2/7 for a six and 1/7 for others 30 times,
## computing the fraction of those trials whose sum is at least 4;
## Draw independently from 1 dice with probability 2/7 for a six and 1/7 for others 100 times,
## computing the fraction of those trials whose sum is at least 4;
https://stodden.net/DataScience/HW/HW5_IS457.R
3/5

11/17/2019 https://stodden.net/DataScience/HW/HW5_IS457.R
## Draw independently from 2 dice with probability 3/14 for a five or a six, and 1/7 for others 100 times,
## computing the fraction of those trials whose sum is at least 7;
## What is your expectation for each trail? Are they different with the trials above? Why?
### Your code here
### Your answer here
#************************************************************************************
# Part 3: Calculation of pie (15 pts.)
#************************************************************************************
## Q1. First, create a dataframe called polygon_data
## that contains the points necessary to build a polygon.
## The first column includes “0” and 1000 values evenly distributed in [0,1], called x_val;
## The second column includes “0” and 1000 values that are sqrt(1-x^2), called y_val.
### Your code here
## Then, run the code below to create the shaded area for the quarter circle with ggplot2 package.
library(ggplot2) #Load ggplot2
plot_pi <- ggplot() + geom_polygon(data=polygon_data,aes(x=x_val,y=y_val),alpha=0.1) + theme_bw() par(mfrow=c(1,1)) plot_pi ## Q2. Now, letâ€TMs randomly put dots on the unit square (i.e. square with side length of 1). ## Create a dataframe called dot_data that contains 25 random points by declaring dot positions ## with 2 random uniformly distributed values for x and y. ## Hint: The dimension of dot_data is (25,2) ### Your code here ## Q3. Then define them “inâ€􏰀 or “outâ€􏰀 depending on whether they are within the circle area or not. ## add a column called in_or_out in dot_data to define if the point is in the polygon, ## indicate being "in" with 1 and "out" with 0. ## Hint: calculate in/out with the EU distance from origin. ## Print the head of dot_data. ### Your code here ## Run the code below to see the distribution of your points. plot_pi + geom_point(data=dot_data,aes(x=x_val,y=y_val,color=in_or_out)) + theme(legend.position="none") https://stodden.net/DataScience/HW/HW5_IS457.R 4/5 11/17/2019 https://stodden.net/DataScience/HW/HW5_IS457.R ## Q4. Simulate size of unit circle (pi) by dots. ## Hints: The ratio of the number of dots within the circle area to the total number of dots ## will give us the approximate ratio of the quarter unit circle to the unit square. Note that we are ## only plotting one quadrant here, assuming dots distribution are the same for all 4 quadrants. ### Your code here ## Q5. You may find your simulated value of pi is not very satisfactory. ## Give a solution to make your simulated value closer to the true value of pi. ## Show code and result. ### Your code here https://stodden.net/DataScience/HW/HW5_IS457.R 5/5