R语言数据统计代写: SS2864B, Assignment #5

SS2864B, 2018

Assignment #5 due to April 6, 2018

Instructions Submit an electronic version (pdf, words) of your solutions (appropriately annotated with comments, plots, and explanations; notice that neatness counts) to owl. Save all your R codes in one script (or markdown) file with proper comments and submit it as well to owl.

  1. Letf(x)=(cosx)2 for0<x<2π.
    1. (a)  Graph the function f(x).
    2. (b)  Use Monte Carlo integration with uniform[0, 2π] sample size 1,000,000 to find the area under f(x) on the range 0 < x < 2π, and to find a 95% confidence interval for the area.
    3. (c)  Use trigonometry or calculus to find the same area exactly. Did the confidence interval cover the true value?
    4. (d)  Write a function called rcos2 which generate random values from the density f(x)/k,0 < x < 2π, where k is the area found in (c). The function should take a single argument specifying how many samples are required, e.g., rcos2(10) would return a vector of 10 samples from this distribution. Use the rejection method to draw the samples. Plot a histogram based on 1000 samples and overlay with the true density f(x)/k.
    5. (e)  Use your function rcos2 to draw a sample of 1,000,000 samples, and calculate a 95% confidence interval for the mean. Did your confidence interval cover the true value?
  2. In this question, you will do some resampling and show results in graphics. This is related to bootstrap technique. The population distribution is a normal with μ = 5 and σ2 = 4. The statistic is the sample mean. Hence in theory we know exactly what the density function of the sample mean is.
    1. (a)  Simulate a sample, say x, with sample size n=100. Report its mean, sd, min, and max.
    2. (b)  Use R functions sample and replicate to resample x 50000 times with replacement. The statistic is the sample mean and the output is booted.data. Find the mean, sd, min, and max of booted.data.
    3. (c)  Plot the histogram of booted.data. Please double the cells of histogram since the default one is too small. Please plot as a density plot since the theoretical density will be added in the next step. Comment the shape and center of this distribution.
    4. (d)  Plot the histogram of booted.data-mean(x) with twice number of default cells. Please plot as a density plot. Add the theoretical density function of the X ̄ − μ to the histogram with different line type and color. Comment out your findings.
    5. (e)  Repeat the procedures from (a) to (d) two additional times to check consistency.
  3. For an odd number of data x = (x1, . . . , xn), the minimizer of the object function

is the sample median of x.

n f(θ|x)=􏰍|xi −θ|

i=1

1

  1. (a)  Build an object function my.obj with arguments theta and x. The return value is 􏰌ni=1 |xi − theta|. Try to vectorize your codes without any looping.
  2. (b)  Use the R function optimize to find the min value of the object function f(θ|x) for a given data x. Please implement it as an R function with arguments x and interval=(min(x), max(x)). The return value is the theta that minimizes the f. Test your function with the dataset 3, 7, 9, 12, 15, 18, 21.
  3. (c)  Use the R function nlminb to find the min value of the object function f(θ|x) for a given data x. Please implement it as an R function with arguments x and start=mean(x). The return value is the theta that minimizes the f. Test your function with the dataset 3, 7, 9, 12, 15, 18, 21.
  4. (d)  Test your functions from (b) and (c) with the dataset 1, 3, 7, 9, 12, 15, 18, 21. With the function from (b), use three different wider intervals. With the function from (c), use three different start values. What are your findings? For such a dataset, plot f with theta from min(x) and max(x) and conclude your findings.

4. In this question you will use the R function nlminb to fit the sunspot.year with AR(3) time series model. The time series sunspot.year records the yearly numbers of sunspots from 1700 to 1988 (rounded to one digit). Use the following steps to carry out your study.

  1. (a)  Compute the mean, sd, min, max, and median of sunspot.year. Do boxplot and time series plot and comment your findings.
  2. (b)  Since we will use the LS method to fit data with AR(3) model, we need to generate the swift matrix (similar to the question 4 in midterm). Write an R function called shift with arguments x and p=3, where x is a data vector and p is an positive integer. The return object will be a matrix of

     xp xp−1 ··· x1 

     xp+1 xp ··· x2  , ············

xn−1 xn−2 · · · xn−p
where n is the length of x. for, while, or repeat looping is not allowed.

(c) The AR(3) model is defined as
Xt =φ1Xt−1 +φ2Xt−2 +φ3Xt−3 +εt,t=3+1,3+2,…,n.

The parameters φi,i = 1,2,3 can be estimated by many methods but we use the LS method instead. Write the object function my.ls with arguments par, x, and p=3. In the function body, you use the swift function from (b) to generate the required matrix and the return value should be the sum of square errors (for, while, or repeat looping is not allowed). Then use nlminb to find the LS estimator par with x=sunspot.year- mean(sunspot.year). Make sure to choose a proper start point.

(d) Based on the estimator par obtained in (c), compute the predicted values as Xˆt =φˆ1Xt−1 +φˆ2Xt−2 +φˆ3Xt−3,t=3+1,3+2,…,n.

Then do a time series plot on x=sunspot.year-mean(sunspot.year) and add the pre- dicted values into it with different color. Comment your findings.

2