代写 R algorithm math graph statistic MATH4/68091 Statistical Computing Coursework 1 (Oct 2019)

MATH4/68091 Statistical Computing Coursework 1 (Oct 2019)
To be submitted by 7.00 pm on Sunday 27 October 2019
This coursework is worth 40 marks all together.
Late Submission of Work: Any student’s work that is submitted after the given deadline will be classed as late, unless an extension has already been agreed via mitigating circumstances or a DASS extension. The following rules for the application of penalties for late submission are quoted from the University guidance on late submission document, version 1.3 (dated July 2019).
”Any work submitted at any time within the first 24 hours following the published submission deadline will receive a penalty of 10% of the maximum amount of marks available. Any work submitted at any time between 24 hours and up to 48 hours late will receive a deduction of 20% of the marks available, and so on, at the rate of an additional 10% of available marks deducted per 24 hours, until the assignment is submitted or no marks remain.”
Your submitted solutions should all be in one document. You are strongly advised to produce this document using LaTex, but if you do feel that you have to use Word, then this will not be penalised.
For each part of the three questions you should provide explanations as to how you completed what is required, show your working and also comment on computational results, where applicable.
When you include a plot be sure to give it a title and label the axes correctly.
When you have written or used R code to answer any of the parts, then you should list this R code after the particular written answer to which it applies. This may be the R code for a function you have written and/or code you have used to produce numerical results and plots.
If you use Latex to produce your report, you can include R code and output from R using the verbatim environment. ie. type
\begin{verbatim}
copy and paste lines of text from R here \end{verbatim}
Your file should be submitted through the module site on Blackboard to the Turnitin assessment entitled ”SC Coursework 1” by the above time and date. The work will be marked anonymously on Blackboard so please ensure that your filename is clear but that it does not contain your name and student id number. Similarly, do not include your name and id number in the document itself.
Turnitin will generate a similarity report for your submitted document and indicate matches to other sources, including billions of internet documents (both live and archived), a subscription repository of periodicals, journals and publications, as well as submissions from other students . Please ensure that the document you upload represents your own work and is written in your own words. The Turnitin report will be available for you to see shortly after the due date.
This coursework should hopefully help to reinforce some of the methodology you have been studying, as well as the skills in R you have been developing in the module so far.
1

1. A sequence of n pseudo-randomized observations on the interval [0,1] can be obtained using the linear congruential generator which works as follows:
yi=(ayi−1+c)(modm), ui= yi , i=1,…,n m−1
for suitable choices of the modulus m, the multiplier a, the increment c and the seed 0 ≤ y0 < m. (See Section 1.8 (Week 1) of the course notes.) (i) Write an R function with the five arguments n, m, a, c and initial value y0 to generate a sequence of n random observations on [0,1] which are output from the function in a single vector of length n. Run your function with the parameter values m = 231 − 1, a = 48271, c = 0 and y0 = 7948 to generate a sample of size n = 1000. Please list the first six values u1 , . . . , u6 that you simulate, but do not print out all 1000 values in your report! [3 marks] (ii) Manually produce (ie. write the code to do it yourself, rather than relying on any generic R functions to directly compute the desired results) a probability-probability (P-P) plot (ex- plained below) of your simulated data (u1,...,u1000) to obtain a graphical impression of whether they can be regarded as being a random sample from a U(0,1) distribution. [The general procedure for producing a P-P plot for a sample of data to check whether it looks plausible that it may be drawn from a particular hypothesised distribution is as follows: If we have a random sample which is denoted by x1, . . . , xn then the empirical distribution function of the data is given by Fn(x) = n where the indicator function IA is defined to be equal to 1 if the event A occurs and equal to 0 otherwise. Hence, Fˆn(x) = number of sample observations ≤ x n Let the hypothesised distribution from which the sample is assumed to be drawn from be denoted by F0(x). To obtain the probability plot for your data, construct a scatterplot of the pairs (Fˆn(x1), F0(x1)), . . . , (Fˆn(xn), F0(xn)) and look for how well they agree. The reasoning is that we are comparing the nonparametric Fˆn(x), which can be shown to be an unbiased estimator of the true distribution function of the random sample, with the hypothesised parametric distribution function F0(x). If our ˆ 1􏰄n I(xi≤x) i=1 2 hypothesis is correct, then the two should match well when they are evaluated at each of the sample values.] Can you graphically enhance your probability plot by providing an aid for subjectively judging whether the hypothesis of a U(0,1) distribution is plausible? Calculate an appropriate summary statistic to quantify the strength of agreement in some sense between Fˆn(x) and F0(x). Give your view as to the suitability of your chosen statistic and comment on its numerical value. [Nb. You do not have to write your own code here to manually calculate your choice of summary statistic; you may use a function already available in R.] [7 marks] [10 marks for Q1] 3 2. It is proposed to use the inverse cdf method to simulate random data from the standard double exponential distribution with pdf f(x) = 1e−|x|, −∞ < x < ∞. ie. 2 􏰃 1ex,x<0 f(x) = 2 1e−x, x≥0 2 (i) Find the cdf of the standard double exponential distribution. (ii) Describe in detail the steps of the inverse cdf method to obtain a random sample of size n from f(x). [3 marks] (iii) Write a function to implement the steps you have described in part (ii) which will output a random sample of size n from f(x). Run your function written for part (iii) to generate a random sample of size n = 5000 from f(x). (Note that you can use the R function runif to generate the required random Uniform data here.) [4 marks] (iv) Construct a histogram of the data generated in part(ii), superimpose the pdf f(x) and comment on the goodness-of-fit. [3 marks] [12 marks for Q2] 4 [2 marks] 3. Suppose now that we want to simulate random data from the standard Normal distribution whose pdf is given by 1 −x2 f(x)=√ e 2 , −∞