CS代写 Instructions

Instructions
Assignment 2
Feb 9, 2022
1) Please submit your solutions to this assignment in one PDF file in Brightspace. Only one file will be accepted.

Copyright By PowCoder代写 加微信 powcoder

2) You can submit a PDF file more than once. However, only the last submission will be saved. If you want to modify your submitted assignment, that is fine as long as it is before the deadline.
3) Late submissions of the assignment are not going to be marked.
4) Please use R markdown for your assignment, unless you are using another language such as Mathe- matica, Maple, etc.
5) You can submit hand written solutions for the mathematical parts of the assignment, but please combine images of your hand-written solutions with the PDF produced with R markdown as one PDF. (See https://imagetopdf.com/ as a possible solution to combine images as one PDF). Alternatively, you can insert your image in the R markdown file.
6) You can work in groups of up to 4 members. Please, only one member of the group should submit the assignment, and the name and student number of each group should be on the assignment.
7) It is not necessary to submit the questions with your assignment. We only need your answers.
8) Deadline: Before 11:59 pm on Wednesday, Feb 16
Using the eigen-decomposition method, generate 1000 observations from a multivariate normal distribution with mean vector μ = E[X] = (0, 1, 2) and covariance matrix
1 −.5.5 Cov(X) = −.5 1 .5 .
Use the pairs.panels plot to graph an array of scatter plots for each pair of variables. For each
pair of variables, check that the correlations approximately agree with the theoretical parameters.
(b) For each observation x ∈ R3 in your sample in part (a), compute its Mahalanobis distance from the mean, i.e. compute
D = (x − μ)′Σ−1(x − μ),
where μ is the vector of means, and Σ is the variance-covariance matrix. Give a density histogram using the 1000 Mahalanobis distances, and overlay the pdf for a chi2(3) distribution. What does this plot suggest.

2. Let X have a Poisson(μ) distribution, where μ > 0. Its p.m.f. function is −μ μx
p(x) = e x! , x = 0,1,2,···
It is often used as a model for counts because of its links with the Poisson process. Sometimes the Poisson distribution is too restrictive, since we must have E[X] = μ = Var[X]. But for a lot of counting variables, we observe that Var[X] > μ. This is called overdispersion. One way to model this overdispersion is to use X that has a negative binomial Neg-B(μ, k) distribution with parameters (μ, k), where μ > 0, and k > 0. Its p.m.f. function is
Γ(x+k)􏰀 μ 􏰁x􏰀 k 􏰁k
p(x) = x! Γ(k) μ + k μ + k , x = 0, 1, 2, . . .
Note that it can be shown that
lim e−μ μx k→∞ x!
So we can consider the Poisson distribution as a limiting case of a negative binomial distribution.
(a) The negative binomial distribution does not have a closed form quantile function, and its support is countably infinite. So to implement the inverse transformation technique to generate values from this distribution, we can try to use a recursive search. Find the factor C(x) which is a function of x, such that
p(x)=c(x)p(x−1), x=1,2,… Note: Γ(x)=(x−1)Γ(x−1)forx≥1.
(b) Using the recursive equation from (a), generate n = 10, 000 values from a Neg-B(μ = 10, k = 5) distri- bution. Produce a bar graph of the observed frequencies and superimpose the expected frequencies.
(c) The problem with the above inverse transformation technique is that it will become more computionally expensive as the values of μ increases. A very interesting (and useful) fact about the negative binomial distribution is that it can be written as a mixture of gamma and Poisson distributions. In other words, it is a compound distribution. Consider a Poisson model with gamma-distributed mean:
X ∼ Poisson(μ),
and μ ∼ gamma(shape = k,scale = μ/k). Then, X ∼ Neg-B(μ,k). Use this result to generate n = 10,000 values from a Neg-B(μ = 10,k = 5) distribution. Produce a bar graph of the observed frequencies and superimpose the expected frequencies. (You can use the functions rgamma and rpois.)
Use the system.time function in R to compare the computational work of the algorithms in (b) and (c). Try different values of μ = 5, 10, 20, 30. You can can keep k = 2. What do you notice as μ increases?
(d) Use the system.time function in R to compare the computational work of the algorithms in (b) and (c). Try different values of μ = 5, 10, 20, 30. You can can keep k = 2. What do you notice as μ increases?
Display the results using a matrix, where μ = (5,10,20,30) is in the first column, and the times for the recursive algorithm from (b) are in the second column, and the times for the algorithm using the compound distribution from (c) are in the 3rd column.

3. Let X be a count variable with mean μ and variance σ2. Its index of dispersion is D = σ2/μ. If the count variables have a common Poisson distribution, then D = 1. However, if D > 1, the distribution is over-dispersed compared to the assumption of a common Poisson distribution. R.A. Fisher proposed a test for Poisson homogeneity that D = 1, based on
ˆ S2 􏰂ni=1(Xi −X ̄)2 (n−1)D=(n−1)X ̄ = X ̄ ,
where S2 is the sample variance, and X ̄ is the sample mean. We consider a large estimated index of dispersion Dˆ as evidence against Poisson homogeneity in favour of over-dispersion. It can be shown that if X1 , . . . , Xn i.i.d. Poisson(μ), then
X =(n−1)D−→χ (n−1), asn→∞.
Let us call the test based on the estimated index of dispersion “The index of dispersion test”. The larger the value of X2 the stronger the evidence against homogeneity in favour of overdispersion.
Katz in 1963 had proposed the following test statistic:
􏰃 n 􏰀 S 2 − X ̄ 􏰁
K= 2 X ̄ ,
which has an approximate N(0,1) for large n. We will call the test based on this statistic the “ ”. The larger the value of K, the stronger the evidence against homogeneity in favour of overdispersion (i.e. it is a one-tailed test).
(a) Use a simulation to estimate the empirical size of the index of dispersion test and of the Katz test. Consider the following cases: μ = 1, 3, 5, 10, and n = 10, 20, 50, 100. Discuss your results.
(b) We will use a negative-binomial(μ, k) distribution to estimate the power of these tests to identify over- dispersion. Use a simulation to estimate the size of the index of dispersion test and of the Katz test. Consider the cases, where we keep μ = 2, but we consider different values for the dispersion parameter k = 1,10,100,1000. You can use the function rnbinom to generate values from a negative binomial distribution, where k is the argument size, and μ is the argument mean. Discuss your results.

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com