程序代写代做代考 graph kernel Assignment 4

Assignment 4
STAT221
To be submitted on Learn by 3pm on Friday 16 October 2020
Where you are asked to do something in R you should include ALL of the code used to produce the result in your assignment submission so it can be reproduced and checked. But do not include any code which is not required (i.e. be concise, just as you would be expected to do in writing essays or any other maths/stats work). Your code should include comments at the main steps to explain what it is doing. Part of the assessment in this assignment is for well written R code. Use the examples in the lectures and labs as a guide to showing clear and concise programs. When asked to ¡°explain¡± or discuss a particular result you will be expected to write one or two sentences, e.g. where relevant to explain ¡°why¡± a result occurred or ¡°how¡± to do something.
Graphs are expected to have relevant axis labels and titles, but at this stage legends (for when there are multiple things displayed on one graph) are not expected.
This assignments covers three topics:
1. permutation and exact hypothesis tests 2. the power of hypothesis tests
3. bootstrapping.
Q1. Pain reduction for arthritis patients?
A study was conducted to investigate the effect of a new drug treatment for rheumatoid arthritis on pain levels.
24 rheumatoid arthritis patients who attended a local clinic over a one week period were randomly assigned into either a treatment group (nT = 12) or a control group (nC = 12). At the end of the study, each patient was assessed for their level of pain, with the pain level measured on a scale from 0 (pain-free) to 100 (severe pain).
Does the treatment reduce pain for arthritis patients?
1. Read the data in the file arthritis.csv into R using the function read.csv().
2. Define and state a set of hypotheses to answer the research question (assuming a one-sided alternative).
3. Perform a one-sided t-test. Report the test statistic and the p-value.
4. Perform a one-sided permutation test (with 10, 000 permutations).
5. How many unique combinations are possible?
6. Perform an exact test using the unique combinations.
7. Compare the results of these different hypothesis tests.
1

Q2. Sample size estimation
We now wish to plan a new study where we use the data in question 1 as a pilot. Assume that the sample estimates are representative of the true population parameters, and that the pain scores are normally distributed.
For the new study we will fix the type-I error rate at ¦Á = 0.01 and we wish to achieve a power of at least 80%.
1. Write down the set of hypotheses for this new study.
2. Estimate the mean and standard deviation of the pain levels in the treatment and control groups,
using the data in the file arthritis.csv.
3. For both the t-test and the permutation test (using 1000 permutations), with a total sample size of 24 patients (where nT = nC = 12), calculate the power of the test to reject the null hypothesis. Explain your results.
4. For the t-test only, calculate the power for a range of sample sizes, assuming the same sample size in each of the treatment and the control group. What is the minimum (total) sample size required in order to achieve a power of 80%?
Q3. At the movies
The file movies.csv contains a list of 28 Academy Award-winning movies, their length in minutes (Length) and their income at the box office in millions of US dollars (Box_office).
1. Plot a density histogram for the length of the movies and add a suitable kernel density estimator (by choosing an appropriate kernel function and bandwidth).
2. Calculate a 95% confidence interval for the average movie length based on a large sample approxi- mation.
3. Calculate a 95% confidence interval for the average movie length using a nonparametric bootstrap. Explain why there is or why there is not a difference between the bootstrap confidence interval and the interval found using the large sample approximation.
4. Calculate a 95% confidence interval for the mean and the median of the box office income using a nonparametric bootstrap procedure. Compare the bootstrap distribution of the mean and the median using density histograms.
5. Calculate a 95% confidence interval for the Pearson correlation coefficient between the length and the income of a movie using a nonparametric bootstrap procedure. Plot this bootstrap distribution of the correlation coefficient in a density histogram.
2