MATH 185 – Take-Home Exam 1
Due Sunday, May 5th, by 11:59 PM
AGREEMENT
By taking this exam, you agree to not discuss the exam with anyone, starting now, neither with a classmate or anyone else, neither in person nor through other means, including electronic. Please do not post questions on Piazza. Unless otherwise speci- fied, it is acceptable to copy-paste from the lecture or homework solution code.
Problem 1. (Student vs Wilcoxon) Suppose we have a numerical sample of size n which we assume was generated iid from an underlying distribution F , unknown with a well-defined mean μ.
• Student’s t-test is a test about the mean: Is μ equal to a given value μ0?
• Wicoxon’s signed-rank test is a test for symmetry: Is F symmetric about a given μ0?
That being said, the t-test can be used to test whether F is symmetric about μ0, based on the fact that ‘symmetric about μ0’ implies that ‘the mean is equal to μ0’. However, the two are not equivalent, so that the t-test is not consistent against all alternatives. Conversely, for the signed- rank test to be useful as a test about the mean, we need to assume that F is symmetric about its mean. With this additional (and nontrivial) assumption on F, testing for symmetry about μ0 is equivalent to testing whether the mean equal to μ0. (Convince yourself of that.) In what follows, we place ourselves in that situation, so that we can directly compare the two tests. There is some theory on that. For example, it is known that when F is a normal distribution, in which case the t-test achieves the most power asymptotically (meaning in the large-sample limit), the signed-rank test performs almost as well. We want to evaluate that with simulations.
Since both tests are scale-free, we may take that F to be the normal distribution with mean μ and variance 1. We consider the two-sided setting where we test μ = 0 versus μ ̸= 0. For each n ∈ {10, 20, 50, 100, 200, 500} do the following. For each μ in a grid of your choice, denoted M and of size 10, generate X1,…,Xn ∼ N(μ,1) and apply the t-test and signed-rank test, both set at level α = 0.10. Record whether they reject or not. Repeat this B = 1, 000 times and compute the fraction of times each test rejects. This estimates the power of each test against the alternative μ. The end result is a plot where these estimated power curves for each of these two tests are overlaid. Use colors and a legend to identify the two curves. Make sure to choose M so that we can see the power go from about α to about 1, zooming in on the action.
Note. When this problem is completed, you will have generated 6 plots all together, each with the estimated power curves for the two tests.)
Problem 2. (Fungi in brassica plants) Consider the following article about how different brassica plants are affected by different types of Rhizoctonia fungi.1 Read enough of the article to understand the premise and the main findings. Otherwise, we will focus on the data given in Table 6 on how different brassica species are affected by different types of Rhizoctonia fungi.
A. Write a function tableObsExp(dat) taking in a two-column data frame, with each column repre- senting a factor, and then outputting a table of observed and expected (under no association) of counts — similar to what Table 6 in that article looks like.
B. Enter the observed counts from Table 6 (likely by hand, as the data do not seem directly downloadable) and apply your function to recover a similar table.
C. Continuing with the same dataset, produce a couple of plots using functions in the ggplot2.
D. Finally, ask a question and formalize it into a hypothesis testing problem. Perform a test and
offer some brief comments.
1 The article was published in the scientific journal PLOS ONE and is available online at the following address https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0111750