Instructions
STAT 341: Assignment 3 – Winter 2022
Instructor:
Due: Friday, March 18 at 11:59 pm ET
Copyright By PowCoder代写 加微信 powcoder
You must upload your solutions in the form of one pdf file for each part of each question by the deadline onto Crowdmark. The response for question 3 is in video format to be submitted on LEARN (by the same due date). Your instructor will NOT accommodate mistakes in submitting the pdf file of one question for another question. No assignment submission through email will be accepted. Note that your pdf solution file must have been generated by R Markdown. Additionally:
• For mathematical questions: your solutions must be produced by LaTeX (from within R Markdown), unless specifically mentioned that hand-written solutions are accepted where you can import a clear image of a hand-written solution. If such note is not included, screenshots and scanned/photographed handwritten solutions receive zero points.
• For computational questions: R code should always be included in your solution (via code chunks in R Markdown). If you don;t provide your R codes, you will receive zero points for the corresponding question.
• For interpretation questions: plain text (within R Markdown) is required. Text responses embedded as comments within code chunks will not be accepted.
• Alternative accommodations including, but not limited to, email submission and/or extensions due to RMarkdown breakdown and/or compilation to pdf will not be granted.
• The formatting requirement will be taken seriously. Screenshots of your solutions and/or R codes, even if the original file is RMarkdown-generated, will receive 0 marks. Your submitted file to Crowdmark must be directly compiled by RMarkdown.
Organization and formatting is part of a full solution. Consequently, points will be deducted for solutions that are not organized and incomprehensible. A disorganized solution which is difficult to understand or find parts will not receive full marks.
Academic Integrity:
• While you may discuss the questions with your classmates on Piazza, consulting another student’s solution is prohibited, and submitted solutions may not be copied from any source. You may not talk to any other individual about the questions in this assignment. The instructor will hold online office hours during which he will answer clarification questions. You also have access to Piazza where you can ask questions.
• You may not use and/or search the internet (except for LEARN and Piazza) to answer the questions in this assignment. However, you may search the internet for R syntax.
• If a question which you would like to post on Piazza shares your solution, you must make it a private post.
• In short, you can treat this assignment like an open-book exam, where you are only allowed to use the course material provided to you during lectures and on Piazza and/or LEARN as well as books you may find at the library.
• Any violation of the the academic integrity regulations outlined here and in the course syllabus (make sure to read the course outline again!) will be counted as cheating and will be reported to the Dean’s Office.
• The instructor reserves the right to conduct an online interview with you during which you will be asked questions about your solutions and the details of how you came to these responses. Should such an interview take place and you are unable to explain and defend your solutions, your grade for this assignment, and consequently, your course grade will be affected.
Question One – 24 Marks
A problem of interest related to potentially deadly diseases is modelling the probability of death due to the disease as a function of the number of infected people in the population. The data-set Infected.csv includes information about an infectious diseases in 100 randomly chosen communities/towns/cities in country A. The variables in the data-set are
• Infected: the number of infected individuals per million population
• Deceased.Prop: the proportion of people who have died due to the infectious disease in each commu-
nity/town/city in the sample
a) [3 marks] Generate the scatter-plot of the data and comment on the pattern that you observe. Imagine that the public does not have access to this plot and/or the data. Communicate your points and important trends through your comments.
b) [4 marks] Non-linear regression is a class of models where the functional form between the response and the explanatory variate(s) is a nonlinear combination of model parameters. Since the model is non-linear, numerical methods may be used to fit the model. Showing the proportion of infected people who have died of the infectious by Y and the number of infected people per million by X, a non-linear model appropriate for this problem is
Yu = 1 − 1 + εu α+βXu
where εu is the error term. We would like to estimate the parameter vector θ = (α,β) using least squares method, i.e. by minimizing the function
1 2
yu− 1−α+βxu on the heat-map for α ∈ (0.5, 4) and β ∈ (0.1, 2).
Generate the 3D plot of the function ρ(α, β) as well as its heat-map with the contour plot superimposed
c) [2 marks] Narrow the ranges of both α and β values and regenerate the plots in part (b) such that the neighbourhood of the minima of the function ρ(α, β) is more visible in the heat-map with contour plots.
d) [8 marks] Calculate the gradient function of ρ(α,β) (you must provide the mathematical form of the gradient). Then code the gradient in R and use Newton-Raphson method with maximum 200 iterations
to find (α, β) where (α, β) = arg min(α,β)∈R2 ρ(α, β).
• i) Use the initial values (α0, β0) = (2, 3)
• ii) Use the initial values (α0, β0) = (3, 0.2)
• iii) Make an educated guess of the minima from the plots in part (c) and use that as initial values (α0, β0).
If the parameter estimates (α,β) are different for the three cases above, calculate the function value at
these parameters to choose the most approperiate initial values among the three options.
e) [3 marks] The fitted values of regression are Yu = 1 − 1/(α + βxu), and the three initial values in part
(d) may or may not have resulted in different parameter estimates (α,β) hence different fitted values.
Generate the plot of observed yu versus yu for each of the three cases in part (d). What do you learn from these plots?
f) [4 marks] Find a an appropriate transformation of the data so that you can use the lm function in R to fit the model in part (b) to the original data. Use the lm function to fit this model.
• i) Provide the summary of the fitted model using the summary funciton.
• ii) Provide the plot of the observed proportions yu vs. yu and comment comment on the fit of the model. Note that the fitted values from the lm fit must be transformed back to the opriginal scale of the data.
Question Two – 20 Marks
Using the data in Question one, we are interested in estimating the total number of individuals who died in country A because of the infectious disease using Horvitz-Thompson estimator. The census data suggests that the population of country A was 12,500,000 when the infectious disease started. In parts (a) and (c) of this question, Horvitz-Thompson estimate of the attribute, the standard error of the estimate, as well as a 95% confidence interval for the attribute of interest must be computed. The standard error is the estimate of the standard deviation of the estimator.
a) [7 marks] The data in Question one was collected from a population of 486 communities/towns/cities in country A through a simple random sampling with replacement protocol. Provide an estimate of the total number of individuals who died in country A because of the infectious disease. Provide the standard error of your estimate as well as a 95% confidence interval for the number of individuals who died of the infectious. You may assume that HT estimator follows a Normal distribution. You may also assume that if the value of Infected is the same for two members of the sample, these are the same communities/towns/cities (remember the we have used a sampling with replacement protocol).
b) [5 marks] A modified version of simple random sampling with replacement is its weighted version, i.e. weighted simple random sampling with replacement (WSRSWR). In this sampling protocol, units are selected with replacement but instead of equal probability, the units are selected with unequal probabilities and based on some weights wu, where wu > 0 and Nu=1 wu = 1. The weights are usually constructed using auxiliary information that one might have on each unit. Determine the inclusion probabilities and join inclusion probabilities for WSRSWR.
c) [5 marks] Now, suppose that the 100 communities in the data-set were collected based on a WSRSWR protocol, were the weights of each unit is the relative size of the community, i.e. wu = Nu/N where Nu is the population of the community and N is the population of the country. The weights are provided below, and are NOT to be used in parts (a) and (b) of this question. Assuming the WSRSWR sampling protocol, answer the questions in part (a).
w = c(0.0027, 0.016, 0.0069, 0.0011, 0.0066, 0.0108, 0.003, 0.0043,
0.0142, 0.0016, 0.0122, 4e-04, 0.0047, 0.014, 0.0086, 0.0169,
0.0165, 0.0118, 0.0043, 2e-04, 0.0142, 0.0092, 0.0162, 0.0106,
0.0588, 0.0135, 0.0025, 0.0011, 0.0109, 0.0085, 0.0027, 0.0112,
0.0127, 5e-04, 0.0082, 0.0085, 0.0066, 0.0125, 0.012, 0.0116,
0.0089, 0.016, 0.0108, 9e-04, 0.0088, 0.0066, 0.0588, 0.0044,
8e-04, 0.002, 0.01, 0.0101, 0.0012, 0.0135, 0.0103, 0.0058,
0.004, 0.0088, 0.0057, 0.0049, 0.0111, 0.0117, 0.0081, 0.014,
0.0079, 0.0134, 0.0149, 0.0042, 0.0109, 0.0072, 0.0109, 0.0082,
5e-04, 0.002, 0.0588, 0.0025, 0.0018, 0.0105, 0.015, 0.0148,
0.0042, 0.0025, 0.0061, 0.0111, 0.017, 0.015, 0.0056, 0.0011,
0.0072, 0.007, 0.015, 0.0081, 0.016, 0.0057, 0.0029, 0.0012,
0.0588, 8e-04, 2e-04, 0.005)
d) [3 marks] Based on the results in (a), (b), and (c), can you decide which sampling protocol (the original one in this question vs. WSRSWR of parts b and c) is better based? Why?
Question Three – 6 Marks
Create a short video (maximum 2 minutes) explaining the concept and the mechanism/logic behind a test of significance. You can create a presentation file from RMarkdown (some instructions at http: //rmarkdown.rstudio.com/lesson-11.html) or use any other software/package such as HTML, Beamer, LaTeX, or PowerPoint to make a presentation. While having a presentation file is not mandatory, it is highly encouraged and recommended as it will help you communicate your ideas more clearly and more effectively. You may even include an example in your presentation file. To submit your video file, a separate video assignment has been created on LEARN/Bongo. You will be required to turn on your camera and share your screen. You may record your response as many times as you want and only submit once you are happy with your recording. To avoid submission problems and lateness penalties, do not start recording just a few minutes prior to the deadline.
Important Notes:
• To record and submit your video, go to LEARN > Content > Assignments > Assignment #3 > Video Classroom – Question 3.
• You will need to turn on both your camera and microphone. Also, you will be required to show your WatCard at the beginning of your recording. Receiving any marks for this question is contingent on showing your WatCard (clear and NOT blurry) at the very beginning of the recording.
• If your computer does not allow you to share your screen, you need to change the security settings on your computer.
• Once you click submit, you cannot retake the assignment any more. All your retakes have to be done prior to submission.
• You must click on submit once you are done with your recording. Failure to do so will result in no submission, hence a 0% mark on the question.
• Additional help about completing this type of assignment on LEARN/Bongo is available at this link: https://bongolearn.zendesk.com/hc/en-us/articles/360005037594-How-to-Complete-Q-A
Note : Comments for Marking
• Clear explanations and use of proper language within the context of the problem : [2 marks]
• Making correct statements and appropriateness of the material [2 marks] (this includes any potential
• Not reading off of a note, and talk about the subject in a “presentation” manner : [2 marks]
• If the recording cuts off before you are done, i.e. if your are cut mid-sentence or in the middle of your
presentation, you will be deducted 2 additional marks (the minimum grade for this question is 0, not negative!)
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com