程序代写代做 STAT 473/873 (Winter 2020) Assignment 2

STAT 473/873 (Winter 2020) Assignment 2
The assignment is due on Feb. 14 (Friday) at 4:30pm.
Please submit a hard copy of your assignment to the mailbox of W. Jiang at Jeffery 412.
If you leave early for the reading week, please make sure you submit your assignment earlier. Extension will not be possible because of the midterm exam right after the reading week.
Please print your name and student number clearly on the first page of your assignment. Please clearly indicate whether you are in Stat 473 or Stat 873.
If the use of R or SAS is needed in a problem, please attach the code and raw output at the end of the assignment as evidence of your independent work. Please answer all data analysis problems with sufficient details: describe analysis methods, procedures, and/or specify models, present figures and tables, and summarize and interpret your results, etc.. For a data analysis problem, you may lose up to 90% of the marks if you only provide R or SAS output but fail to give descriptions and discussions.
1. (a) For 2 × k contingency table of Section 2.1.2 (disease status by groups), the following Pearson’s χ2 test statistic is used for a test of H0 : no association between disease and group,
􏰁k (yi−niπˆ)2
n i πˆ ( 1 − πˆ ) ,
P =
whereπˆ=􏰀k n istheestimatedprobabilityofdiseaseunderH0.Showthatanequivalent
expression for the Pearson’s test statistic is
􏰁k 􏰁2 ( y i j − e i j ) 2
where yi1 = yi, yi2 = ni − yi are observed cell frequencies, ei1 = niπˆ, ei2 = ni(1 − πˆ) are expected cell frequencies under H0.
(b) The following study and data are from Smoking and the Cancer Controversy by R. A. Fisher published in 1959. Seventy-one pairs of twins were examined with respect to their smoking habits. For each pair, it was ascertained whether they were identical twins or fra- ternal twins, and whether their smoking habits were alike or unlike. The results are shown in the following table. Analyze the data using the following three approaches.
􏰀ki=1 yi i=1 i
i=1
Like habits Identical twins 44 Fraternal twins 9
Unlike habits Total 9 53 9 18
P=e, i=1 j=1 ij
i. Test if the probability of like habits is the same for both identical and fraternal twins (at level α = 0.05) using Pearson’s Chi-square test. Clearly set up your null and alternative hypotheses. Calculate the test using the formula in (a), and also conduct the test using R. ii. Carry out the test using Fisher’s Exact test in R.
1

Please present and interpret you analysis and results carefully, and clearly state your con- clusions.
2. A clinical trial was carried out to investigate the effect of a new drug in reducing operative mortality following major abdominal surgery. Patients were randomized to treatment or control groups within each of three categories of surgical risk: high, medium and low. The results were as follows:
Treatment Control
Treated
Outcome Low Dead 3 Alive 12 Dead 2 Alive 10
Surgical Risk
Medium High 7 6 6 3 3 2 10 8
Denote by π the probability of death following surgery. Let x1 = 1 for subjects on treat- ment, with x1 = 0 otherwise. Let x2 = 1 for subjects at medium risk, with x2 = 0 otherwise. Let x3 = 1 for subjects at high risk, with x3 = 0 otherwise. Let x12 = x1x2 and x13 = x1x3 where x12 and x13 are the interaction terms for treatment and risk group covariates.
(a) Express each of the following situations as a logistic regression model for π in terms of the appropriate linear predictor.
Model 1: The log odds ratios associating treatment and outcome are constant across categories of surgical risk.
Model 2: There is no association between treatment and outcome in any category of surgical risk.
For questions (b)–(d), express the probabilities or odds ratios as an analytic function of the appropriate covariates and regression coefficients. No estimation is involved.
(b) Based on Model 1, what is the probability of death following surgery for patients at the low surgical risk who are assigned to treatment? Give the expression only, no model fitting is needed.
(c) Based on Model 1, in the treatment group, what is the odds ratio of death following surgery for high versus low surgical risk patients? Give the expression only, no model fitting is needed.
(d) A model that incorporates the interaction between treatment and surgical risk will be
Model3: log π =β0 +β1×1 +β2×2 +β3×3 +β4×12 +β5×13. 1−π
Give a clear interpretation for the parameter β5: explain what β5 measures in terms of odds ratios or changes in odds ratios for specific patient subsets.
(e) Analyze the data using R. Briefly describe your models, assess model fit by residual plots, explain your findings and clearly state your conclusions.
3. Consider a study of the occurrence of infection following birth by Caesarian section. Let y = 1 denote the occurrence of an infection, and y = 0 denote the absence of an infection. The investigators are interested in learning about risk factors for infection and have identified three potential explanatory variables: i) xi1 = 1 if the Caesarian was planned and xi1 = 0
2

otherwise. ii) xi2 = 1 if any risk factors were present (such as diabetic mother, obese mother, etc.), and xi2 = 0 otherwise, and iii) xi3 = 1 if any antibiotics were given as a preventative measure, and xi3 = 0 otherwise. The data are summarized in the following table.
Caesarian Planned Infection No Infection
1 17
0 2 28 30 8 32
Caesarian Not Planned
Antibiotics
Antibiotics No Antibiotics No Antibiotics
Risk Factors No Risk Factors Risk Factors No Risk Factors
Infection 11
0
23
0
No Infection 87
0
3
9
(a) Fit a logistic regression model containing only the main effect terms for the three ex- planatory variables. Describe the model (see model descriptions in Problem 1 as examples), report and comment on the fit.
(b) Based on the fitted model for (a), construct a 95% confidence interval for the odds ratio of infection for mothers with planned Caesarian, no risk factors and no antibiotics versus mothers with un-planned Caesarian, risk factors present and no antibiotics prescribed.
(c) Fit a complementary log-log regression model (try “glm(…,family=binomial(link=cloglog),…)”) with only the main effect terms of the 3 covariates. Report and comment on the fit. For mothers with unplanned Caesarian, risk factors present, and antibiotics prescribed, estimate
the probability of infection, and find a 95% confidence interval for the probability.
(d) Plot deviance residuals versus fitted values for both models in (a) and (c). Which model fits the data better?
Recommendation: When you create the data set, please use Caesarian not planned, no risk factors, and no antibiotics as the baselines for the 3 covariates respectively.
4. Null deviance for logistic regression. Let Yi,i = 1,…,n be independent response vari- ables. Each Yi ∼ Binomial (mi, πi) counts the number of disease present individuals in sample i of size mi, with probability of disease present being πi.
(a) For the saturated model with parameters (π1, . . . , πn), show that the maximum likelihood estimates of the parameters are π ̃i = yi/mi, i = 1, . . . , n.
(b) Consider a logistic regression model with only an intercept, that is, log πi = β0.
1−πi
Find the maximum likelihood estimate for β0, and estimate (π1, . . . , πn) based on this model. (c) For the model in (b), find its deviance statistic (which is called the null deviance in R), and specify its large sample distribution.
5. (This problem is for STAT 873 students only.) The following data are from a study of the effects of viruses on chicken eggs. Eggs were injected with various dilutions of a virus and were monitored daily up to day 18 after injection. At the end of the study, the eggs were classified into three groups: i) those that died, ii) those which are alive and deformed, and iii) those which are alive and normal.
Define “dose” as “log10(dilution)” of the virus and write the data in the form
Assume that for the ith dose, there are probabilities πi1,πi2, πi3 with πi1 + πi2 + πi3 = 1 such that the numbers (Yi1, Yi2, Yi3) follow the multinomial distribution with parameters mi and (πi1, πi2, πi3). Consider a model which has two parts:
log[πi1/(πi2 + πi3)] = α1 + βxi log(πi2/πi3) = α2 + βxi.
3

Dilution 18.8 232.5 3468.0 51680.0
(Alive) (Alive)
Alive
Deformed Not Deformed
No. of Eggs Dead
16 4 1 11 19 8 8 3 17 10 6 1
19
Dose
No. at risk
Dead
Deformed
Not Deformed y13
17 2 0
(a) Explain what each part of the above model is fitting.
(b) Show that the likelihood for the data based on the multinomial distribution and the above model is equivalent to a likelihood based on 2 binomial distributions. Given this, explain how one could fit the model in R using the “glm” function for logistic regression. (c) Fit the above model to these data using R and comment on the fit of the model, and the significance of any terms in the model. Please interpret the results.
4
x1 m1 y11 y12
… xi … xn
… mi … mn
… yi1 … yn1
… yi2 … yn2
… yi3 … yn3