代写 database graph statistic [6] [8]

[6] [8]
[6]
[6] [8]
[6]
Regression Modelling – Assignment 2
Total of 100 Marks
Due on 11/10/2019 23:59
Question 1 [40 marks]
We consider the Study on the Efficacy of Nosocomial Infection Control (SENIC Project) where data was collected to determine whether infection surveillance and control programs have reduced the rates of nosocomial (hospital acquired) infection in United States hospitals. This data set consists of a random sample of 113 hospitals selected from the original 338 hospitals surveyed.
Each line of the data set has an identification number and provides information on 11 other variables for a single hospital. The data presented here are for the 1975-76 study period. The 12 variables are: Identification number [Id]; Average length of stay of all patients (in days) [Length]; Average age of patients (in years) [Age]; Average estimated probability of acquiring infection in hospital (in percent) [Risk]; Ratio of number of cultures performed to number of patients without signs or symptoms of hospital-acquired infections, times 100 [Culture]; Ratio of number of X-rays performed to number of patients without signs or symptoms of pneumonia, times 100 [Xray]; Average number of beds in hospital during study period [Beds]; Medical school affiliation (1=Yes, 2=No) [Affiliation]; Geographic region, where: 1=NE, 2=NC, 3=S, 4=W [Region]; Average number of patients in hospital per day during study period [Patients]; Average number of full-time equivalent registered and licensed practical nurses during study period (number of full time plus one half the number of part time) [Nurses]; Percent of 35 potential facilities and services that are provided by the hospital [Facilities].
(a) Fit a multiple linear regression (MLR) model with Risk as the response variable and all other covariates (excluding Id) as predictors. Is the regression model significant?
(b) What are the estimated coefficients of the (MLR) model in part (a) and the standard errors associated with these coefficients? Interpret the values of these estimated coefficients with regards to model specification.
(c) There is a t-test associated with each of these coefficients. Briefly explain, what these tests can or cannot be used for? In your answer, be sure to mention the appropriate hypotheses that can be assessed using these t-tests.
(d) Construct an appropriate test of the hypothesis that Age and Beds are not significant contributors to the model. That is, test ¦ÂAge = ¦ÂBeds = 0.
(e) Imagine you are doing this work as a data scientist at a hospital and a (statistical uneducated) colleague suggests that a model with coefficients ¦ÂLength = 0.25, ¦ÂCulture = 0.05, and ¦ÂRegion = 0.3 may be a better model. How would you fit such a model and what would be the estimate of the intercept term with these coefficients? What criticisms do you have about this suggested model?
(f) The Asheville, N.C-based Mission Hospital is making progress on a 12-story surgery tower that will house more than 400 beds. They would like a prediction on the expected risk of infection within this new extension if Length=11, Age=45, Culture=18, Xray=100, Beds=400, Region=2, Patients=400, Nurses=340, Facilities=52, Affiliation=1.
What do you predict the risk of infection to be? Find a 99% interval for this prediction.
Dale Roberts – Australian National University
STAT2008/STAT4038/STAT6014/STAT6038 Assignment 2 Page 1 of 2
Last updated: September 27, 2019

STAT2008/STAT4038/STAT6014/STAT6038 Assignment 2 Page 2 of 2
[10]
[6]
[6]
[8]
[20]
[10]
Question 2 [60 marks]
Company executives from a large packaged foods manufacturer wished to determine which factors influenced the market share of one of its products. Data were collected from a national database (Nielson) for 36 consecutive months. Each line of the data set has an identification number and provides information on 6 other variables for each month. The variables are: Identification number [Id]; Average monthly market share for product (percent) [Share]; Average monthly price of product (dollars) [Price]; An index of the amount of advertising exposure that the product received [Exposure]; Presence or absence of discount price during period: 1 if discounted, 0 otherwise [Discounted]; Presence or absence of package promotion during period: 1 if promotion present, 0 otherwise [Promoted]; Month [Month]; Year [Year]. The data was collected during September 1999 (Id = 1), October 1999 (Id = 2), …, August 2002 (Id = 36).
(a) Fit a multiple linear regression (MLR) model with Share as the response variable and all other covariates as predictors (excluding Id). Is the regression model significant? Interpret the coefficients for the categorical variables in this model. Does the coefficient support the expectations that discounting the price increases market share?
(b) The executives are interested to know if discounting and package promotions have an effect on market share. Conduct a formal test of the hypothesis that
¦ÂDiscounted = ¦ÂPromoted = 0
using an appropriate ANOVA table. Evaluate the F-statistic and the corresponding p-value.
(c) Assuming that the other variables remain fixed, the company executives would like to know your prediction in difference of market share over the month if they discount the price and also promote their product. Base your answer on the model fitted in part (a).
(d) One executive suggests that, in his opinion, discounting the product and promoting the product have a similar effect on market share so the company should pursue the strategy that costs the least. Test whether the coefficients of Discounted and Promoted are the same. Construct an appropriate model to test this hypothesis.
(e) Produce the appropriate diagnostic plots for the model fitted in part (a) and assess the model assumptions. Produce the relevant influence diagnostics for this model. Which data points appear to be influential in the analysis, and in what sense would you consider them influential? Also, do any points appear to be outliers? If so, to what months do these observations correspond to?
(f) Refit the model in part (a), after adding all second-order terms involving only the quantitative predictors. Test whether or not all quadratic and interaction terms can be dropped from the regression model. State the alternatives, decision rule, and conclusion.
Dale Roberts – Australian National University
Last updated: September 27, 2019