程序代写代做 graph C go The University of Nottingham SCHOOL OF MATHEMATICAL SCIENCES

The University of Nottingham SCHOOL OF MATHEMATICAL SCIENCES
A LEVEL 3 MODULE, SPRING SEMESTER 2019-2020 APPLIED STATISTICAL MODELLING Suggested time to complete: TWO Hours THIRTY Minutes
Answer ALL questions
Your solutions should be written on white paper using dark ink (not pencil), on a tablet, or typeset. Do not write close to the margins. Your solutions should include complete explanations and all intermediate derivations. Your solutions should be based on the material covered in the module and its prerequisites only. Any notation used should be consistent with that in the Lecture Notes.
Guidance on the Alternative Assessment Arrangements can be found on the Faculty of Science Moodle page: https://moodle.nottingham.ac.uk/course/view.php?id=99154#section-2
Submit your answers as a single PDF with each page in the correct orientation, to the appropriate dropbox on the module’s Moodle page. Use the standard naming convention for your document: [StudentID]_[ModuleCode].pdf. Please check the box indicated on Moodle to confirm that you have read and understood the statement on academic integrity: https://moodle.nottingham.ac.uk/pluginfile.php/6288943/mod_ tabbedcontent/tabcontent/8496/FoS%20Statement%20on%20Academic%20Integrity.pdf
A scan of handwritten notes is completely acceptable. Make sure your PDF is easily readable and does not require magnification. Text which is not in focus or is not legible for any other reason will be ignored. If your scan is larger than 20Mb, please see if it can easily be reduced in size (e.g. scan in black & white, use a lower dpi — but not so low that readability is compromised).
Staff are not permitted to answer assessment or teaching queries during the assessment period. If you spot what you think may be an error on the exam paper, note this in your submission but answer the question as written. Where necessary, minor clarifications or general guidance may be posted on Moodle for all students to access.
Students with approved accommodations are permitted an extension of 3 days.
The standard University of Nottingham penalty of 5% deduction per working day will apply to any late submission.
MATH3029-E1
MATH3029-E1
Turn over

Academic Integrity in Alternative Assessments
The alternative assessment tasks for summer 2020 are to replace exams that would have assessed your individual performance. You will work remotely on your alternative assessment tasks and they will all be undertaken in “open book” conditions. Work submitted for assessment should be entirely your own work. You must not collude with others or employ the services of others to work on your assessment. As with all assessments, you also need to avoid plagiarism. Plagiarism, collusion and false authorship are all examples of academic misconduct. They are defined in the University Academic Misconduct Policy at: https://www.nottingham.ac. uk/academicservices/qualitymanual/assessmentandawards/academic-misconduct.aspx
Plagiarism: representing another person’s work or ideas as your own. You could do this by failing to correctly acknowledge others’ ideas and work as sources of information in an assignment or neglecting to use quotation marks. This also applies to the use of graphical material, calculations etc. in that plagiarism is not limited to text-based sources. There is further guidance about avoiding plagiarism on the University of Nottingham website.
False Authorship: where you are not the author of the work you submit. This may include submitting the work of another student or submitting work that has been produced (in whole or in part) by a third party such as through an essay mill website. As it is the authorship of an assignment that is contested, there is no requirement to prove that the assignment has been purchased for this to be classed as false authorship.
Collusion: cooperation in order to gain an unpermitted advantage. This may occur where you have consciously collaborated on a piece of work, in part or whole, and passed it off as your own individual effort or where you authorise another student to use your work, in part or whole, and to submit it as their own. Note that working with one or more other students to plan your assignment would be classed as collusion, even if you go on to complete your assignment independently after this preparatory work. Allowing someone else to copy your work and submit it as their own is also a form of collusion.
Statement of Academic Integrity
By submitting a piece of work for assessment you are agreeing to the following statements:
1. I confirm that I have read and understood the definitions of plagiarism, false authorship and collusion.
2. I confirm that this assessment is my own work and is not copied from any other person’s work (published or unpublished).
3. I confirm that I have not worked with others to complete this work.
4. I understand that plagiarism, false authorship, and collusion are academic offences and I may be referred to the Academic Misconduct Committee if plagiarism, false authorship or collusion is suspected.
MATH3029-E1 Turn over
MATH3029-E1

1 MATH3029-E1
Submission instructions
• Release and submission times are with respect to British Standard Time (BST). Please plan accordingly.
• Please take time to write clearly and neatly. This is especially important since you will be handing in scanned documents. If I can’t read your writing clearly, I will not be able to mark appropriately.
• In accordance with University guidelines for this assessment, please write your name and student id on the first page of your submitted document.
• It is your responsibility to ensure that the requirements for a valid submission on moodle are met (e.g. file size; invalidity of ‘draft’ submissions). Please try and submit ahead of time to avoid complications close to the deadline.
MATH3029-E1

2 MATH3029-E1
1. (a)Considertheone-wayANOVAmodel
𝑦𝑖𝑗=𝜇+𝛼𝑖+𝜖𝑖𝑗, 𝑖=1,2,3;𝑗=1,2.
Assume that 𝜖𝑖𝑗 are IID Normal random variables with 𝐸(𝜖𝑖𝑗) = 0 and 𝑉 𝑎𝑟(𝜖𝑖𝑗) = 𝜎2 > 0 for all 𝑖, 𝑗.
i) Suppose the model is used to determine efficacy of three drugs A, B and C on cholestrol levels of patients. Interpret within this context each term in the model above, and the corresponding assumptions.
ii) For the model above construct the corresponding design matrix 𝑿, the vector of responses 𝒚, the vector of regression coefficients 𝜷, and the error vector 𝝐. Justify
why the least squares estimator (𝑿𝑇𝑿)−1𝑿𝑇𝒚 of 𝜷 cannot be computed without further constraints.
[15 marks]
(b) A farmer wanted to compare four types of wheat to find which gives greatest yield. Since he suspected growing conditions might vary across his field, he divided the field into four plots and performed experiments which led to the following data on yield (in tonnes).
26.4 Note: ∑4𝑖=1 ∑4𝑗=1 𝑦2𝑖𝑗 = 637.85.
25.5
24.9
24.1
100.9
Wheat 1 Wheat 2 Wheat 3 Wheat 4 Sum
Plot1 Plot2 Plot3 Plot4 Sum 25.3 26.2 24.2 25.2
6.5 6.6 6.3 5.9 7.2 6.4 6.4 6.2
6.3 6.1 5.9 5.9
6.4 6.4 6.3 6.1
i) What type of design has been used by the farmer?
ii) Explain how you would ensure this design is randomised.
iii) Write down an appropriate model for this experiment, clearly defining your notation and explaining any assumptions you make.
iv) Calculate the ANOVA table for this data.
v) Test for the significance of wheat type and comment on your findings.
[25 marks]
MATH3029-E1 Turn Over

3 MATH3029-E1 2. (a)Showthatthepdfofanormaldistributionwithmean𝜇∈Randvariance1belongsto
the one-parameter GLM family. Clearly identify 𝜃, 𝑏(⋅), 𝑐(⋅, ⋅), 𝜙 and 𝑎(⋅). [5 marks]
(b) Suppose 𝑌𝑖, 𝑖 = 1, … , 𝑛 are IID 𝑁(0, 1) random variables. Denote by 𝜙 and 𝛷 their pdf and cdf (cumulative distribution function), respectively. For real numbers 𝑡𝑖, define 𝑍𝑖 =1if𝑌𝑖 ≤𝑡𝑖 or𝑍𝑖 =0otherwise.
i) For fixed 𝑡𝑖, write down the joint distribution of 𝑍𝑖.
ii) Consider 𝑡𝑖 = 𝛽1 + 𝛽2𝑥𝑖 with 𝑖 = 1, … , 𝑛, where 𝑥𝑖 are real-valued. Using 𝑍𝑖, write down the log-likelihood function 𝑙(𝛽1, 𝛽2). Also show that the score statistic
𝑼 =
𝑈1 ,where𝑈 =𝜕𝑙/𝜕𝛽,𝑖=1,2is: (𝑈2) 𝑖 𝑖
𝑈 = 𝑛 𝑍𝑖𝜙(𝛽1 +𝛽2𝑥𝑖) − (1−𝑍𝑖)𝜙(𝛽1 +𝛽2𝑥𝑖) 1 ∑𝑖=1[ 𝛷(𝛽1+𝛽2𝑥𝑖) 1−𝛷(𝛽1+𝛽2𝑥𝑖) ]
𝑈 = 𝑛 𝑍𝑖𝑥𝑖𝜙(𝛽1 + 𝛽2𝑥𝑖) − (1 − 𝑍𝑖)𝑥𝑖𝜙(𝛽1 + 𝛽2𝑥𝑖) 2 ∑𝑖=1[ 𝛷(𝛽1+𝛽2𝑥𝑖) 1−𝛷(𝛽1+𝛽2𝑥𝑖) ]
iii) Verify that 𝐸(𝑼) = 𝟎.
iv) Why is 𝛷−1 ∶ [0, 1] → R a valid link function for linking 𝐸(𝑍𝑖) with 𝑥𝑖?
[20 marks]
(c) In a study examining relationship between Alzheimer’s disease (yes=1 and no=0) and Age on 98 people, a binary logistic regression model was used. Output from R is given on the next page.
i) Using Output1: (1) interpret, in the context of the problem, the estimate of the Age parameter, and (2) explain the values obtained for the degrees of freedom.
ii) Using Output1 explain, using the GLM form of a Bernoulli distribution, the statement: ‘Dispersion parameter for binomial family taken to be 1’.
iii) Information on economic status (‘Lower’, ‘Middle’, ‘Higher’) of each person was added to the model containing Age. Using Output1 and Output2 perform a Deviance test to ascertain if economic status has a significant relationship with the chances of being diagnosed with Alzheimer’s.
iv) Using Output2 predict the probability of being diagnosed with Alzhemeir’s for a person aged 48 and classified as having a ‘Lower’ economic status.
[15 marks]
MATH3029-E1

MATH3029-E1
Turn Over
Output 1:
Estimate Std.Error z value Pr(>|z|)
(Intercept) -1.62437 0.40575 -4.003 6.25e-05 ***
Age 0.03183 0.01204 2.644 0.00819 **
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 122.32 on 97 degrees of freedom
Residual deviance: 114.91 on 96 degrees of freedom
Output 2:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.49037
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 122.32 on 97 degrees of freedom
Residual deviance: 111.50 on 94 degrees of freedom
0.03127
-0.70309
0.37988
0.52223 -2.854 0.00432 **
0.01247 2.507 0.01216 *
0.56145 -1.252 0.21047
0.55692 0.682 0.49517
Age
Lower
Middle

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
4
MATH3029-E1

3. (a)
(b)
i) Give an example of an offset in a Poisson GLM.
ii) How would you test for the significance of an offset variable in a Poisson GLM?
iii) Suppose 𝑌𝑖 are independent Poisson random variables with mean 𝜇𝑖, offset 𝑛𝑖, and rate 𝜃𝑖 for 𝑖 = 1, … , 𝑁. With 𝑌𝑖 as responses, consider a Poisson GLM with log link function consisting of a single real-valued predictor 𝑥𝑖 with regression coefficient
5 MATH3029-E1
𝛽. Show that, for each 𝑖 = 1, … , 𝑛 the rate parameter 𝜃𝑖 changes by a factor of 𝑒𝛽1
when 𝑥𝑖 increases by one unit.
[15 marks]
The data below is on the monthly accident counts on a major US highway for each of the 12 months of 1970, then for each of the 12 months of 1971, and finally for the first 9 months of 1972.
1970 523749293132283432395063 1971 352227273423423036564840 1972 332631252320252036
Output from R showing results from fitting a GLM modelling number of accidents with appropriately defined predictors year and month is provided below.
Call:
glm(formula = y~year + month, family = poisson)
Coefficients:
Estimate Std. Error z value Pr(> |z|)
(Intercept) 3.81969 0.09896 38.600 < 2e − 16 *** Year1971 -0.12516 0.06694 -1.870 0.061521 . Year1972 -0.28794 0.08267 -3.483 0.000496 *** month2 -0.34484 0.14176 -2.433 0.014994 * month3 -0.11466 0.13296 -0.862 0.388459 month4 -0.39304 0.14380 -2.733 0.006271 ** month5 -0.31015 0.14034 -2.210 0.027108 * month6 -0.47000 0.14719 -3.193 0.001408 ** month7 -0.23361 0.13732 -1.701 0.088889 . month8 -0.35667 0.14226 -2.507 0.012168 * month9 -0.14310 0.13397 -1.068 0.285444 month10 0.10167 0.13903 0.731 0.464628 month11 0.13276 0.13788 0.963 0.335639 month12 0.18252 0.13607 1.341 0.179812 Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ (Dispersion parameter for poisson family taken to be 1) Null deviance: 101.143 on 32 degrees of freedom Residual deviance: 27.273 on 19 degrees of freedom Number of Fisher Scoring iterations: 3 MATH3029-E1 6 MATH3029-E1 i) Write down the mathematical model fitted along with assumptions. ii) Based on the output, is it fair to state that the average number of accidents appears to have decreased from 1970 to 1972? Justify your answer. iii) The Transport Authority wishes to check if the number of accidents tend to be higher from September-December when compared to January. What would be your recommendation? Justify accordingly. iv) Construct a 95% confidence interval for the coefficent of Year1972 in the model in i), and corroborate the conclusion obtained from the p-value corresponding to Year1972 in the output. v) What is your prediction for the number of accidents in October 1972? [25 marks] MATH3029-E1 END