Assignment STAT-6108, 2021-22 Analysis of Hierarchical Data
Your coursework must be submitted using the TurnItIn link provided in Blackboard before 4pm on Friday May 6 2022. For more information about the submission process, please check section 3 d) of the Module Outline.
Remember that the University places the highest importance on maintaining academic integrity and expects all students to do the same. Please make sure you are familiar with the Regulations governing Academic Integrity, which are available at http://www. calendar.soton.ac.uk/sectionIV/academic-integrity-regs.html.
General information
Copyright By PowCoder代写 加微信 powcoder
This assignment comprises two tasks divided in questions. Please provide your answers in a document organised according to this structure by clearly stating the task and question you are answering. The marks allocated to each question are shown in paren- thesis. Your document should be written using a minimum font size 12; line spacing 1.5; and 2.5 left and right margins. Your document should not exceed 12 pages or 4000 words. Additional output can be included in an appendix of up to 3 pages. Anything beyond the word or page limit will not be marked.
Task 1 doesn’t require you to fit any models (no dataset is provided). All the informa- tion you need to answer the questions is included in the outputs presented. For task 2, you can use R, MLwiN or a combination of the two to do your analyses.
Up to 5 marks will be allocated to the general presentation of your report.
Task 1 [35 marks]
Below you are given the MLwiN output of four models fitted to the data of a household
survey researching about liberal attitudes. The variables in the dataset are:
LibAtt: Individual score on an index of liberal attitudes (continuous).
Age: Centred age of the individuals (continuous)
SES: Socioeconomic level of the household (0=Average or lower; 1=Above average).
In the models below, Const denotes a variable of 1’s used to obtain the intercept. Sub- index j correspond to Households and i to individuals.
LibAttij ∼N(XB,Ω)
LibAttij = β0ijConst
β0ij = 5.119(0.067) + u0j + e0ij
[u0j] ∼ N(0,Ωu) : Ωu = [0.638(0.102)] [e0ij] ∼ N(0,Ωe) : Ωe = [1.813(0.091)]
LibAttij ∼N(XB,Ω)
LibAttij = β0ijConst −0.516(0.041)Ageij + 0.604(0.065)SESj β0ij = 3.934(0.137) + u0j + e0ij
[u0j] ∼ N(0,Ωu) : Ωu = [0.348(0.2)] [e0ij] ∼ N(0,Ωe) : Ωe = [1.547(0.078)]
LibAttij ∼N(XB,Ω)
LibAttij = β0ijConst −0.527(0.031)Ageij + 0.608(0.014)SESj β0ij =3.933(0.097)+e0ij
[e0ij] ∼ N(0,Ωe) : Ωe = [1.895(0.064)]
LibAttij ∼N(XB,Ω)
LibAttij = β0ijConst + β1jAgeij + 0.601(0.060)SESj β0ij = 3.928(0.127) + u0j + e0ij
β1j =−0.504(0.032)+u1j
u0j 0.328(0.052)
∼ N (0, Ωu ) : Ωu = −0.017(0.005)
1) Using Model 1, calculate the predicted random intercept for household number 7, which has 3 members with observed Liberal Attitudes scores: 3.497, 5.655 and 7.219. Comment on how would you expect the predicted random intercept to change if instead of 3 members, that household had 10, keeping constant the average level-1 residual.
2) State the assumptions underpinning Models 2 and 3. Which one of them do you think is more appropriate for this data? Justify your choice (no formal testing is required).
3) Write down the equation(s) of the fixed cluster effects model that is the coun- terpart of Model 2. Comment on its advantages and disadvantages compared to Model 2 in this particular example.
4) State the assumptions underpinning Model 4. How does the assumed relationship between the predictors and response change compared to that of Model 2?
5) Consider two individuals living in the same household, with centred ages 1 and 1.5. Calculate the estimated correlation between their Liberal Attitude scores under models 2, 3 and 4.
6) Write down the equation(s) of the marginal model that is the counterpart of Model 2. Comment on its advantages and disadvantages compared to Model 2 in this particular example.
[e0ij] ∼ N(0,Ωe) : Ωe = [1.538(0.072)]
0.025(0.001)
Task 2 [60 marks]
Research has suggested that participation in youth organisations has a positive effect in the well-being of teenagers. As a way to gather evidence for the formulation of relevant public policy, the Department for Education has commissioned a survey with the aim of understanding the relationship between a teenager’s well-being and the number of hours he or she allocates to those activities, as well as finding whether differences can be found for different types of organisation (Sport, Arts or Volunteering organisations).
The data collected is presented in the file yorg.wsz. A sample of 87 youth associations was selected and all their members between the ages of 12 and 16 were interviewed. Teenagers who are members of more than one association or those who do not partici- pate actively, as measured by not participating at least four hours per week during the last month, will not be taken into consideration. For the remaining individuals, the following variables are available:
ID.org: Organisation identifier
ID.indiv: Individual identifier
WB.index: Standardised index of well-being. Measured in a continuous scale from 0
to 5. Higher values indicate better perceived well-being Age: In completed years
H.week: Average number hours per week spent on association activities during the last month
Type: 1 = Sports; 2= Arts; 3= Volunteering
1) Describe how would you proceed to analyse this dataset using an aggregated (group analysis) approach. Discuss in your own words the main potential issue (if any) of using this tool to answer the research question of interest (Max. 200 words).
2) Fit a random intercepts model to study the relationship between an individual’s well-being and his/her number of hours allocated per week.
a) Use an appropriate statistical test to decide, at 5% of significance, whether to include a quadratic term in your model. Clearly state the null and alternative hypothesis of the test and your conclusion.
b) Write the fitted equations of the model (with or without quadratic term depending on the results of your test) in the notation used in the lectures.
Please do not simply copy the software output. Use plots, predicted values or the estimates of the regression coefficients to explain in simple words the relationship between those two variables.
c) Write the equation you would use to predict the expected well-being index for an individual attending the organisation with identifier number 4.
3) Use appropriate statistical tests to assess, at 5% of significance, whether to in- clude variables Age and Type to your current model. Make sure of stating your hypothesis and conclusion. Write the fitted equations of your final model includ- ing the fixed and random parts.
4) Check the residuals of your final model in 3) and comment on the validity of the model assumptions. Use plots and tests to decide whether to include a contextual variable of the ”group mean” type in your model. If you decide to do so, include it and write the equations of your final model.
[12 marks]
5) Use an appropriate statistical test to assess, at 10% of significance, whether it is necessary to include a random slope for Age in your final model in 4).
6) Using as starting point your final model in 4), use an appropriate statistical test to assess, at 5% of significance, whether the relationship between hours and well- being varies depending on the type of Organisation. Explain your findings and summarize the relationship between hours, type of organisation and well-being.
7) Summarize the conclusions of your analysis in non-technical language (Max. 200 words).
Presentation of the report.
[10 marks]
End of the coursework
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com