SOST70011 Introduction to Statistical Modelling, 2020
Assessment (100% of course grade):
Lack of physical activity and depression
Major clinical depression can be a severely debilitating illness, but even mild depressive episodes can severely affect quality of life. Roshanaei-Moghaddam et al. (2009) reviewed the relationship between physical activity and depression. There is certainly a negative correlation between physical activity and depression, such that lower levels of activity are associated with higher levels of depression, but it is unclear as to what proportion of the relationship is causal, as opposed to spurious. For example, there may be confounding factors, such as attributes of individuals that make them both more likely to engage in physical activity and less likely to suffer depression.
In an effort to disentangle spurious from causal effects, you have decided to analyse data from the English Longitudinal study of Ageing (ELSA). This study surveyed a sample of English householders aged 50 or more, across multiple occasions, two years apart. The first occasion, or wave, was in 2002 and the second wave was in 2004. Figure 1 shows the hypothesised relationships between depression (cesd) and lack of activity (notact) across the two waves, plus some key additional variables. The central causal relationships of interest are shown with dashed arrows.
male1
age1
cesd1
notact1
illness1
cesd2
notact2
Figure 1: Directed Acyclic Graph of the hypothesised causal relationships between lack of physical activity (notact1 & notact2) and depression (cesd1 & cesd2) across two waves of ELSA.
male1
age1
cesd1
notact1
illness1
cesd2
notact2
Figure 1: Directed Acyclic Graph of the hypothesised causal relationships between lack of physical activity (notact1 & notact2) and depression (cesd1 & cesd2) across two waves of ELSA.
Data
The variables shown in Figure 1 were measured in ELSA. These variables are described in Tables 1 and 2, below. Table 1 describes the data in the “wide” format file “elsa12wide.csv”. Table 2 describes the data in the “long” format file “elsa12long.csv”.
Table 1: Variables from the DAG in Figure 1, as measured in the English Longitudinal Study of Ageing (ELSA), in the “wide” format file “elsa12wide.csv”.
Variable
Description
Coding
0. id
Respondent identifier
arbitrary numerical label
1. age1
Chronological age in wave 1 (2002)
Years since birth (in 2002)
2. male1
Biological sex in wave 1
Male = 1, female = 0
3. illness1
Limiting longstanding illness, self-declared in wave 1
Has illness = 1, no illness = 0
4. cesd1
Centre for Epidemiological Studies’ Depression (cesd) scale score, wave 1
Score from 0 to 8 (higher score = more depressed)
5. cesd2
as above, in wave 2 (2004)
as above
6. notact1
Self-declared lack of moderate physical activity at least once a week, in wave 1
1 = no moderate physical activity at least once a week,
0 = at least some moderate physical activity
7. notact2
as above, in wave 2
as above
Table 2: Variables from the DAG in Figure 1, as measured in the English Longitudinal Study of Ageing (ELSA), in the “long” format file “elsa12long.csv”.
Variable
Description
Coding
0. id
Respondent identifier
arbitrary numerical label
1. age1
Chronological age in wave 1 (2002)
Years since birth, in 2002.
2. male1
Biological sex in wave 1
Male = 1, female = 0
3. illness1
Limiting longstanding illness, self-declared in wave 1
Has illness = 1, no illness = 0
4. cesd
Centre for Epidemiological Studies’ Depression (cesd) scale score, for waves 1 and 2
Score from 0 to 8 (higher score = more depressed)
5. notact
Self-declared lack of moderate physical activity at least once a week, for waves 1 and 2
1 = no moderate physical activity at least once a week,
0 = at least some moderate physical activity
6. wave
ELSA wave identifier
1 = wave 1 (2002), 2 = wave 2 (2004)
Questions to answer
Use the R package and the data contained in the two files to carry out the tasks and answer the questions below:
• Fit a single-level model using the wide format data for the unconditional effect of notact1 -> cesd2 (i.e. a model with cesd2 as the outcome and notact1 as the sole predictor, and no additional “third” variables).
• Discuss the statistical and substantive meaning of the estimated model results (e.g. b parameters & SEs, p-values, R-square). [8 marks]
• Discuss the assumptions of this model (causal and statistical), and how they affect what you can conclude about the relationship between physical activity and depression in the study population. [4 marks]
• Based upon the DAG in Figure 1, specify and fit a single-level model using the wide format data to evaluate the total causal effect of notact1 -> cesd2.
• Explain the reasoning behind your choice of model specification, i.e. which variables did you choose to include as additional predictors in the model, if any. [8 marks]
• Discuss the changes in the estimated model results (e.g. b parameters, SEs, p-values, R-square), and their statistical and substantive meanings. [12 marks]
• Discuss the assumptions of this model (causal and statistical), and how they affect what you can conclude about the relationship between physical activity and depression in the study population. [8 marks]
• Fit a single-level model using the wide format data for the unconditional effect of cesd1 -> notact2 (i.e. a model with notact2 as the outcome and cesd1 as the sole predictor, and no additional “third” variables).
• Discuss the statistical and substantive meaning of the estimated model results (e.g. b parameters, SEs, p-values, R-square). [10 marks]
• Discuss the assumptions of this model (causal and statistical), and how they affect what you can conclude about the relationship between depression and physical activity in the study population. [5 marks]
• Based upon the DAG in Figure 1, specify and fit a single-level model using the wide format data to evaluate the total causal effect of cesd1 -> notact2.
• Explain the reasoning behind your choice of model specification, i.e. which variables did you choose to include as additional predictors in the model, if any. [10 marks]
• Discuss the changes in the estimated model results (e.g. b parameters, SEs, p-values, R-square), and their statistical and substantive meaning. [15 marks]
• Discuss the assumptions of this model (causal and statistical), and how they affect what you can conclude about the relationship between physical activity and depression in the study population. [10 marks]
• Consider together the models from questions 2. and 4. Did you use the same predictor variables in each model or different ones, and why? When viewed together, what do the results from these models tell us about the relationship between physical activity and depression in the study population? [10 marks]
• Fit a model using the long format data to evaluate the proportion of variance in depression scores (cesd) between- and within-individuals.
• What is the proportion of variance in depression scores (cesd) attributable to variation between individuals in average depression levels over the two years?
[5 marks]
• What is the proportion of variance in depression scores (cesd) attributable to changes within-individuals over the two years? [5 marks]
• Fit a model using the long format data to evaluate the proportion of variance in depression scores (cesd) across the two years that is attributable to differences in the age of participants in the first wave (age1).
• Comment on the size of the variation attributable to age differences, the substantive meaning of the differences, and whether these sample differences can be generalized to the population. [20 marks]
• Fit a model using the long format data to evaluate the proportion of variance in depression scores (cesd) across the two years that is attributable to differences in the sex of participants in the first wave (male1).
• Comment on the size of the variation attributable to sex differences, the substantive meaning of the differences, and whether these sample differences can be generalized to the population. [20 marks]
Copy/paste the R script that you used to run the models for the questions above into an appendix of your submission. [You will not be graded on the correctness of this appendix. Marks will neither be awarded nor penalized for your R code. Rather it will be used to help in the understanding your models and results. Failure to include an R appendix will results in a penalty of 5 marks.]
Guidance notes
• Your submission should answer each of the 8 questions above. There are 150 marks available in total across the 8 questions.
• You should use the models we discussed in class to answer the questions. Different questions will require different models, so be sure to mention briefly which model you chose and why.
• Good answers are those that clearly address all parts of the question.
• Use descriptive statistics and/or derived quantities (such as model-predicted values) to support and justify your answers, where this will help you to answer the question clearly.
• You may re-code, centre, and / or derive new variables if it will help you answer the questions. Be sure to clearly describe and justify any such manipulation of the data.
• If your answer requires that you compute additional information not contained in the model output, be sure to show your working out.
• Your answers to questions 2.b. and 4.b. will be marked without regard to whether you answered questions 2.a. and 4.a. correctly, i.e. even if you got questions 2.a. or 4.a. wrong, it is still possible to get full marks for questions 2.b., 4.b.
• Supporting and justifying your answers with references to books and articles in the course reading lists, and other scholarly material, is encouraged and will help contribute to a higher mark. Provide a full reference for any work that you do cite. Include no more than 7 citations in your answer.
• The word limit for your submission is 3,000 words. Note that this is a limit, not a target; you are permitted to use fewer than 3,000 words, but not more.
• The word limit includes all of the words in the main text of your answer, including headings, but does not include references or the R appendix. (You do not need to repeat the full text of the questions above in your answer. Simply putting e.g. Q1.a., Q1.b., will be fine.)
• Submit your report via the Turnitin portal on Blackboard – it’s in the “Assessment” section. Deadline for submission is 2 PM, Friday 29th of January 2021.
Reference
Roshanaei-Moghaddam, B., Katon, W.J., Russo, J. (2009). The longitudinal effects of depression on physical activity. General Hospital Psychiatry. Volume 31, Issue 4, pages 306-315. https://doi.org/10.1016/j.genhosppsych.2009.04.002.