ENVS450 SOCIAL SURVEY ANALYSIS
Assignment 2 Correlation and Regression
(1) Choose ONE of the following two datasets:
2011 Census.Rdata
QLFS 2012 (Assignment 2).Rdata
(2) Identify an outcome variable of interest to you (3) Briefly explore the variation in this variable
(4) Examine how strongly correlated your outcome variable is with the other variables in the dataset
(5) Based on your preliminary data exploration, and on previous literature, select FIVE potential predictor variables. The selection should include at least ONE categorical and ONE continuous predictor variable.
(6) Explicitly adopt one of the following regression model fitting strategies, justifying your choice: ‘Narrative’; Backwards elimination; Backward stepwise; All subsets
(7) Use your chosen strategy to identify an initial ‘best’ model
(8) Evaluate the potential utility of adding an interaction effect to your ‘best’ model
(9) Evaluate the potential utility of ONE of the following techniques to enhance your initial ‘best’ model: data transformation; polynomials; piecewise regression
(10) Based on the outcomes of (8) and (9) identify your final preferred ‘best’ model
(11) For your final preferred ‘best’ model:
(a) Assess the model fit
(b) Use model diagnostics to demonstrate the extent to which this ‘best’ model satisfies key
regression assumptions, and offer suggestions regarding how any breaches of these
assumptions could potentially be addressed.
(c) Assess the relative importance of the predictors in the model
(d) Offer an interpretation of what the model reveals about the world we live in
(12) Include adequately documented code in your submission: either as an appendix to your report if using Word (or similar); or as an integral part of the report if using R Notebook or R Markdown (or similar).
TIPS
(1) Make sure the type of correlation and regression you use matches the types of outcome and predictor variables being analysed
(2) Structure your write-up as much like a journal article as possible [see Assignment Appendix], subject to the following caveats:
Explicitly address all of the points above in your write up
Include adequately documented code either in the body of the report or as an appendix.
No need for an extensive literature review – the focus of this assignment is on
demonstrating mastery of correlation and regression – but a good/very good answer would
include a few citations to relevant ideas/theories at relevant points, based upon a quick
google.
Because the emphasis of the assignment is on mastery of technique, more extensive (but
still not exhaustive!) coverage of data preparation and model diagnostics than that found in a conventional academic article would be appropriate.
(3) The guideline word count for this assignment is up to c. 3,000 words, excluding any tables, bibliography or code. I will not be counting, but recommend you bear this guideline in mind to guide your efforts. I am NOT asking for a full blown 8-10k word journal article!
(4) Deadline: 2pm, Monday 25th January (Monday of Week 2 of Assessment Period)
(5) Marking criteria: standard school ‘report’ marking criteria apply, but with less emphasis on evidence of reading.
Appendix: The structure of the typical journal article
1. Introduction
– aim (to answer a specified research question)
– why topical/interesting?
– Structure of article
2. Literature review
– what do we already know about this subject?
– rationale for including certain predictor variables in the model
– what knowledge gap remains that this article will address? (Includes ‘not studied before in this area / year’)
3. Methodology
– A brief introduction to the dataset being analysed (who collected it? When? How many responses?)
– A brief description and justification of the statistical techniques in the subsequent analysis
EITHER
4. Results
– description and interpretation of results, including relevance to existing literature
– selective illustrations (graphs and tables)
5. Conclusion
– summary of main findings
– limitations of study (self-critique)
– implications of the findings of the study for others
OR
4. Results
– brief description of results
– selective illustrations (graphs and tables)
5. Discussion
– interpretation of results, including relevance to existing literature
– selective illustrations (graphs and tables)
6. Conclusion
– summary of main findings
– limitations of study (self-critique)
– implications of the findings of the study for others