—
title: “Untitled”
output: html_document
—
“`{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
“`
### Submission instructions
This is an R Markdown document—an example of *literate programming*, an approach which allows users to interweave text, statistical output and the code that produces that output.
***
### Overview
In this assignment you are provided with data on maternal characteristics and birth outcomes. Your aim is to assess the causal effect of maternal smoking during pregnancy on child’s birthweight.
#### Here is a brief description of the available variables:
* **momid** identification number of the mother
* **idx** index number of a mother’s birth
* **mage** age of mother (in years)
* **gestat** length of gestation (in weeks)
* **birwt** birthweight (in grams)
* **smoke** indicator variable for smoking status (1=smoker, 0=nonsmoker)
* **male** sex of the baby (1=male, 0=female)
* **collgrad** college-graduate indicator (1=graduate, 0=non-graduate)
* **black** indicator variable for black race (1=black, 0=white)
These data are contained in your assignment repo and can be read as follows:
“` {r read-data}
dat <- read.csv("smoking_data_first_born.csv", header=TRUE)
head(dat)
```
***
### Assessment questions
(1) There are nine variables in the dataset. Select which variables you will consider for matching and justify your decision (10%)
(2) Using your chosen variables, match smokers to non-smokers using the matching specifications in (2a) and (2b). Briefly summarise the balance in both cases, drawing on appropriate numerical and graphical summaries.
(2a) 3:1 nearest-neighbour matching using the propensity score distance metric (20%)
```{r}
library("MatchIt")
match1 <- matchit(formula = smoke ~ birwt + gestat, data = dat,
method = "nearest",
distance = "logit",
ratio = 3,
discard = "none")
match1
```
(2b) Coarsened Exact Matching (20%)
```{r}
# Coarsened exact matching
match2 <- matchit(smoke ~ birwt + gestat, data=dat, method = "cem")
summary(match2)
```
(3) Fit the model birwt ~ smoke in
(3a) The raw data (5%)
```{r}
dat$birwt <- as.factor(dat$birwt)
match3 <- matchit(birwt ~ smoke, data = dat, method = "nearest", distance = "logit", ratio=1, discard = "none")
```
(3b) the matched data, using your preferred match from 2a/2b (5%)
```{r}
```
(4) Briefly describe the model results. How does the estimated effect for smoking change and why? (10%)
(5) Briefly argue whether or not the assumption of exchangeability is likely to hold in the implicit causal model underpinning 3b. (10%)
(6) These data come from an observational study. Imagine the **exact same data** arose from a Randomised Control Trial where mothers were randomised to smoke or not smoke during pregnancy (ignore the obvious ethical issues of such a trial for the sake of this question).
(6a) What can you observe about the randomisation process? (5%)
(6b) How would the roll of background variables like mother's race and education change in the analysis, given that the data came from an RCT? (15%)
***