Homework 5
Homework 5
BEGIN ASSIGNMENT
generate: true
Logistic regression for pesticide toxicity data.
A environmental agency is testing the effects of a pesticide that can cause acute poisoning in bees, the world’s
most important pollinator of food crops. The environmental agency collects data on exposure to different
levels of the pesticide in parts per million (ppm). The agency also identifies collapsed beehives, which they
expect could be due to acute pesticide poisoning. In the data they collect, each observation is pair (xi, yi),
where xi represents the dosage of the pollutant and yi represents whether or not the hive survived. Take
yi = 1 means that the beehive has collapsed from poisoning and yi = 0 means the beehive survived. The
agency collects data at several different sites, each of which was exposed to a different dosages. The resulting
data can be seen below:
x <- c(1.06, 1.41, 1.85, 1.5, 0.46, 1.21, 1.25, 1.09, 1.76, 1.75, 1.47, 1.03, 1.1, 1.41, 1.83, 1.17, 1.5, 1.64, 1.34, 1.31) y <- c(0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0) n <- length(y) Assume that beehiv collapse, yi, given pollutant exposure level xi, is Yi ∼ Bernoulli(θ(xi)), where θ(xi) is the probability of death given dosage xi. We will assume that logit(θi(xi)) = α + βxi where logit(θ) is defined as log(θ/(1−θ)). This model is known as logistic regression and is one of the most common methods for modeling probabilities of binary events. 1a. Solve for θi(xi) as a function of α and β by inverting the logit function. If you haven’t seen logistic regression before (it is covered in more detail in PSTAT 127 and PSTAT131), it is essentially a generalization of linear regression for binary outcomes. The inverse-logit function maps the linear part, α + βxi, which can be any real-valued number into the interval [0, 1] (since we are modeling probabilities of binary outcome, we need the mean outcome to be confined to this range). Type your answer here, replacing this text. 1b The dose at which there is a 50% chance of beehvive collapse, θ(xi) = 0.5, is known as LD50 (“letha dose 50%”), and is often of interest in toxicology studies. Solve for LD50 as a function of α and β. Type your answer here, replacing this text. 1c Implement the logistic regression model in stan by reproducing the stan model described here: https: //mc-stan.org/docs/2_18/stan-users-guide/logistic-probit-regression-section.html. Run the stan model on the beehive data to get Monte Carlo samples. Compute Monte Carlo samples of the LD50 by applying the function derived in the previous part to your α and β samples. Report and estimate of the posterior mean of the LD50 by computing the sample average of all Monte Carlo samples of LD50. 1 https://mc-stan.org/docs/2_18/stan-users-guide/logistic-probit-regression-section.html https://mc-stan.org/docs/2_18/stan-users-guide/logistic-probit-regression-section.html # YOUR CODE HERE 1d. Make a plot showing both 50% and 95% confidence band for the probability of a hive collapse as a function of pollutant exposure, Pr(y = 1 | α, β, x). Plot your data on a grid of x-values from x = 0 to 2. # YOUR CODE HERE 2 Logistic regression for pesticide toxicity data.