Johanna G. Nešlehová Mc , Winter Term 2022
Generalized Linear Models MATH 523 Assignment 2 due on March 25 at noon. Q1 Lecture 9a
Consider a binomial GLM with an arbitrary link function g and n responses that have been entered in a grouped format. Using the same notation as in the lecture notes, show that:
(1) The maximum likelihood estimates of β do not depend on whether the data have been entered in a grouped or ungrouped format.
Copyright By PowCoder代写 加微信 powcoder
(2) The Fisher information matrix does not depend on whether the data have been entered in a grouped or ungrouped format. Conclude that the asymptotic covari- ance matrix of βˆ (and consequently the standard errors of βˆj , j = 1, . . . , p) does not depend on the data entry format. Hint: It is easiest if you verify the entry at position (j,k) of the Fisher information for arbitrary j,k, rather than doing the matrix multiplication.
Q2 Suppose that miYi is binomial (mi, πi), where g(πi) = Xiβ and i = 1, . . . , n. Consider the null model, for which π1 = … = πn. Show that
ni=1 miyi πˆ=n m.
When mi = 1 for all i ∈ {1,…,n}, show that in this case, the 2 statistic, which is defined as the sum of the squared Pearson residuals, equals n. Decide whether or not X2 is useful for testing whether a Binomial GLM model fits the data well when the response is binary.
Q3 R exercise
Consider the following data on home-well contamination in 3020 households in Ara- hazar upazila, Bangladesh. The response variable is switch (binary variable whether or not the household switched to another well from an unsafe well). Other variables collected for each household were arsenic (the level of arsenic contamination in the household’s original well, in hundreds of micrograms per liter), dist100 (distance in 100-meter units to the closest known safe well), educ (years of education of the head of the household) and assoc (whether or not any members of the household participated in any community organizations: no or yes). The data is available in MyCourses under Datasets. Load the data and compute dist100 as follows.
wells <- read.table("../Datasets/wells.dat")
attach(wells)
dist100 <- dist/100
Johanna G. Nešlehová Mc , Winter Term 2022
Generalized Linear Models MATH 523 Assignment 2 due on March 25 at noon.
(1) Report whether the data have been entered in a grouped or ungrouped form, and
which explanatory variables are continuous and which are factors.
(2) Fit a logistic regression model with the intercept and arsenic. Assess the fit of this model graphically as follows: divide arsenic into 30 approximately filled categories, group the data accordingly, and display the empirical logits of switch- ing to a safe well for each category and display the fitted regression line. Do you think the model is adequate? Perform an approximate goodness-of-fit test of the model using the above binning and Pearson’s X2 statistic; conclude at the 5% level.
(3) Find the most appropriate logistic regression model for the data. Use the de- viance, but also consider practical significance by looking at the AIC and the size of the effect of the predictors.
(4) Try to simplify the model you found in part (3) by replacing educ by a binary factor predictor feduc, constructed as follows:
This predictor feduc records whether the person has a primary education (i.e. 1–8 years) or secondary education and above (i.e. more than 9 years).
(5) Compare the final model in parts (3) and (4) using AIC and ROC curves. Which one do you prefer and why? Interpret the final model you selected.
Q4 R exercise
Consider a study on the duration of unemployment (1: short-term unemployment, less than 6 months; 0: long-term unemployment) with explanatory variables gender (1: male, 0: female) and level of education (0: lower, 1: higher). The data are summarized in the table below.
Gender Education Level Short Term Unemployment Long Term Unemployment 1 0 313 126
1 90 41 0 0 196 132
(1) Analyze these contingency table data with logistic regression using the duration of unemployment as a response.
feduc <- numeric(3020)
for(i in 1:3020){
if(educ[i] < 9){feduc[i] <- 0}
if(educ[i] > 8){feduc[i] <- 1}
Johanna G. Nešlehová Mc , Winter Term 2022
Generalized Linear Models MATH 523 Assignment 2 due on March 25 at noon.
(2) Describe the dependence relationship between the explanatory variables and the response (conditional independence, homogeneous association etc.) in the model selected in part (1).
(3) For the model selected in part (1), calculate the relevant odds ratios that de- scribe the effect of the explanatory variable(s) on the response along with a 95% confidence interval.
(4) Calculate the expected counts from the model selected in part (1) and compare them to the observed counts using Pearson’s X2 statistic. Test goodness of fit using an appropriate χ2 null distribution and conclude at the 5% level.
(5) Interpret the final model in one or two sentences in layman terms that a non- statistician can understand (no formulas).
Due on March 25 at noon.
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com