Survival Analysis – Winter 2022
Assignment 2
Copyright By PowCoder代写 加微信 powcoder
Description of WHAS data set (for questions 1 and 2):
The SAS data set for this assignment is a modified subset of data from the Worcester Heart Attack Study. The outcome is long-term survival following acute myocardial infarction (MI or heart attack)
All patients who were admitted into the hospital in the last couple of days in 1998 or in 1999 (n=188) for treatment of their heart attack were followed up from their admission date until their death date or censored at the study closure date of December 31, 2002. Their vital status at last follow up was recorded along with the date of last follow up. Additional demographic and clinical covariates are included in the data set.
Variables:
ID: Identification Code, numeric, 1 – 188
admitdate: Hospital admission date, SAS date variable with date9. format (DDMONYYYY)
disdate: Hospital discharge date, SAS date variable with date9. format (DDMONYYYY)
fdate: Date of last follow up, SAS date variable with date9. format (DDMONYYYY)
fstat: Vital status at last follow up, numeric,0 = Alive 1 = Dead
age: Age at hospital admission, numeric, years
gender: Gender, numeric, 0 = Male, 1 = Female
BMI: Body mass index, numeric, kg/m^2
CVD: History of cardiovascular disease, numeric, 0 = No, 1 = Yes
CHF: Congestive heart failure complications, numeric, 0 = No, 1 = Yes
Question 1:
a. Calculate the outcome time to death in years and check that all values are valid. Patients alive at their date of last follow up are censored. What proportion of these patients died (using the vital status at last follow up indictor)? [Note: this proportion is not adjusted for length of follow up.]
b. Explore the baseline covariates gender, age, CVD, CHF, BMI. Describe the patients’ clinical characteristics using descriptive summaries of each of these covariates. What are the relationships amongst the covariates?
c. Create the Kaplan-Meier plot for time to death. Develop exponential, Weibull, log-logistic, log-normal models for the outcome time to death unadjusted for covariates (i.e., overall survival). Also, develop the gamma model to help decide between the four previous models as well as by examining the AIC fit statistics and performing likelihood ratio tests.
d. Select one of the four parametric models and explain why this is the best model for the data set.
e. Develop a multivariate model using the chosen parametric model and some or all of the covariates described above (e.g., look at each covariate separately using and prior to finalizing a multivariate model using ). A few patients have missing data. Explain the choices you made to handle the missing data. Explain any transformation of the covariate data.
f. Assess the goodness of fit of the final model.
g. Interpret the results of the final model and identify which groups of patients have the best and worst survival. Include a description of the hazard function.
Question 2:
a. Fit a log-logistic survival model with the gender covariate (0=Male, 1=Female). [Note: this is not necessarily the correct model for question 1].
b. Produce two survival curves from this log-logistic model (i.e., one curve for males and females).
c. What is the estimated time ratio and odds ratio for survival and their 95% confidence intervals using the model parameter estimates (for females compared to males)?
d. Demonstrate the time ratio after calculating the estimated median survival for each group (using the median survival formula). Are the estimated medians observed time points in the data? Using with a ‘strata gender;’ statement, calculate the time ratio using the 25th percentile from the quartile estimates in the SAS output. Are the two time ratios similar?
e. Demonstrate the odds ratio using the estimated proportion surviving 6 months (0.5 years) or more.
f. How would you describe the time ratio and odds ratio to a member of the study team who does not have a background in statistics?
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com