CS计算机代考程序代写 Contingency table analysis

Contingency table analysis

Contingency table analysis

This lab is about using the \(\chi^2\) test of independence for hypotheses about associations between categorical variables. It focuses on contingency tables, which show the observed counts of all possible combinations of the levels of two or more categorical variables. The lab also shows how to use Fisher’s exact test when the \(\chi^2\) test sample size assumption is not met.

Aspirin use to prevent heart attacks

A large pharmaceutical company wants to test the idea that taking a low dose of aspirin daily leads to a lower risk of heart attack. They have conducted a large study with patients, where a number of patients were given either aspirin or a placebo. Follow-ups determine whether each patients experienced a cardiac episode. A total of 7702 patients were included in the study.

The data is summarized in the file ‘patients_demo.csv’.

patients <- read.csv("./patients_demo.csv", header = TRUE) From this data, we can get the number of patients who had cardiac episodes in both the aspirin and placebo group. tab <- table(patients) tab ## Follow.up ## Treatment Heart attack No heart attack ## Aspirin 37 3753 ## placebo 72 3840 Now, we can plot the relative proportion of each group in the contingency table, with a mosaic plot. mosaicplot(tab) Remember to label your axes and give your plot an informative title. The \(\chi^2\) test of independence We will use a test of independence to detect any effect of aspirin on the probability of having a heart attack. The best available test is the \(\chi^2\) test of independence. Much like the \(\chi^2\) test of goodness of fit, this measures the average squared difference between observed values and the values expected under the null hypothesis. To find out what the expected values are, we must first calculate all the marginal totals for the cointingency table. # Calculate expected values. # First save the sum of each row and column: aspirin <- sum(tab[1,]) placebo <- sum(tab[2,]) h.attack <- sum(tab[,1]) no.h.attack <- sum(tab[,2]) N <- sum(tab) # Calculate expected values. exp.asp.ha <- (aspirin * h.attack) / N exp.p.ha <- (placebo * h.attack) / N exp.asp.nha <- (aspirin * no.h.attack) / N exp.p.nha <- (placebo * no.h.attack) / N # Organize the expected values in the same format as the observed values. exp <- tab exp[1,1] <- exp.asp.ha exp[1,2] <- exp.asp.ha exp[2,1] <- exp.p.ha exp[2,2] <- exp.p.nha exp ## Follow.up ## Treatment Heart attack No heart attack ## Aspirin 53.63672 53.63672 ## placebo 55.36328 3856.63672 Here, all expected values are above 5, so Cochrane’s rules are respected, and the \(\chi^2\) approximation is appropriate. We can run the test: x2.num <- (tab - exp)^2 x2 <- sum(x2.num / exp) #Degrees of freedom = 1 #p value p <- 1 - pchisq(x2, 1) At a significance level of 0.05, we reject the null hypothesis that a daily aspirin dose doesn’t affect the risk of having a heart attack (\(\chi^2\) = 10.3054719, df = 1, p = 0.0013). We can also do this more efficiently using the chisq.test function in R: chisq.test(tab, correct = FALSE) #The correct = FALSE tells R not to use Yate's continuity correction ## ## Pearson's Chi-squared test ## ## data: tab ## X-squared = 10.305, df = 1, p-value = 0.001326 Fisher’s Exact test What if the expected values had not satisfied Cochrane’s rules? We would need to run a test of independence that doesn’t rely on the \(\chi^2\) approximation. We can use Fisher’s Exact test: fisher.test(tab) ## ## Fisher's Exact Test for Count Data ## ## data: tab ## p-value = 0.00138 ## alternative hypothesis: true odds ratio is not equal to 1 ## 95 percent confidence interval: ## 0.3428983 0.7944959 ## sample estimates: ## odds ratio ## 0.525847 At a significance level of 0.05, we reject the null hypothesis that a daily aspirin dose doesn’t affect the risk of having a heart attack (Fisher’s exact test: Odds ratio = 0.526, p = 0.00138). Odds ratio Notice the test statistic given for Fisher’s exact test. The odds ratio is a measure of relative probability, ranging from 0 to infinity. It describes how likely a data point is to belong to one group of one of the variables, given it belongs to a specific group of the other. For example, the above odds ratio of 0.526 means that a patient is about half as likely to be in the heart attack group if they are in the aspirin group. Conversely, a patient in the placebo group is twice as likely to be in the heart attack group. This statistic is affected by the order of groups in the contingency table. Fisher’s exact test is a two-tailed test, thus given the same data we might have found an odds ratio of 1.9011407 if we had input our table differently.