程序代写代做代考 Contingency Tables 2

Contingency Tables 2
Billie Anderson, Mark Newman
2020-08-25
Agenda
▪ Prior Homework
▪ Contingency Tables
▪ Next Homework
Tests of Association for Two-Way Tables
This module is concerned with measures for understanding relationships (or dependencies) of variables in \(r\) x \(c\) contingency tables.
We are going to continue our discussion of testing the relationships among variables in a contingency table, but now we will use another statistic, chi-square, \(\chi^2\).
\(\chi^2\) is a very versatile statistic. It ‘pops up’ a lot in traditional statistical analysis. We have already used in it in the last module for a goodness of fit statistic when we trying to assess whether certain data sets followed either a Binomial or Poisson distributions.
\(\chi^2\) can also be used as a test of association for categorical variables in contingency tables. It is a competing measure to the odds ratio.
Recall, from last week, we were interested in testing
\(H_0:\) two variables are independent of each other (there is no relationship)
\(H_1\): two variables are not independent of each other (there is some relationship)
We tested the above hypothesis test with an odds ratio. We can perform the test above with a chi-square statistic.
Note: An odds ratio can only be used when you have variables in the contingency table that are binary. So, odds ratios can only be applied when the contingency table is 2×2. The chi-square statistic can be applied if the categorical variables in the table have more than 2 levels; chi-square tests of associations can be applied to \(r\) x \(c\) contingency tables.
Tests of Association Using the Chi-square statistic
In general, you can think of any chi-square statistic as
\[\sum\frac{\text{(observed frequency – expected frequency)}^2}{\text{expected frequency}}\] By ‘expected frequency’, we mean the expected frequency (counts) we would observe if \(H_0\) is true, if there is no relationship among the variables. Think about the value when you compute \(\chi^2\)!
Two possibilities:
1. What if \(\chi^2\) is small? What does this mean in terms of what you are testing?
If \(\chi^2\) is small, this means there is a small (not much) difference between the observed and expected counts; so we do not have statistical support to not reject \(H_0\).
2. What if \(\chi^2\) is large? What does this mean in terms of what you are testing?
If \(\chi^2\) is large, this means there is a big difference between the observed and expected counts; so we have statistical support to reject \(H_0\) and accept \(H_1\); support that variables are not independent, some relationship exists among the variables.
The p-value is what tells you whether the value is small or large.
Larger tables: Overall Analysis
Example: Arthritis Treatment
Want to examine if there is a relationship among the Improvement status of patients who either took a placebo or an experimental arthritis treatment.
data(“Arthritis”, package=”vcd”)
(Art <- xtabs(~Treatment + Improved, data=Arthritis)) ## Improved ## Treatment None Some Marked ## Placebo 29 7 7 ## Treated 13 7 21 (ArtPlus <- addmargins(Art)) ## Improved ## Treatment None Some Marked Sum ## Placebo 29 7 7 43 ## Treated 13 7 21 41 ## Sum 42 14 28 84 Larger tables (cont.) Using the table above, we want to test: \(H_0\): Treatment and Improved are independent \(H_1\): Treatment and Improved are not independent OR \(H_0\): no relationship between the type of treatment that patient received and their reported improvement level \(H_1\): relationship exists between the type of treatment that patient received and their reported improvement level library(vcd) assocstats(Art) ## X^2 df P(> X^2)
## Likelihood Ratio 13.530 2 0.0011536
## Pearson 13.055 2 0.0014626
##
## Phi-Coefficient : NA
## Contingency Coeff.: 0.367
## Cramer’s V : 0.394
Since p-value \(\le\alpha\), we reject \(H_0\) and accept \(H_1\) and conclude that there is a relationship between type of treatment (placebo vs. actual treatment) and patient-reported improvement level for this arthritis medicine.
Larger tables (cont.)
Let’s dig deeper! Make sure you understand how the value of \(X^2\) was obtained.
## Improved
## Treatment None Some Marked Sum
## Placebo 29 7 7 43
## Treated 13 7 21 41
## Sum 42 14 28 84
\(H_0\): no relationship between the type of treatment that patient received and their reported improvement level
\(H_1\): relationship exists between the type of treatment that patient received and their reported improvement level
\[\chi^2 = \sum\frac{\text{(observed frequency – expected frequency)}^2}{\text{expected frequency}}\] \[\chi^2 = \sum\frac{{(O – E)}^2}{{E}}\]
\(E\): expected frequency if \(H_0\) is true.
Larger tables (cont.)
## Improved
## Treatment None Some Marked Sum
## Placebo 29 7 7 43
## Treated 13 7 21 41
## Sum 42 14 28 84
\[E = \frac{\text{Row Total}*\text{Column Total}}{\text{Grand Total}}\]
For \(\text{Cell}_{1,1}\): \(E = \frac{43 * 42}{84} = 21.5\)
## Improved
## Treatment None Some Marked Sum
## Placebo 29(21.5) 7 7 43
## Treated 13 7 21 41
## Sum 42 14 28 84
Larger tables (cont.)
For \(\text{Cell}_{1,2}\): \(E = \frac{43 * 14}{84} = 7.17\)
For \(\text{Cell}_{1,3}\): \(E = \frac{43 * 28}{84} = 14.33\)
For \(\text{Cell}_{2,1}\): \(E = \frac{41 * 42}{84} = 20.5\)
For \(\text{Cell}_{2,2}\): \(E = \frac{41 * 14}{84} = 6.83\)
For \(\text{Cell}_{2,3}\): \(E = \frac{41 * 28}{84} = 13.67\)
## Improved
## Treatment None Some Marked Sum
## Placebo 29(21.5) 7(7.17) 7(14.33) 43
## Treated 13(20.5) 7(6.83) 21(13.67) 41
## Sum 42 14 28 84
Larger tables (cont.)
## Improved
## Treatment None Some Marked Sum
## Placebo 29(21.5) 7(7.17) 7(14.33) 43
## Treated 13(20.5) 7(6.83) 21(13.67) 41
## Sum 42 14 28 84
\[\chi^2 = \sum\frac{{(O – E)}^2}{{E}}\]
\[\chi^2 = \frac{(29 – 21.5)^2}{21.5} + \frac{(7 – 7.17)^2}{7.17} + \frac{(7 – 14.33)^2}{14.33} + \frac{(13 – 20.5)^2}{20.5} + \frac{(7 – 6.83)^2}{6.83} + \frac{(21 – 13.67)^2}{13.67}\]
\[\chi^2 = 2.62 + 0.00 + 3.75 + 2.74 + 0.00 + 3.93 = 13.04\]
assocstats(Art)
## X^2 df P(> X^2)
## Likelihood Ratio 13.530 2 0.0011536
## Pearson 13.055 2 0.0014626
##
## Phi-Coefficient : NA
## Contingency Coeff.: 0.367
## Cramer’s V : 0.394
Pretty Close! 13.04 ~ 13.055
Larger tables (cont.)
p-value = probability of getting \(\chi^2\) stat or bigger
pchisq(q, df, lower.tail = T)
▪ q = 13.04
▪ df = degrees of freedom = (row count – 1) * (column count – 1) = 1*2 = 2
▪ lower.tail = T means P[X ≤ x]. We want the “stat or bigger” version so lower.tail = F
pchisq(13.04, 2, lower.tail = F)
## [1] 0.001473669
assocstats(Art)
## X^2 df P(> X^2)
## Likelihood Ratio 13.530 2 0.0011536
## Pearson 13.055 2 0.0014626
##
## Phi-Coefficient : NA
## Contingency Coeff.: 0.367
## Cramer’s V : 0.394
Larger tables (cont.)
Interpretation
There is a 0.1473669 % chance that we will observe a \(\chi^2\) or bigger
Assuming \(\alpha = .05\)
Since \(\text{p-value} \le .05\) we can reject \(H_0\) and say there is support for \(H_1\) (a relationship between the variables).
Fisher’s Exact Test
Fisher’s exact test is a test of association among variables and tests the same hypothesis test as \(\chi^2\), but it should be used when your sample size is small.
How small?
Rule of thumb: more than 20% of the cells in the contingency table have an observed frequency (count) less than 5.
Cramer’s V
There are some issues with the \(\chi^2\) test.
▪ determines whether an association (relationship) exists
▪ do not measure the strength of the association
▪ depend on and reflect the sample size
\[\sum\frac{\text{(observed frequency – expected frequency)}^2}{\text{expected frequency}}\]
The p-value for the chi-square test only indicates how confident you can be that the null hypothesis of no association is false. It does not tell you the magnitude of an association.
The value of the chi-square statistic also does not tell you the magnitude of the association. If you double the size of your sample by duplicating each observation, you double the value of the chi-square statistic, even though the strength of the association does not change.
Cramer’s V (cont.)

One measure of the strength of the association between two nominal variables is Cramer’s V statistic.
It has a range of -1 to 1 for 2×2 tables and 0 to 1 for larger tables. Values farther from 0 indicate stronger association. Cramer’s V statistic is derived from the Pearson chi-square statistic.
Test for Ordinal Variables
If the variables in \(r\) x \(c\) contingency table are ordinal and you want to perform a test of association, Cochran-Mantel-Haenszel (CMH) test is appropriate.
Let’s practice with a problem from the Chapter 4 exercises. We will be running a CMH test some hospital data regarding mental patients.
The following descriptions below will be useful when interpreting the output from this test.
Example
library(vcd)
library(vcdExtra)
data <- vcd::Hospital data ## Length of stay ## Visit frequency 2-9 10-19 20+ ## Regular 43 16 3 ## Less than monthly 6 11 10 ## Never 9 18 16 Example (cont.) assocstats(data) ## X^2 df P(> X^2)
## Likelihood Ratio 38.353 4 9.4755e-08
## Pearson 35.171 4 4.2842e-07
##
## Phi-Coefficient : NA
## Contingency Coeff.: 0.459
## Cramer’s V : 0.365
CMHtest(data)
## Cochran-Mantel-Haenszel Statistics for Visit frequency by Length of stay
##
## AltHypothesis Chisq Df Prob
## cor Nonzero correlation 29.138 1 6.7393e-08
## rmeans Row mean scores differ 34.391 2 3.4044e-08
## cmeans Col mean scores differ 29.607 2 3.7233e-07
## general General association 34.905 4 4.8596e-07
Example (cont.)
## Cochran-Mantel-Haenszel Statistics for Visit frequency by Length of stay
##
## AltHypothesis Chisq Df Prob
## cor Nonzero correlation 29.138 1 6.7393e-08
## rmeans Row mean scores differ 34.391 2 3.4044e-08
## cmeans Col mean scores differ 29.607 2 3.7233e-07
## general General association 34.905 4 4.8596e-07
cor: Tests to see if there is a non-zero correlation. Assumes both variables are Ordinal.
\(H_0\): variables are independent (no relationship among the variables)
\(H_1\): variables are not independent (relationship exists among the variables)
rmeans: Row Mean Scores Differ: Do the means differ over the rows in the table. Assumes the columns are Ordinal.
\(H_0\): means are the same over the rows
\(H_1\): means are different over the rows
Example (cont.)
## Cochran-Mantel-Haenszel Statistics for Visit frequency by Length of stay
##
## AltHypothesis Chisq Df Prob
## cor Nonzero correlation 29.138 1 6.7393e-08
## rmeans Row mean scores differ 34.391 2 3.4044e-08
## cmeans Col mean scores differ 29.607 2 3.7233e-07
## general General association 34.905 4 4.8596e-07
cmeans: Column Mean Scores Differ: Do the means differ over the columns in the table Assumes the rows are Ordinal.
\(H_0\): means are the same over the columns
\(H_1\): means are different over the columns
general: testing the same association as chi-square for nominal variables, there is some association between the variables
\(H_0\): variables are independent (no relationship among the variables)
\(H_1\): variables are not independent (relationship exists among the variables)
Stratified tables
Example: Arthritis Treatment
Want to examine if there is a relationship among the Improvement status of patients who either took a placebo or an experimental arthritis treatment while blocking over sex.
data <- vcd::Arthritis head(data) ## ID Treatment Sex Age Improved ## 1 57 Treated Male 27 Some ## 2 46 Treated Male 29 None ## 3 77 Treated Male 30 None ## 4 17 Treated Male 32 Marked ## 5 36 Treated Male 46 Marked ## 6 23 Treated Male 58 Marked Stratified tables (cont.) (a <- xtabs(~Treatment + Improved, data = data)) ## Improved ## Treatment None Some Marked ## Placebo 29 7 7 ## Treated 13 7 21 (b <- xtabs(~Treatment + Improved + Sex, data = data)) ## , , Sex = Female ## ## Improved ## Treatment None Some Marked ## Placebo 19 7 6 ## Treated 6 5 16 ## ## , , Sex = Male ## ## Improved ## Treatment None Some Marked ## Placebo 10 0 1 ## Treated 7 2 5 Stratified tables (cont.) assocstats(a) ## X^2 df P(> X^2)
## Likelihood Ratio 13.530 2 0.0011536
## Pearson 13.055 2 0.0014626
##
## Phi-Coefficient : NA
## Contingency Coeff.: 0.367
## Cramer’s V : 0.394
Stratified tables (cont.)
assocstats(b)
## $`Sex:Female`
## X^2 df P(> X^2)
## Likelihood Ratio 11.731 2 0.0028362
## Pearson 11.296 2 0.0035242
##
## Phi-Coefficient : NA
## Contingency Coeff.: 0.401
## Cramer’s V : 0.438
##
## $`Sex:Male`
## X^2 df P(> X^2)
## Likelihood Ratio 5.8549 2 0.053532
## Pearson 4.9067 2 0.086003
##
## Phi-Coefficient : NA
## Contingency Coeff.: 0.405
## Cramer’s V : 0.443
All the significance is in Sex:Female. What could this mean?
Stratified tables (cont.)
There are over two times the number of females. Remember the rule of thumb from Fisher’s Exact Test: 20% of the cells are less than 5.
addmargins(b)
## , , Sex = Female
##
## Improved
## Treatment None Some Marked Sum
## Placebo 19 7 6 32
## Treated 6 5 16 27
## Sum 25 12 22 59
##
## , , Sex = Male
##
## Improved
## Treatment None Some Marked Sum
## Placebo 10 0 1 11
## Treated 7 2 5 14
## Sum 17 2 6 25
##
## , , Sex = Sum
##
## Improved
## Treatment None Some Marked Sum
## Placebo 29 7 7 43
## Treated 13 7 21 41
## Sum 42 14 28 84
Stratified tables (cont.)
justmale <- b[,, 2] fisher.test(justmale, simulate.p.value = T) ## ## Fisher's Exact Test for Count Data with simulated p-value (based on ## 2000 replicates) ## ## data: justmale ## p-value = 0.09545 ## alternative hypothesis: two.sided chisq.test(justmale, simulate.p.value = T) ## ## Pearson's Chi-squared test with simulated p-value (based on 2000 ## replicates) ## ## data: justmale ## X-squared = 4.9067, df = NA, p-value = 0.08546 Stratified tables (cont.) Interpretation There is statistical support (\(\chi^2 = 13.055\), p < 0.01) to suggest that there is a relationship between the treatment and the treatment outcome. However, when blocking over Sex, the effect vanished in Males (p > 0.05), possibly due to the low male enrollment in the study.
Next Homework
▪ Review Expectations