CS计算机代考程序代写 Subject 8 The Chi-Square Test of

Subject 8 The Chi-Square Test of
Independence

STAT 205 L01-02, Fall 2021

Instructor: Dr. Bingrui (Cindy) Sun

Department of Mathematics and Statistics

Univeristy of Calgary

Outline of Topics and Where to Find Them in the Pearson

eText

Topic Pearson eText

Chi-Square Test for

Two-Way r × c Tables Chapter 10

Learning Outcomes of Subject 8

Understand the Chi-square distribution and apply the Chi-Square test

to determine whether two categorical variables are independence.

Chi-Square Test for Two-Way r × c Tables

In this subject, we study how to compare two or more populations

when the response variables has two or more categories and how to

test whether two categorical variables are independent.

The chi-square (χ2) test in this section answers questions such as

Are men and women equally likely to su�er lingering fear symp-

toms after watching scary movies like Jaws and Poltergeist at

a young age?

Is there an association between texting while driving and auto-

mobile accidents?

Example 8.1 Literature demonstrates that early exposure to fright-

ening movies is associated with lingering fright symptoms. As part of

a class on media e�ects, college students were asked to write narra-

tive accounts of their exposure to frightening movies before the age

of 13. The following table breaks down the results by gender.

Ongoing fright symptoms Men Women Total

Yes 7 (18%) 29 (37%) 36 (31%)
No 31 (82%) 50 (63%) 81 (69%)

Total 38 (100%) 79 (100%) 117 (100%)

We use the term “r×c table” to describe a two-way table of counts
with r rows and c columns. The two categorical variables in this ex-
ample are “Ongoing fright symptoms” with values “Yes” and “No”,

and “Gender” with values “Men” and “Women”. We view “Gender”

as an explanatory variable, and designate it as the column variable.

The row variable “Ongoing fright symptoms” is a categorical re-

sponse variable.

The null hypothesis H0 of interest in a two-way table is: there is
no association between the row variable and the column variable.

In Example 8.1, this null hypothesis says that gender and having

ongoing fright symptoms are not related. The alternative hypothesis

Ha is that there is an association between these two variables in any
directions.

In Example 8.1, the null hypothesis says that the distributions of the

ongoing fright symptoms variable are the same across the genders.

For a general two-way r × c table, where the columns correspond
to independent samples from c distinct populations, there are c dis-
tributions for the row variable, one for each population. The null

hypothesis says that the c distributions of the row variable are iden-
tical. The alternative hypothesis is that the distributions are not all

the same.

To test the null hypothesis in r × c tables, we compare the observed
cell counts with expected cell counts calculated under the assumption

that the null hypothesis is true.

expected cell count =
row total × column total

n

OFS Men (Expect) Women (Expect) Total

Yes 7 (38×36
117

= 11.69) 29 (24.31) 36 (36)
No 31 (38×81

117
= 26.31) 50 (54.69) 81 (81)

Total 38 (38) 79 (79) 117 (117)

If there is no association between gender and having ongoing fright

symptoms, how likely is it that a sample would show a di�erence as

large or larger than that displayed in the table above? We discuss

the signi�cance test to examine this question.

The chi-square statistic is a measure of how much the observed cell

counts in a two-way table diverge from the expected cell counts. The

formula for the statistic is

X 2 =
∑ (observed counts − expected counts)2

expected count

where the sum is over all r × c cells in the table.

Image courtesy: en.wikibooks.org; onlinecourses.science.psu.edu.

Under the assumption that H0 is true, that is, there is no association
between the row and column variables in a two-way table. The chi-

square statistic X 2 has approximately a χ2 distribution with (r −
1)(c − 1) degrees of freedom.

If the expected counts and the observed counts are very di�erent,

a large value of X 2 will result. The chi-square test for a two-way
table always uses the upper tail of the χ2 distribution, because any
deviation from the null hypothesis makes the statistic larger.

The P−value for the chi-square test for a two-way table is P(χ2 ≥
X 2). We reject H0 if the calculated X

2 value is larger than the critical

value χ2 with a right tail probability of α and degrees of freedom
(r − 1)(c − 1), or if the P−value is smaller than α, where α is the
pre-speci�ed signi�cant level.

Selected χ2 critical values can be found in Table 5 on page Ap-
pendix A-8 of the Pearson eText.

The approximation of the distribution of X 2 by χ2 becomes more
accurate as the cell counts increase. Moreover, it is more accurate

for tables larger than 2× 2 tables. For 2× 2 tables, we require that
all four expected cell counts be 5 or more.

Solutions to Example 8.1:

X 2 =
∑ (observed counts − expected counts)2

expected count

=
(7− 11.69)2

11.69
+

(29− 24.31)2

24.31
+

(31− 26.31)2

26.31
+

(50− 54.69)2

54.69
= 4.02

StatCrunch: Stat�Calculators�Chi-Sqauare

The critical value is 3.84, which is smaller than the calculated test

statistic value of 4.02, therefore we can reject the null hypothesis

that there is no association between gender and having ongoing fright

symptoms.

The p−value is P(χ2
1
> 4.02) = 0.04496, which is smaller than

α = 0.05, we can reject the null hypothesis .
However, this test does not provide insight into the nature of the

relationship between the row and column variables. It is up to us to

see that the data show that women are more likely to have lingering

fright symptoms. You can accompany a chi-square test by percents

and by a description of the nature of the relationship.

StatCrunch: Stat�Tables�Contingency�With Summary

Example 8.2 Physical activity generally declines when students leave

high school and enroll in college. This suggests that college is an ideal

setting to promote physical activity. One study examined the level

of physical activity and other health-related behaviors in a sample

of 1184 college students. The data for physical activity (column

variable) and consumption of fruits (row variable) are summarized in

the table below. Expected cell counts are given in brackets.

FC\ PA Low Moderate Vigorous Total
Low 69 (569×108

1184
= 51.90) 206 (212.89) 294 (304.20) 569

Medium 25 (321×108
1184

= 29.28) 126 (120.10) 170 (171.62) 321
High 14 (294×108

1184
= 26.82) 111 (110.00) 169 (157.18) 294

Total 108 443 633 1184

Solutions to Example 8.2:

X 2 =
∑ (observed counts − expected counts)2

expected count

=
(69− 51.9)2

51.9
+

(206− 212.89)2

212.89
+

(294− 304.20)2

304.20

+
(25− 29.28)2

29.28
+

(126− 120.10)2

120.10
+

(170− 171.62)2

171.62

+
(14− 26.82)2

26.82
+

(111− 110)2

110
+

(169− 157.18)2

157.8
= 14.15

StatCrunch: Stat�Calculators�Chi-Sqauare

The critical value at signi�cance level α = 0.05 is χ2
0.05,4 = 9.49,

which is smaller than the calculated test statistic value of 14.15,

and the p−value is given by P(χ2
4
> 14.15) = 0.006831, which is

smaller than α = 0.05. There is strong evidence to reject H0 and
conclude that there is associate between physical activity and fruit

consumption.

StatCrunch: Stat�Tables�Contingency�With Summary

Example 8.3 The operations manager of a company that manu-

factures tires wants to determine whether there are any di�erences

in the quality of workmanship among the three daily shifts. She

randomly selects 632 tires and carefully inspects them. Each tire is

either classi�ed as perfect, satisfactory, or defective, and the shift

that produced it is also recorded. The two categorical variables of

interest are: shift and condition of the tire produced. The data is

summarized by the following table. Do these data provide su�cient

evidence at the 5% signi�cance level to infer that there are di�er-

ences in quality among the three shifts?

Perfect Satisfactory Defective Total

Shift 1 106 124 6 236

Shift 2 88 115 8 211

Shift 3 74 102 9 185

Total 268 341 23 632

Solutions will be discussed in the lecture video.