CS代考 DSME5110F: Statistical Analysis

DSME5110F: Statistical Analysis
Hypothesis Test

Introduction

Copyright By PowCoder代写 加微信 powcoder

• Quite often, an analyst has a particular theory, or
hypothesis, that he or she would like to test.
hypothesis (denoted 𝐻𝐻 ). It is also frequently called
– Alternative hypothesis: The hypothesis that
analyst is attempting to prove is called the alternative
the research hypothesis.
– Null hypothesis: The opposite of the alternative hypothesis is called the null hypothesis (denoted 𝐻𝐻0). It usually represents the current thinking or status quo.
– That is, the null hypothesis is usually the accepted theory that the analyst is trying to disprove.

Concepts in Hypothesis Testing
• The null and alternative hypotheses divide all possibilities into two non-overlapping sets, exactly one of which must be true.
• To reject or not to reject:
– Traditionally, hypothesis testing has been phrased as a decision- making problem, where an analyst decides either to reject the null hypothesis or not to reject it (which is the same as accept it), based on the sample evidence.
• When sample information is used to test the hypotheses, the benefit of the doubt is given to 𝐻𝐻 and the burden of
proof is on 𝐻𝐻 .
• In other words, 𝑯𝑯𝟎𝟎 usually will not be rejected (and 𝑯𝑯𝟏𝟏
accepted) unless the sample evidence is strongly against

How Do We Decide Whether or Not to Reject a Null Hypothesis?
• In order to decide whether or not the null hypothesis should be rejected, we ask the following question:
– “If the null hypothesis is true, how likely would it be to get such a sample or a more extreme sample?”
• If it is not too unlikely, the sample evidence is considered not strong enough for us to reject the null hypothesis.
• On the other hand, if it is very unlikely, then this suggests that the null hypothesis is more likely to be untrue and therefore should be rejected.
• Test statistic – empirical result of the hypothesis test used to either reject or not reject the null hypothesis
• Rejection region (RR) – specifies range of values test statistic might assume that would lead to rejection of the null hypothesis

The 𝑝𝑝-value and 𝛼𝛼
• We have the following simple rule to decide
whether 𝐻𝐻 should be rejected:
– Reject 𝐻𝐻 0if 𝑝𝑝-value < 𝛼𝛼; otherwise, do not reject – If the 𝑝𝑝-value of the test statistic (for example, in Example 7.5) is 0.02% and we want to test 𝐻𝐻0 at 5% significance level, then 𝛼𝛼 = 5%. Since 𝑝𝑝-value < 5%, we say that 𝐻𝐻0 can be rejected at 5% significance level. • Hypothesis Test for a Single Parameter – Hypothesis Test about Mean 𝜇𝜇 • 𝜎𝜎 is known • 𝜎𝜎 is unknown – Hypothesis Test about Proportion 𝑝𝑝 • Comparisons of Means and Proportions – Difference between Means (𝜇𝜇 and 𝜇𝜇 ) – Difference between Proportions (𝑝𝑝 and 𝑝𝑝 ) • Independent samples • Paired samples • Independent samples Steps in Hypothesis Testing State the null and the alternative hypotheses, 𝐻𝐻0 and 𝐻𝐻1, as either a two-tailed, right-tailed, or left-tailed test. Set the level of significance (𝛼𝛼) and determine the sample size (𝑛𝑛). Collect the data and calculate the test statistic Calculate the 𝑝𝑝-value of the statistic a) If 𝑝𝑝-value < 𝛼𝛼, then reject 𝐻𝐻0; b) otherwise, do not reject 𝐻𝐻0. • Alternatively, we can also calculate the rejection region based on the value of 𝛼𝛼 chosen. a) If the value of the test statistic falls in the rejection region, then reject 𝐻𝐻0; b) otherwise, do not reject 𝐻𝐻0. Hypothesis Test About 𝜇𝜇 When 𝜎𝜎 is Known • The test statistics used is (where 𝜇𝜇0 is the hypothesized mean value) 𝑧𝑧 = 𝑥𝑥̅ − 𝜇𝜇0 𝜎𝜎/ 𝑛𝑛 If 𝜇𝜇 = 𝜇𝜇 , then the above is close to the standard normal distribution (by central limit theorem) 0 – R command for P(Z ≤ z) : pnorm(z) • Decision rule by : • Decision rule by 𝑝𝑝-value method: – Reject 𝐻𝐻0 if 𝑝𝑝-value < 𝛼𝛼; do not reject 𝐻𝐻0 otherwise – For a two-tailed test, 𝑝𝑝-value = 2 × P(Z ≥ z) if the value of the test statistic is positive, or 𝑝𝑝-value = 2 × P(Z ≤ z) if the value of the test statistic is negative. For a right-tailed test, 𝑝𝑝-value = P(Z ≥ z), and for a left-tailed test, 𝑝𝑝-value = P(Z ≤ z). rejection/acceptance Reject 𝐻𝐻 if 𝑧𝑧 > 𝑧𝑧 or if 𝑧𝑧 < −𝑧𝑧 ; 0 𝛼𝛼/2 𝛼𝛼/2 Decision Rule 𝐻𝐻:𝜇𝜇=𝜇𝜇 vs.𝐻𝐻:𝜇𝜇≠𝜇𝜇 0010 Type of Test 𝐻𝐻0: 𝜇𝜇 ≤ 𝜇𝜇0 vs. 𝐻𝐻1: 𝜇𝜇 > 𝜇𝜇0 𝐻𝐻0:𝜇𝜇≥ 𝜇𝜇0vs.𝐻𝐻1:𝜇𝜇<𝜇𝜇0 Two-tailed test Right-tailed test Left-tailed test Do not reject 𝐻𝐻0 otherwise. Reject 𝐻𝐻0 if 𝑧𝑧 > 𝑧𝑧𝛼𝛼; Do not reject 𝐻𝐻0 otherwise. Reject 𝐻𝐻0 if 𝑧𝑧 < −𝑧𝑧𝛼𝛼; Do not reject 𝐻𝐻0 otherwise. Note: 𝑧𝑧𝛼𝛼 is the critical value such that P 𝑧𝑧 > 𝑧𝑧𝛼𝛼 = 𝛼𝛼 and 𝑧𝑧𝛼𝛼/2 is defined similarly. 𝑧𝑧𝛼𝛼 can be found using R command: qnorm(𝛼𝛼, lower.tail = FALSE) or qnorm(1 − 𝛼𝛼). Note also that, due to symmetry, qnorm(𝛼𝛼) = – qnorm(1 − 𝛼𝛼).

Example 8.1: Breakfast Cereals
• Agrico produces and sells ready-to-eat breakfast cereals. The advertised weight for each cereal package is 375 grams. However, due to the random nature of the filling process, the volume actually placed in each package is rarely 375 grams. Sometimes it is more, sometimes it is less. So, their quality control group draws daily samples to make sure their packages are not being inadvertently underfilled.
• Suppose that the most recent sample of size 𝑛𝑛 = 30 has a mean of 𝑥𝑥̅ = 365 grams. In previous tests when the process was known to be in adjustment, 𝜎𝜎 = 22.5 grams. Test the hypothesis at 𝛼𝛼 = 5% significance level that the mean volume of cereal placed in each package is at least 375 grams.

Example 8.1: Solution • This is a left-tailed test and the hypotheses to be tested are:
– 𝐻𝐻:𝜇𝜇≥375 0
– 𝐻𝐻:𝜇𝜇<375 • Given that 𝑛𝑛 = 30, 𝑥𝑥̅ = 365 grams, and 𝜎𝜎 = 22.5, the value of the test statistic 𝑧𝑧 is: – 𝑧𝑧 = 𝑥𝑥̅−𝜇𝜇0 = 365−375 = −2.4343. 𝜎𝜎/ 𝑛𝑛 22.5/ 30 • The 𝑝𝑝-value of the test statistic is 0.0075: – R command: pnorm(-2.4343) • Since the 𝑝𝑝-value = 0.0075 < 𝛼𝛼 = 0.05, we should reject 𝐻𝐻 . In other words, there is evidence that the process has gotten out of control and is underfilling the packages. • Alternatively, we can also calculate the critical 𝑧𝑧 value which separates the rejection region from the acceptance region by using R command: – qnorm(1-0.05). – The critical 𝑧𝑧 value = 1.6449. 𝛼𝛼 – Since 𝑧𝑧 = −2.4343 < −𝑧𝑧𝛼𝛼 = −1.6449 and this is a left-tailed test, we should reject 𝐻𝐻0. Example 8.2: Annual Income • To test the claim that the mean annual household income of a country is $50,000, a sample of 100 households collected results in a sample mean of $52,000. Assume the standard deviation of household income 𝜎𝜎 is known to be $12,600, test the following hypotheses at 𝛼𝛼 = 0.05 significance level: – 𝐻𝐻0: 𝜇𝜇 = 50000 – 𝐻𝐻1: 𝜇𝜇 ≠ 50000 Example 8.2: Solution • This is a two-tailed test. • With 𝑛𝑛 = 100, 𝑥𝑥̅ = 52000, and 𝜎𝜎 = 12600, the value of the test statistic is: 𝑧𝑧= 𝑥𝑥̅−𝜇𝜇0 =52000−50000=1.5873 • The 𝑝𝑝-value of the test statistic is 0.1124 12600/ 100 • Since the 𝑝𝑝-value=0.1124>𝛼𝛼=0.05, we cannot reject 𝐻𝐻 . Hence, there is no
– R command: 2*(1-pnorm(1.5873))
evidence that the mean annual household income is significantly different from
• Alternatively, we can also calculate the critical 𝑧𝑧 value which separates the
rejection region from the acceptance region by using R command:
– qnorm(1-0.025).
– The critical value 𝑧𝑧 =1.96.
– Since z=1.5873 is between −𝑧𝑧𝛼𝛼 , 𝑧𝑧𝛼𝛼 = [−1.96, 1.96], we cannot reject 𝐻𝐻0 .

• Hypothesis Test for a Single Parameter
– Hypothesis Test about Mean 𝜇𝜇 • 𝜎𝜎 is known
• 𝜎𝜎 is unknown
– Hypothesis Test about Proportion 𝑝𝑝
• Comparisons of Means and Proportions – Difference between Means (𝜇𝜇 and 𝜇𝜇 )
– Difference between Proportions (𝑝𝑝 and 𝑝𝑝 )
• Independent samples • Paired samples
• Independent samples

Hypothesis Test About 𝜇𝜇 When 𝜎𝜎 Is Unknown
• In practice, it is unrealistic to assume that 𝜎𝜎 is known. So, the more realistic case is 𝜎𝜎 is unknown.
• When 𝜎𝜎 is unknown, the test statistics used is (where 𝜇𝜇0 is the hypothesized mean value)
If 𝜇𝜇 = 𝜇𝜇 , then the above follows a t-distribution with degree of freedom 𝑛𝑛 − 1. 0
• Decision rule by 𝑝𝑝-value method:
– Reject 𝐻𝐻0 if 𝑝𝑝-value < 𝛼𝛼; do not reject 𝐻𝐻0 otherwise 𝑡𝑡 = 𝑥𝑥̅ − 𝜇𝜇0 𝑠𝑠/ 𝑛𝑛 – For a two-tailed test, 𝑝𝑝-value = 2 × P(𝑡𝑡 ≥ 𝑡𝑡) if the value of the test statistic is positive, or 𝑝𝑝-value = 2 × P(𝑡𝑡 ≤ 𝑡𝑡) 𝑛𝑛−1 𝑛𝑛−1 𝑛𝑛−1 if the value of the test statistic is negative. For a right-tailed test, 𝑝𝑝-value = P(𝑡𝑡 ≥ 𝑡𝑡), and for a left-tailed test, 𝑝𝑝- value = P(𝑡𝑡𝑛𝑛−1 ≤ 𝑡𝑡). – RcommandforP(tn-1 ≤t):pt(t,df),wheredf=n–1isthedegreesoffreedomoft. • Decision rule by rejection/acceptance region method: Reject 𝐻𝐻 if 𝑡𝑡 > 𝑡𝑡 or if 𝑡𝑡 < −𝑡𝑡 ; Decision Rule 𝐻𝐻:𝜇𝜇=𝜇𝜇 vs.𝐻𝐻:𝜇𝜇≠𝜇𝜇 0010 Type of Test 𝐻𝐻0: 𝜇𝜇 ≤ 𝜇𝜇0 vs. 𝐻𝐻1: 𝜇𝜇 > 𝜇𝜇0 𝐻𝐻0:𝜇𝜇≥ 𝜇𝜇0vs.𝐻𝐻1:𝜇𝜇<𝜇𝜇0 Two-tailed test Right-tailed test Left-tailed test 0 𝑛𝑛−1,𝛼𝛼/2 𝑛𝑛−1,𝛼𝛼/2 Do not reject 𝐻𝐻0 otherwise. Reject 𝐻𝐻0 if 𝑡𝑡 > 𝑡𝑡𝑛𝑛−1,𝛼𝛼; Do not reject 𝐻𝐻0 otherwise. Reject 𝐻𝐻0 if 𝑡𝑡 < −𝑡𝑡𝑛𝑛−1,𝛼𝛼; Do not reject 𝐻𝐻0 otherwise. Note: 𝑡𝑡𝑛𝑛−1,𝛼𝛼 is the critical value such that P(𝑡𝑡𝑛𝑛−1 > 𝑡𝑡𝑛𝑛−1,𝛼𝛼) = α and 𝑡𝑡𝑛𝑛−1,𝛼𝛼/2 is defined similarly. 𝑡𝑡𝑛𝑛−1,𝛼𝛼 can be found using R command: qt(𝛼𝛼, df). Note also that, due to symmetry, qt(𝛼𝛼, df)= – qt(1 − 𝛼𝛼, df). If 𝑛𝑛 is
sufficiently large, we can use 𝑧𝑧𝛼𝛼 to replace 𝑡𝑡𝑛𝑛−1,𝛼𝛼.

Example 8.3: Lake City Housing Price
• Recall the Lake City housing price data used in several previous examples. Suppose that the overall average housing price of Lake City is $250,000. We want to
test whether the average
housing price of Neighborhood A is significantly
. Using the sample data in lake_city.csv to test the following
– 𝐻𝐻 : 𝜇𝜇 = 250000 0
hypotheses at 1% significance level:
– 𝐻𝐻1: 𝜇𝜇 ≠ 250000
– Here, 𝜇𝜇 is the mean housing price of Neighborhood A.15

Example 8.3: Solution Using t.test() • We can use t.test() function to the sample raw data to analyze this
• The R codes and the results are shown below. Since the 𝑝𝑝-value is very small (almost 0), we can reject 𝐻𝐻0 and conclude that the average price of Neighborhood A is significantly different from the city average.
problem directly without using the formula in Slide 14.
• The 99% confidence interval also confirms the conclusion above because the interval lies completely below 250000, suggesting that we are at least 99% sure that the mean price of Neighborhood A is below the city average.

Example 8.3: Solution
• Just for comparison, I also did the test with the formula in Slide 14.
• The codes and results are given below and result is the same.

Example 8.4: Lake City Housing Price
• Continue with the Lake City housing data example. People in Lake City always feel that
houses in Neighborhood C are relatively more
. To confirm this, use the sample data in lake_city.csv to test the following hypotheses at 5% significance level:
– 𝐻𝐻 : 𝜇𝜇 ≤ 250000 0
– 𝐻𝐻1:𝜇𝜇 > 250000
– Here, 𝜇𝜇 is the mean housing price of Neighborhood C.18

Example 8.4: Solution Using t.test()
• Using t.test() function, the R codes and the results are shown below. Since the p-value is very small (almost 0), we can reject 𝐻𝐻0 and conclude that the average price of Neighborhood C is significantly higher than the city average.
• The 95% confidence interval also confirms the conclusion above because the interval lies completely above 250000, suggesting that we are at least 95% sure that the mean price of Neighborhood C is above the city average.

Example 8.4: Solution
• Again, for comparison, I also did the test with the formula in Slide 14.
• The codes and results are given below and result is the same.

• Hypothesis Test for a Single Parameter
– Hypothesis Test about Mean 𝜇𝜇 • 𝜎𝜎 is known
• 𝜎𝜎 is unknown
– Hypothesis Test about Proportion 𝑝𝑝
• Comparisons of Means and Proportions – Difference between Means (𝜇𝜇 and 𝜇𝜇 )
– Difference between Proportions (𝑝𝑝 and 𝑝𝑝 )
• Independent samples • Paired samples
• Independent samples

Hypothesis Test About Proportion The test statistics used is (where 𝑝𝑝0 is the hypothesized value of population proportion)
𝑧𝑧 = 𝑝𝑝̅ − 𝑝𝑝0
𝑝𝑝0(1 − 𝑝𝑝0)/𝑛𝑛
If 𝑝𝑝 = 𝑝𝑝0, then the above is close to the standard normal distribution (by central limit theorem).
– R command for P(Z ≤ z) : pnorm(z)
Decision rule by rejection/acceptance :
Decision rule by 𝑝𝑝-value method:
– Reject 𝐻𝐻0 if 𝑝𝑝-value < 𝛼𝛼; do not reject 𝐻𝐻0 otherwise – For a two-tailed test, 𝑝𝑝-value = 2 × P(Z ≥ z) if the value of the test statistic is positive, or 𝑝𝑝-value = 2 × P(Z ≤ z) if the value of the test statistic is negative. For a right-tailed test, 𝑝𝑝-value = P(Z ≥ z), and for a left-tailed test, 𝑝𝑝-value = P(Z ≤ z). 𝐻𝐻 : 𝜇𝜇 = 𝜇𝜇 vs. 𝐻𝐻 : 𝜇𝜇 ≠ 𝜇𝜇 0 0 1 0 Reject 𝐻𝐻 if 𝑧𝑧 > 𝑧𝑧 or if 𝑧𝑧 < −𝑧𝑧 ; 0 𝛼𝛼/2 𝛼𝛼/2 Type of Test Decision Rule Two-tailed test Right-tailed test Do not reject 𝐻𝐻0 otherwise. 𝐻𝐻0:𝜇𝜇≤𝜇𝜇0 vs.𝐻𝐻1:𝜇𝜇>𝜇𝜇0 Reject𝐻𝐻0 if𝑧𝑧>𝑧𝑧𝛼𝛼;
Do not reject 𝐻𝐻0 otherwise. 𝐻𝐻0:𝜇𝜇 ≥ 𝜇𝜇0 vs. 𝐻𝐻1:𝜇𝜇 < 𝜇𝜇0 Left-tailed test Reject 𝐻𝐻0 if 𝑧𝑧 < −𝑧𝑧𝛼𝛼; Do not reject 𝐻𝐻0 otherwise. Note: 𝑧𝑧𝛼𝛼 is the critical value such that P 𝑧𝑧 > 𝑧𝑧𝛼𝛼 = 𝛼𝛼 and 𝑧𝑧𝛼𝛼/2 is defined similarly. 𝑧𝑧𝛼𝛼 can be found using R command: qnorm(𝛼𝛼, lower.tail = FALSE) or qnorm(1 − 𝛼𝛼). Note also that, due to symmetry, qnorm(𝛼𝛼) = – qnorm(1 − 𝛼𝛼).

Example 8.5: Distracted Driving
• A survey of European Union nations has found that 70% of drivers admit to having used their cellphones while driving.
• In light of this, several EU countries have introduced measures to combat the phenomenon now referred to as “distracted driving.”
• After an extensive public relations campaign in one country intended to dissuade drivers from engaging in such distracted driving, a survey has found that 1190 drivers in a sample of 1776 report using their cellphones to speak or text while driving.
• Conduct a lower-tail test at the 𝛼𝛼 = 0.05 level of significance to determine if the public relations effort has been successful.

Example 8.5: Solution Using prop.test()
• To determine if the public relations effort has been successful in reducing “distracted driving”, it
– 𝐻𝐻:𝑝𝑝≥0.7 0
is appropriate to do the following left-tailed test:
– 𝐻𝐻1:𝑝𝑝<0.7 • Using prop.test() function, the R codes and the results are shown below. Since the p-value is very small (0.0029), we can reject 𝐻𝐻0 and conclude that the proportion of “distracted driving” is now significantly below 0.7. In other words, the campaign has been successful. • The 95% confidence interval also confirms the conclusion above because the interval lies completely below 0.7, suggesting that we are at least 95% sure that the proportion of “distracted driving” is now significantly below 0.7. Example 8.5 Solution • For comparison, I also did the test with the formula in Slide 22. • The codes and results are given below and result is the same. • Hypothesis Test for a Single Parameter – Hypothesis Test about Mean 𝜇𝜇 • 𝜎𝜎 is known • 𝜎𝜎 is unknown – Hypothesis Test about Proportion 𝑝𝑝 • Comparisons of Means and Proportions – Difference between Means (𝜇𝜇 and 𝜇𝜇 ) – Difference between Proportions (𝑝𝑝 and 𝑝𝑝 ) • Independent samples • Paired samples • Independent samples Comparing Two Samples • Sometimes, we may be interested in comparing the difference between two sample means or proportions. • Examples: – Whether one city has higher average income than another city. – Whether drug A is more effective than drug B, on average, in reducing cholesterol level. – Whether the proportion of low- income family in a city is higher or lower than that in another city. Difference Between 1 and 2: Independent Samples • To estimate difference between two population means, select – one sample of size 𝑛𝑛 from population with mean 𝜇𝜇1 and standard deviation 𝜎𝜎1, and – an independent sample of size 𝑛𝑛2 from another population with mean 𝜇𝜇2 and standard deviation 𝜎𝜎2 . – Use 𝑠𝑠1 and 𝑠𝑠2 as estimates of standard deviations 28 Hypothesis Test for when 𝜎𝜎 and 𝜎𝜎 are Unknown • The hypotheses: – 𝐻𝐻 : 𝜇𝜇 − 𝜇𝜇 = 𝑑𝑑 v.s. 𝐻𝐻 : 𝜇𝜇 − 𝜇𝜇 ≠ 𝑑𝑑 01201120 – 𝐻𝐻0: 𝜇𝜇1 − 𝜇𝜇2 ≤ 𝑑𝑑0 v.s. 𝐻𝐻1: 𝜇𝜇1 − 𝜇𝜇2 > 𝑑𝑑0
– 𝐻𝐻 : 𝜇𝜇 − 𝜇𝜇 ≥ 𝑑𝑑 v.s. 𝐻𝐻 : 𝜇𝜇 − 𝜇𝜇 < 𝑑𝑑 01201120 • When 𝐻𝐻 is true, the following test statistic follows a t-distribution: 𝑥𝑥̅1−𝑥𝑥̅2 −𝑑𝑑0 𝜎𝜎�𝑥𝑥̅ −𝑥𝑥̅ 12 – Theexpressionof𝜎𝜎� isquitecomplicated 𝑥𝑥̅ −𝑥𝑥̅ 12 – The degrees of freedom of the t-distribution depend on whether 𝜎𝜎 and 𝜎𝜎 are equal or unequal 2 1 • The 𝑝𝑝-value of the test statistic can be calculated and is used to determine whether to reject 𝐻𝐻0: – reject if 𝑝𝑝-value < 𝛼𝛼; do not reject otherwise. Example 8.6: Lake City Housing Price • Let’s consider the Lake City housing price data again. • Suppose that we want to test at 5% significance level whether brick houses are more expensive than non-brick houses. • Let 𝜇𝜇 and 𝜇𝜇 be the mean prices of non-brick and brick houses, No Yes – 𝐻𝐻 : 𝜇𝜇 ≥ 𝜇𝜇 v.s. 𝐻𝐻 : 𝜇𝜇 < 𝜇𝜇 0 No Yes 1 No Yes respectively. Then, the hypotheses to be tested are: – Or equivalently, 𝐻𝐻0: 𝜇𝜇No − 𝜇𝜇Yes ≥ 0 v.s. 𝐻𝐻1: 𝜇𝜇No − 𝜇𝜇Yes < 0 • We can use R function, t.test(), to test the difference between two population means. However, the syntax of t.test() will be slightly different depending on whether two variables are in stacked or non-stacked form. – If the variables are non-stacked, then the syntax is t.test(𝑥𝑥, 𝑦𝑦), where both 𝑥𝑥 and 𝑦𝑦 are numerical variables. – If the variables are stacked, then the syntax is t.test(𝑥𝑥~𝑦𝑦), where 𝑥𝑥 is the numerical variable and 𝑦𝑦 is a factor. Example 8.6: Solution • The R codes and results are shown below. Note that, before doing the test, we make boxplots and calculate standard deviations of brick and non-brick houses to help decide whether we should assume equal or unequal variances. Because the standard deviation of brick house is almost 20% larger than that of non-brick house, I feel more appropriate to assume unequal variances (just my subjective judgment). However, I doubt the result will be significantly different if we assume equal variances (check it!). conclude that non-brick houses are cheaper on average. • The 𝑝𝑝-value of the test is very small (nearly 0), we therefore can reject 𝐻𝐻 and Example 8.7: Starting Salaries • The starting salaries taken from some randomly selected BU and CU graduates are given in the file salary.csv. Test at 5% significance level whether the mean starting salaries of BU and CU graduates are significantly different. Example 8.7: Solution • The R codes and results are shown below. Note that, before doing the test, we make boxplots and calculate standard deviations of BU and CU to help decide whether we should assume equal or unequal variances. Because the standard deviations of BU is 3% larger than that of CU, I feel more appropriate to assume equal variances. • The 𝑝𝑝-value of the test is 0.2068 and also the 95% CI includes 0, we therefore cannot reject 𝐻𝐻0 . We thus conclude that there is no significant difference between the mean starting salaries of BU and CU graduates. • Hypothesis Test for a Single Parameter – Hypothesis Test about Mean 𝜇𝜇 • 𝜎𝜎 is known • 𝜎𝜎 is unknown – Hypothesis Test about Proportion 𝑝𝑝 • Comparisons of Means and Proportions – Difference between Means (𝜇𝜇 and 𝜇𝜇 ) – Difference between Proportions (𝑝𝑝 and 𝑝𝑝 ) • Independent samples • Paired samples • Independent samples Comparing Paired Samples • Paired samples (also called dependent samples) are samples in which natural or matched couplings occur. This generates a data set in which each data point in one sample is uniquely paired to a data point in the second sample. – two measurements on same observation (e.g. same city, same person, etc.) – no statistical methods can test whether the data are paired, so we must understand how the data are 程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com