CS代考 MAST20005) & Elements of Statistics (MAST90058)

Distribution-free methods
(Module 7)
Statistics (MAST20005) & Elements of Statistics (MAST90058)
School of Mathematics and Statistics University of Melbourne

Semester 2, 2022

Aims of this module
• Introduce inference methods that do not make strong distributional assumptions
• Explain the highly used Pearson’s chi-squared test

Introduction
Testing for a difference in location Sign test
Wilcoxon signed-rank test (one-sample) Wilcoxon rank-sum test (two-sample)
Goodness-of-fit tests (χ2) Introduction
Two classes
More than two classes Estimating parameters
Tests of independence (contingency tables)

Distribution-free methods
• So far, have only considered tests that assume a specified form for the population distribution.
• We don’t always want to make such assumptions.
• Instead, we can use distribution-free methods.
• Here, we will learn about various distribution-free hypothesis tests.

An aside: distribution-free versus non-parametric
• The term non-parametric is also often used to describe methods
that do not assume a specific distributional form.
• It is usually a misnomer: the methods typically do make use of parameters, but there are usually a large number of them and they adapt to the data.
• Thus, a better term might be super-parameteric.
• (Note: we won’t be covering any advanced methods of this form in
this subject.)
• In any case, the convention has stuck, so you will see either of the labels ‘distribution-free’ or ‘non-parameteric’ being used.

Distribution-free tests
• Even without making distributional assumptions, it is possible to obtain exact or asymptotic sampling distributions for various statistics.
• Can use these as a basis for hypothesis tests.
• Often the distribution-free test statistic is approximately normally
distributed
• . . . the Central Limit Theorem strikes again!

Extracting information with fewer assumptions
• How can we assess the information in a sample without assuming a distribution?
• Specifying a distribution is somewhat analogous to specifying a scale of measurement, so. . .
• How do we compare numbers without a scale?
• Two strategies:
1. (Sign) Only record whether a number is smaller or greater than a reference number, i.e. replace them by binary indicator variables.
2. (Rank) Only retain information about the order of the numbers, i.e. replace them by their rank order.
• Each of these throws away some information, but hopefully retains enough to be useful.
• We now look at a few methods that use these strategies. 8 of 66

Aim: test for the median
• Let X have median m
• We have an iid sample of size n from X
• Can we test H0 : m = m0 with very few assumptions?
• (Want to find distribution-free alternatives to tests about the mean, such as the t-test)
• (Typically consider medians rather than means when distribution-free)

• We assume X is continuous
• (No further assumptions!)
• Compute, Y , the number of positive numbers amongst X1 −m0,…,Xn −m0
• In other words, replace Xi with sgn(Xi − m0)
• Under H0, we have Y ∼ Bi(n, 0.5)
• Tests proceed as usual. . .

Example (sign test)
The time between calls to a switchboard is represented by X.
H0:m=6.2 versus H1:m<6.2 xi − 6.2 Sign 0.60 +1 −0.50 −1 0.70 +1 −0.90 −1 −2.10 −1 3.60 +1 −4.50 −1 0.80 +1 −4.10 −1 12.80 +1 i xi 11 18.90 12 16.90 13 10.40 14 44.10 15 2.90 16 2.40 17 4.80 18 18.90 19 4.80 20 7.90 xi − 6.2 Sign 12.70 +1 10.70 +1 4.20 +1 37.90 +1 −3.30 −1 −3.80 −1 −1.40 −1 12.70 +1 −1.40 −1 1.70 +1 • Y is the number of positive signs. Reject H0 if Y too small. (If median < 6.2 then expect fewer than 1/2 of the observations to be greater than 6.2.) • Since Pr(Y 􏰀 6) = 0.0577 ≈ 0.05, an appropriate rejection rule is to reject H0 if Y 􏰀 6. (In R: pbinom(6, 20, 0.5)) • We observed y = 11, so cannot reject H0. • The p-value is Pr(Y 􏰀 11) = 0.75 > 0.05 so cannot reject H0.
(In R: pbinom(11, 20, 0.5))

> binom.test(11, 20, alternative = “less”)
Exact binomial test
data: 11 and 20
number of successes = 11, number of trials = 20,
p-value = 0.7483
alternative hypothesis: true probability of
success is less than 0.5
95 percent confidence interval:
0.0000000 0.7413494
sample estimates:
probability of success

Sign test for paired samples
Can also use the sign test for paired samples: simply replace (xi,yi) with sgn(xi − yi).
For example:
i xi yi Sign 1 8.9 10.3 −1 2 26.7 11.7 +1 3 12.4 5.2 +1 4 34.3 36.9 −1

Use of the sign test
• The sign test requires few assumptions
• But it doesn’t use information on the size of the differences,
so it can be insensitive to departures from H0
• In other words, large type II error or small power
• Tends to only be used when the data are not numerical but for which comparisons between values are meaningful (e.g. ordinal data)

Wilcoxon one-sample test
• Now, assume the underlying distribution is also symmetrical (as well as continuous)
• Same null hypothesis (H0 : m = m0) against a one-sided or two-sided alternative
• Determine the ranks of: |X1 −m0|,…,|Xn −m0|
• Replace the data by signed ranks,
Xi becomes sgn(Xi − m0) · rank(|Xi − m0|)
• The Wilcoxon signed-rank statistic, W , is the sum of these signed
• Using this as a basis for a test gives the Wilcoxon signed-rank test,
also known as the Wilcoxon one-sample test. 17 of 66

Alternative definitions
• Textbooks and software packages vary in the statistic they use
• We just defined: W is the sum of the signed ranks
• A popular alternative: V is the sum of the positive ranks only
• V is a bit easier to calculate, esp. by hand
• V and W are deterministically related
(can you derive the formula?)
• V and W have different (but related) sampling distributions
• Using either statistic leads to equivalent test procedures

Example (Wilcoxon one-sample test)
• The lengths of 10 fish are:
5.0, 3.9, 5.2, 5.5, 2.8, 6.1, 6.4, 2.6, 1.7, 4.3
• Interested in testing: H0 : m = 3.7 versus H1 : m > 3.7

i xi xi − 3.7 |xi − 3.7| Rank 15.01.31.35 23.90.20.21 35.21.51.56 45.51.81.87 52.8 −0.9 0.9 3 66.12.42.49 76.42.72.710 82.6 −1.1 1.1 4 91.7 −2.0 2.0 8
Signed rank 5 1 6 7 −3 9 10 −4 −8 10 4.3 0.6 0.6 2 2

• The sum of signed ranks is:
W =5+1+6+7−3+9+10−4−8+2=25
• Alternatively, the sum of positive ranks is:
V =5+1+6+7+9+10+2=40

Decision rule
• What is an appropriate critical region?
• If H1 : m > 3.7 is true, we expect more positive signs. Then W should be large, so the critical region should be W 􏰁 c for a suitable c.
• (For other alternative hypotheses, e.g. two-sided, need to modify this accordingly.)
• If H0 is true then Pr(Xi < m0) = Pr(Xi > m0) = 12.
• Assignment of the n signs to the ranks are mutually independent
(due to symmetry assumption)
• W is the sum of the integers 1,…,n, each with a positive or negative sign

• Under H0, W = 􏰂ni=1 Wi where
Pr(Wi =i)=Pr(Wi =−i)= 21, i=1,…,n
• ThemeanunderH0 isE(Wi)=−i·12 +i·12 =0,soE(W)=0
• Similarly, var(Wi) = E(Wi2) = i2 and
n n n(n+1)(2n+1) var(W)=􏰃var(Wi)=􏰃i2 = 6
• A more advanced argument shows that for large n this statistic approximately follows a normal distribution when H0 is true. In other words,
Z = 􏰄n(n+1)(2n+1)/6 ≈N(0,1)

• Pr(W􏰁c|H0)≈Pr(Z􏰁z|H0),whichallowsustodetermine c.
• Inthiscase,forn=10andα=0.05,werejectH0 if W
Z = 􏰄10·11·21/6 􏰁1.645 (because Φ−1(0.95) = 1.645) which is equivalent to
W 􏰁 1.645 × 6 = 32.27
• For the example data we have w = 25, so we do not reject H0

• R uses V rather than W
• For small sample sizes R will use the exact sampling distribution
(which we haven’t explored) rather than the normal approximation.
• To carry out the test, use: wilcox.test
• To work with the sampling distribution of V , use: psignrank
• Note: E(V)=n(n+1)/4andvar(V)=n(n+1)(2n+1)/24. You can derive these in a similar way to W .

> wilcox.test(x, mu = 3.7, alternative = “greater”,
exact = TRUE)
Wilcoxon signed rank test
V = 40, p-value = 0.1162
alternative hypothesis: true location is greater than 3.7
# Calculate exact p-value manually.
> 1 – psignrank(39, 10)
[1] 0.1162109
# Calculate approximate p-value, based on W.
> z <- 25 / sqrt(10 * 11 * 21 / 6) > 1 – pnorm(z)
[1] 0.1013108

Paired samples
• Like other tests, we can use the Wilxcon signed-rank test for paired samples by first taking differences and treating these as a sample from a single distribution.
• The assumption of symmetry is quite reasonable in this setting, since under H0 we would typically assume X and Y have the same distribution and therefore X − Y ∼ Y − X.
• Indeed, this test is most often used in such a setting, due to the plausibility of this assumption.

Tied ranks
• We assumed a continuous population distribution
• Thus, all observations will differ (with probablity 1)
• In practice, the data are reported to finite precision (e.g. due to rounding), so we could have exactly equal values
• This will lead to ties when ranking our data
• If this happens, the ‘rank’ assigned for the tied values should be
equal to the average of the ranks they span
• Example:
Value: 2.1 4.3 4.3 5.2 5.7 5.7 5.7 5.9 Rank: 1 2.5 2.5 4 6 6 6 8
• The presence of ties complicates the derivation of the sampling distribution, but R knows how to do the right thing

Wilcoxon two-sample test
• We can create a two-sample version of the Wilcoxon test.
• Independent random samples X1, . . . , XnX and Y1, . . . , YnY from
two different populations with medians mX and mY respectively.
• Want to test H0 : mX = mY against a one-sided or two-sided
alternative
• Order the combined sample and let W be the sum of the ranks of
Y1, . . . , YnY . This is the Wilcoxon rank-sum statistic.
• Note: this captures information on X as well as Y ! (Why?)
• The test based on this statistic is called the Wilcoxon rank-sum test, also known as the Wilcoxon two-sample test and the Mann- test.

Rejection region
• Suppose our alternative hypothesis is H1 : mX > mY
• If mX > mY then we expect W to be small, since the Y values
will tend to be smaller than X and thus have smaller ranks
• Therefore, the critical region should be of the form W 􏰀 c for a
suitable c.
• Properties of W (derivation not shown):
E(W)=nY(nX+nY +1) 2
var(W)=nXnY(nX+nY +1) 12
• W is approximately normally distributed when nX and nY are large 30 of 66

Alternative definitions
• Like for the one-sample version, the definition of the statistic varies
• We just defined: W is the sum of the ranks in the Y sample
• A popular alternative: U is the number of all pairs (Xi,Yj) such that Yj 􏰀 Xi (the number of ‘wins’ out of all possible pairwise ‘contests’)
• U and W are deterministically related (can you derive the formula?)
• U and W have different (but related) sampling distributions
• Using either statistic leads to equivalent test procedures
• Note: E(U) = nXnY /2 and var(U) = var(W)

Example (Wilcoxon two-sample test)
Two companies package cinnamon. Samples of size eight from each company yield the following weights:
X 117.1 121.3 127.8 121.9 117.4 124.5 119.5 115.1 Y 123.5 125.3 126.5 127.9 122.1 125.6 129.8 117.2
WanttotestH0:mX =mY versusH1:mX ̸=mY Use a significance level of 5%

• R uses U…but calls it W!
• For small sample sizes R will use the exact sampling distribution,
otherwise it will use a normal approximation
• To carry out the test, use: wilcox.test
• To work with the sampling distribution of U, use: pwilcox

> wilcox.test(x, y)
Wilcoxon rank sum test
data: x and y
W = 13, p-value = 0.04988
alternative hypothesis:
true location shift is not equal to 0
# Calculate exact p-value manually.
> 2 * pwilcox(13, 8, 8)
[1] 0.04988345
We reject H0 and conclude that we have sufficient evidence to show that the median weights differ between the two companies.

Goodness-of-fit tests
• How well does a given model fit a set of data?
• E.g. if we assume a Poisson model for a set of data, is it
reasonable?
• We can assess this with a ‘goodness-of-fit’ test
• The most commonly used is Pearson’s chi-squared test
• Unlike most of the other tests we’ve seen, this operates on categorical (discrete) data
• Can also apply it on continuous data by first partitioning the data into separate classes

Binomial model
• Start with a binomial model Y1 ∼ Bi(n, p1)
• Our usual test statistic for this is
Z = 􏰄np1(1 − p1) ≈ N(0, 1)
• Therefore,
• TotestH0:p=p1 versusH1:p̸=p1,wewouldrejectH0 if|Z|
Q 1 = Z 2 ≈ χ 21 (and, hence, Q1) is too large.

• Next, notice that
(Y1 − np1)2 (Y1 − np1)2 (Y1 − np1)2
Q1=np(1−p)= np + n(1−p) 1111
(Y1 −np1)2 = (n−Y1 −n(1−p1))2 = (Y2 −np2)2
whereY2 =n−Y1 andp2 =1−p1. • Therefore,
(Y1 − np1)2 (Y1 − np1)2 (Y2 − np2)2
Q1 = np (1 − p ) = np + np 1112

• Y1 is the observed number of successes, np1 is the expected number of successes
• Y2 is the observed number of failures, np2 is the expected number of failures
􏰃2 (Yi − npi)2 􏰃2 (Oi − Ei)2 2 np = E ≈χ1
where Oi is the observed number and Ei is the expected number.
i=1 i i=1 i
• Even though there are two classes, we have only one degree of
freedom. This is due to the constraint Y1 + Y2 = n.

Multinomial model
• Generalize to k possible outcomes (a multinomial model)
• pi = probability of the ith class (􏰂ki=1 pi = 1)
• Suppose we have n trials, with Yi being the number of outcomes in class i
• E(Yi) = npi
• Now we get,
􏰃k (Yi − npi)2 􏰃k (Oi − Ei)2 np = E
• k−1degreesoffreedombecauseY1+···+Yk =n
i=1 i i=1 i

Setting up the test
• Specify a categorical distribution: p1, p2, . . . , pk
• We use the Qk−1 statistic to test whether are data are consistent
with this distribution
• The null hypothesis is that they do (i.e. the pi define the distribution)
• The alternative is that they do not (i.e. a different set of probabilities define the distribution)
• Under the null, the test statistic will tend to be small (it measures ‘badness-of-fit’)
• Therefore, reject the null if Qk−1 > c where c is the 1 − α quantile from χ2k−1.

• We are approximating a binomial with a normal
• Good approximation if n is large and the pi are not too small
• Ruleofthumb: needtohaveallEi =npi 􏰁5
• The larger the k (i.e. more classes), the more powerful the test. However, we need the classes to be large enough
• If any of the Ei are too small, can combine some of the classes until they are large enough
• If Qk−1 is very small, this indicates that the fit is ‘too good’. This can be used as a test for rigging of experiments / fake data. Typically need very large n to do this.
• Often refer to the test statistic as χ2 42 of 66

Example (completely specified distribution)
• Proportions of commuters using various modes of transport, based on past records:
• After a 3-month campaign, a random sample (n = 80) found:
• Did the campaign alter commuters behaviour?
• The expected frequencies are:
• The value of the test statistic is:
2 (26−20)2 (15−12)2 (32−40)2 (7−8)2
χ= 20 + 12 + 40 + 8 =4.275
Bus Train Car Other 0.25 0.15 0.50 0.1
Bus Train Car Other 26 15 32 7
Bus Train Car Other 20 12 40 8

• H0 : proportions have not changed, H1 : proportions have changed
• We have 4 classes, so the test statistic here has a χ23 distribution.
• The 0.95 quantile is 7.81, which is greater than χ2 = 4.275
• Therefore, there is insufficient evidence that the proportions have changed
• The p-value is
p = Pr(χ23 > 4.275) = 0.233 > 0.05

Chi-square pdf df = 3
Prob. shaded area = 0.233

>x<-c( 26, 15, 32, 7) > p <- c(0.25, 0.15, 0.5, 0.1) > t1 <- chisq.test(x, p = p) Chi-squared test for given probabilities X-squared = 4.275, df = 3, p-value = 0.2333 > rbind(t1$observed, t1$expected)
[,1] [,2] [,3] [,4]
[1,] 26 15 32 7
[2,] 20 12 40 8
> t1$residuals
[1] 1.3416408 0.8660254 -1.2649111 -0.3535534
> sum(t1$residuals^2)
> 1 – pchisq(4.275, 3)
[1] 0.2332594

Fitting distributions
• We don’t always have an exact model to compare against
• We might specify a family of distributions but still need to
estimate some of the parameters
• For example, Pn(λ) or N(μ, σ2)
• We would need to estimate the parameters using the sample, and use these to specify H0
• We need to adjust the test to take into account that we’ve used the data to define H0 (by design, it will be ‘closer’ to the data than if it we didn’t need to do this)
• The ‘cost’ of this estimation is 1 degree of freedom for each parameter that is estimated
• The final degrees of freedom is k−p−1, where p is the number of
estimated parameters

Example (Poisson distribution)
• X is number of alpha particles emitted in 0.1 sec by a radioactive source
• Fifty observations:
7, 4, 3, 6, 4, 4, 5, 3, 5, 3, 5, 5, 3, 2, 5, 4, 3, 3, 7, 6, 6, 4, 3, 9, 11, 6, 7, 4, 5, 4, 7, 3, 2, 8, 6, 7, 4, 1, 9, 8, 4, 8, 9, 3, 9, 7, 7, 9, 3, 10
• Is a Poisson distribution an adequate model for the data?
• H0 : Poisson, H1 : something else
• We have only specified the family of the distribution, not the parameters
• Estimate the Poisson rate parameter λ by the MLE, λˆ = x ̄ = 5.4
• Now we ask: does the Pn(5.4) model give a good fit?

First, find an appropriate partition of the value (collapse the data):
> X1 <- cut(X, breaks = c(0, 3.5, 4.5, 5.5, 6.5, 7.5, Inf)) > T1 <- table(X1) (0,3.5] (3.5,4.5] (4.5,5.5] (5.5,6.5] (6.5,7.5] (7.5,Inf] 13 9 6 5 7 10 Then, prepare the data for the test: > x <- as.numeric(T1) >x
[1]13 9 6 5 710
> n <- sum(x) > p1 <- sum(dpois(0:3, 5.4)); > p2 <- dpois(4, 5.4) > p3 <- dpois(5, 5.4) > p4 <- dpois(6, 5.4) > p5 <- dpois(7, 5.4) > p6 <- 1 - (p1 + p2 + p3 + p4 + p5) > p <- c(p1, p2, p3, p4, p5, p6) Then, run the test: > chisq.test(x, p = p)
Chi-squared test for given probabilities
X-squared = 2.7334, df = 5, p-value = 0.741
But this is the wrong df! Need to adjust manually:
> 1 – pchisq(2.7334, 4)
[1] 0.6033828

Chi-square pdf df = 4
Shaded probability is 0.6034

• Needed to adjust p-values as we have estimated the mean
• The critical value is the 0.95 quantile from χ24, which is 9.488, so
we cannot reject H0
• Not enough evidence against the Poisson model
• Therefore, this is an adequate fit (at least, until further data proves otherwise)
0–3 4 5 6 7 8+ Observed 13.0 9.0 6.0 5.0 7.0 10.0 Expected 10.7 8.0 8.6 7.8 6.0 8.9

Contingency ta

程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts