CS代写 MAST20005) & Elements of Statistics (MAST90058) Semester 2, 2022

1 Introduction
Distribution-free methods
(Module 7)
Statistics (MAST20005) & Elements of Statistics (MAST90058) Semester 2, 2022

2.1 Signtest……………..
2.2 Wilcoxonsigned-ranktest(one-sample) …………………………….. 4
2 Testing for a difference in location
……………………………… 2
……………………………… 6
3 Goodness-of-fit tests (χ2)
3.1 Introduction……………
3.2 Twoclasses ……………
3.3 Morethantwoclasses ……………………………………… 8
3.4 Estimatingparameters……………………………………… 10
4 Tests of independence (contingency tables) 12
Aims of this module
• Introduce inference methods that do not make strong distributional assumptions • Explain the highly used Pearson’s chi-squared test
1 Introduction
Distribution-free methods
• So far, have only considered tests that assume a specified form for the population distribution. • We don’t always want to make such assumptions.
• Instead, we can use distribution-free methods.
• Here, we will learn about various distribution-free hypothesis tests.
An aside: distribution-free versus non-parametric
• The term non-parametric is also often used to describe methods that do not assume a specific distributional
• It is usually a misnomer: the methods typically do make use of parameters, but there are usually a large number of them and they adapt to the data.
• Thus, a better term might be super-parameteric.
• (Note: we won’t be covering any advanced methods of this form in this subject.)
• In any case, the convention has stuck, so you will see either of the labels ‘distribution-free’ or ‘non-parameteric’ being used.
2.3 Wilcoxonrank-sumtest(two-sample)
……………………………… 7 ……………………………… 8

Distribution-free tests
Even without making distributional assumptions, it is possible to obtain exact or asymptotic sampling distribu- tions for various statistics.
Can use these as a basis for hypothesis tests.
Often the distribution-free test statistic is approximately normally distributed . . . the Central Limit Theorem strikes again!
Testing for a difference in location
Extracting information with fewer assumptions
• How can we assess the information in a sample without assuming a distribution?
• Specifying a distribution is somewhat analogous to specifying a scale of measurement, so. . . • How do we compare numbers without a scale?
• Two strategies:
1. (Sign) Only record whether a number is smaller or greater than a reference number, i.e. replace them by binary indicator variables.
2. (Rank) Only retain information about the order of the numbers, i.e. replace them by their rank order. • Each of these throws away some information, but hopefully retains enough to be useful.
• We now look at a few methods that use these strategies.
Aim: test for the median
• Let X have median m
• We have an iid sample of size n from X
• Can we test H0 : m = m0 with very few assumptions?
• (Want to find distribution-free alternatives to tests about the mean, such as the t-test) • (Typically consider medians rather than means when distribution-free)
2.1 Sign test
• We assume X is continuous
• (No further assumptions!)
• Compute, Y , the number of positive numbers amongst X1 − m0, . . . , Xn − m0 • In other words, replace Xi with sgn(Xi − m0)
• Under H0, we have Y ∼ Bi(n, 0.5)
• Tests proceed as usual. . .
Example (sign test)
The time between calls to a switchboard is represented by X.
H0:m=6.2 versus H1:m<6.2 xi − 6.2 Sign i 0.60 +1 11 −0.50 −1 12 0.70 +1 13 −0.90 −1 14 −2.10 −1 15 3.60 +1 16 −4.50 −1 17 0.80 +1 18 −4.10 −1 19 12.80 +1 20 xi 18.90 16.90 10.40 44.10 2.90 2.40 4.80 18.90 4.80 7.90 xi − 6.2 Sign 12.70 +1 10.70 +1 4.20 +1 37.90 +1 −3.30 −1 −3.80 −1 −1.40 −1 12.70 +1 −1.40 −1 1.70 +1 • Y is the number of positive signs. Reject H0 if Y too small. the observations to be greater than 6.2.) (If median < 6.2 then expect fewer than 1/2 of • Since Pr(Y 􏰀 6) = 0.0577 ≈ 0.05, an appropriate rejection rule is to reject H0 if Y 􏰀 6. (In R: pbinom(6, • We observed y = 11, so cannot reject H0. • The p-value is Pr(Y 􏰀 11) = 0.75 > 0.05 so cannot reject H0.
> binom.test(11, 20, alternative = “less”)
Exact binomial test
data: 11 and 20
number of successes = 11, number of trials = 20,
p-value = 0.7483
alternative hypothesis: true probability of
success is less than 0.5
95 percent confidence interval:
0.0000000 0.7413494
sample estimates:
probability of success
Sign test for paired samples
(In R: pbinom(11, 20, 0.5))
Can also use the sign test for paired samples: simply replace (xi , yi ) with sgn(xi − yi ). For example:
i xi yi Sign
1 8.9 10.3 −1
2 26.7 11.7 +1
3 12.4 5.2 +1
4 34.3 36.9 −1
Use of the sign test
• The sign test requires few assumptions
• But it doesn’t use information on the size of the differences, so it can be insensitive to departures from H0
• In other words, large type II error or small power
• Tends to only be used when the data are not numerical but for which comparisons between values are meaningful (e.g. ordinal data)

2.2 Wilcoxon signed-rank test (one-sample)
Wilcoxon one-sample test
• Now, assume the underlying distribution is also symmetrical (as well as continuous) • Same null hypothesis (H0 : m = m0) against a one-sided or two-sided alternative
• Determine the ranks of: |X1 −m0|,…,|Xn −m0|
• Replace the data by signed ranks, Xi becomes sgn(Xi − m0) · rank(|Xi − m0|)
• The Wilcoxon signed-rank statistic, W , is the sum of these signed ranks
• Using this as a basis for a test gives the Wilcoxon signed-rank test, also known as the Wilcoxon one-sample test.
Alternative definitions
• Textbooks and software packages vary in the statistic they use • We just defined: W is the sum of the signed ranks
• A popular alternative: V is the sum of the positive ranks only • V is a bit easier to calculate, esp. by hand
• V and W are deterministically related (can you derive the formula?) • V and W have different (but related) sampling distributions
• Using either statistic leads to equivalent test procedures
Example (Wilcoxon one-sample test)
• The lengths of 10 fish are:
5.0, 3.9, 5.2, 5.5, 2.8, 6.1, 6.4, 2.6, 1.7, 4.3
• Interested in testing: H0 : m = 3.7 versus H1 : m > 3.7
i xi xi − 3.7 |xi − 3.7| Rank 15.01.31.35 23.90.20.21 35.21.51.56 45.51.81.87 52.8 −0.9 0.9 3 66.12.42.49 76.42.72.710 82.6 −1.1 1.1 4 91.7 −2.0 2.0 8
10 4.3 0.6 0.6 2 • The sum of signed ranks is:
Signed rank 5 1 6 7 −3 9 10 −4 −8 2
W =5+1+6+7−3+9+10−4−8+2=25 • Alternatively, the sum of positive ranks is:
V =5+1+6+7+9+10+2=40
Decision rule
• What is an appropriate critical region?
• If H1 : m > 3.7 is true, we expect more positive signs. Then W should be large, so the critical region should be
W 􏰁 c for a suitable c.
• (For other alternative hypotheses, e.g. two-sided, need to modify this accordingly.)

• If H0 is true then Pr(Xi < m0) = Pr(Xi > m0) = 12.
• Assignment of the n signs to the ranks are mutually independent (due to symmetry assumption)
• W is the sum of the integers 1, . . . , n, each with a positive or negative sign
• Under H0, W = 􏰂ni=1 Wi where
Pr(Wi =i)=Pr(Wi =−i)= 12, i=1,…,n
• ThemeanunderH0 isE(Wi)=−i·12 +i·12 =0,soE(W)=0
• Similarly, var(Wi) = E(Wi2) = i2 and
var(W) = 􏰃var(Wi) = 􏰃i2 = 6
• Inthiscase,forn=10andα=0.05,werejectH0 if W
Z = 􏰄10·11·21/6 􏰁1.645 (because Φ−1(0.95) = 1.645) which is equivalent to
n n n(n+1)(2n+1) i=1 i=1
• A more advanced argument shows that for large n this statistic approximately follows a normal distribution when H0 is true. In other words,
Z = 􏰄n(n+1)(2n+1)/6 ≈N(0,1)
• Pr(W 􏰁 c | H0) ≈ Pr(Z 􏰁 z | H0), which allows us to determine c.
W 􏰁 1.645 × 6 = 32.27
• For the example data we have w = 25, so we do not reject H0
• R uses V rather than W
• For small sample sizes R will use the exact sampling distribution (which we haven’t explored) rather than the
normal approximation.
• To carry out the test, use: wilcox.test
• To work with the sampling distribution of V , use: psignrank
• Note: E(V)=n(n+1)/4andvar(V)=n(n+1)(2n+1)/24. YoucanderivetheseinasimilarwaytoW.
> wilcox.test(x, mu = 3.7, alternative = “greater”,
exact = TRUE)
Wilcoxon signed rank test
V = 40, p-value = 0.1162
alternative hypothesis: true location is greater than 3.7
# Calculate exact p-value manually.
> 1 – psignrank(39, 10)
[1] 0.1162109
# Calculate approximate p-value, based on W.
> z <- 25 / sqrt(10 * 11 * 21 / 6) > 1 – pnorm(z)
[1] 0.1013108
⇒ Close agreement between exact and approximate p-values 5

Paired samples
• Like other tests, we can use the Wilxcon signed-rank test for paired samples by first taking differences and treating these as a sample from a single distribution.
• The assumption of symmetry is quite reasonable in this setting, since under H0 we would typically assume X and Y have the same distribution and therefore X − Y ∼ Y − X.
• Indeed, this test is most often used in such a setting, due to the plausibility of this assumption.
Tied ranks
• We assumed a continuous population distribution
• Thus, all observations will differ (with probablity 1)
• In practice, the data are reported to finite precision (e.g. due to rounding), so we could have exactly equal values
• This will lead to ties when ranking our data
• If this happens, the ‘rank’ assigned for the tied values should be equal to the average of the ranks they span
• Example:
Value: 2.1 4.3 4.3 5.2 5.7 5.7 5.7 5.9 Rank: 1 2.5 2.5 4 6 6 6 8
• The presence of ties complicates the derivation of the sampling distribution, but R knows how to do the right thing
2.3 Wilcoxon rank-sum test (two-sample)
Wilcoxon two-sample test
• We can create a two-sample version of the Wilcoxon test.
• Independent random samples X1,…,XnX and Y1,…,YnY from two different populations with medians mX
and mY respectively.
• Want to test H0 : mX = mY against a one-sided or two-sided alternative
• Order the combined sample and let W be the sum of the ranks of Y1, . . . , YnY . This is the Wilcoxon rank-sum statistic.
• Note: this captures information on X as well as Y ! (Why?)
• The test based on this statistic is called the Wilcoxon rank-sum test, also known as the Wilcoxon two-sample
test and the Mann- test.
Rejection region
• Suppose our alternative hypothesis is H1 : mX > mY
• If mX > mY then we expect W to be small, since the Y values will tend to be smaller than X and thus have
smaller ranks
• Therefore, the critical region should be of the form W 􏰀 c for a suitable c.
• Properties of W (derivation not shown):
E(W)=nY(nX+nY +1) 2
var(W)=nXnY(nX+nY +1) 12
• W is approximately normally distributed when nX and nY are large

Alternative definitions
• Like for the one-sample version, the definition of the statistic varies
• We just defined: W is the sum of the ranks in the Y sample
• A popular alternative: U is the number of all pairs (Xi,Yj) such that Yj 􏰀 Xi (the number of ‘wins’ out of all possible pairwise ‘contests’)
• U and W are deterministically related (can you derive the formula?)
• U and W have different (but related) sampling distributions
• Using either statistic leads to equivalent test procedures
• Note: E(U) = nXnY /2 and var(U) = var(W)
Example (Wilcoxon two-sample test)
Two companies package cinnamon. Samples of size eight from each company yield the following weights: X 117.1 121.3 127.8 121.9 117.4 124.5 119.5 115.1
Y 123.5 125.3 126.5 127.9 122.1 125.6 129.8 117.2 WanttotestH0:mX =mY versusH1:mX ̸=mY
Use a significance level of 5%
• R uses U…but calls it W!
• For small sample sizes R will use the exact sampling distribution, otherwise it will use a normal approximation • To carry out the test, use: wilcox.test
• To work with the sampling distribution of U, use: pwilcox
> wilcox.test(x, y)
Wilcoxon rank sum test
data: x and y
W = 13, p-value = 0.04988
alternative hypothesis:
true location shift is not equal to 0
# Calculate exact p-value manually.
> 2 * pwilcox(13, 8, 8)
[1] 0.04988345
We reject H0 and conclude that we have sufficient evidence to show that the median weights differ between the two companies.
3 Goodness-of-fit tests (χ2) 3.1 Introduction
Goodness-of-fit tests
• How well does a given model fit a set of data?
• E.g. if we assume a Poisson model for a set of data, is it reasonable? • We can assess this with a ‘goodness-of-fit’ test
• The most commonly used is Pearson’s chi-squared test

• Unlike most of the other tests we’ve seen, this operates on categorical (discrete) data
• Can also apply it on continuous data by first partitioning the data into separate classes
3.2 Two classes
Binomial model
• Start with a binomial model Y1 ∼ Bi(n, p1)
• Our usual test statistic for this is
Z = 􏰄np1(1 − p1) ≈ N(0, 1)
• Therefore,
• TotestH0:p=p1 versusH1:p̸=p1,wewouldrejectH0 if|Z|(and,hence,Q1)istoolarge.
• Next, notice that
(Y1 − np1)2 (Y1 − np1)2 (Y1 − np1)2 Q1=np(1−p)= np + n(1−p)
whereY2 =n−Y1 andp2 =1−p1.
• Therefore,
• Y1 is the observed number of successes, np1 is the expected number of successes
• Y2 is the observed number of failures, np2 is the expected number of failures
Q 1 = Z 2 ≈ χ 21
(Y1 −np1)2 = (n−Y1 −n(1−p1))2 = (Y2 −np2)2 (Y1 − np1)2 (Y1 − np1)2 (Y2 − np2)2
Q1 = np (1 − p ) = np + np 1112
􏰃2 (Y − np )2 􏰃2 (O − E )2
Q1= i i = i i ≈χ21
i=1 npi i=1 Ei where Oi is the observed number and Ei is the expected number.
• Even though there are two classes, we have only one degree of freedom. This is due to the constraint Y1 + Y2 = n. 3.3 More than two classes
Multinomial model
• Generalize to k possible outcomes (a multinomial model) • pi = probability of the ith class (􏰂ki=1 pi = 1)
• Suppose we have n trials, with Yi being the number of outcomes in class i • E(Yi) = npi
• Now we get,
i=1 npi • k−1degreesoffreedombecauseY1+···+Yk =n
􏰃k (Y − np )2 􏰃k (O − E )2
Qk−1= i i = i i ≈χ2k−1

Setting up the test
• Specify a categorical distribution: p1, p2, . . . , pk
• We use the Qk−1 statistic to test whether are data are consistent with this distribution
• The null hypothesis is that they do (i.e. the pi define the distribution)
• The alternative is that they do not (i.e. a different set of probabilities define the distribution) • Under the null, the test statistic will tend to be small (it measures ‘badness-of-fit’)
• Therefore, reject the null if Qk−1 > c where c is the 1 − α quantile from χ2k−1.
• We are approximating a binomial with a normal
• Good approximation if n is large and the pi are not too small
• Ruleofthumb: needtohaveallEi =npi 􏰁5
• The larger the k (i.e. more classes), the more powerful the test. However, we need the classes to be large enough
• If any of the Ei are too small, can combine some of the classes until they are large enough
• If Qk−1 is very small, this indicates that the fit is ‘too good’. This can be used as a test for rigging of experiments / fake data. Typically need very large n to do this.
• Often refer to the test statistic as χ2
Example (completely specified distribution)
• Proportions of commuters using various modes of transport, based on past records: Bus Train Car Other
0.25 0.15 0.50 0.1
• After a 3-month campaign, a random sample (n = 80) found:
Bus Train Car Other 26 15 32 7
• Did the campaign alter commuters behaviour? • The expected frequencies are:
Bus Train Car Other 20 12 40 8
• The value of the test statistic is:
2 (26−20)2 (15−12)2 (32−40)2 (7−8)2
χ= 20 + 12 + 40 + 8 =4.275
• H0 : proportions have not changed, H1 : proportions have changed
• We have 4 classes, so the test statistic here has a χ23 distribution.
• The 0.95 quantile is 7.81, which is greater than χ2 = 4.275
• Therefore, there is insufficient evidence that the proportions have changed • The p-value is
p = Pr(χ23 > 4.275) = 0.233 > 0.05

Chi-square pdf df = 3
Prob. shaded area = 0.233
>x<-c( 26, 15, 32, 7) > p <- c(0.25, 0.15, 0.5, 0.1) > t1 <- chisq.test(x, p = p) Chi-squared test for given probabilities X-squared = 4.275, df = 3, p-value = 0.2333 > rbind(t1$observed, t1$expected)
[,1] [,2] [,3] [,4]
[1,] 26 15 32 7
[2,] 20 12 40 8
> t1$residuals
[1] 1.3416408 0.8660254 -1.2649111 -0.3535534
> sum(t1$residuals^2)
> 1 – pchisq(4.275, 3)
[1] 0.2332594
3.4 Estimating parameters
Fitting distributions
• We don’t always have an exact model to compare against
• We might specify a family of distributions but still need to estimate some of the parameters
• For example, Pn(λ) or N(μ,σ2)
• We would need to estimate the parameters using the sample, and use these to specify H0
• We need to adjust the test to take into account that we’ve used the data to define H0 (by design, it will be ‘closer’ to the data than if it we didn’t need to do this)
• The ‘cost’ of this estimation is 1 degree of freedom for each parameter that is estimated
• The final degrees of freedom is k − p − 1, where p is the number of estimated parameters

Example (Poisson distribution)
• X is number of alpha particles emitted in 0.1 sec by a radioactive source
• Fifty observations:
7, 4, 3, 6, 4, 4, 5, 3, 5, 3, 5, 5, 3, 2, 5, 4, 3, 3, 7, 6, 6, 4, 3, 9, 11, 6, 7, 4, 5, 4, 7, 3, 2, 8, 6, 7, 4, 1, 9, 8, 4, 8, 9, 3, 9, 7, 7, 9, 3, 10
• Is a Poisson distribution an adequate model for the data?
• H0 : Poisson, H1 : something else
• We have only specified the family of the distribution, not the parameters
• Estimate the Poisson rate parameter λ by the MLE, λˆ = x ̄ = 5.4
• Now we ask: does the Pn(5.4) model give a good fit?
First, find an appropriate partition of the value (collapse the data):
> X1 <- cut(X, breaks = c(0, 3.5, 4.5, 5.5, 6.5, 7.5, Inf)) > T1 <- table(X1) (0,3.5] (3.5,4.5] (4.5,5.5] (5.5,6.5] (6.5,7.5] (7.5,Inf] 13 9 6 5 7 10 Then, prepare the data for the test: > x <- as.numeric(T1) >x
[1] 13 9 6 5 7 10
> n <- sum(x) > p1 <- sum(dpois(0:3, 5.4)); > p2 <- dpois(4, 5.4) > p3 <- dpois(5, 5.4) > p4 <- dpois(6, 5.4) > p5 <- dpois(7, 5.4) > p6 <- 1 - (p1 + p2 + p3 + p4 + p5) > p <- c(p1, p2, p3, p4, p5, p6) Then, run the test: > chisq.test(x, p = p)
Chi-squared test for given probabilities
X-squared = 2.7334, df = 5, p-value = 0.741
But this is the wrong df! Need to adjust manually:
> 1 – pchisq(2.7334, 4)
[1] 0.6033828

Chi-square pdf df = 4
Shaded probability is 0.6034
• Needed to adjust p-values as we have estimated the mean
• The critical value is the 0.95 quantile from χ24, which is 9.488, so we cannot reject H0 • Not enough evidence against the Poisson model
• Therefore, this is an adequate fit (at least, until further data proves otherwise)
0–3 4 5 6 7 8+ Observed 13.0 9.0 6.0 5.0 7.0 10.0 Expected 10.7 8.0 8.6 7.8 6.0 8.9
Tests of independence (contingency tables)
Contingency tables
• Suppose we have multiple categorical variables (which could be continuous variables partitioned into classes) • A contingency table records the number of observations for each possible cross-classification of these variables • We are often interested in whether two categorical variables are related to each other
• For example, height and weight
• Define height classes A1,…,Ar, and weight classes B1,…,Bc
• Each person is assigned to a single combination (Ai,Bj)
• A sample of people can be summarised with a r × c table of counts (a contingency table)
Independence model
• A general model for these data is:
pij =Pr(Ai∩Bj), i=1,…,r, j=1,…,c
• Are the two variables independent?
• We can set this up as a hypothesis test:
H0:pij =Pr(Ai)Pr(Bj) versus H1:pij ̸=Pr(Ai)Pr(Bj)
• This has the same structure as a goodness-of-fit test, can use Pearson’s chi-squared statistic • Show how this works through an example. . .

Example (contingency table)

程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts