Assignment #1 STA355H1S
due Friday, January 31, 2020
Instructions: Solutions to problems 1 and 2 are to be submitted on Quercus (PDF files only). You are strongly encouraged to do problems 3 through 7 but these are not to be submitted for grading.
1. Suppose that Y = (Y1, · · · , Yn)T where Y1, · · · , Yn are independent Normal random vari- ables where Yi ∼ N(μi,σ2). If Γ is an n×n orthogonal matrix (that is, Γ−1 = ΓT) then Z = ΓY is a random vector whose elements Z1,···,Zn are independent Normal random variables each with variance σ2 whose means ν = (ν1, · · · , νn)T are defined by ν = Γμ. It is often convenient to assume that the mean vector ν is “sparse” in the sense that all but a small fraction of its components are exactly 0. (In practice, the matrix Γ is chosen so that the sparsity of ν = Γμ is a reasonable assumption.)
Half-normal plots (which are often called Daniel plots) are used in some statistical models to distinguish values of Z1,···,Zn coming from a N(0,σ2) distribution from those coming from Normal distributions with non-zero means. Suppose for example that νi1,···,νik are non-zero with the remaining components equal to 0; then we would expect the values of |Zi1 |, · · · , |Zik | to be larger than other values of {|Zi|}. Defining Wi = |Zi|, we plot the ordered values W(1) ≤ ··· ≤ W(n) versus the corresponding quantiles of a standard “half- normal” distribution (the distribution of the absolute value of a N (0, 1) random variable); if Z1, · · · , Zn come from a N (0, σ2) distribution then the points should lie close to a straight line whose slope is σ; on the other hand, if νi1 , · · · , νik are non-zero then we might expect the largest values W(n−k+1), · · · , W(n) to lie noticeably above the line whose slope is σ. However, since σ is unknown, we need to estimate it and we do not want this estimate influenced (that is, biased upwards) by larger values of Wi; in part (b) below, we define possible “robust” estimators of σ.
(a) If Z ∼ N(0,σ2), show that
(i) the cdf of |Z| is G(x) = 2Φ(x/σ)−1 where Φ(t) is the cdf of a N(0,1) random variable;
(ii) the τ quantile of the distribution of |Z| is G−1(τ) = σΦ−1((τ + 1)/2).
(b) Suppose that Z1, · · · , Zn are independent N (0, σ2) random variables and define Wi = |Zi| for i = 1,···,n and the order statistics W(1) ≤ W(2) ≤ ··· ≤ W(n). The result of part (a) suggests that we could estimate σ using an order statistic W(k) as follows:
σ k = W ( k ) Φ−1((τk + 1)/2)
where (for example) τk = k/(n + 1). If τk → τ ∈ (0, 1) as k, n → ∞ then √d2
n(σk −σ)−→N(0,γ (τ)).
Give an expression for γ2(τ). For what value of τ is γ2(τ) minimized? (You can determine
the minimizing value of τ graphically.)
(c) A random variable U is said to be stochastically greater than a random variable V if P(U ≤x)≤P(V ≤x)forallxwithP(U ≤x)
|μ2|. Show that |U| is stochastically greater than |V |. (Hint: First of all, show that the distribution of |U | depends on|μ1|sothatwecanassumethatμ1 >μ2 ≥0. ThenshowthatifX∼N(μ,σ2)forμ≥0 then P(|X| ≤ x) decreases as μ increases. Calculus is your friend here!)
(d) The function halfnormal.txt on Quercus contains a function to do half-normal plots. This function halfnormal has three arguments: the data x, the value of τ, tau (which defaults to τ = 0.5) used to estimate σ, and an optional parameter ylim, which allows you to define the minimum and maximum y-axis values. The file data.txt contains 1000 observations from Normal distributions whose means are almost all 0. Using half-normal plots, try to estimate how many of the 1000 means are non-zero. There is no right or wrong approach here so feel free to be creative.
2. The hazard or failure rate function of a non-negative continuous random variable X is
defined to be
h(x) = f(x) for x ≥ 0 1−F(x)
where f(x) is the pdf of X and F(x) is its cdf. We can also define h(x) by h(x)=lim1P(x≤X ≤x+δ|X ≥x).
δ↓0 δ
(a) A useful formula for the expected value of any non-negative random variable is
∞ 0
∞x 00
∞ 0
E (X ) =
If X is also continuous with pdf f(x) then this formula can be derived as follows:
E(X) = = = =
xf (x) dx
∞ 0
(1 − F (x)) dx.
∞∞ 0t
f(x)dtdx
f(x)dxdt (1−F(t))dt.
If h(x) is the hazard function of X, show that E(X) = 1 1
dτ.
(b) Suppose that X(k) is the k-th order statistic where k ≈ τ n (for some τ ∈ (0, 1)) and define Dk = X(k) − X(k−1). From lecture, we know that the distribution of n Dk is approximately Exponential with mean 1/f(F−1(τ)). Use this fact to show that the distribution of (n−k+ 1)Dk is approximately Exponential with mean 1/h(F−1(τ)). (Hint: Note that h(F−1(τ)) = f(F−1(τ))/(1 − τ).)
(c) The shape of h(x) provides useful information about the distribution not readily obvious from the pdf and cdf; for example, if X represents the lifetime of some (say) electronic component then a decreasing hazard function would indicate that the component improves with age.
The total time on test (TTT) plot provides one to assess the rough shape of h(x) based on a sample x1, · · · , xn. To construct this plot, we define
d1 = nx(1)
dk = (n−k+1)(x(k) −x(k−1)) fork=2,···,n
and plot (d1 + ··· + dk)/(x1 + ··· + xn) versus k/n for k = 1,···,n. Using the result from part (b), we might argue that (d1 +···+dk)/(x1 +···+xn) is an estimate of
1τ 1dτ E(X) 0 h(F−1(τ))
for τ = k/n. If the underlying hazard function h(x) is decreasing then the shape of these points will be roughly convex (and lie below the 45o line) while if h(x) is increasing then the shape of the points will be roughly concave (and lie above the 45o line).
Given data in a vector x, the TTT plot can be constructed as follows:
> x <- sort(x) # order elements from smallest to largest
> n <- length(x) # find length of x
> d <- c(n:1)*c(x[1],diff(x))
> plot(c(1:n)/n, cumsum(d)/sum(x), xlab=”t”, ylab=”TTT”)
> abline(0,1) # add 45 degree line to plot
Data on the lifetimes (in hours) of Kevlar 373/epoxy strands (subjected to constant pressure at 90% stress level) are contained in the file kevlar.txt. Construct a TTT plot for these data. Does the hazard function appear to be increasing or decreasing with time?
0 h(F−1(τ)) (Hint: Make the change of variables u = F−1(τ).)
Supplemental problems (not to be handed in):
3. (a) Suppose that X has a Gamma distribution with shape parameter α and scale param- eter λ; the density of X is
f(x) = λαxα−1 exp(−λx) for x > 0 Γ(α)
Find expressions for the skewness and kurtosis of X in terms of α and λ. (Do these depend on λ?) What happens to the skewness and kurtosis as α → ∞?
(b) Suppose that X1,···,Xn are independent and define Sn = X1 +···+Xn. Assuming that E(Xi3) is well-defined for all i, show that the skewness of Sn is given by
n −3/2n
skew(Sn) = σi2 σi3 skew(Xi)
i=1 i=1
where σi2 = Var(Xi). (Hint: Follow the proof given for the kurtosis identity assuming for simplificity that E(Xi) = 0; this is more simple since E(Sn) involves a triple summation, most of whose terms are 0.)
4. Suppose that X1, · · · , Xn are independent random variables with distribution function F where μ = E(Xi) and σ2 = Var(Xi). For some families of distributions, the variance is a function of the mean so that σ2 = σ2(μ). A function g is said to be a variance stabilizing transformation for the family of distributions if
√ ̄d
n(g(Xn) − g(μ)) −→ N (0, 1)
(a) Show that g defined above must satisfy the differential equation g′(μ)=± 1 .
σ(μ)
(Note that g is not unique.)
(b) Find variance stabilizing transformations for
(i) Poisson distributions;
(ii) Exponential distributions;
(iii) Bernoulli distributions.
5. Suppose that X1, · · · , Xn are independent random variables with some continuous distri- bution function F. Given data x1,···,xn (outcomes of X1,···,Xn), we can make a boxplot to graphically represent the data — observations beyond the “whiskers” (which extend to at most 1.5 × interquartile range from the upper and lower quartiles) are flagged as possible outliers. When n is large enough, we can obtain a crude estimate for the expected number of outliers as follows:
(i) Compute the lower and upper quartiles of F , F −1(1/4) and F −1(3/4) and define IQR = F−1(3/4) − F−1(1/4).
(ii) Compute the probability of an outlier by
F(F−1(1/4) − 1.5 × IQR) + 1 − F(F−1(3/4) + 1.5 × IQR)
(iii) The expected number of outliers is simply n times the probability in part (ii).
Compute the expected number of outliers for the following distributions.
(a) Normal distribution – note that the probability in (ii) will not depend on the mean and variance so you can assume a standard normal distribution. (The R functions pnorm and qnorm can be used to compute the distribution function and quantiles, respectively, for the normal distribution.)
(b) Laplace distribution with density
f(x) = 1 exp(−|x|). 2
(No R functions for the distribution functions and quantiles seem to exist for the Laplace distribution. However, both are easy to evaluate analytically.)
(c) Cauchy distribution with density
f(x) = 1
π(1 + x2)
(The R functions pcauchy and qcauchy can be used to compute the distribution function and quantiles, respectively, for the Cauchy distribution.)
(d) Comment on the differences between the 3 distributions considered in parts (a)–(c). In particular, how does the proportion of outliers change as the “tails” (i.e. the rate at which f(x) goes to 0 as |x| → ∞) of the distributions change?
6. Suppose that X1, X2, · · · is a sequence of independent random variables with mean μ and variance σ2 < ∞; define X ̄n = n−1(X1 + · · · + Xn). Describe the limiting behaviour (that is, either convergence in probability or convergence in distribution as well as the limit as
n → ∞) of the following random variables. 1n
(a) Sn2 = (Xi − X ̄n)2. √ n − 1 i=1
(b) √n(X ̄n − μ)/Sn.
(c) n(exp(X ̄n) − exp(μ))/Sn. 1n
(d) |Xi − X ̄n|. (The limit here should be intuitively clear; however, proving it is not n i=1
easy!)
d
7. Suppose that an(Xn −θ) −→ Z (where an ↑ ∞) and that g(x) is an infinitely differentiable
function (that is, it has derivatives of all orders). The Delta Method says that
d′ an(g(Xn) − g(θ)) −→ g (θ)Z;
′p
if g (θ) = 0 then the right hand side above is 0 and so an(g(Xn) − g(θ)) −→ 0.
(a) Suppose that g′(θ) = 0 and g′′(θ) ̸= 0. Use the Taylor series expansion g(x) = g(θ) + g′(θ)(x − θ) + 1g′′(θ)(x − θ)2 + rn
2
(where rn/(x − θ)2 → 0 as x → θ) to find the limiting distribution of a2n(g(Xn) − g(θ)).
(b) Extend the result of part (a) to the case where g′(θ) = g′′(θ) = ··· = g(k−1)(θ) = 0 but g(k)(θ) ̸= 0 (g(k) denotes the k-th derivative of g).