University of California, Los Angeles Department of Statistics
Statistics 100B Instructor: Nicolas Christou Distributions related to the normal distribution
Three important distributions:
• Chi-square (χ2) distribution. • t distribution.
• F distribution.
Before we discuss the χ2,t, and F distributions here are few important things about the gamma (Γ) distribution. The gamma distribution is useful in modeling skewed distributions for variables that are not negative.
A random variable X is said to have a gamma distribution with parameters α,β if its probability density function is given by
xα−1e− x β
f(x)= βαΓ(α), α,β>0,x≥0. E(X) = αβ and σ2 = αβ2.
A brief note on the gamma function:
The quantity Γ(α) is known as the gamma function and it is equal to:
∞ α−1 −x Γ(α)= x e dx.
0
Γ(1) = √π. 2
If we set α = 1 and β = 1 we get f(x) = λe−λx. We see that the exponential distribution is λ
a special case of the gamma distribution.
Useful result:
1
The gamma density for α = 1,2,3,4 and β = 1.
Gamma distribution density
Γ(α = 1, β = 1)
Γ(α = 2, β = 1)
Γ(α = 3, β = 1)
Γ(α = 4, β = 1)
02468
x
Moment generating function of the X ∼ Γ(α, β) random variable: MX (t) = (1 − βt)−α
Proof:
MX(t)=Ee = e α
x tX ∞ txxα−1e−β
1 ∞ α−1 −x(1−βt) dx= α x e β dx
0 Let y = x(1−βt) ⇒ x = β
β Γ(α)
y, and dx = β
β Γ(α) 0
dy. Substitute these in the expression above:
β 1−βt
1−βt yα−1e−y
1 ∞ β α−1 β Γ(α) 0 1−βt
β 1−βt
MX(t) = α
MX(t)= α y e dy⇒MX(t)=(1−βt) .
dy
1 β α−1 β ∞ α−1 −y −α
β Γ(α) 1−βt 1−βt 0 2
0.0 0.2 0.4
0.6 0.8 1.0
f(x)
Theorem:
Let Z ∼ N(0,1). Then, if X = Z2, we say that X follows the chi-square distribution with 1 degree of freedom. We write, X ∼ χ21.
Probability density function of X ∼ χ21:
Find the probability density function of X = Z2, where f(z) = cdf of X:
FX(x)=P(X ≤x)=P(Z2 ≤x)=P(−√x≤Z ≤√x)⇒ FX(x)=FZ(√x)−FZ(−√x). Therefore:
√2
1 e−1z2. Begin with the
2π
1−1 1−1x 1−1 1−1x 1 −1−x fX(x)=x2√e2 +x2√e2 =1√x2e2,or
2 2π 2 2π 22 π x− 1 e− x
22 fX(x) = 21 Γ(1) .
2
2
This is the pdf of Γ( 1 , 2), and it is called the chi-square distribution with 1 degree of freedom.
The moment generating function of X ∼ χ2 is M (t) = (1 − 2t)− 1 . 1X2
Theorem:
Let Z1, Z2, . . . , Zn be independent random variables with Zi ∼ N(0, 1). If Y = ni=1 zi2 then Y follows the chi-square distribution with n degrees of freedom. We write Y ∼ χ2n.
Proof:
Find the moment generating function of Y . Since Z1, Z2, . . . , Zn are independent,
M Y ( t ) = M Z 12 ( t ) × M Z 2 2 ( t ) × . . . M Z n2 ( t )
Each Z 2 follows χ2 and therefore it has mgf equal to (1 − 2t)− 1 . Conclusion: i12
Y2 M (t)=(1−2t)−n.
This is the mgf of Γ(n,2), and it is called the chi-square distribution with n degrees of free- 2
dom.
Theorem:
Let X1, X2, . . . , Xn independent random variables with Xi ∼ N(μ, σ). It follows directly form the previous theorem that if
2 We write, X ∼ χ21.
Y =
σ then Y ∼χn.
n x i − μ 2 2
i=1
3
We know that the mean of Γ(α, β) is E(X) = αβ and its variance var(X) = αβ2. Therefore, if X ∼ χ2n it follows that:
E(X) = n, and var(x) = 2n. Theorem:
Let X ∼ χ2n and Y ∼ χ2m. If X, Y are independent then X + Y ∼ χ 2n + m .
Proof: Use moment generating functions.
Shape of the chi-square distribution:
In general it is skewed to the right but as the degrees of freedom increase it becomes
N(n,
√
2n). Here is the graph:
Χ23
0 4 8 12162024283236404448525660646872768084889296
x
Χ2 10
0 4 8 12162024283236404448525660646872768084889296
x
Χ2 30
0 4 8 12162024283236404448525660646872768084889296
x
4
f(x) f(x) f(x)
0.00 0.10 0.20 0.00 0.10 0.20 0.00 0.10 0.20
b. Var(X).
c. P (83.85 < X < 163.64).
The χ2 distribution - examples
Example 1
If X ∼ χ216, find the following:
a. P (X < 28.85).
b. P (X > 34.27).
c. P (23.54 < X < 28.85).
d. IfP(X 2).
b. Find P(16 Z > 2). i=1 i
c.FindP(16 Z2>6.91). i=1 i
d. Let S2 be the sample variance of the first sample. Find c such that P(S2 > c) = 0.05.
e. WhatisthedistributionofY,whereY =16 Z2+64 (X −μ)2?
f. FindEY.
g. Find V ar(Y ).
h. Approximate P (Y > 105).
i. Find c such that 16 Z2
c i=1 i ∼F16,80. Y
j. Let Q ∼ χ260. Find c such that
Z1
P √
12
i=1 i
i=1 i
Central limit theorem, χ2, t, F distributions – examples
Example 1
Suppose X1,···,Xn is a random sample from a normal population with mean μ1 and standard deviation σ = 1. Another random sample Y1,···,Ym is selected from a normal population with mean μ2 and standard deviation σ = 1. The two samples are independent.
a. What is the distribution of W , where W is
nm
( X i − X ̄ ) 2 + ( Y i − Y ̄ ) 2 i=1 i=1
b. What is the mean of W?
c. What is the variance of W ?
Example 2
Determine which columns in the F tables are squares of which columns in the t table. Clearly explain your answer.
Example 3
Y1,Y2,···,Y23 comes from a population which is also normal N(μ2,σ 3). The two samples are
independent. For these samples we compute the sample variances
S2 = 1 18 (X −X ̄)2 andS2 = 1 23 (Y −Y ̄)2. X 17 i=1 i Y 22 i=1 i
S2
For what value of c does the expression c X have the F distribution with (17, 22) degrees of freedom?
Example 4
√ √
The sample X1, X2, · · · , X18 comes from a population which is normal N(μ1, σ
7). The sample
SY2
Supply responses true or false with an explanation to each of the following:
a. The standard deviation of the sample mean X ̄ increases as the sample increases.
b. The Central Limit Theorem allows us to claim, in certain cases, that the distribution of the sample mean X ̄ is normally distributed.
c. The standard deviation of the sample mean X ̄ is usually approximately equal to the unknown population σ.
d. The standard deviation of the total of a sample of n observations exceeds the standard deviation of the sample mean.
e. IfX∼N(8,σ)thenP(X ̄>4)islessthanP(X>4).
13
Example 5
A selective college would like to have an entering class of 1200 students. Because not all students who are offered admission accept, the college admits 1500 students. Past experience shows that 70% of the students admitted will accept. Assuming that students make their decisions independently, the number who accept X, follows the binomial distribution with n = 1500 and p = 0.70.
a. Write an expression for the exact probability that at least 1000 students accept.
b. Approximate the above probability using the normal distribution.
Example 6
An insurance company wants to audit health insurance claims in its very large database of trans- actions. In a quick attempt to assess the level of overstatement of this database, the insurance company selects at random 400 items from the database (each item represents a dollar amount). Suppose that the population mean overstatement of the entire database is $8, with population standard deviation $20.
a. Find the probability that the sample mean of the 400 would be less than $6.50.
b. The population from where the sample of 400 was selected does not follow the normal dis-
tribution. Why?
c. Why can we use the normal distribution in obtaining an answer to part (a)?
d. ForwhatvalueofωcanwesaythatP(μ−ω
Example 7
Next to the cash register of the Southland market is a small bowl containing pennies. Customers are invited to take pennies from this bowl to make their purchases easier. For example if a customer has a bill of $2.12 might take two pennies from the bowl. It frequently happens that customers put into the bowl pennies that they receive in change. Thus the number of pennies in the bowl rises and falls. Suppose that the bowl starts with $2.00 in pennies. Assume that the net daily changes is a random variable with mean −$0.06 and standard deviation $0.15. Find the probability that, after 30 days, the value of the pennies in the bowl will be below $1.00.
Example 8
A telephone company has determined that during nonholidays the number of phone calls that pass through the main branch office each hour follows the normal distribution with mean μ = 80000 and standard deviation σ = 35000. Suppose that a random sample of 60 nonholiday hours is selected and the sample mean x ̄ of the incoming phone calls is computed.
a. Describe the distribution of X ̄.
b. Find the probability that the sample mean X ̄ of the incoming phone calls for these 60 hours
is larger than 91970.
c. Is it more likely that the sample average X ̄ will be greater than 75000 hours, or that one hour’s incoming calls will be?
14
Example 9
Assume that the daily S&P return follows the normal distribution with mean μ = 0.00032 and standard deviation σ = 0.00859.
a. Find the 75th percentile of this distribution.
b. What is the probability that in 2 of the following 5 days, the daily S&P return will be larger
than 0.01?
c. Consider the sample average S&P of a random sample of 20 days.
i. What is the distribution of the sample mean?
ii. What is the probability that the sample mean will be larger than 0.005?
iii. Is it more likely that the sample average S&P will be greater than 0.007, or that one day’s S&P return will be?
Example 10
Find the mean and variance of 1n
S 2 = ( X i − X ̄ ) 2 , n − 1 i=1
where X1, X2, · · · , Xn is a random sample from N(μ, σ). Example 11
Let X1,X2,X3,X4,X5 be a random sample of size n = 5 from N(0,σ).
a. Find the constant c so that
c(X1 − X2) X 32 + X 42 + X 52
has a t distribution.
b. How many degrees of freedom are associated with this t distribution?
Example 12
Let X ̄ , Y ̄ , and W ̄ and SX2 , SY2 , and SW2 denote the sample means and sample variances of three independent random samples, each of size 10, from a normal distribution with mean μ and variance σ2. Find c so that
X ̄ + Y ̄ − 2 W ̄ P9SX2+9SY2+9SW2
Example 15
If Y has a χ2 distribution with n degrees of freedom, then Y could be represented by
n
Y =Xi
i=1
where Xi′s are independent, each having a χ2 distribution with 1 degree of freedom.
√
a. Show that Z = Y −n has an asymptotic standard normal distribution.
2n
b. A machine in a heavy-equipment factory produces steel rods of length Y , where Y is a normal random variable with μ = 6 inches and σ2 = 0.2. The cost C of repairing a rod that is not exactly 6 inches in length is proportional to the square of the error and is given, in dollars, by C = 4(Y −μ)2. If 50 rods with independent lengths are produced in a given day, approximate that the total cost for repairs for that day exceeds $48.
Example 16
Suppose that five random variables X1, · · · , X5 are i.i.d., and each has a standard normal distribu- tion. Determine a constant c such that the random variable
c(X1 + X2) X 32 + X 42 + X 52
will have a t distribution.
Example 17
Suppose that a random variable X has an F distribution with 3 and 8 degrees of freedom. Deter- mine the value of c such that P(X < c) = 0.05.
Example 18
Suppose that a random variable X has an F distribution with 1 and 8 degrees of freedom. Use the table of the t distribution to determine the value of c such that P (X > c) = 0.2.
Example 19
Suppose that a point (X, Y, Z) is to be chosen at random in 3-dimensional space, where X, Y , and Z are independent random variables and each has a standard normal distribution. What is the probability that the distance from the origin to the point will be less than 1 unit?
Example 20
Suppose that X1, · · · , X6 form a random sample from a standard normal distribution and let Y = (X1 +X2 +X3)2 +(X4 +X5 +X6)2.
Determine a value of c such that the random variable cY will have a χ2 distribution. 16