程序代写代做代考 CHAPTER 2. ESTIMATION

CHAPTER 2. ESTIMATION

1 Introduction
1. Remark. Estimation is the name given to the statistical procedure by which numer-
ical values obtained from a sample are used as an approximation for the corresponding
unknown parameters for the parent population. Example are: the mean and variance
of the population values (MT130); the regression parameters in a regression problem
(MT230); and, the estimates of the factor effects in analysis of variance (MT230).

2. Formulation. The random variables X1, X2, …, Xn are called a random sample of
size n from the population f(x) if X1, X2, …, Xn are independent random variables and
the pdf or pf of each Xi is the same function f(x). Alternatively, X1, X2, …, Xn are called
independent and identically distributed (iid) random variables with the common pdf or pf
f(x).

From Ch.1, the joint pdf or pf of X1, X2, …, Xn is given by

f(x) = f(x1, . . . , xn) = f(x1)f(x2) · · · f(xn) =
n∏
i=1

f(xi).

In particular, if the population pdf or pf is a member of a parametric family with pdf or
pf f(x, θ), then the joint pdf or pf is

f(x, θ) = f(x1, . . . , xn, θ) =
n∏
i=1

f(xi, θ).

The parameter θ may be a single real number or a vector whose coordinates are all the
parameters of interest in the problem under discussion. We write Θ for the set of all
possible (sensible) values of θ.

Along with the sample random variables we have their observed numerical values x1, x2, …, xn.

3. Examples. (i) Let X1, X2, …, Xn be a random sample from a Binomial distribution
B(1,θ). Then Θ = {θ : 0 ≤ θ ≤ 1}.
(ii) If X1, X2, …, Xn are a random sample from a normal distribution N(µ,σ

2), then

Θ = {θ = (µ, σ) : −∞ < µ < +∞, σ > 0}.

4. Notation. We simplify the writing by putting:

x = (x1, x2, …, xn) and X = (X1, X2, …, Xn), so that we write

F (x1, x2, …, xn, θ) = F (x, θ) and f(x1, x2, …, xn, θ) = f(x, θ).

5. Aim. Our goal is to assign a numerical value for θ, or the coordinates of θ in the
multi-parameter case, and thereby obtain information about the nature of the distribution
of the values of the parent population.

As in any area of mathematical application, the first question to be asked about an
estimate is “what is the accuracy?”. Bearing in mind that, in our context, the estimate is
based on a random sample, the question should be re-expressed in terms of what reliance

7

can be placed on the numerical value obtained. A measured response to this question
is provided by probability calculations involving the distribution of the sample random
variables. This involves the assumption that the distribution has a particular functional
form which contains the unknown parameter θ.

6. Definition. Let g(θ) be a function of the parameter θ.

(i) A (point) estimator of g(θ) is a function ĝ(X) of the sample variables X = (X1, X2, …, Xn).

(ii) A (point) estimate of g(θ) is a function ĝ(x) of the observed values x = (x1, x2, …, xn).

[Thus, ĝ(x) is the observed value of the statistic ĝ(X).]

7. Examples. (i) Let X1, X2, …, Xn be a random sample from the binomial distribution
B(1, θ) – so that f(x, θ) = θt(1− θ)n−t, where t = x1 + x2 + …+ xn. Then:
(a) With g(θ) = θ (the population mean), take as an estimate ĝ(x) = θ̂(x) = x = t/n

with corresponding estimator θ̂(X) = X ; and,

(b) with g(θ) = θ(1−θ) (the population variance), take as an estimate ĝ(x) = 1
n−1

∑n
i=1(xi−

x)2 = s2 with corresponding estimator S2 = 1
n−1

∑n
i=1(Xi −X)

2.

(ii) Let X1, X2, …, Xn be a random sample from the normal N(µ,σ
2) distribution. Here,

we put θ = (µ, σ). Then:

(a) with g(θ) = µ we may take as an estimate ĝ(x) = µ̂ = x with estimator X ; and,

(b) with g(θ) = σ2 we may take as an estimate ĝ(x) = σ̂2 = 1
n−1

∑n
i=1(xi − x)

2 with

estimator S2 = 1
n−1

∑n
i=1(Xi −X)

2.

8. Remark. The definition above of an estimator neither specifies any method of finding
estimators nor offers any guidance on ‘desirable’ properties we should seek for them.
Also, the definition makes no mention of any correspondence between an estimator and
the parameter it is to estimate. In order to select a useful estimator from the range of
possibilities allowed by the definition, the first requirement is a set of criteria by which
different estimators may be compared.

9. Definition. An estimator ĝ(X) of g(θ) is unbiased if E(ĝ(X)) = g(θ) for all θ ∈ Θ.
An estimator that is not unbiased is biased.

10. Examples. (i) All the estimators in 2.1.7 are unbiased.

(ii) Let X1, X2, …, Xn be a random sample from the uniform distribution U(0,θ). Then
ĝ1(X) = 2X and ĝ2(X) =

n+1
n
Yn, where Yn = max{X1, X2, …, Xn} are two unbiased

estimators of the parameter θ (here, g(θ) = θ) – see Exercises 1. Further, also from
Exercises 1,

var(ĝ1(X)) = 4var(X) =
θ2

3n
and var(ĝ2(X)) =

(n+1)2

n2
var(Yn) =

θ2

n(n+2)
.

Hence, since n ≥ 1, var(ĝ2(X)) ≤ var(ĝ1(X)).

11. Principle. Generally, we will look for unbiased estimators and, given a choice
of estimators, we prefer the estimator with smaller variance. [Recall that the variance
measures the spread of the values of the random variable, so that a small variance suggests
that the values are concentrated around the mean.]

8

12. Remark. In direct contradiction to the principle, some statisticians would assert a
preference for a biased estimator, of known bias, over an unbiased estimator with larger
variance.

2 Methods of Estimation

Two general methods by which estimators may be determined are now described. As
the examples and exercises demonstrate, neither method will automatically produce an
unbiased estimator. The third method of estimation – least squares estimation – is covered
in MT230.

1. Method 1: The method of moments

For this method we assume that we have a simple random sample. Hence, the sample
random variables X1, X2, …, Xn have a common distribution and, for a positive integer r,
the (common) finite moment mr = E(X

r
1), is assumed to exist. The method of moments

is to estimate mr by the corresponding sample moment ψ̂r =
1
n

∑n
i=1 x

r
i with estimator

Ψ̂r =
1
n

∑n
i=1 X

r
i . By definition, the moment estimators are unbiased estimators for the

population moments. The method of moments estimators are found by equating
the first r sample moments to the corresponding population moments and solving the
resulting system of equations w.r.t. unknown parameters.

2. Examples. The following examples show how the method of moments may be used to
obtain estimators for other population parameters, by the device of equating the moments
to their expected values.

(i) Suppose that X1, X2, …, Xn are a random sample from U(0, θ).

(ii) Suppose that X1, X2, …, Xn are iid having unknown mean µ and variance σ
2.

(iii) Suppose thatX1, X2, …, Xn are a random sample from the uniform distribution U(a, b)
where a and b are unknown.

3. Method 2: The Maximum Likelihood Principle

A frequently used method, available when the joint distribution F (x,θ) and joint p.d.f./p.f.
f(x,θ) take a specified form, is to proceed as follows.

Define the likelihood function L(θ,x) = f(x, θ) – regarded as a function of θ for fixed
values of x = (x1, x2, …, xn).

The principle of maximum likelihood is to choose as the maximum likelihood estimate
(MLE) the value θ̂(x1, x2, …, xn) = θ̂(x) such that

L(θ̂(x), x) = sup{L(θ, x1, x2, …, xn): θ ∈ Θ} = sup{L(θ,x): θ ∈ Θ}

with the corresponding statistic θ̂(X) being the ML estimator.

4. Remarks. (i) An intuitive understanding of the principle is most readily seen in
the discrete case, with an approximation argument extending the idea to the continuous

9

variable case. For discrete sample variables the likelihood function is simply Pr (X1 =

x1, X2 = x2, …, Xn = xn), and, with x1, x2, …, xn fixed, the principle asserts that θ̂(x)
should be taken to maximize the probability of obtaining the sample values x1, x2, …, xn
which actually are obtained.

(ii) In most cases, especially when differentiation is to be used, it is convenient to work
with the natural logarithm of L(θ,x), that is with the function log L(θ,x) known as the
log likelihood. This is possible because the log function is strictly increasing on (0,∞),
which implies that the extrema of L(θ,x) and log L(θ,x) coincide.

(iii) If the likelihood function is differentiable (in θi), possible candidates for the MLE are
the values of (θ1, . . . , θk) that solve

∂θi
logL(θ,x) = 0, i = 1, . . . , k.

Note the the solutions of the above equations are only possible candidates for the MLE
since the first derivative being zero is only a necessary condition for a maximum, not a
sufficient condition. Furthermore, the zeroes of the first derivatives only locate extreme
points of a function in the interior of the domain of a function. If the extrema occur on
the boundary, the first derivative may not be zero.

5. Example. Suppose that the sample random variables X1, X2, …, Xn are a random
sample from the binomial B(1,θ) distribution.

6. Example. Suppose that the sample variables are a random sample from a population
distributed N(µ, σ2), where µ and σ are unknown.

7. Example. Suppose the sample variables X1, X2…, Xn are a random sample from a
population having the uniform distribution U(0, θ) with p.d.f. f(x, θ) = 1/θ for 0 < x ≤ θ, and is zero otherwise. 10