留学生作业代写 MATH3821: Page 79 / 756

Slide 2.1 / 2.104
2 General inference problem 2.1 Measurement precision 2.2 Statistical Models
2.3 Inference problem
2.4 Goals in Statistical Inference

Copyright By PowCoder代写 加微信 powcoder

2.5 Statistical decision theoretic approach to inference
Chapter 2 : General inference problem MATH3821: Page 79 / 756

2.1 Measurement precision Slide 2.2 / 2.104
2.1 Measurement precision
We studied different probability models because they can be used to describe the population of interest. Finding such a good model is our final goal. On the way towards this goal, we use the data to identify the parameters that are used to describe the models.
We stress that the statistical inference problem arises precisely because we do not know the exact value of the parameter in the model description and we use the data to work out a proxy for the parameter.
The statistician is confronted with the problem of drawing a conclusion about the population by using the limited information from the dataset.
Chapter 2 : General inference problem MATH3821: Page 80 / 756

2.1 Measurement precision Slide 2.3 / 2.104
The purpose in Statistical Inference is to draw conclusions from data.
The conclusions might be about predicting further outcomes, evalu- ating risks of events, testing hypotheses, among others.
In all cases, inference about the population is to be drawn from limited information contained in the sample.
Chapter 2 : General inference problem MATH3821: Page 81 / 756

2.1 Measurement precision Slide 2.4 / 2.104
The most common situation in Statistics:
an experiment has been performed (i.i.d.);
the possible results are real numbers that form a vector of obser-
vations x=(x1, x2, . . . , xn);
the appropriate sample space is Rn;
there is typically a ”hidden”mechanism that generates the data – we are looking for ways to identify it.
Chapter 2 : General inference problem MATH3821: Page 82 / 756

2.1 Measurement precision Slide 2.5 / 2.104
Models will describe this mechanism in some simplistic but hopefully useful way.
For the model to be more trustworthy, continuous variables, such as time, interval measurements, etc. should be treated as such, where feasible. However, in practice, only discrete events can actually be observed.
Thus we record with some unit of measurement, ∆, determined by the precision of the measuring instrument. This unit of measurement is always finite in any real situation.
Chapter 2 : General inference problem MATH3821: Page 83 / 756

2.1 Measurement precision Slide 2.6 / 2.104
If empirical observations were truly continuous, then, with probability one, no two observed responses would ever be identical. This fact will sometimes be used in our theoretical derivations.
On the other hand, the real life empirical observations are discrete. This fact will be utilized by us to keep some of the proofs simpler. In many cases we will be dealing with the discrete case only, thus avoiding more involved measure-theoretic arguments.
Chapter 2 : General inference problem MATH3821: Page 84 / 756

2.2 Statistical Models Slide 2.7 / 2.104
2.2 Statistical Models
Having got the observations we would like to calculate the joint density (in the continuous case):
LX(x) = fX(x1,x2,…,xn) = fX1(x1).fX2(x2)… fXn(xn) (1) In the discrete case this will be just the product of the probabilities
for each of the measurements to be in a suitable interval of length ∆. If the observations were independent identically distributed (i.i.d.)
then all densities in (1) would be the same:
fX1(x) = fX2(x) = ··· = fXn(x) = f(x).
This is the most typical situation we will be discussing in our course.
Chapter 2 : General inference problem MATH3821: Page 85 / 756

2.2 Statistical Models Slide 2.8 / 2.104
The need of Statistical Inference arises since typically, our knowledge about fX(x1,x2,…,xn) is incomplete.
Given an inference problem and having collected some data, we construct one or more set of possible models which may help us to understand the data generating mechanism.
Basically, statistical models are working assumptions about how the dataset was obtained.
Chapter 2 : General inference problem MATH3821: Page 86 / 756

2.2 Statistical Models Slide 2.9 / 2.104
Example 2.9
If our data were counts of accidents within n = 10 consecutive weeks on a busy crossroad, it may be reasonable to assume that a Poisson distribution with an unknown parameter λ has given rise to the data. That is, we may assume that we have 10 independent realisations of a Poisson(λ) random variable.
Example 2.10
If, on the other hand, we measured the lengths Xi of 20 baby boys at age 4 weeks, it would be reasonable to assume normal distribution for these data. Symbolically we denote this as follows:
Xi ∼ N(μ,σ2), i = 1,2,…,20. Chapter 2 : General inference problem
MATH3821: Page 87 / 756

2.2 Statistical Models Slide 2.10 / 2.104
The models we use, as seen in the examples above, are usually about the shape of the density or of the cumulative distribution function of the population from which we have sampled.
These models should represent, as much as possible, the available prior theoretical knowledge about the data generating mechanism.
It should be noted that in most cases, we do not exactly know which population distribution to assume for our model.
Chapter 2 : General inference problem MATH3821: Page 88 / 756

2.2 Statistical Models Slide 2.11 / 2.104
Suggesting the set of models to be validated, is a difficult matter and there is always a risk involved in this choice.
The reason is that if the contemplated set of models is“too large”, many of them will be similar and it will be difficult to single out the model that is best supported by the data.
On the other hand, if the contemplated set of models is“too small”, there exists the risk that none of them gives an adequate description of the data.
Choosing the most appropriate model usually involves a close collab- oration between the statistician and the people who formulated the inference problem.
Chapter 2 : General inference problem MATH3821: Page 89 / 756

2.2 Statistical Models Slide 2.12 / 2.104
In general, we can view the statistical model as the triplet (X, P, Θ) where:
X is the sample space (i.e.. the set of all possible realizations X = (X1,X2,…,Xn)
P is a family of model functions Pθ(X) that depend on the unknown parameter θ;
Θ is the set of possible θ-values, i.e. the parameter space indexing the models.
Chapter 2 : General inference problem MATH3821: Page 90 / 756

2.3 Inference problem Slide 2.13 / 2.104
2.3 Inference problem
The statistical inference problem can be formulated:
Once the random vector X has been observed, what can be said about which members of P best describe how it was generated?
The reason we are speaking about a problem here is that we do not know the exact shape of the distribution that generated the data.
The reason that there exists a possibility of making inference rests in the fact that typically a given observation is much more probable under some distributions than under others (i.e. the observations give information about the distribution).
This information should be combined with the a priori information about the distribution to do the inference. Always there is some a priori information. It could be more or less specific.
Chapter 2 : General inference problem MATH3821: Page 91 / 756

2.3 Inference problem Slide 2.14 / 2.104
Parametric Inference
When it is specific to such an extent that the shape of the distribution is known up to some finite number of parameters i.e. the parameter θ is finite-dimensional, we have to conduct parametric inference.
Most of the classical statistical inference techniques are based on fairly specific assumptions regarding the population distribution and most typically the description of the population is in a parametric form.
In introductory textbooks, the student just practices applying stan- dard parametric techniques. However, to be successful in practical statistical analysis, one has to be able to deal with situations where standard parametric assumptions are not justified.
Chapter 2 : General inference problem MATH3821: Page 92 / 756

2.3 Inference problem Slide 2.15 / 2.104
Non-parametric Inference
A whole set of methods and techniques is available that may be classified as nonparametric procedures. We will be dealing with them in the later parts of the course.
These procedures allow us to make inferences without or with a very limited amount of assumptions regarding the functional form of the underlying population distribution.
If Θ could only be specified as a infinite dimensional function space, we speak about non-parametric inference.
Chapter 2 : General inference problem MATH3821: Page 93 / 756

2.3 Inference problem Slide 2.16 / 2.104
Nonparametric inference procedures are applicable in more general situations (which is good). However, if they are applied to a situation where a particular parametric distributional shape indeed holds, the nonparametric procedures may not be as efficient as compared to a procedure specifically tailored for the corresponding parametric case which would be bad if the specific parametric model indeed holds.
Chapter 2 : General inference problem MATH3821: Page 94 / 756

2.3 Inference problem Slide 2.17 / 2.104
Robustness approach
The situation, in practice, might be even more blurred. We may know
that the population is“close”to parametrically describable and yet “deviates a bit”from the parametric family.
Going over in such cases directly to a purely nonparametric approach would not properly address the situation since the idea about a relatively small deviation from the baseline parametric family will be lost. Hence we can use the robustness approach where we still keep the idea about the“ideal”parametric model but allow for small deviations from it.
The aim is in such“intermediate”situations to be“close to efficient”if the parametric model holds but at the same time to be“less sensitive” to small deviations from the ideal model. These important issues will
be discussed later in the course.
Chapter 2 : General inference problem MATH3821: Page 95 / 756

2.3 Inference problem Slide 2.18 / 2.104
Illustration of robustness
Chapter 2 : General inference problem MATH3821: Page 96 / 756

2.3 Inference problem Slide 2.19 / 2.104
Consequences of applying robustness approach
Chapter 2 : General inference problem MATH3821: Page 97 / 756

2.3 Inference problem Slide 2.20 / 2.104
Bayesian Inference
Another way to classify the Statistical Inference procedures is by the way we treat the unknown parameter θ.
If we treat it as unknown but deterministic then we are in a Non-Bayesian setting. If we consider the set of θ-values as quantities that before collecting the data, have different probabilities of occurring according to some (a priori) distribution, then we are speaking about Bayesian inference.
Bayesian approach allows us to introduce and utilise any additional (prior) information (when such information is available). This infor- mation is entered through the prior distribution over the set Θ of parameter values and reflects our prior belief about how likely any of the parameter values is before obtaining the information from the data.
Chapter 2 : General inference problem MATH3821: Page 98 / 756

2.4 Goals in Statistical Inference
Slide 2.21 / 2.104
2.4 Goals in Statistical Inference
Following are the most common goals in inference:
Estimation
Confidence set construction Hypothesis testing
Chapter 2 : General inference problem
MATH3821: Page 99 / 756

2.4 Goals in Statistical Inference Slide 2.22 / 2.104
2.4.1 Estimation
We want to calculate a number (or a k-dimensional vector, or a single function) as an approximation to the numerical characteristic in question.
But let us point out immediately that there is little value in calculating an approximation to an unknown quantity without having an idea of how“good”the approximation is and how it compares with other approximations. Hence, immediately questions about confidence interval (or, more generally, confidence set) construction arise.
To quote the famous statistician A.N. Whitehead, in Statistics we always have to ”seek simplicity and distrust it”.
Chapter 2 : General inference problem MATH3821: Page 100 / 756

2.4 Goals in Statistical Inference Slide 2.23 / 2.104
2.4.2 Confidence set construction
After the observations are collected, further information about the set Θ is added and it becomes plausible that the true distribution belongs to a smaller family than it was originally postulated, i.e., it becomes clear that the unknown θ-value belongs to a subset of Θ.
The problem of confidence set construction arises: i.e., determining a (possibly small) plausible set of θ-values and clarifying the sense in which the set is plausible.
Chapter 2 : General inference problem MATH3821: Page 101 / 756

2.4 Goals in Statistical Inference Slide 2.24 / 2.104
2.4.3 Hypothesis testing
An experimenter or a statistician sometimes has a theory which when suitably translated into mathematical language becomes a statement that the true unknown distribution belongs to a smaller family than the originally postulated one.
One would like to formulate this theory in the form of a hypothesis. The data can be used then to infer whether or not his theory complies with the observations or is in such a serious disarray that would indicate that the hypothesis is false.
Chapter 2 : General inference problem MATH3821: Page 102 / 756

2.4 Goals in Statistical Inference Slide 2.25 / 2.104
Deeper insight in all of the above goals of inference and deeper understanding of the nature of problems involved in them is given by Statistical Decision Theory.
Here we define in general terms what a statistical decision rule is and it turns out that any of the procedures discussed above can be viewed as a suitably defined decision rule.
Moreover, defining optimal decision rules as solutions to suitably formulated constrained mathematical optimization problems will help us to find “best” decision rules in many practically relevant situations.
Chapter 2 : General inference problem MATH3821: Page 103 / 756

2.5 Statistical decision theoretic approach to inference Slide 2.26 / 2.104
2.5 Statistical decision theoretic approach to inference
Statistical Decision Theory studies all inference problems (estimation, confidence set construction, hypothesis testing) from a unified point of view.
All parts of the decision making process are formally defined, a desired optimality criterion is formulated and a decision is considered optimal if it optimizes the criterion.
Chapter 2 : General inference problem MATH3821: Page 104 / 756

2.5 Statistical decision theoretic approach to inference Slide 2.27 / 2.104
2.5.1 Introduction
Statistical Decision Theory may be considered as the theory of a two-person game with one player being the statistician and the other one being the nature. To specify the game, we define:
Θ-set of states (of nature);
A- set of actions (available to the statistician); L(θ, a) – real-valued function (loss) on Θ × A
Chapter 2 : General inference problem MATH3821: Page 105 / 756

2.5 Statistical decision theoretic approach to inference Slide 2.28 / 2.104
There are some important differences between mathematical theory of games (that only involves the above triplet) and Statistical Decision Theory. The most important differences are:
In a two-person game both players are trying to maximize their winnings (or to minimize their losses), whereas in decision theory nature chooses a state without this view in mind. Nature can not be considered an ”intelligent opponent”who would behave ”rationally”.
There is no complete information available (to the statistician) about nature’s choice.
Chapter 2 : General inference problem MATH3821: Page 106 / 756

2.5 Statistical decision theoretic approach to inference Slide 2.29 / 2.104
In Statistical Decision Theory nature always has the first move in choosing the ”true state”θ.
The statistician has the chance (and this is most important) to gather partial information on nature’s choice by sampling or performing an experiment. This gives the statistician data X = (X1, X2, . . . , Xn) that has a distribution L(X|θ) depending on θ. This is used by the statistician to work out their decision.
Chapter 2 : General inference problem MATH3821: Page 107 / 756

2.5 Statistical decision theoretic approach to inference Slide 2.30 / 2.104
Definition 2.1
A (deterministic) decision function is a function d : X → A from the sample space to the set of actions.
There is a non-negative loss (a random variable) L(θ, d(X)) incurred by this action.
We define the risk
Eθ L(θ, d(X)) = R(θ, d).
For a fixed decision, this is a function (risk function) depending on θ. R(θ, d) is the average loss of the statistician when the nature has a true state θ and the statistician uses decision d.
Chapter 2 : General inference problem MATH3821: Page 108 / 756

2.5 Statistical decision theoretic approach to inference Slide 2.31 / 2.104
2.5.2 Examples
Example 2.11 (Hypothesis testing)
Assume that a data vector X ∼ f (X, θ). Consider testing H0 : θ ≤ θ0 versus H1 : θ > θ0 where θ ∈ R1 is a parameter of interest.
Let A = {a1,a2}, Θ = R1. Here a1 denotes the action ”accept H0” whereas a2 is the action ”Reject H0.”. Let
D = {Set of all functions from X into A}. 
1 ifθ>θ0,
L(θ,a1)=0 ifθ≤θ0 
0 ifθ>θ0,
L(θ,a2)= 1 ifθ≤θ . 0
Chapter 2 : General inference problem
MATH3821: Page 109 / 756

2.5 Statistical decision theoretic approach to inference Slide 2.32 / 2.104
Then we have
R(θ, d) = EL(θ, d(X))
= L(θ, a1)Pθ(d(X) = a1) + L(θ, a2)Pθ(d(X) = a2) 
= Pθ(d(X)=a2) ifθ≤θ0.
if θ ≤ θ0 : R(θ,d) = Pθ(reject H0) = Error of I type,
if θ > θ0 : R(θ,d) = Pθ(accept H0) = Error of II type.
 Pθ(d(X)=a1) ifθ>θ0,
Chapter 2 : General inference problem MATH3821: Page 110 / 756

2.5 Statistical decision theoretic approach to inference Slide 2.33 / 2.104
Example 2.12 (Estimation)
Let now A = Θ with the interpretation that each action corresponds to selecting a point θ ∈ Θ. Every d(X) maps X into Θ and if we chose
L(θ, d(X)) = (θ − d(X))2 (quadratic loss)
then the decision rule d (which we can call estimator) has a risk
R(θ,d)=Eθ(d(X)−θ)2 =MSEθ(d(X)).
Chapter 2 : General inference problem MATH3821: Page 111 / 756

2.5 Statistical decision theoretic approach to inference Slide 2.34 / 2.104
2.5.3 Randomized decision rule
We will see later when studying optimality in hypothesis testing context that the set of deterministic decision rules D is not convex and it is difficult to develop a decent mathematical optimization theory over it.
This set is also very small and examples show that very often a simple randomization of given deterministic rules gives better rules in the sense of risk minimization. This explains the reason for the introduction of the randomized decision rules.
Chapter 2 : General inference problem MATH3821: Page 112 / 756

2.5 Statistical decision theoretic approach to inference Slide 2.35 / 2.104
Definition 2.2
A rule δ which chooses di with probability wi, 􏰐 wi = 1, is a random- ized decision rule.
For the randomized decision rule δ we have: 􏰔􏰔
L(θ, δ(X)) = wiL(θ, di(X)) and R(θ, δ) = wiR(θ, di) The set of all randomized decision rules generated by the set D in the
above way will be denoted by D.
Chapter 2 : General inference problem MATH3821: Page 113 / 756

2.5 Statistical decision theoretic approach to inference Slide 2.36 / 2.104
2.5.4 Optimal decision rules
Given a game (Θ, A, L) and a random vector X whose distribution depends on θ ∈ Θ what (randomized) decision rule δ should the statistician choose to perform“optimally”?
This is a question that is easy to pose but usually difficult to answer.
The reason is that usually uniformly best

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com