Chapter 11
Frequentist Techniques for Parameter Estimation
We use the data from Example 3.2 to motivate some of the issues which we address in this chapter.
Example 11.1. Consider the height-weight data from the 1975 World Almanac and Book of Facts [125], which we compile in Table 11.1 and plot in Figure 11.1. Based on the behavior, we consider the quadratic observation model
Copyright By PowCoder代写 加微信 powcoder
Yi =✓0 +✓1(xi/12)+✓2(xi/12)2 +”i, , i=1,…,15, (11.1) where xi is the height in inches and Yi is the corresponding weight. We denote the
vector of parameters by ✓ = [✓0 , ✓1 , ✓2 ]T and assume that the errors “i are unbiased
Height 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 (in)
Weight 115 117 120 123 126 129 132 135 139 142 146 150 154 159 164 (lbs)
Table 11.1. Height-weight data from [125].
160 150 140 130 120
58 60 62 64 66 68 70 72 Height (in)
Figure 11.1. Behavior of height-weight data from [125]. 257
MA 540 Proof
Weight (lbs)
258 Chapter 11. Frequentist Techniques for Parameter Estimation
and identically distributed with variance 2. We note that the model exhibits a nonlinear dependence on the independent variable xi but a linear dependence on the parameters ✓.
In this chapter, we address the following issues.
• Construct unbiased frequentist estimators ✓ˆ and ˆ for the parameters ✓ and error variance 2. Construct a covariance matrix V for the parameters: Sec- tion 11.1.1–11.1.3.
• Determine properties of the distribution for ✓ˆ and construct confidence inter- vals: Sections 11.1.4 and 11.1.5.
• Construct prediction intervals for heights x⇤ not used for inference: Sec- tion 11.2.
• Provide a framework for nonlinear regression with scalar observations: Sec- tion 11.3.
To provide a framework for statistical inference, we employ the additive observation model
Yi =f(⇠i,✓)+”i , i=1,…,n, (11.2)
detailed in Section 6.2.1, which relates measurements Yi to model outputs f(⇠i,✓). Here ⇠i denotes values of independent variables, such as time ti, polarization Pi, space xi or height xi in the previous example. As in previous chapters, ✓ = [✓1,…,✓p] denotes calibration parameters, which can include initial or boundary conditions. In this chapter, we delineate between random variables and realizations. We respectively denote the random and realized measurement errors by “i and ✏i and the resulting observations by Yi and yi.
The model (11.2) can be expressed in vector form as Y =f(✓)+”,
where Y = [Y1,…,Yn],f(✓) = [f(⇠1,✓),…,f(⇠n,✓)] and ” = [“1,…,”n]. Lin- early parameterized models have the form
where X denotes the n ⇥ p design matrix. We illustrate in subsequent examples how X can encapsulate various dependencies on ⇠i.
The mathematical inverse problem associated with parameter estimation can then be formulated as follows: given values of ⇠i and measurements Yi, determine ✓ in a robust manner. The associated statistical inverse problem—sometimes referred to as inverse uncertainty quantification—is to additionally quantify uncertainties associated with ✓ due to the measurement errors. The assumptions required to ap- proximate ✓ and quantify its uncertainty define frequentist and Bayesian techniques for parameter estimation.
MA 540 Proof
11.1. Linear Regression 259
As detailed in Section 4.8.1, a basic tenet of frequentist inference is the assump- tion that parameters are fixed but possibly unknown. We let ✓0 denote the true but unknown parameter values that generated the observations Y = [Y1 , . . . , Yn ]T . The deterministic nature of ✓ dictates that f(⇠i,✓) is a deterministic quantity and necessitates that we construct an estimator ✓ˆ that estimates ✓0 in a statistically reasonable manner.
We detail linear regression in Section 11.1 and nonlinear regression for the observation model (11.2) in Section 11.3. We refer readers to [37] and Section 7.3.2 of [357] for discussion of nonlinear regression with multiple responses.
11.1 Linear Regression
We focus here on models that depend linearly on the parameters. This includes the linear Helmholtz energy detailed in Example 3.1, convolution models for acoustics, and models employed in image processing and X-ray tomography. We refer readers to [157] for details regarding linear regression.
We employ the statistical observation model
Y =X✓0+”, (11.3)
where Y = [Y1,…,Yn]T and ” = [“1,…,”n]T are random vectors and the n ⇥ p design matrix X is deterministic and known. We let ✓0 denote the vector of true but unknown parameters and let y = [y1 , . . . , yn ]T denote realizations or observa- tions from an experiment with observation errors ✏ = [✏1, . . . , ✏n]. Throughout this discussion, we assume that there are more measurements than parameters so that n > p.
Example 11.2. Consider the quadratic model (11.1) which we employed in Exam- ple 11.1 to model height-weight data from [125]. Here n = 15, p = 3 and the n ⇥ p
design matrix is
1 xn /12 (xn /12)2
We note that the weight exhibits a quadratic dependence on the height but depends
linearly on the parameters.
Assumption 11.3. We assume that observation errors are unbiased and iid with
fixed but unknown variance 02; hence for j = 1, . . . , n, (i) E(“i) = 0,
2 1 x1/12 (x1/12)2 3 X = 64 . . . . . . 75 .
(11.4) For initial analysis, we make no further assumption regarding a distribution for “i.
(ii) var(“i) = 02 , cov(“i,”j) = 0 for i 6= j.
The first objective is to construct unbiased estimators ✓ˆ and ˆ2 for the un- known parameters ✓0 and 02.
MA 540 Proof
260 Chapter 11. Frequentist Techniques for Parameter Estimation
11.1.1 Parameter Estimator and Estimate
To construct an estimator ✓ˆ for ✓0, we seek ✓, which minimizes the OLS functional
J (✓) = (Y X✓)T (Y X✓). (11.5)
For scalar parameters, we would minimize (11.5) by setting the derivative with respect to ✓ equal to 0 and solving for ✓. For vector-valued problems, this is achieved using the gradient operation
(11.6) (11.7)
Remark 11.4. Throughout this chapter, we will discuss only OLS estimators and estimates. To simplify notation, we thus drop the subscript OLS and let ✓ˆ = ✓ˆOLS and ✓ = ✓OLS denote the least squares estimator and estimate.
Whereas the normal equations (11.7) provide an analytic minimum for (11.5), they are typically ill-conditioned for moderate to large numbers of parameters. In practice, it is common to instead solve the minimization problem (11.5) to avoid this numerical ill-conditioning. We discuss optimization techniques in Section 11.3.1.
11.1.2 Parameter Estimator Properties
Result 11.5. The parameter estimator ✓ˆ has the mean and covariance matrix
This yields the least squares estimator
r✓J =2[r✓(Y X✓)T][Y X✓]=0, r✓(Y X✓)T = r✓✓TXT = XT.
✓ˆOLS = (XT X) 1XT Y . ✓OLS = (XT X) 1XT y
The realization
is the least squares estimate for the unknown true parameter ✓0.
(i) E(✓ˆ) = ✓0,
(ii) V (✓ˆ) = 02(XT X) 1.
Relation (i) follows directly from (11.6) since
E(✓ˆ) = E[(XT X) 1XT Y ] = (XT X) 1XT E(Y ) = ✓0.
Hence ✓ provides an unbiased estimate for the true parameter. To establish the covariance relation, we let A = (XT X) 1XT and note that
V ( ✓ˆ ) = E [ ( ✓ˆ ✓ 0 ) ( ✓ˆ ✓ 0 ) T ]
=E[(✓0+A”