Large Sample Properties of Generalized Method of Moments Estimators Author(s): Hansen
Source: Econometrica, Vol. 50, No. 4 (Jul., 1982), pp. 1029-1054 Published by: The Econometric Society
Stable URL: https://www.jstor.org/stable/1912775 Accessed: 10-02-2020 23:42 UTC
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact
Copyright By PowCoder代写 加微信 powcoder
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at https://about.jstor.org/terms
The Econometric Society is collaborating with JSTOR to digitize, preserve and extend access to Econometrica
This content downloaded from 128.220.159.65 on Mon, 10 Feb 2020 23:42:47 UTC All use subject to https://about.jstor.org/terms
Econometrica, Vol. 50, No. 4 (July, 1982)
LARGE SAMPLE PROPERTIES OF GENERALIZED METHOD OF MOMENTS ESTIMATORS1
BY LARS PETER HANSEN
This paper studies estimators that make sample analogues of population orthogonality conditions close to zero. Strong consistency and asymptotic normality of such estimators is established under the assumption that the observable variables are stationary and ergodic. Since many linear and nonlinear econometric estimators reside within the class of estima- tors studied in this paper, a convenient summary of the large sample properties of these estimators, including some whose large sample properties have not heretofore been discussed, is provided.
1. INTRODUCTION
IN THIS PAPER we study the large sample properties of a class of generalized method of moments (GMM) estimators which subsumes many standard econo- metric estimators. To motivate this class, consider an econometric model whose parameter vector we wish to estimate. The model implies a family of orthogonal- ity conditions that embed any economic theoretical restrictions that we wish to impose or test. For example, assumptions that certain equations define projec- tions or that particular variables are predetermined give rise to orthogonality conditions in which expected cross products of unobservable disturbances and functions of observable variables are equated to zero. Heuristically, identification requires at least as many orthogonality conditions as there are coordinates in the parameter vector to be estimated. The unobservable disturbances in the orthogo- nality conditions can be replaced by an equivalent expression involving the true parameter vector and the observed variables. Using the method of moments, sample estimates of the expected cross products can be computed for any element in an admissible parameter space. A GMM estimator of the true parameter vector is obtained by finding the element of the parameter space that sets linear combinations of the sample cross products as close to zero as possible.
In studying strong consistency of GMM estimators, we show how to construct a class of criterion functions with minimizers that converge almost surely to the true parameter vector. The resulting estimators have the interpretation of making the sample versions of the population orthogonality conditions as close as possible to zero according to some metric or measure of distance. We use the metric to index the alternative estimators. This class of estimators includes the nonlinear instrumental variables estimators considered by, among others, Amemiya [1, 2], Jorgenson and Laffont [24], and Gallant [11].2 There the
‘The author acknowledges helpful comments by , , V. , , , , , , , and an anonymous referee. Special thanks are given to who played a prominent role in the formulation of this paper.
2We include versions of two- and three-stage least squares under the heading of instrumental variables procedures.
This content downloaded from 128.220.159.65 on Mon, 10 Feb 2020 23:42:47 UTC All use subject to https://about.jstor.org/terms
1030 LARS PETER HANSEN
population orthogonality conditions equate expected cross products of instru- ments and serially independent disturbances to zero. In our treatment we work directly with expressions for the population orthogonality conditions and implic- itly permit the disturbance terms used in construction of the orthogonality conditions to be both serially correlated and conditionally heteroskedastic.3 We allow ourselves flexibility in choosing the distance measure because it permits choosing measures that are computationally convenient and because the choice of distance measure influences the asymptotic distribution of the resulting esti- mator.
In studying asymptotic normality, we view estimation in a different but closely related fashion. We follow Sargan [29, 30] and consider estimators that have the interpretation of setting linear combinations of the sample orthogonality condi- tions to zero, at least asymptotically, where the number of linear combinations that are set to zero is equal to the number of coordinates in the parameter vector to be estimated. We index alternative estimators by an associated weighting matrix that selects the particular linear combinations of orthogonality conditions that are used in estimation. Since alternative weighting matrices give rise to estimators with alternative asymptotic covariance matrices, we describe how to obtain an asymptotically optimal weighting matrix. The estimators considered in our treatment of consistency are shown to reside in the class of estimators considered in our treatment of asymptotic normality by examining the first-order conditions of minimization problems used to construct the class of consistent estimators. It turns out, however, that our discussion of asymptotic normality is sufficiently general to include other consistent estimators that are obtained from
minimizing or maximizing other criterion functions which have first-order condi- tions that satisfy the specification of our generic GMM estimator, e.g., least squares or quasi-maximum likelihood estimators. Again our discussion of large sample properties permits the disturbances implicitly used in the orthogonality conditions to be both conditionally heteroskedastic and serially correlated.4
There are a variety of applications in which it is important to possess an asymptotic theory which accommodates these features. In testing market effi- ciency and the rationality of observed forecasts using least squares procedures, one oftentimes encounters situations in which the implied forecast interval
3Sargan [30] treats the case in which disturbances can follow a low-order autoregression and can be filtered to remove serial correlation prior to the construction of the orthogonality conditions. White [34] discusses linear instrumental variables estimation in which observation vectors are independent but not necessarily identically distributed. White allows heteroskedasticity to exist both conditionally and unconditionally, but places restrictions on higher moments of observable and unobservable variables that are not needed in this paper. Here we think of heteroskedasticity emerging because of some implicit conditioning, do not impose independence, but maintain a stationarity assumption.
4Engle [9] allows for conditional heteroskedasticity in regression models with serially uncorrelated disturbances. He proposes a maximum likelihood procedure for estimating such models when the form of the heteroskedasticity is specified a priori. White [32, 33, 34] has studied the asymptotic distribution of a variety of estimators for cross-sectional models which allow for both conditional and unconditional forms of heteroskedasticity. See Footnote 3.
This content downloaded from 128.220.159.65 on Mon, 10 Feb 2020 23:42:47 UTC All use subject to https://about.jstor.org/terms
LARGE SAMPLE PROPERTIES 1031
exceeds the sampling interval giving rise to a serially correlated forecast error [4, 14, 17]. Least squares procedures can be used since the hypothetical forecast error should be orthogonal to the observed forecast and to any other variables in the information set of economic agents when the forecast is made. On the other hand, generalized least squares procedures can result in inconsistent parameter estimators (see Sims [31] and Hansen and Hodrick [17]). Brown and Maital [4], Hansen and Hodrick [17], and Hakkio [14] rely on the asymptotic distribution theory in this paper to carry out least squares estimation and inference for such
Hansen and Sargent [18, 19] have considered linear rational expectations
models in which economic agents are assumed to forecast infinite geometrically- declining sums of forcing variables and the econometrician employs only a subset of the variables in the information set of economic agents. The distur- bance terms in these models are serially correlated but orthogonal to current and past values of a subset of variables which are not strictly exogenous. Hansen and Sargent [18, 19] discuss how to apply the techniques developed in this paper to those rational expectations models. McCallum [28] has shown how other types of linear rational expectations models with disturbance terms that have low-order autoregressive representations lead to equations that can be estimated consis- tently using standard instrumental variables procedures. He notes, however, that the associated asymptotic distribution of the estimations has to be modified in the manner suggested in this paper to allow the disturbances to be serially correlated. In considering models like those studied by McCallum [28], Cumby, Huizinga, and Obstfeld [5] propose a two-step, two-stage least squares estimator that resides within the class of estimators examined in this paper.5
Hansen and Singleton [20] have studied how to test restrictions and estimate parameters in a class of nonlinear rational expectations models. They construct generalized instrumental variables estimators from nonlinear stochastic Euler equations and note that the implied disturbance terms in these models are conditionally heteroskedastic and in many circumstances serially correlated. Their estimators are special cases of the generic GMM estimator of this paper. Finally, Avery, Hansen, and Hotz [3] describe how to use methods in this paper to obtain computationally convenient procedures for estimating multiperiod probit models. The vector disturbance term implicit in their orthogonality condi- tions also is conditionally heteroskedastic.
In the examples described above, application of the techniques in this paper will not result in asymptotically efficient estimators. However, in these and other examples, a researcher may be willing to sacrifice asymptotic efficiency in exchange for not having to specify completely the nature of the serial correlation and/or heteroskedasticity or in exchange for computationally simpler estimation strategies. As noted above, we do provide a more limited optimality discussion
5Cumby, Huizinga, and Obstfeld [5] proposed their estimator independently of this paper. However, their discussion of its asymptotic distribution exploited results in a precursor to this paper written by the author.
This content downloaded from 128.220.159.65 on Mon, 10 Feb 2020 23:42:47 UTC All use subject to https://about.jstor.org/terms
1032 LARS PETER HANSEN
that is patterned after an approach taken by Sargan [29, 30] and can be easily exploited in practice.
The organization of the paper is as follows. The second section provides some consistency results for the GMM estimator under various assumptions about the form of the econometric model. The third section discusses the asymptotic distribution of the GMM estimator and considers the construction of an asymp- totically optimal estimator among the class of estimators that exploit the same orthogonality conditions. The fourth section examines procedures for testing overidentifying restrictions using GMM estimation. Finally, the fifth section contains some concluding remarks.
2. CONSISTENCY OF THE GMM ESTIMATOR
In this section we specify our first form of the GMM estimator and provide
some sufficient conditions that insure its almost sure convergence to the parame-
ter vector that is being estimated. Let Q denote the set of sample points in the
underlying probability space used in our estimation problem, and let E denote
the associated expectations operator. We will be working with a p component
stochastic process { x : n > 1 } defined on this probability space. A finite segment
of one realization of this process, i.e., 1 < n < N} for sample size N and
for some wo E S2, can be thought of as the observable data series that the econometrician employs.
ASSUMPTION 2.1: {x,,: 1 < n} is stationary and ergodic.
We introduce a parameter space S that is a subset of Rq (or its compactifica- tion) and let Po be the element of S that we wish to estimate.
ASSUMPTION 2.2: (S, a) is a separable metric space.
One possibility is to use the standard absolute value norm on R q to define a. It
is well known that since S is a subset of R q the resulting metric space is separable. We do not restrict ourselves to this metric in order to allow for S to b a subset of a compactification of R q.
We consider a function f: RP x S -- R r where R is the real line and r is greater than or equal to q.
ASSUMPTION 2.3: f(., fi) is Borel measurable for each /8 in S and f(x, ) is continuous on S for each x in RP.
The function f provides an expression for the r orthogonality conditions that emerge from the econometric model in the sense indicated by Assumption 2.4.
ASSUMPTION 2.4: Ef(x1, /3) exists and is finite for all /B E S and Ef(xI, /80) =0.
This content downloaded from 128.220.159.65 on Mon, 10 Feb 2020 23:42:47 UTC All use subject to https://about.jstor.org/terms
LARGE SAMPLE PROPERTIES 1033
A common way to obtain orthogonality conditions is to exploit the assumption that disturbances in an econometric model are orthogonal to functions of a set of variables that the econometrician observes. For example, suppose that the econometric model is given by
un= F(Xn,io0) n = G(xnq i30)
(2) E[un 0 zn]=O.
The vector functions F and G are specified a priori, un is an uno of disturbance terms, zn is a vector of instrumental variable
the Kronecker product. The dependence of G on its second argument is often- times trivial. When (2) is satisfied, we can let the function f be given by
(3) f(xn, go) = F(Xnq /0) 0 G(Xn, o), and it follows that
E[ f(xn 13o) 0.
We proceed to describe how to use orthogonality conditions to construct an
estimator of the unknown parameter vector go.
For our discussion of consistency, we introduce a sequence of random weight-
ing matrices { aN: N > 1 } that are dimensioned s by r where q < s < r. The matrices are random in order to allow for their possible dependence on sample information.
ASSUMPTION 2.5: The sequence of random matrices { aN: N > 1} converges almost surely to a constant matrix ao.6
These weighting matrices are used in conjunction with a method of moments estimator of E[fi(xn, ,B)] to obtain a sample objective function whose minimiz is our estimator of flo. Let
fn (w ) =J[xn (W) N
gN(‘9)N E fn (@ : )9
hN (o, /) = aN (i) gN ( A)’
BN(o) = { E S: IhN(o )12 = inf IhN (. f)I2}
6This matrix convergence is defined as element by element convergence using the absolute value norm on R.
This content downloaded from 128.220.159.65 on Mon, 10 Feb 2020 23:42:47 UTC All use subject to https://about.jstor.org/terms
1034 LARS PETER HANSEN
The random function gN(/3) is just the method of moments estimator of
E [f(xn, /3)], IhN 2 is the sample criterion function to be used in estimation, and BN is the (random) set of elements in the parameter space S that minimize IhN 2. The weighting matrices { aN : N > 1 } can be thought of as defining the metric by which the sample orthogonality conditions gN(bN) are made as close as possible
To estimate g3o we choose an element out of BN. More precisely, we employ the following definition.
DEFINITION 2.1: The GMM estimator {bN b N> 1 } is a sequence of random vectors such that bN (O) E BN (O) for N > N*(co) where N*(co) is less than infinity for almost all X in 0.7
The nonlinear instrumental variables estimators discussed by Amemiya [1], Jorgenson and Laffont [24], and Gallant [11] are defined in this manner for appropriate choices of aN. Their instrumental variables estimators assume that
the function f satisfies (1)-(3) and in addition that the disturbances are serially independent. They use consistent estimators of E [u U4] and E [zzJ] to construct an estimator of ao where
aoao= {E[ unun] E[Znzn] }1.8
In preparation for our first consistency theory, we introduce the notation
ho(3) = aoE[f1(W, 13)],
1 (W, /3,8) = sup(tfi(o, /3A)-fl(,a)t a E S,a(,a) < 86. The following definition is needed for our consistency results.
DEFINITION 2.2: The random function f1 is kth moment continuous at /3 if lim,50E [ k(C 8 6)] =0.9
Since { Xn: n > 1 } is stationary, it follows that if fi is kth moment continuous, then fn is kth moment continuous for all n. Notice that kth moment continuity is
joint property of the function f and the stochastic process { Xn: n > 1). An
7In this definition we have imposed the requirement that the sequence of functions { bN: N > 1} be measurable. Alternatively, we could follow a suggestion of Huber [23] and not necessarily require that the functions be measurable and establish almost sure convergence in terms of outer probability.
8Amemiya [1], Jorgenson and Laffont [24], and Gallant [11] do not require that the instrumental variables be stationary and ergodic but instead require that the appropriate moment matrices converge. Stationarity and ergodicity coupled with finite expectations are sufficient conditions for these moment matrices to converge almost surely. Amemiya [2] adopts a more general representation
of the orthogonality conditions than (3) to allow different disturbances to be paired with different sets of instruments.
9The function E((*, k3, 8) is Borel measurable under Assumptions 2.2 and 2.3. In the case in which k = 1, DeGroot [6] refers to first moment continuity as supercontinuity.
This content downloaded from 128.220.159.65 on Mon, 10 Feb 2020 23:42:47 UTC All use subject to https://about.jstor.org/terms
LARGE SAMPLE PROPERTIES 1035
alternative characterization of kth moment continuity is established in Lemma 2.1.
LEMMA 2.1: Under Assumption 2.3, if there exists a 8 > 0 such that E [E 3)] < + oo, then fi is kth moment continuous at /.
Using this lemma, it is apparent that kth moment continuity is implied if the random function fi is dominated locally by a random variable with a finite kth moment. DeGroot [6, p. 206] proved Lemma 2.1 for k and q equal to one, and the extension to larger specifications of k and q is immediate.
One other lemma is of use in verifying first moment continuity in the case where the function f satisfies relation (3).
LEMMA 2.2: Suppose (i) F1 and G1 are second moment continuous at ,l; (ii) F1(, /l3) and G1(, /l3) have finite second moments. Then f1 = F1 0 G1 is first moment continuous at /3.10
Lemma 2.2 may be useful in establishing that fi is first moment continuous at /3 when the orthogonality conditions are of the form (3).
We now consider our first consistency theorem for the GMM estimator.
THEOREM 2.1: Suppose Assumptions 2.1-2.5 are satisfied. If (i) f1 is first moment continuous for all ,l Ee S; (ii) S is compact; (iii) ho( ,l) has a unique zero at /0; then a GMM estimator { bN: N > 1 } exists and converges almost surely to /l30.
Condition (iii) of this theorem is the parameter identification requirement that
the population orthogonali
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com