ECON6300/7320/8300 Advanced Microeconometrics Finite Mixtures
Christiern Rose 1 1University of Queensland
1/22
Review: Series Estimation
Series estimation is widely used for nonparametric analysis.
Let φ1(·), φ2(·), . . . be a sequence of (basis) functions such
that
∞
θjφj(·) j=1
approximates any function on the same domain with some
θ1,θ2,….
Then, if φ1(·), φ2(·), . . . are densities, by restricting θj to be
positive and sum to one, we have a fully flexible density
specification with infinite dimensional θ.
If φ1(·), φ2(·), . . . are not densities, by normalising
we may approximate any density.
∞
exp θj φj (·) j=1
2/22
Series Estimation
In practice, we cannot have infinitely many components. So, we often use a finite number K of components.
K determines the smoothness of the estimate like h of KDE.
If K is too small, the density estimate would not be informative.
If K is too large, the density estimate would be too noisy (bumpy).
The optimal number of components K should increase as the sample size increases.
3/22
Series Estimation
HowtochooseK? Frequentist:
often use BIC or AIC, or some data driven method such as cross-validation, or even use prior information informally.
There is no universally accepted rule for choosing K .
In any case, computation of parameter estimates is for each
given fixed K , but inference is nonparametric (slow-convergence).
Bayesian:
regard K as a latent variable (parameter), put a prior on K
with a full support N, and obtain the posterior of K as well
as other parameters using an MCMC method.
Do not choose an arbitrary K as its distribution is
determined.
4/22
Series Estimation
There are many functions that can be used as basis functions {φj (·)}
Examples include
Polynomials: Legendre polynomials, Bernstein polynomials,
etc.
Splines: piecewise linear splines, B-splines, etc.
Densities: Beta densities (Bernstein polynomials), normal
densities, Gamma densities, etc.
5/22
Series Estimation: BPD
For example, the Bernstein polynomial density of order K is given as
K
f (y |θ1 , . . . , θK ) := θj Beta(j , K − j + 1)
j=1
where θj are all positive and sum to 1 and Beta(a, b) denotes the Beta density with parameters a and b, i.e., its mean is ab/(a + b).
When K → ∞, the BPD approximates any absolutely continuous density on [0, 1]; see Petrone (1999) for Bayesian nonparametric method using BPD.
6/22
Series Estimation: BPD
The Basis functions are plotted below (a) k=3
3 2 1 0
4 3 2 1 0
6 4 2 0
The BPD is a histogram smoothing; see Petrone (1999)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 (b) k=4
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 (c) k=6
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
7/22
Series Estimation: normal mixture
Another example: normal densities can be used as basis functions.
2 ∞ 1 y − μ j f(y|{πj,μj,σj }) = πj σ φ σ
j=1 j j
where φ(·) is the PDF of N (0, 1).
The normal mixture approximates any absolutely continuous density.
8/22
Series Estimation: normal mixture
The normal mixture has been widely used with Dirichlet process prior in nonparametric Bayesian literature; Ferguson (1973), Escobar and West (1995), Walker (2006), etc.
Note that Dirichlet process can be viewed as a probability distribution over the space of density functions.
Walker (2006) developed an efficient Gibbs sampler to sample densities from the posterior.
9/22
Series Estimation → FMM
For the rest of the lecture, we consider the normal mixture
with a fixed K .
The mixture model is fairly flexible, but it is a parametric model since K is fixed as a finite number, i.e., K does not increase as N grows.
When we use a fixed K , the parametric model is called the finite mixture model (FMM).
Note that frequentiest nonparametric methods use a fixed K for estimation purpose, in which case the nonparametric estimate is numerically the same as the estimate of FMM with the same K .
However, nonparametric inference is different from parametric inference, e.g., hypothesis testing, confidence intervals, etc.
10/22
Series Estimation → FMM
It is natural to consider FMM as a convenient approximation of a nonparametric model: nonparametric analysis is more complicated.
However, FMM itself can be a reasonable specification for certain empirical problems.
Suppose the population can be partitioned into two sub-classes. Within the class, individuals are relatively homogeneous, but between the class, individuals are more heterogeneous.
11/22
Series Estimation → FMM
Consider for individual i belongs to class 1 with probability π1 ∈ (0, 1) and otherwise belongs to class 0 with probability π0 = 1 − π1.
Then, the following finite mixture may be reasonable;
2 1 1 y−μ0 1 y−μ1
It is straightforward to extend to the many class case.
f(y|{πj,μj,σj }j=0) = π0 σ φ σ +π1 σ φ σ 00 11
12/22
FMM, an example
Estimating parameters of the distribution of lengths of halibut.
13/22
FMM, an example
Some are small, but some are very large!
14/22
FMM, an example
It is known that female halibut is longer, on average, than the male and that the distribution of lengths is normal
Gender cannot be determined at measurement
Then distribution is a 2-component finite mixture of normals. Latent “types” correspond to gender
A finite mixture model allows one to estimate:
mean/variance of lengths of male and female halibut mixing probability (proportions)
Other examples: Stock returns in “typical” and “crisis” regimes, GDP growth, Insurance with “risk loving” and “risk averse”
15/22
Normal mixture with two components
16/22
Normal mixture with two components
17/22
Normal mixture with two components
18/22
FMM, covariates
Suppose we observe {(yi,xi)}Ni=1 and there are two latent classes, {0, 1} and define the membership indicator
di := 1(i belongs class 1).
Moreover, we assume
yi|xi,di ∼ N(xiβd ,σd2 )
ii If we observed di , the likelihood would be
N 1 yi−x′βd πdi φ ii
i=1 σdi σdi
whereπj =Pr(di =j)forj∈{0,1}.
We will briefly discuss about how to obtain the estimates.
19/22
FMM, generalisation
For each class, more generally, we could assume yi|xi,di ∼ f(yi|xi,θdi )
where f (yi |xi , θj ) is a parametric density function with parameter θj , j ∈ {0, 1}.
Then, if we observed di , the likelihood would be
N
πdi f(yi|xi,θdi ) i=1
A problem is that d := (d1,…,dN) is not observed.
20/22
FMM, estimation
Frequentist:
EM algorithm can be used, which iterates between
1. Computing the expectation of the log-likelihood as a function of the parameter values from the previous iteration:
Ed |data,θ(s−1) [ln L(θ)]
2. Maximizing with respect to θ to obtain θ(s).
For FMM, it is likely that the likelihood has multiple local
maxima. So, many initial values have to be considered to
be sure of a global maximum.
We will see how to estimate a FMM using Stata.
Bayesian:
Weobtaintheposteriorof(d,π1,θ0,θ1)usinganMCMC
method.
In the Bayesian framework, there is no distinction between
missing data d and parameters (π1, θ0, θ1).
Especially, the method to handle the missing data in an
MCMC method is called the Data Augmentation.
21/22
FMM, some properties
After integrating di out, the model can be written as yi|xi,π1,θ0,θ1 ∼π1f(yi|xi,θ1)+(1−π1)f(yi|xi,θ0)
Hence, the regression function is given as
E [yi |xi , π1 , θ0 , θ1 ] = π1 E [yi |xi , θ1 ] + (1 − π1 )E [yi |xi , θ0 ]
Moreover, the marginal effect is
∂ E[y|x,π ,θ ,θ ]=π ∂ E[y|x,θ ]+(1−π ) ∂ E[y|x,θ ]
∂x i i 1 0 1 1∂x i i 1 1 ∂x i i 0 iii
Extension to many classes is straightforward.
22/22