程序代写代做代考 Bayesian flex algorithm ECON6300/7320/8300 Advanced Microeconometrics Finite Mixtures

ECON6300/7320/8300 Advanced Microeconometrics Finite Mixtures
Christiern Rose 1 1University of Queensland

Review: Series Estimation
􏰉 Series estimation is widely used for nonparametric analysis.
􏰉 Let φ1(·), φ2(·), . . . be a sequence of (basis) functions such

􏰁θjφj(·) j=1
approximates any function on the same domain with some
􏰉 Then, if φ1(·), φ2(·), . . . are densities, by restricting θj to be
positive and sum to one, we have a fully flexible density
specification with infinite dimensional θ.
􏰉 If φ1(·), φ2(·), . . . are not densities, by normalising
we may approximate any density.
∞ 
exp 􏰁 θj φj (·) j=1

Series Estimation
􏰉 In practice, we cannot have infinitely many components. So, we often use a finite number K of components.
􏰉 K determines the smoothness of the estimate like h of KDE.
􏰉 If K is too small, the density estimate would not be informative.
If K is too large, the density estimate would be too noisy (bumpy).
􏰉 The optimal number of components K should increase as the sample size increases.

Series Estimation
􏰉 HowtochooseK? 􏰉 Frequentist:
􏰉 often use BIC or AIC, or some data driven method such as cross-validation, or even use prior information informally.
􏰉 There is no universally accepted rule for choosing K .
􏰉 In any case, computation of parameter estimates is for each
given fixed K , but inference is nonparametric (slow-convergence).
􏰉 Bayesian:
􏰉 regard K as a latent variable (parameter), put a prior on K
with a full support N, and obtain the posterior of K as well
as other parameters using an MCMC method.
􏰉 Do not choose an arbitrary K as its distribution is

Series Estimation
􏰉 There are many functions that can be used as basis functions {φj (·)}
􏰉 Examples include
􏰉 Polynomials: Legendre polynomials, Bernstein polynomials,
􏰉 Splines: piecewise linear splines, B-splines, etc.
􏰉 Densities: Beta densities (Bernstein polynomials), normal
densities, Gamma densities, etc.

Series Estimation: BPD
􏰉 For example, the Bernstein polynomial density of order K is given as
f (y |θ1 , . . . , θK ) := 􏰁 θj Beta(j , K − j + 1)
where θj are all positive and sum to 1 and Beta(a, b) denotes the Beta density with parameters a and b, i.e., its mean is ab/(a + b).
􏰉 When K → ∞, the BPD approximates any absolutely continuous density on [0, 1]; see Petrone (1999) for Bayesian nonparametric method using BPD.

Series Estimation: BPD
􏰉 The Basis functions are plotted below (a) k=3
3 2 1 0
4 3 2 1 0
6 4 2 0
􏰉 The BPD is a histogram smoothing; see Petrone (1999)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 (b) k=4
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 (c) k=6
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Series Estimation: normal mixture
􏰉 Another example: normal densities can be used as basis functions.
2 􏰁∞ 􏰖 1 􏰑 y − μ j 􏰒 􏰤 f(y|{πj,μj,σj }) = πj σ φ σ
j=1 j j
where φ(·) is the PDF of N (0, 1).
􏰉 The normal mixture approximates any absolutely continuous density.

Series Estimation: normal mixture
􏰉 The normal mixture has been widely used with Dirichlet process prior in nonparametric Bayesian literature; Ferguson (1973), Escobar and West (1995), Walker (2006), etc.
􏰉 Note that Dirichlet process can be viewed as a probability distribution over the space of density functions.
􏰉 Walker (2006) developed an efficient Gibbs sampler to sample densities from the posterior.

Series Estimation → FMM
􏰉 For the rest of the lecture, we consider the normal mixture
with a fixed K .
􏰉 The mixture model is fairly flexible, but it is a parametric model since K is fixed as a finite number, i.e., K does not increase as N grows.
􏰉 When we use a fixed K , the parametric model is called the finite mixture model (FMM).
􏰉 Note that frequentiest nonparametric methods use a fixed K for estimation purpose, in which case the nonparametric estimate is numerically the same as the estimate of FMM with the same K .
􏰉 However, nonparametric inference is different from parametric inference, e.g., hypothesis testing, confidence intervals, etc.

Series Estimation → FMM
􏰉 It is natural to consider FMM as a convenient approximation of a nonparametric model: nonparametric analysis is more complicated.
􏰉 However, FMM itself can be a reasonable specification for certain empirical problems.
􏰉 Suppose the population can be partitioned into two sub-classes. Within the class, individuals are relatively homogeneous, but between the class, individuals are more heterogeneous.

Series Estimation → FMM
􏰉 Consider for individual i belongs to class 1 with probability π1 ∈ (0, 1) and otherwise belongs to class 0 with probability π0 = 1 − π1.
􏰉 Then, the following finite mixture may be reasonable;
2 1 􏰖1 􏰑y−μ0􏰒􏰤 􏰖1 􏰑y−μ1􏰒􏰤
􏰉 It is straightforward to extend to the many class case.
f(y|{πj,μj,σj }j=0) = π0 σ φ σ +π1 σ φ σ 00 11

FMM, an example
􏰉 Estimating parameters of the distribution of lengths of halibut.

FMM, an example
􏰉 Some are small, but some are very large!

FMM, an example
􏰉 It is known that female halibut is longer, on average, than the male and that the distribution of lengths is normal
􏰉 Gender cannot be determined at measurement
􏰉 Then distribution is a 2-component finite mixture of normals. Latent “types” correspond to gender
􏰉 A finite mixture model allows one to estimate:
􏰉 mean/variance of lengths of male and female halibut 􏰉 mixing probability (proportions)
􏰉 Other examples: Stock returns in “typical” and “crisis” regimes, GDP growth, Insurance with “risk loving” and “risk averse”

Normal mixture with two components

Normal mixture with two components

Normal mixture with two components

FMM, covariates
􏰉 Suppose we observe {(yi,xi)}Ni=1 and there are two latent classes, {0, 1} and define the membership indicator
di := 1(i belongs class 1).
􏰉 Moreover, we assume
yi|xi,di ∼ N(xiβd ,σd2 )
ii 􏰉 If we observed di , the likelihood would be
􏰙N 􏰖1 􏰑yi−x′βd􏰒􏰤 πdi φ ii
i=1 σdi σdi
whereπj =Pr(di =j)forj∈{0,1}.
􏰉 We will briefly discuss about how to obtain the estimates.

FMM, generalisation
􏰉 For each class, more generally, we could assume yi|xi,di ∼ f(yi|xi,θdi )
where f (yi |xi , θj ) is a parametric density function with parameter θj , j ∈ {0, 1}.
􏰉 Then, if we observed di , the likelihood would be
􏰙πdi f(yi|xi,θdi ) i=1
􏰉 A problem is that d := (d1,…,dN) is not observed.

FMM, estimation
􏰉 Frequentist:
􏰉 EM algorithm can be used, which iterates between
1. Computing the expectation of the log-likelihood as a function of the parameter values from the previous iteration:
Ed |data,θ(s−1) [ln L(θ)]
2. Maximizing with respect to θ to obtain θ(s).
􏰉 For FMM, it is likely that the likelihood has multiple local
maxima. So, many initial values have to be considered to
be sure of a global maximum.
􏰉 We will see how to estimate a FMM using Stata.
􏰉 Bayesian:
􏰉 Weobtaintheposteriorof(d,π1,θ0,θ1)usinganMCMC
􏰉 In the Bayesian framework, there is no distinction between
missing data d and parameters (π1, θ0, θ1).
􏰉 Especially, the method to handle the missing data in an
MCMC method is called the Data Augmentation.

FMM, some properties
􏰉 After integrating di out, the model can be written as yi|xi,π1,θ0,θ1 ∼π1f(yi|xi,θ1)+(1−π1)f(yi|xi,θ0)
􏰉 Hence, the regression function is given as
E [yi |xi , π1 , θ0 , θ1 ] = π1 E [yi |xi , θ1 ] + (1 − π1 )E [yi |xi , θ0 ]
􏰉 Moreover, the marginal effect is
∂ E[y|x,π ,θ ,θ ]=π ∂ E[y|x,θ ]+(1−π ) ∂ E[y|x,θ ]
∂x i i 1 0 1 1∂x i i 1 1 ∂x i i 0 iii
􏰉 Extension to many classes is straightforward.