程序代写代做代考 Bayesian flex algorithm ECON6300/7320/8300 Advanced Microeconometrics Finite Mixtures

ECON6300/7320/8300 Advanced Microeconometrics Finite Mixtures
Christiern Rose 1 1University of Queensland
1/22

Review: Series Estimation
􏰉 Series estimation is widely used for nonparametric analysis.
􏰉 Let φ1(·), φ2(·), . . . be a sequence of (basis) functions such
that

􏰁θjφj(·) j=1
approximates any function on the same domain with some
θ1,θ2,….
􏰉 Then, if φ1(·), φ2(·), . . . are densities, by restricting θj to be
positive and sum to one, we have a fully flexible density
specification with infinite dimensional θ.
􏰉 If φ1(·), φ2(·), . . . are not densities, by normalising
we may approximate any density.
∞ 
exp 􏰁 θj φj (·) j=1
2/22

Series Estimation
􏰉 In practice, we cannot have infinitely many components. So, we often use a finite number K of components.
􏰉 K determines the smoothness of the estimate like h of KDE.
􏰉 If K is too small, the density estimate would not be informative.
If K is too large, the density estimate would be too noisy (bumpy).
􏰉 The optimal number of components K should increase as the sample size increases.
3/22

Series Estimation
􏰉 HowtochooseK? 􏰉 Frequentist:
􏰉 often use BIC or AIC, or some data driven method such as cross-validation, or even use prior information informally.
􏰉 There is no universally accepted rule for choosing K .
􏰉 In any case, computation of parameter estimates is for each
given fixed K , but inference is nonparametric (slow-convergence).
􏰉 Bayesian:
􏰉 regard K as a latent variable (parameter), put a prior on K
with a full support N, and obtain the posterior of K as well
as other parameters using an MCMC method.
􏰉 Do not choose an arbitrary K as its distribution is
determined.
4/22

Series Estimation
􏰉 There are many functions that can be used as basis functions {φj (·)}
􏰉 Examples include
􏰉 Polynomials: Legendre polynomials, Bernstein polynomials,
etc.
􏰉 Splines: piecewise linear splines, B-splines, etc.
􏰉 Densities: Beta densities (Bernstein polynomials), normal
densities, Gamma densities, etc.
5/22

Series Estimation: BPD
􏰉 For example, the Bernstein polynomial density of order K is given as
K
f (y |θ1 , . . . , θK ) := 􏰁 θj Beta(j , K − j + 1)
j=1
where θj are all positive and sum to 1 and Beta(a, b) denotes the Beta density with parameters a and b, i.e., its mean is ab/(a + b).
􏰉 When K → ∞, the BPD approximates any absolutely continuous density on [0, 1]; see Petrone (1999) for Bayesian nonparametric method using BPD.
6/22

Series Estimation: BPD
􏰉 The Basis functions are plotted below (a) k=3
3 2 1 0
4 3 2 1 0
6 4 2 0
􏰉 The BPD is a histogram smoothing; see Petrone (1999)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 (b) k=4
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 (c) k=6
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
7/22

Series Estimation: normal mixture
􏰉 Another example: normal densities can be used as basis functions.
2 􏰁∞ 􏰖 1 􏰑 y − μ j 􏰒 􏰤 f(y|{πj,μj,σj }) = πj σ φ σ
j=1 j j
where φ(·) is the PDF of N (0, 1).
􏰉 The normal mixture approximates any absolutely continuous density.
8/22

Series Estimation: normal mixture
􏰉 The normal mixture has been widely used with Dirichlet process prior in nonparametric Bayesian literature; Ferguson (1973), Escobar and West (1995), Walker (2006), etc.
􏰉 Note that Dirichlet process can be viewed as a probability distribution over the space of density functions.
􏰉 Walker (2006) developed an efficient Gibbs sampler to sample densities from the posterior.
9/22

Series Estimation → FMM
􏰉 For the rest of the lecture, we consider the normal mixture
with a fixed K .
􏰉 The mixture model is fairly flexible, but it is a parametric model since K is fixed as a finite number, i.e., K does not increase as N grows.
􏰉 When we use a fixed K , the parametric model is called the finite mixture model (FMM).
􏰉 Note that frequentiest nonparametric methods use a fixed K for estimation purpose, in which case the nonparametric estimate is numerically the same as the estimate of FMM with the same K .
􏰉 However, nonparametric inference is different from parametric inference, e.g., hypothesis testing, confidence intervals, etc.
10/22

Series Estimation → FMM
􏰉 It is natural to consider FMM as a convenient approximation of a nonparametric model: nonparametric analysis is more complicated.
􏰉 However, FMM itself can be a reasonable specification for certain empirical problems.
􏰉 Suppose the population can be partitioned into two sub-classes. Within the class, individuals are relatively homogeneous, but between the class, individuals are more heterogeneous.
11/22

Series Estimation → FMM
􏰉 Consider for individual i belongs to class 1 with probability π1 ∈ (0, 1) and otherwise belongs to class 0 with probability π0 = 1 − π1.
􏰉 Then, the following finite mixture may be reasonable;
2 1 􏰖1 􏰑y−μ0􏰒􏰤 􏰖1 􏰑y−μ1􏰒􏰤
􏰉 It is straightforward to extend to the many class case.
f(y|{πj,μj,σj }j=0) = π0 σ φ σ +π1 σ φ σ 00 11
12/22

FMM, an example
􏰉 Estimating parameters of the distribution of lengths of halibut.
13/22

FMM, an example
􏰉 Some are small, but some are very large!
14/22

FMM, an example
􏰉 It is known that female halibut is longer, on average, than the male and that the distribution of lengths is normal
􏰉 Gender cannot be determined at measurement
􏰉 Then distribution is a 2-component finite mixture of normals. Latent “types” correspond to gender
􏰉 A finite mixture model allows one to estimate:
􏰉 mean/variance of lengths of male and female halibut 􏰉 mixing probability (proportions)
􏰉 Other examples: Stock returns in “typical” and “crisis” regimes, GDP growth, Insurance with “risk loving” and “risk averse”
15/22

Normal mixture with two components
16/22

Normal mixture with two components
17/22

Normal mixture with two components
18/22

FMM, covariates
􏰉 Suppose we observe {(yi,xi)}Ni=1 and there are two latent classes, {0, 1} and define the membership indicator
di := 1(i belongs class 1).
􏰉 Moreover, we assume
yi|xi,di ∼ N(xiβd ,σd2 )
ii 􏰉 If we observed di , the likelihood would be
􏰙N 􏰖1 􏰑yi−x′βd􏰒􏰤 πdi φ ii
i=1 σdi σdi
whereπj =Pr(di =j)forj∈{0,1}.
􏰉 We will briefly discuss about how to obtain the estimates.
19/22

FMM, generalisation
􏰉 For each class, more generally, we could assume yi|xi,di ∼ f(yi|xi,θdi )
where f (yi |xi , θj ) is a parametric density function with parameter θj , j ∈ {0, 1}.
􏰉 Then, if we observed di , the likelihood would be
N
􏰙πdi f(yi|xi,θdi ) i=1
􏰉 A problem is that d := (d1,…,dN) is not observed.
20/22

FMM, estimation
􏰉 Frequentist:
􏰉 EM algorithm can be used, which iterates between
1. Computing the expectation of the log-likelihood as a function of the parameter values from the previous iteration:
Ed |data,θ(s−1) [ln L(θ)]
2. Maximizing with respect to θ to obtain θ(s).
􏰉 For FMM, it is likely that the likelihood has multiple local
maxima. So, many initial values have to be considered to
be sure of a global maximum.
􏰉 We will see how to estimate a FMM using Stata.
􏰉 Bayesian:
􏰉 Weobtaintheposteriorof(d,π1,θ0,θ1)usinganMCMC
method.
􏰉 In the Bayesian framework, there is no distinction between
missing data d and parameters (π1, θ0, θ1).
􏰉 Especially, the method to handle the missing data in an
MCMC method is called the Data Augmentation.
21/22

FMM, some properties
􏰉 After integrating di out, the model can be written as yi|xi,π1,θ0,θ1 ∼π1f(yi|xi,θ1)+(1−π1)f(yi|xi,θ0)
􏰉 Hence, the regression function is given as
E [yi |xi , π1 , θ0 , θ1 ] = π1 E [yi |xi , θ1 ] + (1 − π1 )E [yi |xi , θ0 ]
􏰉 Moreover, the marginal effect is
∂ E[y|x,π ,θ ,θ ]=π ∂ E[y|x,θ ]+(1−π ) ∂ E[y|x,θ ]
∂x i i 1 0 1 1∂x i i 1 1 ∂x i i 0 iii
􏰉 Extension to many classes is straightforward.
22/22