Maximum Likelihood Binary response models
ECON 61001: Lecture 9
Alastair R. Hall
The University of Manchester
Alastair R. Hall ECON 61001: Lecture 9 1 / 34
Maximum Likelihood Binary response models
Outline of today’s lecture
Maximum Likelihood Estimation
Binary response models
Linear probability model Probit and Logit models Empirical application
Alastair R. Hall ECON 61001: Lecture 9 2 / 34
Maximum Likelihood Estimation
Binary response models Inference
Notation
For the purposes of introducing ML, we follow the convention that capital letters are random variables/vectors and small letters denote the observed outcomes.
However, having introduced ML, we revert to the convention – more common in econometrics – that small letters denote either random variables (vectors) or their outcomes with the interpretation defined by context.
Alastair R. Hall ECON 61001: Lecture 9 3 / 34
Maximum Likelihood Estimation
Binary response models Inference Intuition behind ML
Suppose {Vi; i = 1,2,…N} is a sequence of discrete random variables with joint probability distribution function:
P(V1 = v1,V2 = v2,…,VN = vN; θ0) = p(v1,v2,…,vN; θ0), say, where θ0 is a vector of parameters.
If we know θ0 then the probability of observing a particular sample {Vi = vi;i = 1,2,…,N} is given by
p(v1,v2,…,vN; θ0)
Alastair R. Hall ECON 61001: Lecture 9 4 / 34
Maximum Likelihood Estimation
Binary response models Inference Intuition behind ML
The situation we face in estimation is the other way round: given the observed sample we wish to work out θ0.
Basic idea of ML: estimate θ0 by the parameter value that maximizes the probability of observing the particular sample we have.
To express this mathematically, we now write:
LF(θ;v1,v2,…vN) =d p(v1,v2,…,vN; θ) and ML estimator (MLE) of θ is:
θˆ = a r g m a x L F ( θ ; v 1 , v 2 , . . . v N ) θ∈Θ
LF(θ; v1,v2,…vN) is known as the likelihood function.
Alastair R. Hall ECON 61001: Lecture 9 5 / 34
Maximum Likelihood Estimation
Binary response models Inference Intuition behind ML
How can we extend this to continuous random variables?
If {Vi ; i = 1, 2, . . .N} is a sequence of continuous random variables with joint probability density function (pdf)
f (v1, v2, . . . , vN ; θ) then the likelihood function is defined as:
LF(θ;v1,v2,…vN) =d f(v1,v2,…,vN; θ)
We will now focus our attention on the discrete case as this is the
scenario relevant to binary response models.
Alastair R. Hall ECON 61001: Lecture 9 6 / 34
Maximum Likelihood Estimation
Binary response models Inference Intuition behind ML
Clearly to implement ML, we require the joint probability distribution function.
Sometimes we specify the model explicitly in terms of the joint distribution and so p(v1, v2, . . . , vN ; θ) is readily available.
In others, we specify model for Vi from which we can deduce joint probability distribution function. For example:
{Vi } is independently and identically distributed, with P(Vi =vi;θ0)=p(vi;θ0).
So that
N p(v1,v2,…,vN; θ) = p(vi;θ0)
i=1
Alastair R. Hall ECON 61001: Lecture 9 7 / 34
Maximum Likelihood Estimation
Binary response models Inference Likelihood function for iid data
This means that:
LF(θ;v1,v2,…vN) = p(vi;θ).
i=1
In this case, easier to equivalently define MLE via:
θˆN =argmaxLLFN(θ) θ∈Θ
where
LLFN (θ) = ln[LF (θ; v1, v2, . . . vN )] ∼ log likelihood function
N
Alastair R. Hall ECON 61001: Lecture 9 8 / 34
Maximum Likelihood Estimation
Binary response models Inference MLE and score equations
Under relatively mild conditions on the distribution, the MLE can be obtained by solving the first order conditions:
∂LLFN(θ) = 0 ∂ θ θ = θˆ
These equations are known as the score equations.
Sometimes, we can obtain closed form solution for θˆN (in terms of data alone) from score equations. Other times, MLE is found by using computer-based numerical optimization routines.
N
Alastair R. Hall ECON 61001: Lecture 9 9 / 34
Maximum Likelihood Estimation
Binary response models Inference Example: Bernouilli distribution
Let {Vi}Ni=1 be a sequence of i.i.d. Bernoulli random variables withP(Vi =1;θ0)=θ0.
The probability distribution function of Vi is: P(Vi =vi;θ0) = θvi(1−θ0)1−vi.
Likelihood function is:
LFN(θ) = θvi(1−θ)1−vi.
Log likelihood function is
N
LLFN(θ) = {viln[θ] + (1−vi)ln[1−θ]}.
i=1
Alastair R. Hall ECON 61001: Lecture 9 10 / 34
0
N i=1
Maximum Likelihood Estimation
Binary response models Inference Example: Bernouilli distribution
And so:
∂LLFN(θ) = Ni=1 vi − Ni=1(1−vi).
∂θ θ 1−θ
Defining Ni=1 vi = N1, score equation is:
N1−N−N1 =0. θˆN 1 − θˆN
M L E i s : θˆ N = N 1 / N .
Alastair R. Hall ECON 61001: Lecture 9 11 / 34
Maximum Likelihood Estimation
Binary response models Inference Overview of statistical properties of MLE
MLE has appealing intuition but what can be said about its properties?
Consistency: θˆN →p θ0.
Asymptotic normality: N1/2(θˆN − θ0) →d N(0, Vθ) where
Vθ = { lim N−1Iθ,N}−1 N→∞
and
Iθ,N = −E∂2LLFN(θ) .
∂θ∂θ′ θ=θ0 Iθ,N is known as the information matrix.
Alastair R. Hall ECON 61001: Lecture 9
12 / 34
Maximum Likelihood Estimation
Binary response models Inference Overview of properties of MLE – continued
Invariance: The MLE of h(θ0) is h(θˆN).
These properties imply that the MLE is optimal in the sense that it is asymptotically efficient (i.e. minimum variance) in the class of consistent uniformly asymptotically normal (CUAN) estimators of θ0.
But note:
justification is via large sample properties
optimality depends crucially on our correctly specifying the joint distribution.
Alastair R. Hall ECON 61001: Lecture 9 13 / 34
Maximum Likelihood
Binary response models
Hypothesis testing
Suppose we wish to test: H0 : g(θ0) = 0
where
Estimation
Inference
vs.
g ( · ) is a ng × 1 vector of continuous differentiable functions
H1 : g(θ0) ̸= 0
G ( θ ̄ ) = ∂ g ( θ ) w i t h r a n k { G ( θ ) } = n . ∂θ′θ=θ ̄ 0g
Three fundamental test principles associated with ML estimation: Wald, Likelihood Ratio and Lagrange Multiplier (or Score).
Alastair R. Hall ECON 61001: Lecture 9 14 / 34
Maximum Likelihood Estimation Binary response models Inference
Hypothesis testing
To present the forms of the three test statistics we need the following notation/terminology:
Unrestricted MLE is:
θˆN =argmaxLLFN(θ)
Restricted MLE is:
θ∈Θ
θ ̃N =argmaxLLFN(θ) θ∈ΘR
whereΘR ={θ:g(θ)=0,θ∈Θ}.
Alastair R. Hall ECON 61001: Lecture 9 15 / 34
Maximum Likelihood Estimation Binary response models Inference
Hypothesis testing
Wald test:
WN = N g(θˆN)′[G(θˆN)′VˆθG(θˆN)]−1g(θˆN)
w h e r e Vˆ θ →p V θ . Likelihood ratio (LR) test:
LRN = 2{LLFN (θˆN ) − LLFN (θ ̃N )}
Alastair R. Hall ECON 61001: Lecture 9 16 / 34
Maximum Likelihood Estimation Binary response models Inference
Hypothesis testing
Lagrange Multiplier (LM) test:
LMN = N ̄sN(θ ̃N)′V ̃θ ̄sN(θ ̃N)
−1∂LLFN(θ)
̄s N ( θ ) = N ∂ θ θ = θ ̄ .
̄
V ̃ θ →p V θ ( u n d e r H 0 ) .
Under H0 then: WN, LRN, LMN →d χ2ng and further all three are asymptotically equivalent (WN − LRN →p 0, WN − LMN →p 0).
Alastair R. Hall ECON 61001: Lecture 9 17 / 34
Maximum Likelihood Estimation Binary response models Inference
Conditional models
Often interested in conditional models that is, explaining yi in terms of xi.
But our original definition of LF is in terms of joint distribution of vi =(yi,xi′)′.
However can also work with conditional distribution under certain circumstances.
For example, if vi is i.i.d and p(vi; θ) = p(yi |xi;φ)p(xi;ψ) so parameters of distribution of yi |xi do not appear in marginal distribution of xi then can obtain ML by maximizing
N
CLLF(φ) = ln[p(yi |xi;φ)]
i=1
Alastair R. Hall ECON 61001: Lecture 9 18 / 34
Maximum Likelihood
Binary response models
Linear probability model Probit model
Empirical Example
What are binary response models?
In binary response models, we are interested in modeling the probability that an event occurs.
Examples of events that may be of interest: whether an individual is employed; whether an individual is divorced; whether an individual receives a loan; whether a firm is taken over.
As economists, we are interested in modeling the conditional probability of the event given characteristics of the individual or firm concerned.
Alastair R. Hall ECON 61001: Lecture 9 19 / 34
Maximum Likelihood
Binary response models
Linear probability model Probit model
Empirical Example
Basic structure of binary response models
Use a dummy variable to model outcome, that is yi = 1, if event occurs
= 0, if event does not occur
Let xi be a vector of explanatory variables then we are interested in:
the conditional probability that an event occurs, P(yi = 1|xi). how probability changes with changes in the elements of xi .
Alastair R. Hall ECON 61001: Lecture 9 20 / 34
Maximum Likelihood
Binary response models
Linear probability model Probit model
Empirical Example
Basic structure of binary response models
Example: event of interest is employment
yi = 1 if individual is employed, yi = 0 if individual is unemployed;
xi might contain information about education, experience, demograhic information (using dummy variables), economic conditions if sample over disparate area.
Alastair R. Hall ECON 61001: Lecture 9 21 / 34
Maximum Likelihood
Binary response models
Linear probability model Probit model
Empirical Example
Basic structure of binary response models
A common approach to binary response modeling is to use so-called index models for which,
P(yi =1|xi) = p(xi′β0)
where β0 is a vector of unknown parameters (as in the multiple linear regression model), p(.) is some function and xi′β0 is known as the “index”.
Examples of index models:
linear probability model (LPM); logit model;
probit model.
These three differ in their choice of p(.) above (and hence in implicit definition of β0).
Alastair R. Hall ECON 61001: Lecture 9 22 / 34
Maximum Likelihood
Binary response models
Basic structure of LPM
Linear probability model
Probit model Empirical Example
For the LPM, we assume that p(.) is linear so that p(xi′β0) = xi′β0
How can we estimate β0? Answer: it turns out we can estimate this model using regression techniques. To uncover why, we need to think about the E[yi |xi] for this model.
Since yi is a dummy variable:
E[yi |xi] = 0×P(yi =0|xi) + 1×P(yi =1|xi) = P(yi=1|xi)
= x i′ β 0
Alastair R. Hall ECON 61001: Lecture 9 23 / 34
Maximum Likelihood
Binary response models
Basic structure of LPM
Therefore, we can write:
Linear probability model
Probit model Empirical Example
yi =xi′β0+ui
and ui satisfies E [ui |xi ] = 0 which is a key condition for OLS to be
a consistent estimator of β0.
This suggests that we can estimate the LPM by OLS.
However, we must use heteroscedasticity robust standard errors; see Tutorial 9 Question 1.
Problem: the predicted probabilities can be outside [0, 1].
Alastair R. Hall ECON 61001: Lecture 9 24 / 34
Maximum Likelihood
Binary response models
Linear probability model
Probit model
Empirical Example
Probit model
Define
Φ(z) is the cumulative distribution function of a standard
normal rv that is, Φ(z) = z φ(v)dv where φ(v) is the pdf −∞
of a standard normal distribution. Then for Probit model set
P(yi =1|xi) = Φ(xi′β0) Φ(xi′β0) is a nonlinear function of xi′β0;
If P(yi = 1|xi) = Φ(xi′β0) then P(yi = 0|xi) = 1 − Φ(xi′β0).
Alastair R. Hall ECON 61001: Lecture 9 25 / 34
Maximum Likelihood
Binary response models
Linear probability model
Probit model
Empirical Example
Latent variable model interpretation
These models look very different from our linear regression model, but we can uncover a connection by taking a latent variable approach to specifying these models.
What is a latent variable? Answer: it is variable that is observed by individual/firm but not observed by the econometrician. In this case, it is some variable that is the sole determinant of whether the event occurs.
For example: if event is whether individual is given loan by bank then latent variable is credit score given by bank.
Now consider math.
Alastair R. Hall ECON 61001: Lecture 9 26 / 34
Maximum Likelihood
Binary response models
Linear probability model
Probit model
Empirical Example
Latent variable model interpretation
Suppose that the latent variable yi∗ satisfies the linear regression model
and
yi∗ = xi′β0 + ui yi∗ >0⇒yi=1
yi∗ ≤0⇒yi=0
Then
if ui has standard normal distribution then yi follows probit model.
Alastair R. Hall ECON 61001: Lecture 9 27 / 34
Maximum Likelihood
Binary response models
Linear probability model
Probit model
Empirical Example
How does change in xi,l affect probability?
This is more complicated here than in the LPM because probit
model implies the probability is a nonlinear function of the x’s. If xi,l is continuous then
∂P(yi = 1|xi) = φ(xi′β0)β0,l. ∂xi,l
If xi,l is a dummy variable then holding values of other elements of xi constant then
∆P(yi = 1|xi) = Φ(xi′β0,xi,l = 1) − Φ(xi′β0,xi,l = 0)
Note that in both cases response ∆P (yi = 1|xi ) depends on xi .
Alastair R. Hall ECON 61001: Lecture 9 28 / 34
Maximum Likelihood
Binary response models
Linear probability model
Probit model
Empirical Example
ML Estimation of probit model
Assume
{(yi,xi′)}Ni=1 arei.i.d
P(yi = 1|xi) = Φ(xi′β0)
p(xi ) does not depend on β0
Then:
CLLFN(β) = yiln[Φ(xi′β)]+(1−yi)ln[1−Φ(xi′β)]
N i=1
Alastair R. Hall ECON 61001: Lecture 9 29 / 34
Maximum Likelihood
Binary response models
Linear probability model Probit model
Empirical Example
Empirical example: Wooldridge Example 7.12 on p.256
y is a dummy variable that takes the value one if a man is arrested during 1986 and is zero, otherwise.
x contains:
pcnv = the proportion of prior arrests that led to conviction;
avgsen = the average sentence served from prior convictions;
tottime = the total time spent in prison since age 18 prior to 1986;
ptime86 = months spent in prison in 1986;
qemp86 = the number of quarters that the man was legally
employed in 1986;
Alastair R. Hall ECON 61001: Lecture 9 30 / 34
Maximum Likelihood
Binary response models
Linear probability model Probit model
Empirical Example
Linear probability model:
——————————————————
Estimate (Intercept) 0.440615 pcnv -0.162445 avgsen 0.006113 tottime -0.002262 ptime86 -0.021966 qemp86 -0.042829
Std. Error 0.018546 0.019220 0.006210 0.004573 0.002919 0.005466
t value
23.758
-8.452
0.984
-0.494
-7.526
-7.835
Pr(>|t|)
< 2.2e-16 *** < 2.9e-16 ***
0.326
0.621 7.06e-14 *** 6.66e-15 ***
------------------------------------------------------ Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 0.4373 on 2719 degrees of freedom Multiple R-squared: 0.04735, Adjusted R-squared: 0.0456 F-statistic: 27.03 on 5 and 2719 DF, p-value: < 2.2e-16
Alastair R. Hall ECON 61001: Lecture 9 31 / 34
Maximum Likelihood
Binary response models
Linear probability model Probit model
Empirical Example
Probit model:
------------------------------------------------------
Estimate -0.101999 -0.540475
Std. Error 0.051462 0.069303 0.020459 0.016175 0.017055 0.016600
z value
-1.982
-7.799
Pr(>|z|) 0.0475 *
(Intercept)
pcnv
avgsen
tottime
ptime86
qemp86 —————————————————— Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 3216.4 on 2724 degrees of freedom Residual deviance: 3079.2 on 2719 degrees of freedom AIC: 3091.2
0.018923 -0.006569 -0.078238 -0.131658
0.925
-0.406
-4.587
-7.931
6.25e-15 *** 0.3550
Number of Fisher Scoring iterations: 5
0.6847 4.49e-06 *** 2.17e-15 ***
Alastair R. Hall ECON 61001: Lecture 9 32 / 34
Maximum Likelihood
Binary response models
Linear probability model Probit model
Empirical Example
Probit model:
—————————————————— Marginal Effects:
dF/dx pcnv -0.1775607 avgsen 0.0062166 tottime -0.0021580 ptime86 -0.0257034 qemp86 -0.0432534
Std. Err. 0.0226318 0.0067212 0.0053141 0.0055866 0.0054381
z
-7.8456
0.9249
-0.4061
-4.6009
-7.9537
P>|z| 4.308e-15 ***
0.3550
0.6847 4.207e-06 *** 1.810e-15 ***
—————————————————— Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Alastair R. Hall ECON 61001: Lecture 9 33 / 34
Maximum Likelihood
Binary response models
Further reading- to come
Linear probability model Probit model
Empirical Example
Notes: Chapter 6
Greene:
ML: Ch 14.1-14.6 (but goes into more detail on properties of
MLE than we do)
Binary response model: Ch 17.1-17.3 (but again more detail than this course)
Alastair R. Hall ECON 61001: Lecture 9 34 / 34