Lecture 4 : Simultaneous Equations Models and other uses of IVs
Rigissa Megalokonomou University of Queensland
1/29
Reading for lecture 4
Inthetextbook(Wooldridge2013):Chapter16
2/29
Introduction to SEM
TheIVmethodcansolvetwokindsofendogeneity:omitted variables and measurement error.
Anotherimportantsourceofendogeneityissimultaneity:thatis, one or more of the explanatory variables are jointly determined with the dependent variable.
Anyuserofsimultaneousequationsmodels(SEM)shouldknow that just because two (or more) variables are jointly determined, it does not mean that it is appropriate to specify and estimate an SEM.
TheuseofOLStoanequationinasimultaneoussystemis biased and inconsistent.
TheIVmethodisaleadingmethodforSEM.
TheIVsolutiontoSEMisessentiallythesameastheIV solutions to the omitted variables and measurement error problems.
3/29
Theclassicalcaseisasystemofdemandandsupply equations:
D(p) = how high would demand be if the price was set to p? S(p) = how high would supply be if the price was set to p?
Bothmechanismshaveaceterisparibus,causal interpretation.
Observedquantityandpricewillbejointlydeterminedin equilibrium.
4/29
The nature of SEM
ClassicExample:asupplyanddemandequationforsome commodity or input to production
In the context of labor supply:
hs =α1·w+β1·z1+u1 (1)
Let hs denote the annual labor hours supplied in agriculture, measured at the country level, and w be the average hourly wage and z1 be additional explanatory factors affecting labor supply, and u1 contains other unobserved factors affecting labor supply.
For example, z1 could be the manufacturing wage. We expect β1 <0, so if the manufacturing wage increases less people will go into agriculture.
We also expect α1 > 0.
When we graph labor supply, we sketch hours as a function
of wage, with z1 and u1 being held constant.
z1 is the observable supply shifter, u1 is the unobserved
supply shifter.
5/29
Howdoestheequation(1)differfromtheoneinprevious lecture?
Although equation (1) is supposed to hold for all possible values of wage, we cannot view wage as varying exogenously for a cross section of counties.
If we could run an experiment where we vary the level of agricultural and manufacturing wages across a sample of counties and survey labor supply of workers for each country, then we could estimate (1) with OLS. This experiment is not possible in general.
Instead, we must collect data on average wages in these two sectors along with how many person hours were spent in agricultural production.
Todescribehowequilibriumwagesandlaborare determined, we need labor demand.
6/29
Tocompletethemodel,wemustaddaspecificationof labor demand:
hd =α2·w+β2·z2+u2 (2)
where hd is annual labor hours demanded, z2 is observable demand shifter, u2 is unobserved demand shifter.
z2 could be the agricultural land area. β2 >0 as the more land is available, the higher the demand for labor will be.
Also, we expect α2 < 0: the lower the wage, the higher the demand for labor will be.
(1)representsthebehaviorofworkers,while(2) represents the behavior of employers/farmers.
7/29
Eachequationshouldhaveaceterisparibusinterpretation and an economic meaning in isolation from the other equations. Thus, each parameter of the variables in (1) and (2) should represent a causal relationship, while we hold all other variables fixed.
(1)and(2)arelinkedbecauseobservedwageandlabor hours are determined by the intersection of supply and demand.
hs =f(w,z1)
hd =f(w,z2)
This causes endogeneity of w in both equations.
8/29
(1)and(2)constituteasimultaneousequationsmodel (SEM).
In equilibrium hs = hd , so we can rewrite (1) and (2) as:
hi = α1 · wi + β1 · z1i + u1i (3)
hi = α2 · wi + β2 · z2i + u2i (4)
Givenz1i,z2i,u1i,andu2i,thesetwoequationsdetermine
hi andwi.
For this reason, hi and wi are the endogenous variables
in this SEM.
z1i , z2i are determined outside of the model, we view them
as exogenous variables.
Neitheroftheseequationsmaybeconsistentlyestimated via OLS, since the wage variable in each equation is correlated with the respective error term. (Keep in mind Lecture 1 and 2)
9/29
Inequilibriumwehave(3)=(4):
(hint: we make (3)=(4) and we solve for w)
α1·w+β1·z1+u1 =α2·w+β2·z2+u2 => α2 · w − α1 · w = β1 · z1 + u1 − (β2 · z2 + u2)
w = β1 ·z1 +u1 −(β2 ·z2 +u2) => α2 − α1
w = β1 · z1 − β2 · z2 + u1 − u2 α2 − α1
10/29
Aswesolveit:
w= β1 z1− β2 z2+ 1 u1− 1 u2 (5)
α2 −α1 α2 −α1 α2 −α1 α2 −α1 w is a function of exogenous variables, z1 and z2, and
errors, u1 and u2.
Also,wandtheerrortermsarecorrelated!
Ingeneral,anyshocktoeitherlabordemand(throughz1) or supply (through z2) will affect both the equilibrium quantity (h, through 3 and 4) and wage (through 5).
Note that if z1 =z2 , then the equations look identical, and we cannot estimate either one. Thus, z1 and z2 should be factors that are unique to each equation. For instance, factors that shift the supply curve should not shift the demand curve. Think about the standard demand and supply model.
11/29
Identification of SEM
Inasimultaneousequationssystem,variablesthatappear only on the right-hand side of the equations are called exogenous variables.
Variablesthatappearontheright-handsideandalsohave their own equations are referred to as endogenous variables.
In this case: h and w are endogenous and z1 and z2 are exogenous.
12/29
Identification of SEM
Supply:hi =α1·wi+β1·z1i+u1i (6) Demand:hi =α2·wi+β2·z2i+u2i (7)
Inparticular,weassumethatcertainexogenousvariables do not appear in (6) (i.e z2i ), but appear in (7), and others do not appear in (7) (i.e z1i ), but appear in (6).
Foridentificationofthesupplyequation(6),welookfor excluded variable/s (i.e. instrument) in the supply equation, that are included in the demand equation–but not in the supply equation.
Foridentificationofthedemandequation(7),welookfor excluded variable/s (i.e. instrument) in the demand equation, that are included in the supply equation –but not in the demand equation.
13/29
Identification of SEM
Thenecessaryandsufficientconditionforidentificationis:
Thefirst(second)equationinatwo-equationsimultaneous equations model is identified, if and only if the second (first) equation contains at least one exogenous variable that is excluded from the first (second) equation, and the exogenous variable has a nonzero coefficient.
ThisexcludedvariableisagoodcandidateforavalidIVfor the endogenous variable.
14/29
Example : Crime and size of the police force
Citieswanttoknowhowmuchadditionalpolicewilldecrease their crime. A simple cross-section model to address this question is:
crimpc = α1polpc + β1incpc + u1 (8)
where crimpc is number of crimes per capita, polpc is number of police officers per capita, and incpc is income per capita. Here incpc is assumed as exogenous.
Ideally,wehopetoanswerwhetheranincreaseinpoliceforce lowers the crime rate.
Ifwecouldexogenouslychoosepoliceforcesizesforarandom sample of cities, we could estimate (8) by OLS.
Certainly,wecannotdothisexperiment.
15/29
Canwethinkofpoliceforcebeingexogenouslydetermined?
Acity’sspendingonlawenforcement,andthussizeofpolice force, is at least partly determined by its expected crime rate. Thus, they are jointly determined. To reflect this, we postulate a second relationship:
polpc = α2crimpc + otherfactors (9)
we expect α2 > 0: other factors being equal, cities with higher
(expected) crime rates will have more police officers per capita.
(8)describesbehaviorofcriminalswhile(9)describesbehavior by city officials. This gives each equation (8) and (9) a clear ceteris paribus interpretation and makes them an SEM.
Whicharetheendogenousandexogenousvariablesinthis system?
Exogenous variables: incpc, otherfactors
Endogenous variables: crimpc, polpc
16/29
Thus,foridentificationof(8):
we need excluded factors in (8) that affect polpc and are
included in (9).
Ifsuchvariablesexist,thesefactorsaregoodcandidates for valid IVs.
Likewise,foridentificationof(9):
we need excluded factors in (9) that affect crimpc and are
included in (8).
Dowehavesuchvariables?
incpc in equation (8) allows us to estimate (9) and otherfactors in equation (9) allows us to estimate (8).
17/29
Example: Identification in a two-equation system
Considermarketformilk
Supply : q = α1 · p + β1 · z1 + u1 (10)
Demand : q = α2 · p + u2 (11)
qisthepercapitamilkquantityatthecountylevel,pisthe average price of a gallon of milk in that county, and let z1 is the price of cattle feed (input in production of milk).
Equation (10) is the supply equation, with α1 > 0 and β1 < 0: that is, a higher cost of production will generally reduce the quantity supplied at the same price per gallon.
(11) is the demand equation, where we presume that α2 < 0.
Endogenous variables: q and p, exogenous: z1
Toidentify(10)weneedtofindexcludedvariablesin(10)that
affect qd and are included in (11).
Toidentify(11)weneedtofindexcludedvariablesin(11)that
affect qs and are included in (10).
Identification of (11) is done by z1. Also β1 has to be non-zero.
Itturnsoutthatthedemandequation,isidentified,butthe
supply equation is not.
18/29
More formally: Order Condition
Therehastobeanexcludedvariablefromtheequationthatwe want to identify that is included in the other question of the system.
In the system of(3) and (4):
z1 allowed us to estimate the demand equation
z2 allowed us to estimate the supply equation
Intheexampleofcrimeandsizeofthepoliceforce(8)and(9):
incpc in equation (8) allowed us to estimate (9).
otherfactors in (9) allowed us to estimate (8)
Thisisformallycalledtheordercondition:inotherwords,the order condition states that at least one exogenous variable is excluded from the equation that we are interested in identifying.
Inallthesemodels,weassumethatdifferentexogenous variables are excluded from (1) and (2). These excluded variables we call them also exclusion restrictions.
z1 is called the excluded variable or exclusion restriction on the labor demand model
and z2 is called as excluded variable or exclusion restriction on the labor supply model.
19/29
More formally: Rank Condition
Theorderconditionisanecessary,butnotasufficient condition for identification. We also need the rank condition to hold. The rank condition requires more and says that:
The order condition holds AND
At least one of the exogenous variables excluded from the
first equation must have a nonzero population coefficient in the other equation.
We can actually test that using a t or an F test.
Giventhattheidentificationconditionshold(orderandrank conditions), the parameters of a simultaneous equations system can be consistently estimated by 2SLS (since we use an IV method).
20/29
Example: Labor supply of married, working women
Considerthelaborsupply(LS)equation:
hours = α1ln(wage)+β2educ+β3age+β4kid6+β5nwinc+u1
Thisequationisalaborsupplyrelationobtainedfrom women’s utility maximization.
Itexpresseshoursworkedbyamarriedwomanasa function of her wage, education, age, the number of preschool (age < 6) children, and non-wage income
21/29
Consider the labor demand (LD) equation: ln(wage) = α2hours + γ2educ + γ3age
+γ4exper +γ5exper2 +u2
Thesecondequationisalabordemandrelationobtained
from firms’ profit maximization.
Itexpressesthewagepaidasafunctionofhoursworked, the employee’s education, age, and a polynomial of her work experience.
Allvariablesexcepthoursandlog(wage)areassumedto be exogenous.
22/29
Forthefirstequation:
hours = α1ln(wage)+β2educ+β3age+β4kid6+β5nwinc+u1
Itsatisfiestheordercondition,because:
There are two excluded exogenous variables or exclusion
restrictions (exper and exper2 ) are omitted from the LS equation, but are included in LD.
Therankconditionforidentifyingthefirstequationisthat at least one of the exper and exper2 has a nonzero coefficient in the second equation.
ThisissomethingwecantestusingastandardFstatistic.
If both coefficients equal 0, there are no exogenous
variables appearing in the second equation that do not also appear in the first (educ and age appear in both).
We expect exper and exper2 to have nonzero coefficients (γ4andγ5 ̸= 0)
23/29
Forthesecondequation(labordemand): ln(wage) = α2hours + γ2educ + γ3age
+γ4exper +γ5exper2 +u2
Itsatisfiestheordercondition,because:
There are two excluded exogenous variables or exclusion
restrictions (kid6 and nwinc ) are omitted from the LD equation, but are included in LS.
Therankconditionforidentifyingthesecondequationis that at least one of the kid6 and nwinc has a nonzero coefficient in the first equation.
ThisissomethingwecantestusingastandardFstatistic.
If both coefficients equal 0, there are no exogenous
variables appearing in the first equation that do not also appear in the second (educ and age appear in both).
Weexpectkid6andnwinctohavenonzerocoefficients (δ3andδ4 ̸= 0)
24/29
Estimation of simultaneous equation systems by 2SLS.
Oncewehavedeterminedthatanequationisidentified, we estimate it by 2SLS.
Ifnotallequationsareidentified,onecanestimateonlythe identified ones.
Ifcertainadditionalconditionshold,onecanalsousemore efficient system estimation methods (Three Stage Least Squares, 3SLS).
Trade-offbetween2SLSand3SLS:
If all equations are identified and correctly specified with
valid exclusion restrictions, 3SLS (joint estimations of all
equations) estimates are more precise than that of 2SLS.
However, among all equations, as long as at least one
equation is misspecified (e.g. invalid IV/exclusion restriction) while all others except one equation are correctly specified, estimates in all equations will be contaminated.
25/29
Estimation of simultaneous equation systems by 2SLS
Estimationoflaborsupplyequation:
Run a regression of ln(wage) on
educ,age,kid6,nwinc,exper, exper2.
Find the predicted value for ln(wage) from the labor supply
equation (using only exogenous variables)
Use predicted value as IV for ln(wage) in labor supply equation.
In Stata, ivregress 2sls hours educ age kid6 nwinc (ln(wage) = exper exper2)
Estimationoflabordemandequation:
Run a regression of hour on
educ,age,kid6,nwinc,exper,exper2.
Use predicted value as IV for hour in labor demand
equation.
In Stata, ivregress 2sls ln(wage) educ age exper
exper2 (hour = kid6 nwinc)
26/29
Example: Inflation and openness
Romer(1993)proposedsometheoreticalmodelsof inflation that imply that more open countries should have lower inflation rate. He applied the IV method to the following equations.
inf = β10 + α1open + β11ln(incpc) + u1 (12) open = β20 + α2inf + β21ln(incpc) + β22ln(land) + u2
(13)
where inf is average annual inflation rates since 1973, open is average import share of GDP since 1973, incpc is 1980 income per capita, land is land area of the country.
ln(incpc)andln(land)areexogenousvariables.
ln(incpc)appearsinbothequations,butln(land)appears only in the second equation.
27/29
Example: Inflation and openness
Equation12satisfiestheordercondition,because:
There is one excluded exogenous variables or exclusion restrictions (ln(land)) that is omitted from equation 12 , but is included in equation 13.
Therankconditionforidentifyingequation12saysthat ln(land) should have a nonzero coefficient in equation 13 β22 ̸= 0.
Inotherwords,equation(12)isidentifiedprovidedthat β22 ̸= 0.
To verify the rejection of the null of H0 : β22 = 0, we need to use a t-test.
28/29
Example: Inflation and openness
Equation13doesNOTsatisfytheordercondition, because:
There is NO excluded exogenous variables or exclusion restrictions that is omitted from the equation 13, but is included in equation 12.
Sinceitdoesnotsatisfytheordercondition,itdoesnot satisfy the rank condition either.
Equation(13)isnotidentified.
Inthiscase,wewilluse2SLStoestimateEquation(12).
29/29