ECONOMETRICS I ECON GR5411
Lecture 2 Probability Review (cont’) by
Seyhan Erden Columbia University MA in Economics
Example cps09mar.dat:
• age: years, capped at 85
• race: 1 is white only, 2 is black only, 3 American Indian, Alaskan only, 4 Asian only, 5 Hawaiian/Pacific Islander (HP) only 6 White-Black, 7 White-AI, 8 White-Asian, 9 White-HP, 10 Black-AI, 11 Black-Asian, 12 Black-HP, 13 AI-Asian, 14 Asian-HP, 15 White-Black-AI, 16 White-Black-Asian, 17 White-AI-Asian, 18 White-Asian-HP, 19 White-Black-AI-Asian, 20 2or3races, 21 4 or 5 races
• female: 1 for females, 0 otherwise
• hispanic: 1 if Spanish, Hispanic, or Latino, 0 otherwise
• education: 0 Less than 1st grade, 4 1st, 2nd, 3rd, or 4th grade, 6 5th or 6th grade, 8 7th or 8th grade, 9 9th grade, 10 10th grade, 11 11th grade or 12th grade with no high school diploma, 12 High school graduate, high school diploma or equivalent, 13 Some college but no degree, 14 Associate degree in college, including occupation/vocation programs, 16 Bachelor’s degree or equivalent (BA, AB, BS), 18 Master’s degree (MA, MS, MENG, MED, MSW, MBA), 20 Professional degree or Doctorate degree (MD, DDS, DVM, LLB, JD, PHD, EDD)
• marital: 1 Married – civilian spouse present, 2 Married – Armed Forces spouse present, 3 Married – spouse absent (except separated) 4 Widowed, 5 Divorced, 6 Separated, 7 Never married
• l_earnings: Log of total annual wage and salary earnings
9/14/20 Lecture 1 GR5411 by Seyhan Erden 2
Example:
Using cps09mar.dta
. sum l_earnings age female race marital hours education
Variable
Obs
Mean
10.66288 42.13173 .4257223 1.433507 2.763174
43.82724 13.92462
Std. Dev.
.7013724 11.48762 .4944569
1.31743 2.503158
7.704467 2.744447
Min
0 15 0 1 1
36 0
Max
13.23763 85 1 21 7
99 20
l_earnings 50,742 age 50,742 female 50,742 race 50,742 marital 50,742
hours 50,742 education 50,742
.
9/14/20
Lecture 1 GR5411 by Seyhan Erden
3
Example joint, marginal, conditional; probabilities:
female
0 1
Total
Hispanic 0
43,192
Fisher’s exact =
1
7,550
Total
29,140 21,602
50,742
0.000
24,593 4,547 18,599 3,003
female
Hispanic 01
total
0
0.485
0.090
0.575
1
0.366
0.059
0.425
total
0.851
0.149
1
9/14/20
Lecture 1 GR5411 by Seyhan Erden 4
Example joint, marginal, conditional;
probabilities:
Joint Probabilities: ex: Pr 𝑓𝑒𝑚 = 1 𝒂𝒏𝒅 h𝑖𝑠 = 1 = 0.059 Marginal Probabilities: ex: Pr h𝑖𝑠 = 1 = 0.149 Conditional Probabilities: ex:
Pr 𝑓𝑒𝑚=1 h𝑖𝑠=1)=Pr 𝑓𝑒𝑚=1𝒂𝒏𝒅h𝑖𝑠=1 = 0.059 =0.396 Pr h𝑖𝑠 = 1 0.149
Conditional Probability Distribution: ex:
Female = 0
Female = 1
Total
Pr(𝐹𝑒𝑚|𝐻𝑖𝑠 = 0)
:.;<= =0.570 :.<=>
:.?@@ =0.430 :.<=>
1.00
Pr(𝐹𝑒𝑚|𝐻𝑖𝑠 = 1)
:.:A: =0.604 :.>;A
:.:=A =0.396 :.>;A
1.00
9/14/20 Lecture 1 GR5411 by Seyhan Erden
5
Covariance and Correlation:
The covariance of 𝑥 and 𝑦: 𝐶𝑜𝑣𝑥,𝑦=𝐸𝑥−𝜇K 𝑦−𝜇L
=𝐸𝑥𝑦 −𝜇K𝜇L = 𝜎KL
The correlation coefficient :
𝜌KL = 𝜎KL
𝜎K 𝜎L −1≤𝜌KL ≤1
9/14/20 Lecture 1 GR5411 by Seyhan Erden 6
Some general results:
𝑉𝑎𝑟 𝑎𝑥+𝑏𝑦+𝑐
= 𝑎V𝑉𝑎𝑟 𝑥 + 𝑏V𝑉𝑎𝑟 𝑦 + 2𝑎𝑏𝐶𝑜𝑣(𝑥, 𝑦)
𝐶𝑜𝑣 𝑎𝑥+𝑏𝑦,𝑐𝑥+𝑑𝑦
=𝑎𝑐𝑉𝑎𝑟𝑥 +𝑏𝑑𝑉𝑎𝑟𝑦 + 𝑎𝑑+𝑏𝑐 𝐶𝑜𝑣(𝑥,𝑦)
9/14/20 Lecture 1 GR5411 by Seyhan Erden 7
Earnings distribution:
0 5 10 15 l_earnings
9/14/20 Lecture 1 GR5411 by Seyhan Erden 8
Density
0 .2 .4 .6
Earnings distribution for females=1,0:
2 4 6 8 10 12 Earnings for Female=1
0 5 10 15 Earnings for Female=0
9/14/20 Lecture 1 GR5411 by Seyhan Erden 9
Density
0 .2 .4 .6 .8
Density
0 .2 .4 .6 .8
Earnings distribution for females=1,0:
. sum earnings if female==1
Variable Obs
earnings 21,602
. sum earnings if female==0
Mean Std. Dev.
44224.13 36547.19
Mean Std. Dev.
63147.73 60052.25
Min
4
Min
1
Max
466789
Max
561087
Variable
earnings
Obs
29,140
0 100000 200000 300000 400000 500000 Earnings for Female=1
0 200000 400000 600000 Earnings for Female=0
9/14/20
Lecture 1 GR5411 by Seyhan Erden
10
Density
0 5.0e-06 1.0e-05 1.5e-05 2.0e-05 2.5e-05
Density
0 5.0e-06 1.0e-05 1.5e-05
The Conditional Distribution:
Conditioning and the use of conditional distribution paramount in econometric modelling.
We will consider bivariate distribution without loss of generality. All of these results can be extended directly to multivariate case.
There is a conditional distribution of 𝑦 over each value of 𝑥. The conditional densities are
𝑓𝑦𝑥 =𝑓(𝑥,𝑦) 𝑓K ( 𝑥 )
where 𝑓K(𝑥) is the marginal probability density such that Y𝑓(𝑥,𝑦) 𝑖𝑛𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒𝑐𝑎𝑠𝑒
L 𝑓K 𝑥 =
\ 𝑓 𝑥, 𝑦 𝑑𝑦 𝑖𝑛 𝑡h𝑒 𝑐𝑜𝑛𝑡𝑖𝑛𝑢𝑜𝑢𝑠 𝑐𝑎𝑠𝑒 L
9/14/20
Lecture 1 GR5411 by Seyhan Erden 11
Regression: the conditional mean
A conditional mean is the mean of the conditional distribution and is defined as
Y𝑦𝑓 𝑦 𝑥
L 𝐸 𝑦|𝑥 =
\ 𝑦𝑓 𝑦|𝑥 𝑑𝑦 L
𝑖𝑓𝑦𝑖𝑠𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒
𝑖𝑓 𝑦 𝑖𝑠 𝑐𝑜𝑛𝑡𝑖𝑛𝑢𝑜𝑢𝑠
The conditional mean function 𝐸 𝑦|𝑥 is called the regression of 𝑦 on 𝑥.
9/14/20 Lecture 1 GR5411 by Seyhan Erden 12
Example about the conditional mean
Ex: Conditional mean (or conditional expectation) of earnings
𝐸 𝑒𝑎𝑟𝑛𝑖𝑛𝑔𝑠|𝑓𝑒𝑚𝑎𝑙𝑒 = 0 = $63,148.73 𝐸 𝑒𝑎𝑟𝑛𝑖𝑛𝑔𝑠|𝑓𝑒𝑚𝑎𝑙𝑒 = 1 = $44,224.13
More meaningful way of looking at the difference would be in percentages:
𝐸 𝐿𝑛(𝑒𝑎𝑟𝑛𝑖𝑛𝑔𝑠)|𝑓𝑒𝑚𝑎𝑙𝑒 = 0 = 10.78524 𝐸 𝐿𝑛(𝑒𝑎𝑟𝑛𝑖𝑛𝑔𝑠)|𝑓𝑒𝑚𝑎𝑙𝑒 = 1 = 10.49782
𝐸 𝐿𝑛(𝑒𝑎𝑟𝑛𝑖𝑛𝑔𝑠)|𝑓𝑒𝑚𝑎𝑙𝑒 = 0 − 𝐸 𝐿𝑛(𝑒𝑎𝑟𝑛𝑖𝑛𝑔𝑠)|𝑓𝑒𝑚𝑎𝑙𝑒 = 1 = 10.78524 − 10.49782
= 0.28742
A difference in expected log wages of 0.28 implies an average 28%
difference between the wages of men and women, which is quite
substantial.
9/14/20 Lecture 1 GR5411 by Seyhan Erden 13
Conditional variance
A conditional variance is the variance of the conditional distribution and is defined as
Y𝑦−𝐸𝑦|𝑥 V𝑓𝑦𝑥 𝑖𝑓𝑦𝑖𝑠𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒
L 𝑉𝑎𝑟 𝑦|𝑥 =
\ 𝑦−𝐸 𝑦|𝑥 V𝑓 𝑦|𝑥 𝑑𝑦 𝑖𝑓𝑦𝑖𝑠𝑐𝑜𝑛𝑡𝑖𝑛𝑢𝑜𝑢𝑠 L
The computation can be simplified using
𝑉𝑎𝑟𝑦|𝑥 =𝐸𝑦V|𝑥 − 𝐸𝑦|𝑥 V
The case where conditional variance does not vary with 𝑥 is called homoskedasticity (same variance) but usually
𝑉𝑎𝑟𝑦|𝑥 ≠𝑉𝑎𝑟𝑥
9/14/20 Lecture 1 GR5411 by Seyhan Erden 14
Law of Iterated Expectations (LIE):
𝐸 𝑦 = 𝐸K 𝐸 𝑦|𝑥
𝐸K e means the expectations over the values of 𝑥.
The simple law states the expectation of the conditional expectation is the unconditional expectation.
In other words, the average of the conditional averages is the unconditional average.
When 𝑥 is discrete like 𝑓𝑒𝑚𝑎𝑙𝑒 variable:
𝐸 𝐿𝑛(𝑒𝑎𝑟𝑛𝑖𝑛𝑔𝑠) = 𝐸 𝐸 𝐿𝑛 𝑒𝑎𝑟𝑛𝑖𝑛𝑔𝑠 |𝑓𝑒𝑚𝑎𝑙𝑒
= 𝐸 𝐿𝑛 𝑒𝑎𝑟𝑛𝑖𝑛𝑔𝑠 |𝑓𝑒𝑚𝑎𝑙𝑒 = 0 𝑃𝑟𝑜𝑏 𝑓𝑒𝑚𝑎𝑙𝑒 = 0 +𝐸 𝐿𝑛 𝑒𝑎𝑟𝑛𝑖𝑛𝑔𝑠 |𝑓𝑒𝑚𝑎𝑙𝑒 = 1 𝑃𝑟𝑜𝑏 𝑓𝑒𝑚𝑎𝑙𝑒 = 1
9/14/20 Lecture 1 GR5411 by Seyhan Erden 15
As an example to LIE
. sum l_earnings if female==0
Variable Obs Mean
l_earnings 29,140 10.78524 . sum l_earnings if female==1
Variable Obs Mean
l_earnings 21,602 10.49782 . sum l_earnings female
Variable Obs Mean
l_earnings 50,742 10.66288 female 50,742 .4257223
Std. Dev.
.7285253
Std. Dev.
Min Max
0 13.23763
Min Max
.6262008 1.386294 13.05363
Std. Dev. Min Max
.7013724 0 13.23763 .4944569 0 1
When 𝑥 is discrete like 𝑓𝑒𝑚𝑎𝑙𝑒 variable: 𝐸 𝐿𝑛(𝑒𝑎𝑟𝑛𝑖𝑛𝑔𝑠)
= 𝐸 𝐿𝑛 𝑒𝑎𝑟𝑛𝑖𝑛𝑔𝑠 |𝑓𝑒𝑚𝑎𝑙𝑒 = 0 𝑃𝑟𝑜𝑏 𝑓𝑒𝑚𝑎𝑙𝑒 = 0
+𝐸 𝐿𝑛 𝑒𝑎𝑟𝑛𝑖𝑛𝑔𝑠 |𝑓𝑒𝑚𝑎𝑙𝑒 = 1 𝑃𝑟𝑜𝑏 𝑓𝑒𝑚𝑎𝑙𝑒 = 1
or numerically
10.66288 = 10.78524 .5742777 + 10.49782 .4257223
9/14/20 Lecture 1 GR5411 by Seyhan Erden 16
Another example to LIE
2-17
Law of Iterated Expectations (LIE)
𝐸𝑀 =𝐸𝐸𝑀|𝐴
Let’s show this with the above example:
;
𝐸𝑀 =Y𝑀iePr𝑀i
ij:
= 0 .8 + 1 .1 + 2 .06 + 3 .03 + 4 (.01)
= 0.35
;
𝐸𝐸𝑀|𝐴 =𝐸Y𝑀iePr𝑀i|𝐴 > ij:;
=Y Y𝑀i ePr 𝑀i|𝐴k kj: ij:
ePr 𝐴k
9/14/20
Lecture 1 GR5411 by Seyhan Erden 18
Law of Iterated Expectations (LIE)
;>;
𝐸𝐸𝑀|𝐴 =𝐸Y𝑀i×Pr𝑀i|𝐴 =YY𝑀i×Pr𝑀i|𝐴k ePr𝐴k ij: kj: ij:
>;
=YY𝑀i×Pr𝑀i|𝐴k ×Pr𝐴k kj: ij:
= 0 .7+1 .13+2 .1+3 .05+4(.02)×(.5) + 0 .9+1 .07+2 .02+3 .01+4(0)×(.5)
= .56 .5 + .14 .5 =.28+.07=.35
9/14/20 Lecture 1 GR5411 by Seyhan Erden 19
Proof of Law of Iterated Expectations
𝐸𝐸𝑌|𝑋 =𝐸Y𝑦ePr𝑌=𝑦|𝑋 L
=Y Y𝑦e𝑃𝑟 𝑌=𝑦|𝑋=𝑥 KL
=YY𝑦e𝑃𝑟 𝑌=𝑦,𝑋=𝑥 KL
e𝑃 𝑋=𝑥
We can switch the summation signs, if the series is finite, then
=YY𝑦e𝑃𝑟 𝑌=𝑦,𝑋=𝑥 LK
=Y𝑦Y𝑃𝑟 𝑌=𝑦,𝑋=𝑥 LK
=Y𝑦e𝑃𝑟𝑌=𝑦 =𝐸(𝑌)
9/14/20 L Lecture 1 GR5411 by Seyhan Erden 20
Proof of Law of Iterated Expectations
By the definition of expected value,
p
𝐸 𝐸 𝑌|𝑋 = \ 𝐸 𝑌|𝑋 = 𝑥 𝑓q 𝑥 𝑑𝑥 op
By the definition of conditional expectations,
=\\𝑦𝑓r|qjK 𝑦𝑑𝑦𝑓q 𝑥𝑑𝑥 KL
=\\𝑦𝑓r|qjK 𝑦 𝑓q 𝑥 𝑑𝑦𝑑𝑥 KL
9/14/20
Lecture 1 GR5411 by Seyhan Erden 21
=\\𝑦𝑓rq 𝑦,𝑥 𝑑𝑦𝑑𝑥 KL
where 𝑓rq is the joint probability density function =\𝑦\𝑓rq 𝑦,𝑥 𝑑𝑥𝑑𝑦
LK
by marginalizing the joint pdf
=\𝑦𝑓r 𝑦𝑑𝑦 L
by the definition of expected value
=𝐸𝑌
9/14/20
Lecture 1 GR5411 by Seyhan Erden 22