Chris Hansman
Empirical Finance: Methods and Applications Imperial College Business School
Topic 1: OLS and the Conditional Expectation Function
Copyright By PowCoder代写 加微信 powcoder
Consider random variable yi and (variables) Xi Which of the following is false,
(a) Xi′βOLS provides the best predictor of yi out of any function of Xi (b) Xi′βOLS is the best linear approximation of E[yi|Xi]
(c) yi −E[yi|Xi] is uncorrelated with Xi
Part 1: The Conditional Expectation Function (CEF)
We are often interested in the relationship between some outcome yi and a variable (or set of variables) Xi
A useful summary is the conditional expectation function: E [yi |Xi ]
Gives the mean of yi when Xi takes any particular value
Formally, if fy (·|Xi ) is the conditional p.d.f. of yi |Xi :
E[yi|Xi]= zfy(z|Xi)dz
E [yi |Xi ] is a random variable itself: a function of the random Xi
Can think of it as E[yi|Xi] = h(Xi)
Alternatively, evaluate it at particular values: for example Xi = 0.5
E [yi |Xi = 0.5] is just a number!
The Conditional Expectation Function: E[Y|X]
E[H|Age=5]
E[H|Age=10]
E[H|Age=15]
E[H|Age=20] E[H|Age=25] E[H|Age=30] E[H|Age=35]
E[H|Age=40]
0 5 10 15 20 25 30 35 40 Age
Height (Inches)
30 40 50 60 70 80
Three Useful Properties of E[X|Y]
(i) The law of iterated expectations (LIE): E[E[yi|Xi]] = E[yi]
(ii) The CEF Decomposition Property:
Any random variable yi can be broken down into two pieces
yi = E[yi|Xi]+εi
Where the residual εi has the following properties:
(a) E[εi|Xi]=0(“meanindependence”) (b) εi uncorrelated with any function of Xi
(iii) Out of any function of Xi, E[yi|Xi] is the best predictor of yi:
E[yi|Xi] = arg min E[(yi −m(Xi))2] m(Xi )
Summary: Why We Care About Conditional Expectation Functions
Useful tool for describing relationship between yi and Xi
Several nice properties
Most statistical tests come down to comparing E[yi|X] at certain Xi Classic example: experiments
Part 2: Ordinary Least Squares
Linear regression is arguably the most popular modeling approach across every field in the social sciences
Transparent, robust, relatively easy to understand
Provides a basis for more advanced empirical methods
Extremely useful when summarizing data
Plenty of focus on the technical aspects of OLS last term Focus on an applied perspective
OLS Estimator Fits a Line Through the Data
●●●●● ● ●●
● ● ●● ● ●● ●●●●
●●●● ●● ● ● ●●
● ● ●●● ●● ●●●● ●●
●● ●● ●●●●● ●●
● ●● ●● ●● ● ●
● ● ●● ●●●●
● ●●●● ● ● ● ●● ● ● ● ●●● ●●●
● ●●●●●● ●●
●●● ●● ●● ●
●●● ●● ● ●● ● ●
●● ● ●●●●● ●● ● ● ●●
●● ●●●●● ●●●●● ● ● ●● ● ● ● ●●
● ●● ●●● ● ●● ● ●●●● ● ●●●●●●●● ●● ●●●●● ●●● ●●●● ●
●●●● ● ●● ● ●●
●● ● ●● ●●●●●●●●●
● ● ●●● ● ● ● ● ● ●● ●● ● ●● ● ●● ● ●● ● ● ●● ●●●●●●●●●●● ●●●● ●●●
●● ●●●●●● ● ● ●●● ●●●●●●●●●●●●● ●
● ● ●●●● ●●● ● ●●●●●●●●● ● ● ● ● ●●●● ●● ● ● ●● ● ●● ●●● ● ●● ● ● ●●● ● ●●●● ●
●● ●● ●● ●● ●●● ●●● ● ● ●●●●●●●●●●●●●●●●● ●● ●●●●●● ●●● ●● ●● ●● ● ●●
●●● ●●●● ● ●● ● ●● ● ● ● ●●●●●●●● ●●●
● ●●●●●●● ● ● ●● ●●●●●●●●●●●●●●
●● ●●●●●●●●●●● ●
●●● ● ● ● ●●● ●●
●●●●●●●● ●
● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ●
●●●● ●●●● ●●
● ●● ● ●●●
●●●●● ● ●●
Choosing the (Population) Regression Line
yi =β0+β1xi+vi
An OLS regression is simply choosing the β OLS , β OLS that make vi
as “small” as possible on average How do we define “small”?
Want to treat positive/negative the same: consider vi2
Choose βOLS,βOLS to minimize: 01
E[vi2] = E[(yi −β0 −β1xi)2]
Regression and The CEF
Given yi and Xi The population regression coefficient is: βOLS = E[X X′]−1E[X y ]
A useful time to note: you should remember the OLS estimator: βˆOLS = (X′X)−1X′Y
With just one xi :
βˆOLS = Cov(xi,yi)
Regression and the Conditional Expectation Function
Why is linear regression so popular?
Simplest way to estimate (or approximate) conditional expectations!
Three simple results
OLS perfectly captures CEF if CEF is Linear
OLS generates best linear approximation to the CEF if not
OLS perfectly captures CEF with binary (dummy) regressors
Regression captures CEF if CEF is Linear
Take the special case of a linear conditional expectation function:
E[y |X ] = X′β iii
Then OLS captures E [yi |Xi ]
●● ● ●● ●●
●● ●● ●●●●● ●● ●
● ●●● ● ●● ● ● ●●●●
●●●● ●● ● ● ● ●
● ● ●●● ● ● ● ●● ● ●●●● ●●
● ●● ●● ● ●●● ●●●●● ●●●
● ●●● ● ●● ●● ●●
● ● ●●●●●●●●●● ●●
●●●●● ● ●●●● ● ●●●●●●●● ● ● ●●● ●●● ● ● ●●●● ● ● ● ●
●●●●●●● ● ●●
● ●● ●●●●●●●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●
● ●●●●●●●●●● ●●
● ●●●● ●●●●● ●
● ●●●● ●●●●● ●● ●●●●● ● ●●● ●●●● ●●●●
● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ●● ●●
● ●● ●●● ●
●● ● ●● ● ● ●● ● ●
●● ● ●●●●● ●● ● ● ●●
●● ●● ●● ● ●● ● ● ●●● ● ● ● ● ● ● ●●
● ●● ●●● ●●●● ● ●●●●
● ●●●●●●●●● ●● ●● ●●●
● ●●●●● ●●● ● ●●● ●
● ● ●●●●●●●●
●●● ● ●●●●●●● ●
● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ●●●● ●●● ● ●● ●
● ●●●●● ●● ●● ●● ● ● ● ● ● ●●●● ●●●● ● ●● ● ●●● ●●●●●● ●●●●●●● ● ●● ●●●
●●●● ● ●●●●● ● ●●● ●● ●● ●
● ●●● ● ●●●● ● ●
●●● ● ● ●●
●●● ●●●● ●●
●●●● ●●●● ● ●
●●●●● ●●● ●
OLS Provides Approximation to CEF
● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ●● ●● ● ●● ● ● ●● ●●● ●● ● ●●● ●● ● ●● ●●●●
●●●●●● ●● ●●●● ● ●●●● ●
● ● ● ●● ● ● ● ●● ● ● ● ●●● ●● ●●● ●
●● ● ●● ● ●
● ● ●●●●● ●●
● ● ●● ● ●● ●●
●● ●●●●●●●
● ● ●●●●● ●● ●●
●● ● ● ●● ●● ●●●●●●● ●●●●●●
● ● ●●● ●●●
●●●●●● ●● ●
● ●● ●●● ● ● ●● ●● ● ● ●● ●●●● ● ● ●● ●
● ●●●●● ● ●●
●● ● ● ● ● ●●●
● ● ● ● ● ● ●
●● ●●●● ●● ●●●
●●● ● ● ●●● ●● ●●●●● ●●●● ●●●●●●●● ●
●●● ● ●●● ● ●● ●
● ● ●● ● ● ●● ●●●● ●●● ●●●●● ● ●
● ● ● ●● ●
●●●●●● ● ● ● ●●●● ●
●●●●●● ● ●● ●●● ●
● ● ●● ● ● ● ● ●● ●●
● ●● ● ● ● ● ●● ●● ●● ●●
●●●● ●●●●● ● ● ●●●● ●
● ●●●● ● ●●●
● ●●●● ●● ● ● ● ● ● ● ● ● ●● ●● ●
●●●● ●●● ● ● ● ●● ● ● ●● ●●●●● ●● ●●●●●●● ●●●●●●●
● ● ● ●● ● ●
●●● ●●●●●● ● ●●●●● ●●● ●● ●● ●●● ●●●
● ●●● ● ●●●●● ●●●●
●● ● ●● ●● ●
●● ●●● ●●●●
● ● ●●● ● ●● ●●●●
Implementing Regressions with Categorical Variables
What if we are interested in comparing all 11 GISCS sectors?
Create dummy variables for each sector omitting 1
Lets call them D1i,···,D10i
pricei = β0 +δ1D1i +···+δ10D10i +vi
In other words Xi = [1 D1i ···D10]′ or
1 0 ··· 1 0
1 1 ··· 0 0
1 0 ··· 0 0
1 0 ··· 1 0 X=
Regress pricei on a constant and those 10 dummy variables
1 0 ··· 0 1 . . . . .
. . .. . . 1 1 ··· 0 0
Average Share Price by Sector for Some S&P Stocks
OLS OLS δ2 δ3
Share Price (USD)
0 20 40 60 80 100
Cons. Discret. Cons. Financials
Health Care Industrials
IT Materials
Real Estate Telecom
Implementing Regressions with Dummy Variables
βˆOLS (coef. on the constant) is the mean for the omitted category: 0
In this case “Consumer Discretionary”
The coefficient on each dummy variable (e.g. δˆOLS) is the difference
between βˆOLS and the conditional mean for that category
Key point: If you are only interested in categorical variables… You can perfectly capture the full CEF in a single regression
For example:
E[price|sector =consumerstaples]=βOLS+δOLS
ii 01 E[price|sector =energy]=βOLS+δOLS
Topic 2: Causality and Regression
Suppose wages (yi ) are determined by:
yi =β0+β1xi+γai+ei
and we see years of schooling (xi ) but not ability (ai ) Corr(xi,ai) > 0 and Corr(yi,ai) > 0
We estimate:
And recover
yi =β0+β1xi+vi
βOLS =β +γδOLS 111
Is our estimated β OLS larger or smaller than β1 ? 1
Part 1: The Potential Outcomes Framework
Ideally, how would we find the impact of candy on evaluations (yi )? Imagine we had access to two parallel universes and could observe
The exact same student (i)
At the exact same time
In one universe they receive candy—in the other they do not
And suppose we could see the student’s evaluations in both worlds Define the variables we would like to see: for each individual i:
yi1 = evaluation with candy yi0 = evaluation without candy
The Potential Outcomes Framework
If we could see both yi1 and yi0 impact would be easy to find: The causal effect or treatment effect for individual i defined as
Would answer our question—but we never see both yi1 and yi0!
Some people call this the “fundamental problem of causal inference” Intuition: there are two “potential” worlds out there
The treatment variable Di decides which one we see:
yi1 if Di = 1 yi= yi0ifDi=0
So What Do Differences in Conditional Means Tell You?
E[yi1|Di = 1]−E[yi0|Di = 0] = E[yi1|Di = 1]−E[yi0|Di = 1]
Average Treatment Effect for the Treated Group +E[yi0|Di = 1]−E[yi0|Di = 0]
Selection Effect ̸= E[yi1]−E[yi0]
Average Treatment Effect
So our estimate could be different from the average effect of treatment E[yi1]−E[yi0] for two reasons:
(1) The morning section might have given better reviews anyway: E[yi0|Di = 1]−E[yi0|Di = 0] > 0
Selection Effect
(2) Candy matters more in the morning:
E[yi1|Di = 1]−E[yi0|Di = 1] ̸= E[yi1]−E[yi0]
Average Treatment Effect for the Treated Group Average Treatment Effect
Part 2: Causality and Regression
yi =β0+β1xi+vi
Regression coefficient captures causal effect (β OLS = β ) if:
E[vi|xi] = E[vi] Fails anytime corr (xi , vi ) ̸= 0
An aside: we have used similar notation for 3 different things: 1. β1: the causal effect on yi of a 1 unit change in xi.
2. β OLS = Cov (xi ,yi ) : the population regression coefficient 1 Var(xi)
3. βˆOLS = Cov (xi ,yi ) : the sample regression coefficient
1 Var (xi )
Omitted Variables Bias
Soifwehave:
What will the regression of yi on xi give us?
Recall that the regression coefficient is β OLS = Cov (yi ,xi ) : 1 Var(xi)
βOLS = Cov(yi,xi) 1 Var(xi)
= β1 + Cov (vi , xi ) Var(xi)
yi =β0+β1xi+vi
Part 3: Instrumental Variables
Suppose we have the following specification: yi =β0+β1xi+vi
And zi is a potential instrumental variable
Which of these is not a necessary assumption for zi to be a valid IV:
(a) Cov[zi,xi]̸=0 (b) Cov[zi,yi]=0 (c) Cov[zi,vi]=0
Instrumental Variables: Suppose Corr (vi , xi ) ̸= 0
yi =β0+β1xi+vi
Our informal assumption: zi should change xi , but have absolutely
no other impact on yi . Formally:
1. Cov [zi , xi ] ̸= 0 (Instrument Relevance)
Intuition: zi must change xi
2. Cov [zi , vi ] = 0 (Exclusion Restriction)
Intuition: zi has absolutely no other impact on yi
Recall that vi is everything else outside of xi that influences yi
A few different ways of getting βIV in practice (1) If we run two regressions:
1. “First stage” impact of zi on Xi
Xi = α1 + φ zi + ηi
2. “Reduced form” impact of zi on =α2+ρzi+ui
Then we can write:
Impact of Zi onYi
βIV = Cov(Yi,zi)/Var(Zi) = ρOLS Cov(Xi,zi)/Var(Zi) φOLS
Impact of Zi on Xi
We can estimate these in a sample with two regressions, or simply by calculating two sample covariances
A few different ways of getting βIV in practice (2)
A more common way of estimating βIV (two stage least squares):
1. Estimate φOLS in first stage:
Xi = α1 + φ zi + ηi
2. Predict the part of Xi explained by zi
Xˆ =αOLS+φOLSz
i1i 3. Regress Yi on predicted Xˆi in a second stage
Note that:
Y i = α 2 + β Xˆ i + u i
β2ndStage = Cov(Yi,Xˆi) = β V a r ( Xˆ i )
Topic 3: Panel Data and Diff-in-Diff
The average coursework grade in the Morning class is 68 The average coursework grade in the Afternoon class is 75 Suppose we run the following regression:
Coursework = β0 + β1Afternooni + vi What is the value of β0?
(a) 68 (b) 75 (c) 7
Part 1: Panel data
Panel data consists of observations of the same n units in T different periods
If the data contains variables x and y, we write them (xit,yit)
fori=1,···,N
i denotes the unit, e.g. Microsoft or Apple
andt=1,···,T
t denotes the time period, e.g. September or October
Panel Data and Omitted Variables
Lets reconsider our omitted variables problem: yit =β0+β1xit+γai+eit
Suppose we see xit and yit but not ai
Suppose Corr(xit,eit) = 0 but Corr(ai,xi) ̸= 0
Note that we are assuming ai doesn’t depend on t
First Difference Regression
yit=β0+β1xit+ vit
Suppose we see two time periods t = {1, 2} for each i We can write our two time periods as:
yi,1 = β0 +β1xi,1 +γai +ei,1
yi,2 = β0 +β1xi,2 +γai +ei,2
Taking changes (differences) gets rid of fixed omitted variables
∆yi,2−1 = β1∆xi,2−1 +∆ei,2−1
Fixed Effects Regression
yit =β0+β1xit+γai+eit
An alternative approach:
Lets define δi = γai and rewrite:
yit =β0+β1xit+δi+eit So yi is determined by
(i) The baseline intercept β0 (ii) The effect of xi
(iii) An individual specific change in the intercept: δi Intuition behind fixed effects: Lets just estimate δi
Fixed Effects Regression: Implementation
yit = β0 +β1xit + ∑ δiDi +eit
Note that we’ve left out DN
βOLS is interpreted as the intercept for individual N:
βOLS=E[y|x =0,i=N] 0 itit
and for all other i (e.g. i=1)
δ2 = E[yit|xit = 0,i = 1]−β0
This should look familiar
Part 2: General Form of Diff-in-Diff
We are interested in the impact of some treatment on outcome Yi
Suppose we have a treated group and a control group
Let Di =1 be a dummy equal to 1 if i belongs to the treatment
And suppose we see both groups before and after the treatment occurs
Let Aftert = 1 be equal to 1 if time t is after the treatment date Yit =β0+β1Di×Aftert+β2Di+β3Aftert+vit
Diff-in-Diff Graphically
Treatment (Delaware)
Control (Non−Delaware)
When Does Diff-in-Diff Identify A Causal Effect
As usual, we need
E[vit|Di,Aftert] = E[vit]
What does this mean intuitively?
Parallel trends assumption: In the absence of any reform the
average change in leverage would have been the same in the treatment and control groups
In other words: trends in both groups are similar
Parallel Trends
Parallel trends does not require that there is no trend in leverage Just that it is the same between groups
Does not require that the levels be the same in the two groups What does it look like when the parallel trends assumption fails?
When Parallel Trends Fails
Treatment (Delaware)
Control (Non−Delaware)
Topic 4: Regularization
1. Basics of Ridge, LASSO and Elastic Net
2. How to choose hyperparameter λ: cross-validation
We Are Given 100 Observations of yi
0 25 50 75 100
Observation
●●●● ●●●●●
● ●●● ●●●●●
●● ●●● ●●●●
How Well Can We Predict Out-of-Sample Outcomes (yoos) i
0 25 50 75 100
Observation
●●● ●●●●●●
● ● ●●● ●●●●●●
● ●●●● ●●●
Outcome and Prediction
A Good Model Has Small Distance (yoos −yˆoos)2 i
0 25 50 75 100
Observation
●● ●● ●●● ●
●●●●●● ●●● ●●●● ● ●●●
●●●● ●●●●●●●
●●●●●● ●●●●●
●●●● ●● ● ● ● ●
● ●● ●● ●●●●●●
●●●●●●●●●●●
●●●● ● ● ●●● ●●●●●
●●●●●● ●●●● ●●●●●
● ●● ● ● ● ●● ● ●●●
●●●●●● ● ●●●
●●● ●● ●●●●
Outcome and Prediction
Solution to The OLS Problem: Regularization
With 100 Observations OLS Didn’t do Very Well Solution: regularization
LASSO/RIDGE/Elastic Net
Simplest version of elastic net (nests LASSO and RIDGE):
β Ni=1 k=1 k=1 For α =1 is this LASSO or RIDGE?
ˆelastic 1N 2 K K2 β =argmin ∑(yi−β0−β1x1i···−βKxKi) +λ α ∑|βk|+(1−α)∑βk
LASSO Coefficients With 100 Observations (λ=0.2)
5 10 15 20 25 30 35 40 45 50
X Variable
Coefficient Estimate
LASSO Coefficients With 100 Observations (λ=1)
5 10 15 20 25 30 35 40 45 50
X Variable
Coefficient Estimate
LASSO Coefficients With 100 Observations (λ=3)
5 10 15 20 25 30 35 40 45 50
X Variable
Coefficient Estimate
LASSO Coefficients For All λ
49 49 46 35 21 10 3
−5 −4 −3 −2 −1 0 1
Log Lambda
Coefficients
−1 0 1 2 3 4 5
How to Choose λ : k-fold Cross Validation Partition the sample into k equal folds
The default for R is k=10
For our sample, this means 10 folds with 10 observations each
Cross-validation proceeds in several steps:
1. Choose k-1 folds (9 folds in our example, with 10 observations each)
2. Run LASSO on these 90 observations
3. find βlasso(λ) for all 100 λ
4. Compute MSE(λ) for all λ using remaining fold (10 observations)
5. Repeat for all 10 possible combinations of k-1 folds
Can construct means and standard deviations of MSE(λ) for each λ
provides 10 estimates of MSE (λ ) for each λ Choose λ that gives small mean MSE(λ)
Cross Validated Mean Squared Errors for all λ 49 48 48 49 47 46 45 34 31 21 15 11 5 4 3 1
−5 −4 −3 −2 −1 0 1
log(Lambda)
Mean−Squared Error
30 40 50 60 70 80 90
Topic 5: Observed Factor Models
Suppose xt is a vector of asset returns, and B is a matrix of factor loadings
Which has higher dimension: (a) B
(b) Σx =Cov(xt)
Observed Factor Models
1. General Framing of Linear Factor Models
2. Single Index Model and the CAPM
3. Multi-Factor Models Fama-French
Macroeconomic Factors 4. Barra approach
Linear Factor Models
Assume that returns xit are driven by K common factors: xi,t = αi +β1,if1,t +β2,if2,t +···+βK,ifK,t +εit
ft = (f1,t,f2,t,··· ,fK,t)′ is the set of common factors
These are the same for all assets (constant over i)
But change over time (different for t , t + 1)
Each ft has dimension (K × 1)
βi = (β1,i,β2,i,··· ,βK,i)′ is the set of factor loadings
K different parameters for each asset
But constant over time (same for all t)
Fixed, specific relationship between asset i and factor k
Linear Factor Model
xt =α+Bft+εt Summary of Parameters
α : (m × 1) intercepts for m assets
B:(m×K)loadings(βik)onK factorsformassets
μf : (K × 1) vector of means for K factors
Ωf : (K × K ) variance covariance matrix of factors
Ψ: (m × m) diagonal matrix of asset specific variances
Given our assumptions xt is m-variate covariance stationary with: E[xt|ft] = α + [xt|ft] = Ψ E[xt]=μx =α+Bμf Cov[xt]=Σx =BΩfB′+Ψ
The Index Model: First Pass
xi =αi1T +Rmβi +εi
Estimate OLS regression on time-series version of our factor specification
One regression for each asset i
Recover two parameters αˆi and βˆi for each asset i Ωˆf is just the sample varianc
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com