程序代写 OLS 111 - PowCoder代写

Chris Hansman
Empirical Finance: Methods and Applications Imperial College Business School

Topic 1: OLS and the Conditional Expectation Function

􏰀 Consider random variable yi and (variables) Xi 􏰀 Which of the following is false,
(a) Xi′βOLS provides the best predictor of yi out of any function of Xi (b) Xi′βOLS is the best linear approximation of E[yi|Xi]
(c) yi −E[yi|Xi] is uncorrelated with Xi

Part 1: The Conditional Expectation Function (CEF)
􏰀 We are often interested in the relationship between some outcome yi and a variable (or set of variables) Xi
􏰀 A useful summary is the conditional expectation function: E [yi |Xi ]
􏰀 Gives the mean of yi when Xi takes any particular value
􏰀 Formally, if fy (·|Xi ) is the conditional p.d.f. of yi |Xi : 􏰊
E[yi|Xi]= zfy(z|Xi)dz
􏰀 E [yi |Xi ] is a random variable itself: a function of the random Xi
􏰀 Can think of it as E[yi|Xi] = h(Xi)
􏰀 Alternatively, evaluate it at particular values: for example Xi = 0.5
E [yi |Xi = 0.5] is just a number!

Three Useful Properties of E[X|Y]
(i) The law of iterated expectations (LIE): E[E[yi|Xi]] = E[yi]
(ii) The CEF Decomposition Property:
􏰀 Any random variable yi can be broken down into two pieces
yi = E[yi|Xi]+εi
􏰀 Where the residual εi has the following properties:
(a) E[εi|Xi]=0(“meanindependence”) (b) εi uncorrelated with any function of Xi
(iii) Out of any function of Xi, E[yi|Xi] is the best predictor of yi:
E[yi|Xi] = arg min E[(yi −m(Xi))2] m(Xi )

Summary: Why We Care About Conditional Expectation Functions
􏰀 Useful tool for describing relationship between yi and Xi
􏰀 Several nice properties
􏰀 Most statistical tests come down to comparing E[yi|X] at certain Xi 􏰀 Classic example: experiments

Part 2: Ordinary Least Squares
􏰀 Linear regression is arguably the most popular modeling approach across every field in the social sciences
􏰀 Transparent, robust, relatively easy to understand
􏰀 Provides a basis for more advanced empirical methods
􏰀 Extremely useful when summarizing data
􏰀 Plenty of focus on the technical aspects of OLS last term 􏰀 Focus on an applied perspective

OLS Estimator Fits a Line Through the Data
●●●●● ● ●●
● ● ●● ● ●● ●●●●
●●●● ●● ● ● ●●
● ● ●●● ●● ●●●● ●●
●● ●● ●●●●● ●●
● ●● ●● ●● ● ●
● ● ●● ●●●●
● ●●●● ● ● ● ●● ● ● ● ●●● ●●●
● ●●●●●● ●●
●●● ●● ●● ●
●●● ●● ● ●● ● ●
●● ● ●●●●● ●● ● ● ●●
●● ●●●●● ●●●●● ● ● ●● ● ● ● ●●
● ●● ●●● ● ●● ● ●●●● ● ●●●●●●●● ●● ●●●●● ●●● ●●●● ●
●●●● ● ●● ● ●●
●● ● ●● ●●●●●●●●●
● ● ●●● ● ● ● ● ● ●● ●● ● ●● ● ●● ● ●● ● ● ●● ●●●●●●●●●●● ●●●● ●●●
●● ●●●●●● ● ● ●●● ●●●●●●●●●●●●● ●
● ● ●●●● ●●● ● ●●●●●●●●● ● ● ● ● ●●●● ●● ● ● ●● ● ●● ●●● ● ●● ● ● ●●● ● ●●●● ●
●● ●● ●● ●● ●●● ●●● ● ● ●●●●●●●●●●●●●●●●● ●● ●●●●●● ●●● ●● ●● ●● ● ●●
●●● ●●●● ● ●● ● ●● ● ● ● ●●●●●●●● ●●●
● ●●●●●●● ● ● ●● ●●●●●●●●●●●●●●
●● ●●●●●●●●●●● ●
●●● ● ● ● ●●● ●●
●●●●●●●● ●
● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ●
●●●● ●●●● ●●
● ●● ● ●●●
●●●●● ● ●●

Choosing the (Population) Regression Line
yi =β0+β1xi+vi
􏰀 An OLS regression is simply choosing the β OLS , β OLS that make vi
as “small” as possible on average 􏰀 How do we define “small”?
􏰀 Want to treat positive/negative the same: consider vi2
􏰀 Choose βOLS,βOLS to minimize: 01
E[vi2] = E[(yi −β0 −β1xi)2]

Regression and The CEF
􏰀 Given yi and Xi The population regression coefficient is: βOLS = E[X X′]−1E[X y ]
􏰀 A useful time to note: you should remember the OLS estimator: βˆOLS = (X′X)−1X′Y
􏰀 With just one xi :
βˆOLS = Cov(xi,yi)

Regression and the Conditional Expectation Function
􏰀 Why is linear regression so popular?
􏰀 Simplest way to estimate (or approximate) conditional expectations!
􏰀 Three simple results
􏰀 OLS perfectly captures CEF if CEF is Linear
􏰀 OLS generates best linear approximation to the CEF if not
􏰀 OLS perfectly captures CEF with binary (dummy) regressors

Regression captures CEF if CEF is Linear
􏰀 Take the special case of a linear conditional expectation function:
E[y |X ] = X′β iii
􏰀 Then OLS captures E [yi |Xi ]
●● ● ●● ●●
●● ●● ●●●●● ●● ●
● ●●● ● ●● ● ● ●●●●
●●●● ●● ● ● ● ●
● ● ●●● ● ● ● ●● ● ●●●● ●●
● ●● ●● ● ●●● ●●●●● ●●●
● ●●● ● ●● ●● ●●
● ● ●●●●●●●●●● ●●
●●●●● ● ●●●● ● ●●●●●●●● ● ● ●●● ●●● ● ● ●●●● ● ● ● ●
●●●●●●● ● ●●
● ●● ●●●●●●●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●
● ●●●●●●●●●● ●●
● ●●●● ●●●●● ●
● ●●●● ●●●●● ●● ●●●●● ● ●●● ●●●● ●●●●
● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ●● ●●
● ●● ●●● ●
●● ● ●● ● ● ●● ● ●
●● ● ●●●●● ●● ● ● ●●
●● ●● ●● ● ●● ● ● ●●● ● ● ● ● ● ● ●●
● ●● ●●● ●●●● ● ●●●●
● ●●●●●●●●● ●● ●● ●●●
● ●●●●● ●●● ● ●●● ●
● ● ●●●●●●●●
●●● ● ●●●●●●● ●
● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ●●●● ●●● ● ●● ●
● ●●●●● ●● ●● ●● ● ● ● ● ● ●●●● ●●●● ● ●● ● ●●● ●●●●●● ●●●●●●● ● ●● ●●●
●●●● ● ●●●●● ● ●●● ●● ●● ●
● ●●● ● ●●●● ● ●
●●● ● ● ●●
●●● ●●●● ●●
●●●● ●●●● ● ●
●●●●● ●●● ●

OLS Provides Approximation to CEF
● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ●● ●● ● ●● ● ● ●● ●●● ●● ● ●●● ●● ● ●● ●●●●
●●●●●● ●● ●●●● ● ●●●● ●
● ● ● ●● ● ● ● ●● ● ● ● ●●● ●● ●●● ●
●● ● ●● ● ●
● ● ●●●●● ●●
● ● ●● ● ●● ●●
●● ●●●●●●●
● ● ●●●●● ●● ●●
●● ● ● ●● ●● ●●●●●●● ●●●●●●
● ● ●●● ●●●
●●●●●● ●● ●
● ●● ●●● ● ● ●● ●● ● ● ●● ●●●● ● ● ●● ●
● ●●●●● ● ●●
●● ● ● ● ● ●●●
● ● ● ● ● ● ●
●● ●●●● ●● ●●●
●●● ● ● ●●● ●● ●●●●● ●●●● ●●●●●●●● ●
●●● ● ●●● ● ●● ●
● ● ●● ● ● ●● ●●●● ●●● ●●●●● ● ●
● ● ● ●● ●
●●●●●● ● ● ● ●●●● ●
●●●●●● ● ●● ●●● ●
● ● ●● ● ● ● ● ●● ●●
● ●● ● ● ● ● ●● ●● ●● ●●
●●●● ●●●●● ● ● ●●●● ●
● ●●●● ● ●●●
● ●●●● ●● ● ● ● ● ● ● ● ● ●● ●● ●
●●●● ●●● ● ● ● ●● ● ● ●● ●●●●● ●● ●●●●●●● ●●●●●●●
● ● ● ●● ● ●
●●● ●●●●●● ● ●●●●● ●●● ●● ●● ●●● ●●●
● ●●● ● ●●●●● ●●●●
●● ● ●● ●● ●
●● ●●● ●●●●
● ● ●●● ● ●● ●●●●

Implementing Regressions with Categorical Variables
􏰀 What if we are interested in comparing all 11 GISCS sectors?
􏰀 Create dummy variables for each sector omitting 1
􏰀 Lets call them D1i,···,D10i
pricei = β0 +δ1D1i +···+δ10D10i +vi
􏰀 In other words Xi = [1 D1i ···D10]′ or
1 0 ··· 1 0
1 1 ··· 0 0
1 0 ··· 0 0
1 0 ··· 1 0 X= 
􏰀 Regress pricei on a constant and those 10 dummy variables
1 0 ··· 0 1 . . . . .
. . .. . . 1 1 ··· 0 0

Average Share Price by Sector for Some S&P Stocks
OLS OLS δ2 δ3
Share Price (USD)
0 20 40 60 80 100
Cons. Discret. Cons. Financials
Health Care Industrials
IT Materials
Real Estate Telecom

Implementing Regressions with Dummy Variables
􏰀 βˆOLS (coef. on the constant) is the mean for the omitted category: 0
􏰀 In this case “Consumer Discretionary”
􏰀 The coefficient on each dummy variable (e.g. δˆOLS) is the difference
between βˆOLS and the conditional mean for that category
􏰀 Key point: If you are only interested in categorical variables… 􏰀 You can perfectly capture the full CEF in a single regression
􏰀 For example:
E[price|sector =consumerstaples]=βOLS+δOLS
ii 01 E[price|sector =energy]=βOLS+δOLS

Topic 2: Causality and Regression
􏰀 Suppose wages (yi ) are determined by:
yi =β0+β1xi+γai+ei
􏰀 and we see years of schooling (xi ) but not ability (ai ) Corr(xi,ai) > 0 and Corr(yi,ai) > 0
􏰀 We estimate:
􏰀 And recover
yi =β0+β1xi+vi
βOLS =β +γδOLS 111
􏰀 Is our estimated β OLS larger or smaller than β1 ? 1

Part 1: The Potential Outcomes Framework
􏰀 Ideally, how would we find the impact of candy on evaluations (yi )? 􏰀 Imagine we had access to two parallel universes and could observe
􏰀 The exact same student (i)
􏰀 At the exact same time
􏰀 In one universe they receive candy—in the other they do not
􏰀 And suppose we could see the student’s evaluations in both worlds 􏰀 Define the variables we would like to see: for each individual i:
yi1 = evaluation with candy yi0 = evaluation without candy

The Potential Outcomes Framework
􏰀 If we could see both yi1 and yi0 impact would be easy to find: 􏰀 The causal effect or treatment effect for individual i defined as
􏰀 Would answer our question—but we never see both yi1 and yi0!
􏰀 Some people call this the “fundamental problem of causal inference” 􏰀 Intuition: there are two “potential” worlds out there
􏰀 The treatment variable Di decides which one we see:
􏰉yi1 if Di = 1 yi= yi0ifDi=0

So What Do Differences in Conditional Means Tell You?
E[yi1|Di = 1]−E[yi0|Di = 0] = E[yi1|Di = 1]−E[yi0|Di = 1] 􏰎 􏰍􏰌 􏰏
Average Treatment Effect for the Treated Group +E[yi0|Di = 1]−E[yi0|Di = 0]
Selection Effect ̸= E[yi1]−E[yi0]
Average Treatment Effect
􏰀 So our estimate could be different from the average effect of treatment E[yi1]−E[yi0] for two reasons:
(1) The morning section might have given better reviews anyway: E[yi0|Di = 1]−E[yi0|Di = 0] > 0
Selection Effect
(2) Candy matters more in the morning:
E[yi1|Di = 1]−E[yi0|Di = 1] ̸= E[yi1]−E[yi0]
􏰎 􏰍􏰌 􏰏􏰎􏰍􏰌􏰏
Average Treatment Effect for the Treated Group Average Treatment Effect

Part 2: Causality and Regression
yi =β0+β1xi+vi
􏰀 Regression coefficient captures causal effect (β OLS = β ) if:
E[vi|xi] = E[vi] 􏰀 Fails anytime corr (xi , vi ) ̸= 0
􏰀 An aside: we have used similar notation for 3 different things: 1. β1: the causal effect on yi of a 1 unit change in xi.
2. β OLS = Cov (xi ,yi ) : the population regression coefficient 1 Var(xi)
3. βˆOLS = Cov (xi ,yi ) : the sample regression coefficient
1􏰐 Var (xi )

Omitted Variables Bias
􏰀 Soifwehave:
􏰀 What will the regression of yi on xi give us?
􏰀 Recall that the regression coefficient is β OLS = Cov (yi ,xi ) : 1 Var(xi)
βOLS = Cov(yi,xi) 1 Var(xi)
= β1 + Cov (vi , xi ) Var(xi)
yi =β0+β1xi+vi

Part 3: Instrumental Variables
􏰀 Suppose we have the following specification: yi =β0+β1xi+vi
􏰀 And zi is a potential instrumental variable
􏰀 Which of these is not a necessary assumption for zi to be a valid IV:
(a) Cov[zi,xi]̸=0 (b) Cov[zi,yi]=0 (c) Cov[zi,vi]=0

Instrumental Variables: Suppose Corr (vi , xi ) ̸= 0
yi =β0+β1xi+vi
􏰀 Our informal assumption: zi should change xi , but have absolutely
no other impact on yi . Formally:
1. Cov [zi , xi ] ̸= 0 (Instrument Relevance)
􏰀 Intuition: zi must change xi
2. Cov [zi , vi ] = 0 (Exclusion Restriction)
􏰀 Intuition: zi has absolutely no other impact on yi
􏰀 Recall that vi is everything else outside of xi that influences yi

A few different ways of getting βIV in practice (1) 􏰀 If we run two regressions:
1. “First stage” impact of zi on Xi
Xi = α1 + φ zi + ηi
2. “Reduced form” impact of zi on =α2+ρzi+ui
􏰀 Then we can write:
Impact of Zi onYi 􏰌􏰏􏰎􏰍
βIV = Cov(Yi,zi)/Var(Zi) = ρOLS Cov(Xi,zi)/Var(Zi) φOLS
Impact of Zi on Xi
􏰀 We can estimate these in a sample with two regressions, or simply by calculating two sample covariances

A few different ways of getting βIV in practice (2)
􏰀 A more common way of estimating βIV (two stage least squares):
1. Estimate φOLS in first stage:
Xi = α1 + φ zi + ηi
2. Predict the part of Xi explained by zi
Xˆ =αOLS+φOLSz
i1i 3. Regress Yi on predicted Xˆi in a second stage
􏰀 Note that:
Y i = α 2 + β Xˆ i + u i
β2ndStage = Cov(Yi,Xˆi) = β V a r ( Xˆ i )

Topic 3: Panel Data and Diff-in-Diff
􏰀 The average coursework grade in the Morning class is 68 􏰀 The average coursework grade in the Afternoon class is 75 􏰀 Suppose we run the following regression:
Coursework = β0 + β1Afternooni + vi 􏰀 What is the value of β0?
(a) 68 (b) 75 (c) 7

Part 1: Panel data
􏰀 Panel data consists of observations of the same n units in T different periods
􏰀 If the data contains variables x and y, we write them (xit,yit)
􏰀 fori=1,···,N
􏰀 i denotes the unit, e.g. Microsoft or Apple
􏰀 andt=1,···,T
􏰀 t denotes the time period, e.g. September or October

Panel Data and Omitted Variables
􏰀 Lets reconsider our omitted variables problem: yit =β0+β1xit+γai+eit
􏰀 Suppose we see xit and yit but not ai
􏰀 Suppose Corr(xit,eit) = 0 but Corr(ai,xi) ̸= 0
􏰀 Note that we are assuming ai doesn’t depend on t

First Difference Regression
yit=β0+β1xit+ vit 􏰎􏰍􏰌􏰏
􏰀 Suppose we see two time periods t = {1, 2} for each i 􏰀 We can write our two time periods as:
yi,1 = β0 +β1xi,1 +γai +ei,1
yi,2 = β0 +β1xi,2 +γai +ei,2
􏰀 Taking changes (differences) gets rid of fixed omitted variables
∆yi,2−1 = β1∆xi,2−1 +∆ei,2−1

Fixed Effects Regression
yit =β0+β1xit+γai+eit
􏰀 An alternative approach:
􏰀 Lets define δi = γai and rewrite:
yit =β0+β1xit+δi+eit 􏰀 So yi is determined by
(i) The baseline intercept β0 (ii) The effect of xi
(iii) An individual specific change in the intercept: δi 􏰀 Intuition behind fixed effects: Lets just estimate δi

Fixed Effects Regression: Implementation
yit = β0 +β1xit + ∑ δiDi +eit
􏰀 Note that we’ve left out DN
􏰀 βOLS is interpreted as the intercept for individual N:
βOLS=E[y|x =0,i=N] 0 itit
􏰀 and for all other i (e.g. i=1)
δ2 = E[yit|xit = 0,i = 1]−β0
􏰀 This should look familiar

Part 2: General Form of Diff-in-Diff
􏰀 We are interested in the impact of some treatment on outcome Yi
􏰀 Suppose we have a treated group and a control group
􏰀 Let Di =1 be a dummy equal to 1 if i belongs to the treatment
􏰀 And suppose we see both groups before and after the treatment occurs
􏰀 Let Aftert = 1 be equal to 1 if time t is after the treatment date Yit =β0+β1Di×Aftert+β2Di+β3Aftert+vit

Diff-in-Diff Graphically
Treatment (Delaware)
Control (Non−Delaware)

When Does Diff-in-Diff Identify A Causal Effect
􏰀 As usual, we need
E[vit|Di,Aftert] = E[vit]
􏰀 What does this mean intuitively?
􏰀 Parallel trends assumption: In the absence of any reform the
average change in leverage would have been the same in the treatment and control groups
􏰀 In other words: trends in both groups are similar

Parallel Trends
􏰀 Parallel trends does not require that there is no trend in leverage 􏰀 Just that it is the same between groups
􏰀 Does not require that the levels be the same in the two groups 􏰀 What does it look like when the parallel trends assumption fails?

When Parallel Trends Fails
Treatment (Delaware)
Control (Non−Delaware)

Topic 4: Regularization
1. Basics of Ridge, LASSO and Elastic Net
2. How to choose hyperparameter λ: cross-validation

We Are Given 100 Observations of yi
0 25 50 75 100
Observation
●●●● ●●●●●
● ●●● ●●●●●
●● ●●● ●●●●

How Well Can We Predict Out-of-Sample Outcomes (yoos) i
0 25 50 75 100
Observation
●●● ●●●●●●
● ● ●●● ●●●●●●
● ●●●● ●●●
Outcome and Prediction

A Good Model Has Small Distance (yoos −yˆoos)2 i
0 25 50 75 100
Observation
●● ●● ●●● ●
●●●●●● ●●● ●●●● ● ●●●
●●●● ●●●●●●●
●●●●●● ●●●●●
●●●● ●● ● ● ● ●
● ●● ●● ●●●●●●
●●●●●●●●●●●
●●●● ● ● ●●● ●●●●●
●●●●●● ●●●● ●●●●●
● ●● ● ● ● ●● ● ●●●
●●●●●● ● ●●●
●●● ●● ●●●●
Outcome and Prediction

Solution to The OLS Problem: Regularization
􏰀 With 100 Observations OLS Didn’t do Very Well 􏰀 Solution: regularization
􏰀 LASSO/RIDGE/Elastic Net
􏰀 Simplest version of elastic net (nests LASSO and RIDGE):
β Ni=1 k=1 k=1 􏰀 For α =1 is this LASSO or RIDGE?
ˆelastic 􏰇1N 2 􏰅K K2􏰆􏰈 β =argmin ∑(yi−β0−β1x1i···−βKxKi) +λ α ∑|βk|+(1−α)∑βk

LASSO Coefficients With 100 Observations (λ=0.2)
5 10 15 20 25 30 35 40 45 50
X Variable
Coefficient Estimate

LASSO Coefficients With 100 Observations (λ=1)
5 10 15 20 25 30 35 40 45 50
X Variable
Coefficient Estimate

LASSO Coefficients With 100 Observations (λ=3)
5 10 15 20 25 30 35 40 45 50
X Variable
Coefficient Estimate

LASSO Coefficients For All λ
49 49 46 35 21 10 3
−5 −4 −3 −2 −1 0 1
Log Lambda
Coefficients
−1 0 1 2 3 4 5

How to Choose λ : k-fold Cross Validation 􏰀 Partition the sample into k equal folds
􏰀 The default for R is k=10
􏰀 For our sample, this means 10 folds with 10 observations each
􏰀 Cross-validation proceeds in several steps:
1. Choose k-1 folds (9 folds in our example, with 10 observations each)
2. Run LASSO on these 90 observations
3. find βlasso(λ) for all 100 λ
4. Compute MSE(λ) for all λ using remaining fold (10 observations)
5. Repeat for all 10 possible combinations of k-1 folds
􏰀 Can construct means and standard deviations of MSE(λ) for each λ
provides 10 estimates of MSE (λ ) for each λ 􏰀 Choose λ that gives small mean MSE(λ)

Cross Validated Mean Squared Errors for all λ 49 48 48 49 47 46 45 34 31 21 15 11 5 4 3 1
−5 −4 −3 −2 −1 0 1
log(Lambda)
Mean−Squared Error
30 40 50 60 70 80 90

Topic 5: Observed Factor Models
􏰀 Suppose xt is a vector of asset returns, and B is a matrix of factor loadings
􏰀 Which has higher dimension: (a) B
(b) Σx =Cov(xt)

Observed Factor Models
1. General Framing of Linear Factor Models
2. Single Index Model and the CAPM
3. Multi-Factor Models 􏰀 Fama-French
􏰀 Macroeconomic Factors 4. Barra approach

Linear Factor Models
􏰀 Assume that returns xit are driven by K common factors: xi,t = αi +β1,if1,t +β2,if2,t +···+βK,ifK,t +εit
􏰀 ft = (f1,t,f2,t,··· ,fK,t)′ is the set of common factors
􏰀 These are the same for all assets (constant over i)
􏰀 But change over time (different for t , t + 1)
􏰀 Each ft has dimension (K × 1)
􏰀 βi = (β1,i,β2,i,··· ,βK,i)′ is the set of factor loadings
􏰀 K different parameters for each asset
􏰀 But constant over time (same for all t)
􏰀 Fixed, specific relationship between asset i and factor k

Linear Factor Model
xt =α+Bft+εt 􏰀 Summary of Parameters
􏰀 α : (m × 1) intercepts for m assets
􏰀 B:(m×K)loadings(βik)onK factorsformassets
􏰀 μf : (K × 1) vector of means for K factors
􏰀 Ωf : (K × K ) variance covariance matrix of factors
􏰀 Ψ: (m × m) diagonal matrix of asset specific variances
􏰀 Given our assumptions xt is m-variate covariance stationary with: E[xt|ft] = α + [xt|ft] = Ψ E[xt]=μx =α+Bμf Cov[xt]=Σx =BΩfB′+Ψ

The Index Model: First Pass
xi =αi1T +Rmβi +εi
􏰀 Estimate OLS regression on time-series version of our factor specification
􏰀 One regression for each asset i
􏰀 Recover two parameters αˆi and βˆi for each asset i 􏰀 Ωˆf is just the sample varianc

程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts