OLS and the Conditional Expectation Function
Empirical Finance: Methods and Applications Imperial College Business School
January 10th and 11th, 2022
Copyright By PowCoder代写 加微信 powcoder
Course Details
Basic housekeeping
Course tools: Menti, R, and R-Studio Introduction to tidy data
OLS and the Conditional Expectation Function Review and properties of the CEF
Review, implementation, and value of OLS
Course Details: Contact
Lecturer:
Office: 53 Prince’s Gate, 5.01b Phone: +44 (0)20 7594 1044
Course Details: Assessment
Two assignments (proposed schedule) Assignment 1 (25%)
Assigned Tuesday of Week 3
Due by 4pm on Tuesday of Week 5 Assignment 2 (25%)
Assigned Tuesday of Week 6
Due by 5:30pm Tuesday of Week 8
Final Exam (50%)
Course Details: Tentative Office Hours and Tutorials
Tentative office hours
Tuesdays from 14:00-15:00
Or by appointment
Formal tutorials will begin in Week 2
Joe will be available online this week to help with R/RStudio
Course Details: Mentimeter
On your phone (or computer) go to Menti.com
Course Details: R and R-Studio
Make sure you have the most up-to-date version of R: https://cloud.r-project.org/
And an up-to-date version of RStudio:
https://www.rstudio.com/products/rstudio/download/
Course Details: In Class Exercises
Throughout the module we’ll regularly do hands on exercises Lets start with a quick example:
On the insendi course page find the data: ols basic.csv
5variables: Y,X,Y sin,Y 2,Y nl
Load the data into R, and run an OLS regression of Y on X. What is the coefficient on X?
Course Details: Projects in R-Studio
For those with R-Studio set up:
Open R-Studio and select File ⇒ ⇒ ⇒
Name the directory “EF lecture 1” and locate it somewhere convenient
Each coursework should be completed in a unique project folder
Course Details: R set up
Download all data files from the hub and place them in EF lecture 1 s p price.csv
ols basics.csv
ames testing.csv ames training.csv
Course Details: The Tidyverse
The majority of the coding we do will utilize the tidyverse
The tidyverse is an opinionated collection of R packages designed for data science.
All packages share an underlying design philosophy, grammar, and data structures.
For an excellent introduction and overview:
’s R for Data Science: https://r4ds.had.co.nz/
install.packages(“tidyverse”)
library(tidyverse)
Course Details: Tidy Data
The tidyverse is structured around tidy datasets
There are three interrelated rules which make a dataset tidy:
1. Each variable must have its own column 2. Each observation must have its own row 3. Each value must have its own cell
For the theory underlying tidy data:
http://www.jstatsoft.org/v59/i10/paper
An Example of Tidy Data
An Example of Non-Tidy Data
Fixing An Observation Scattered Across Rows
tidy2 <- table2 %>%
pivot_wider(names_from=type, values_from=count)
Another Example of Non-Tidy Data
Fixing Columns as Values: gather()
tidy4a <- table4a %>%
pivot_longer(c(‘1999‘, ‘2000‘), names_to = “year”, values_to =
Introducing the Pipe: %≥%
You’ll notice that both of these operations utilize a “pipe”: %≥%
A tool for clearly expressing a sequence of multiple operations Can help make code easy to read and understand
Consider evaluating the following: x = (log(e9)) Could write it as:
x <-sqrt(log(exp(9)))
Or with pipes:
x <- 9 %>%
This Week: Two Parts
(1) Introduction to the conditional expectation function (CEF) Why is the CEF a useful (and widely used) summary of the
relationship between variables Y and X
(2) Ordinary Least Squares and the CEF
Review, implementation, and the utility of OLS
Part 1: The Conditional Expectation Function
Overview
Key takeaway: useful tool for describing the relationship between
variables Y and X
Why: (at least) three nice properties:
1. Law of iterated expections 2. CEF decomposition property 3. CEF prediction property
Review: Expectation of a Random Variable Y
Suppose Y is a random variable with a finite number of outcomes y1,y2,···yk occurring with probability p1,p2,···pk:
The expectation of Y is:
k E[Y]= ∑yipi
For example: if Y is the value of a (fair) dice roll:
E[Y]=1×61+2×16+3×16+4×16+5×16+6×16 =3.5
Suppose Y is a (continuous) random variable whose CDF F(y) admits density f (y )
The expectation of Y is:
This is just a number!
E[Y]= yf(y)dy
The Conditional Expectation Function (CEF)
We are often interested in the relationship between some outcome Y and a variable (or set of variables) X
A useful summary is the conditional expectation function: E[Y|X] Gives the expectation of Y when X takes any particular value
Formally, if fy(·|X) is the conditional density of Y|X:
E[Y|X]= zfy(z|X)dz
E[Y|X] is a random variable itself: a function of the random X
Can think of it as E[Y|X]=h(X)
Alternatively, evaluate it at particular values: for example X = 0.5
E[Y|X =0.5] is just a number!
Unconditional Expectation of Height for Adults: E[H]
Adult Height (Inches)
54 60 66 72 78
Unconditional Expectation of Height for Adults: E[H]
Adult Height (Inches)
54 60 66 72 78
Unconditional Expectation of Height for Adults: E[H]
E[H]=67.5 In.
Adult Height (Inches)
54 60 66 72 78
Conditional Expectation of Height by Age: E[H|Age]
E[H|Age=5]
E[H|Age=10]
E[H|Age=15]
E[H|Age=20] E[H|Age=25] E[H|Age=30] E[H|Age=35]
E[H|Age=40]
0 5 10 15 20 25 30 35 40 Age
Height (Inches)
30 40 50 60 70 80
Why the Conditional Expectation Function?
E[Y|X] is not the only function that relates Y to X
For example, consider 95th Percentile of Y given X: P95(Y|X)
P95[H|G=Male]
P95[H|G=Female]
But E[Y|X] has a bunch of nice properties
Adult Height (Inches)
54 60 66 72 78
Property 1: The Law of Iterated Expectations
EX[E[Y|X]]=E[Y]
Example: let Y be yearly wages for MSc graduates
E[Y]=£1,000,900
Two values for X : {RMFE, Other}
Say 10% of MSc students are RMFE, 90% in other programs E[Y|X=RMFE]=£10,000,000
E[Y|X=Other]=£1000
The expectation works like always (just over E[Y|X] instead of X): E[E[Y|X]]=E[Y|X =RMFE]×P[X =RMFE]+E[Y|X =Other]×P[X =Other]
£10,000,000×0.1 £1000×0.9 = £1, 000, 900
Property 1: The Law of Iterated Expectations
E[E[Y|X]]=E[Y]
Not true, for example, for the 95th percentile: E[P95[Y|X]]̸=P95[Y]
Property 2: The CEF Decomposition Property
Any random variable Y can be broken down into two pieces Y =E[Y|X]+ε
Where the residual ε has the following properties: (i) E[ε|X] = 0 (“mean independence”)
(ii) ε uncorrelated with any function of X
Intuitively this property says we can break down Y into two parts: (i) The part of Y “explained by” X: E[Y|X]
This is the (potentially) useful part when predicting Y with X (ii) The part of Y unrelated to X: ε
Property 2: Proof
Y =E[Y|X]+ε
(i) E[ε|X] = 0 (“mean independence”)
ε = Y − E [Y |X ]
⇒E[ε|X]=E[Y −E[Y|X]|X] =E[Y|X]−E[Y|X]=0
(ii) ε uncorrelated with any function of X
Cov(ε,h(x)) = E[h(X)ε]−E[h(X)]E[ε]
=0 How come?
= E[E[h(X)ε|X]]
iterated expectations
= E[h(X)E[ε|X]] = E[h(X)·0] = 0
Property 3: The CEF Prediction Property
Out of any function of X, E[Y|X] is the best predictor of Y
In other words, E[Y|X] is the “closest” function to Y on average
What do we mean by closest?
Consider any function of X, say m(X)
m(X) is close to Y if the difference (or “error”) is small: Y −m(x) Close is about magnitude, treat positive/negative the same…
m(X) is also close to Y if the squared error is small: (Y −m(x))2 E[Y|X] is the closest, in this sense, in expectation:
E [Y |X ] = arg min E [(Y − m(X ))2 ] m(X)
“Minimum mean squared error”
Property 3: Proof (Just for Fun)
Out of any function of X, E[Y|X] is the best predictor of Y E [Y |X ] = arg min E [(Y − m(X ))2 ]
To see this, note:
(Y −m(X))2 =([Y −E[Y|X]]+[E[Y|X]−m(X)])2 = [Y − E [Y |X ]]2 + [E [Y |X ] − m(X )]]2
+2[E[Y|X]−m(X)]]·[Y −E[Y|X]]
⇒E[(Y −m(X))2]=E[(Y −E[Y|X])2]+E[(E[Y|X]−m(X))2]+E[h(X)·ε]
Unrelated to m(X) Min. when m(X)=E[Y|X] =0
Summary: Why We Care About Conditional Expectation Functions
Useful tool for describing relationship between Y and X
Several nice properties
Most statistical tests come down to comparing E[Y|X] at certain X Classic example: experiments
Part 2: Ordinary Least Squares
Linear regression is arguably the most popular modeling approach across every field in the social sciences
Transparent, robust, relatively easy to understand
Provides a basis for more advanced empirical methods Extremely useful when summarizing data
Plenty of focus on the technical aspects of OLS last term Focus today on an applied perspective
Review of OLS in Three Parts
1. Overview
Intuition and Review of Population and Sample Regression Algebra
Connection With Conditional Expectation Function Estimating a Linear Regression in R
2. An Example: Predicting Home Prices
3. Rounding Out Some Details
Scaling and Implementation
OLS Part 1: Overview
OLS Estimator Fits a Line Through the Data
βOLS +βOLSX 01
A Line Through the Data: Example in R
●●●●● ● ●●
● ● ●● ● ●● ●●●●
●●●● ●● ● ● ●●
● ● ●●● ●● ●●●● ●●
●● ●● ●●●●● ●●
● ●● ●● ●● ● ●
● ● ●● ●●●●
● ●●●● ● ● ● ●● ● ● ● ●●● ●●●
● ●●●●●● ●●
●●● ●● ●● ●
●●● ●● ● ●● ● ●
●● ● ●●●●● ●● ● ● ●●
●● ●●●●● ●●●●● ● ● ●● ● ● ● ●●
● ●● ●●● ● ●● ● ●●●● ● ●●●●●●●● ●● ●●●●● ●●● ●●●● ●
●●●● ● ●● ● ●●
●● ● ●● ●●●●●●●●●
● ● ●●● ● ● ● ● ● ●● ●● ● ●● ● ●● ● ●● ● ● ●● ●●●●●●●●●●● ●●●● ●●●
●● ●●●●●● ● ● ●●● ●●●●●●●●●●●●● ●
● ● ●●●● ●●● ● ●●●●●●●●● ● ● ● ● ●●●● ●● ● ● ●● ● ●● ●●● ● ●● ● ● ●●● ● ●●●● ●
●● ●● ●● ●● ●●● ●●● ● ● ●●●●●●●●●●●●●●●●● ●● ●●●●●● ●●● ●● ●● ●● ● ●●
●●● ●●●● ● ●● ● ●● ● ● ● ●●●●●●●● ●●●
● ●●●●●●● ● ● ●● ●●●●●●●●●●●●●●
●● ●●●●●●●●●●● ●
●●● ● ● ● ●●● ●●
●●●●●●●● ●
● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ●
●●●● ●●●● ●●
● ●● ● ●●●
●●●●● ● ●●
A Line Through the Data: Example in R
●●●●● ● ●●
● ● ●● ● ●● ●●●●
●●●● ●● ● ● ●●
● ● ●●● ●● ●●●● ●●
●● ●● ●●●●● ●●
● ●● ●● ●● ● ●
● ● ●● ●●●●
● ●●●● ● ● ● ●● ● ● ● ●●● ●●●
● ●●●●●● ●●
●●● ●● ●● ●
●●● ●● ● ●● ● ●
●● ● ●●●●● ●● ● ● ●●
●● ●●●●● ●●●●● ● ● ●● ● ● ● ●●
● ●● ●●● ● ●● ● ●●●● ● ●●●●●●●● ●● ●●●●● ●●● ●●●● ●
●●●● ● ●● ● ●●
●● ● ●● ●●●●●●●●●
● ● ●●● ● ● ● ● ● ●● ●● ● ●● ● ●● ● ●● ● ● ●● ●●●●●●●●●●● ●●●● ●●●
●● ●●●●●● ● ● ●●● ●●●●●●●●●●●●● ●
● ● ●●●● ●●● ● ●●●●●●●●● ● ● ● ● ●●●● ●● ● ● ●● ● ●● ●●● ● ●● ● ● ●●● ● ●●●● ●
●● ●● ●● ●● ●●● ●●● ● ● ●●●●●●●●●●●●●●●●● ●● ●●●●●● ●●● ●● ●● ●● ● ●●
●●● ●●●● ● ●● ● ●● ● ● ● ●●●●●●●● ●●●
● ●●●●●●● ● ● ●● ●●●●●●●●●●●●●●
●● ●●●●●●●●●●● ●
●●● ● ● ● ●●● ●●
●●●●●●●● ●
● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ●
●●●● ●●●● ●●
● ●● ● ●●●
●●●●● ● ●●
How Do We Choose Which Line?
One Data Point
vi: Observation i’s Deviation from β0 +β1xi
One Data Point
yi=β0 + β1xi+vi vi
Choosing the Regression Line
For any line β0 +β1X, the data point (yi,xi) may be written as: yi =β0+β1xi+vi
vi will be big if β0 +β1xi is “far” from yi vi willbesmallifβ0+β1xi is“close”toyi We refer to vi as the residual
Choosing the (Population) Regression Line
yi =β0+β1xi+vi
An OLS regression is simply choosing the βOLS,βOLS that make vi
as “small” as possible on average How do we define “small”?
Want to treat positive/negative the same: consider vi2
Choose βOLS,βOLS to minimize: 01
E[vi2] = E[(yi −β0 −β1xi)2]
(Population) Regression Anatomy
{βOLS,βOLS} = arg min E[(y −β −β x )2] 0 1 {β0,β1} i 0 1i
In this simple case with only one xi , β OLS has an intuitive definition: 1
βOLS = Cov(yi,xi) 1 Var(xi)
βOLS =y ̄−βOLSx ̄ 01
Regression Anatomy (Matrix Notation)
yi =β0+β1xi+vi
You will often see more concise matrix notation:
β=β Xi=x 1i
2×1 2×1 y =X′β+v
This lets us write the OLS Coefficients as:
βOLS = argminE[(y −X′β)2]
⇒ βOLS = E[X X′]−1E[X y ] ii ii
(Sample) Regression Anatomy
βOLS = argminE[(y −X′β)2] {β} i i
βOLS = E[X X′]−1E[X y ] ii ii
Usually do not explicitly know these expectations, so compute sample analogues:
⇒βˆOLS =(X′X)−1(X′Y)
1 x1 y1
1 x2 y2 X=. . Y=.
. . . 1 xN yn
ˆOLS N (yi −Xi′β)2
β =argmin∑ {β} i=1
This Should (Hopefully) Look Familiar
RSS(b)= ∑(yi −Xi′b)2 i=1
Estimating a Linear Regression in R
Simple command to estimate OLS
ols v1<-lm(y∼ x, data= ols basics)
And to display results: summary(ols v1)
A Line Through the Data: Example in R
●●●● ● ●● ●
●●●● ●● ● ● ●
● ● ●● ● ●
●●● ● ●●●●●
● ● ●● ● ● ●●●●
● ● ●●● ●●●● ●●
●●● ●●●●● ●●●
● ●●● ● ●● ●● ●●
● ●●●●●●●●● ●●
●● ● ●● ●●●●●● ●● ● ● ●●● ● ● ● ● ● ●● ●● ● ●● ● ●● ● ●●● ● ● ●
●● ● ●● ● ●●● ●● ●●●
● ● ●●●● ● ● ● ● ●● ●● ●● ● ● ● ●● ●●●●●● ●● ●● ● ●● ●●● ● ●● ● ● ●●● ● ●●●● ●
●●● ● ●● ●●●●● ● ● ● ●●●
●● ●●●●●● ●●●● ●● ● ● ● ●●●●●● ● ●●●●●● ● ● ● ● ●
● ● ●● ● ● ●● ●●
●● ● ● ● ● ● ●● ● ● ●●● ● ●●
● ● ● ● ● ●●● ● ● ●
● ●● ●●● ●
● ● ● ●● ●●●● ●●●● ●● ● ● ● ●●●●●● ●●●●●
●● ●●●●● ● ●● ●●●●●● ●●● ● ● ●●● ● ●
● ●●● ●●● ● ● ●●
●● ● ●● ● ● ●● ● ●
●● ● ●●●●● ●● ● ● ●●
●● ●●●●●● ●●●●● ● ● ●● ● ● ● ●●
● ●● ●●● ●●●● ● ●●●● ●
●●●●●●●● ●●●● ●
●●● ●● ●●●
●● ● ●● ●●●
● ●●●●●●●●●●
● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ●
●●●● ●● ● ●● ●● ●
● ●●● ● ● ●●● ●
●●● ●●●● ●●●● ● ●
●●●● ●●● ● ●
● ●● ● ●●●
Intercept looks something like 1 Slope, approximately 2?
Recall, for comparison
Regression and the Conditional Expectation Function
Why is linear regression so popular?
Simplest way to estimate (or approximate) conditional expectations!
Three simple results
OLS perfectly captures CEF if CEF is Linear
OLS generates best linear approximation to the CEF if not OLS perfectly captures CEF with binary (dummy) regressors
Regression captures CEF if CEF is Linear
Take the special case of a linear conditional expectation function:
E[y |X ] = X′β iii
Then OLS captures E[yi|Xi]
●● ● ●● ●●
●● ●● ●●●●● ●● ●
● ●●● ● ●● ● ● ●●●●
●●●● ●● ● ● ● ●
● ● ●●● ● ● ● ●● ● ●●●● ●●
● ●● ●● ● ●●● ●●●●● ●●●
● ●●● ● ●● ●● ●●
● ● ●●●●●●●●●● ●●
●●●●● ● ●●●● ● ●●●●●●●● ● ● ●●● ●●● ● ● ●●●● ● ● ● ●
●●●●●●● ● ●●
● ●● ●●●●●●●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●
● ●●●●●●●●●● ●●
● ●●●● ●●●●● ●
● ●●●● ●●●●● ●● ●●●●● ● ●●● ●●●● ●●●●
● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ●● ●●
● ●● ●●● ●
●● ● ●● ● ● ●● ● ●
●● ● ●●●●● ●● ● ● ●●
●● ●● ●● ● ●● ● ● ●●● ● ● ● ● ● ● ●●
● ●● ●●● ●●●● ● ●●●●
● ●●●●●●●●● ●● ●● ●●●
● ●●●●● ●●● ● ●●● ●
● ● ●●●●●●●●
●●● ● ●●●●●●● ●
● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ●●●● ●●● ● ●● ●
● ●●●●● ●● ●● ●● ● ● ● ● ● ●●●● ●●●● ● ●● ● ●●● ●●●●●● ●●●●●●● ● ●● ●●●
●●●● ● ●●●●● ● ●●● ●● ●● ●
● ●●● ● ●●●● ● ●
●●● ● ● ●●
●●● ●●●● ●●
●●●● ●●●● ● ●
●●●●● ●●● ●
Regression captures CEF if CEF is Linear
Why? Recall from the CEF decomposition property: E[Xiεi] = E[Xi(Yi −E[Yi|Xi])] = 0
Replacing
So Xi′β =Xi′βOLS =E[Yi|X]
E[Y |X ] = X′β iii
⇒E[X(Y −X′β)]=0 iii
⇒β =E[X X′]−1E[X Y ]=βOLS ii ii
Conditional Expectation Function Often Non-Linear
●●●● ●● ● ●●●●●
● ●●● ●●● ●● ● ● ●●●●
●●● ● ● ●● ● ● ●●●● ● ● ●● ●
●●● ●●● ● ●● ● ●●
● ● ●●● ● ●● ●● ●●●
●● ● ● ● ●● ●
● ●● ● ● ●●●●● ●
● ● ●● ● ●●● ● ●
● ●● ● ●●●●● ●●●●
● ●● ●●● ●● ●●
●●● ● ● ●●● ●● ●●●●● ●●●● ●●●●●●●● ●
● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ●● ●● ● ●●●● ● ●● ●●●●● ●
●●●●● ●●●●●● ●
●● ●● ●● ●● ●●●● ●●● ●● ●●● ●●●●●● ●● ●
●●● ●●●●●●
● ● ●● ● ● ● ●●● ● ● ● ● ● ●● ●●●●
●●● ● ●●● ●
●● ●● ●● ●●● ● ●●● ●●●●● ●
●●●● ●●● ●● ● ●● ●●●●● ●●● ●●
●● ● ● ●●● ●
●●●●● ●●●● ● ●● ●●● ●● ● ● ●●
● ●●● ● ● ●●●●●
● ● ●● ● ● ●●●
●● ●●●● ● ● ●●●●●●● ●
●● ●●●●● ●● ● ● ● ● ● ● ●● ●
● ●●●● ●● ●●●● ●● ●● ●
● ● ● ● ● ●● ● ● ● ● ●●
OLS Provides Approximation to CEF
● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ●● ●● ● ●● ● ● ●● ●●● ●● ● ●●● ●● ● ●● ●●●●
●●●●●● ●● ●●●● ● ●●●● ●
● ● ● ●● ● ● ● ●● ● ● ● ●●● ●● ●●● ●
●● ● ●● ● ●
● ● ● ● ● ● ●
● ● ●●●●● ●●
● ● ●● ● ●● ●●
●● ●●●●●●●
● ● ●●●●● ●● ●●
●● ● ● ●● ●● ●●●●●●● ●●●●●●
● ● ●●● ●●●
●●●●●● ●● ●
● ●● ●●● ● ● ●● ●● ● ● ●● ●●●● ● ● ●● ●
● ●●●●● ● ●●
●● ● ● ● ● ●●●
● ● ● ● ● ● ●
●● ●●●● ●●
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com