CS计算机代考程序代写 finance data science Excel data structure OLS and the Conditional Expectation Function

OLS and the Conditional Expectation Function
Chris Hansman
Empirical Finance: Methods and Applications Imperial College Business School
Week One
January 11th and 12th, 2021
1/84

This Week
􏰒 Course Details
􏰒 Basic housekeeping
􏰒 Course tools: Menti, R, and R-Studio 􏰒 Introduction to tidy data
􏰒 OLS and the Conditional Expectation Function 􏰒 Review and properties of the CEF
􏰒 Review, implementation, and value of OLS
2/84

Course Details: Contact
􏰒 Lecturer: Chris Hansman
􏰒 Email: chansman@imperial.ac.uk
􏰒 Office: 53 Prince’s Gate, 5.01b 􏰒 Phone: +44 (0)20 7594 1044
􏰒 TA: Davide Benedetti
􏰒 Email: d.benedetti@imperial.ac.uk
3/84

Course Details: Assessment
􏰒 Two assignments
􏰒 Assignment 1 (25%)
􏰒 Assigned Tuesday of Week 3
􏰒 Due by 4pm on Tuesday of Week 5 􏰒 Assignment 2 (25%)
􏰒 Assigned Tuesday of Week 6
􏰒 Due by 5:30pm Tuesday of Week 8
􏰒 Final Exam (50%)
4/84

Course Details: Tentative Office Hours and Tutorials
􏰒 Tentative office hours
􏰒 Tuesdays from 17:30-18:30
􏰒 Or by appointment
􏰒 Formal tutorials will begin in Week 2
􏰒 Davide will be available this week to help with R/RStudio
5/84

Course Details: Mentimeter
􏰒 On your phone (or computer) go to Menti.com
6/84

Course Details: R and R-Studio
􏰒 Make sure you have the most up-to-date version of R: 􏰒 https://cloud.r-project.org/
􏰒 And an up-to-date version of RStudio:
􏰒 https://www.rstudio.com/products/rstudio/download/
7/84

Course Details: In Class Exercises
􏰒 Throughout the module we’ll regularly do hands on exercises 􏰒 Lets start with a quick example:
􏰒 On the insendi course page find the data: ols basic.csv
􏰒 5variablesY,X,Y sin,Y 2,Y nl
􏰒 Load the data into R, and run an OLS regression of Y on X. 􏰒 What is the coefficient on X?
8/84

Course Details: Projects in R-Studio
􏰒 For those with R-Studio set up:
􏰒 Open R-Studio and select File ⇒ New Project ⇒ New Directory ⇒
New Project
􏰒 Name the directory “EF lecture 1” and locate it somewhere convenient
􏰒 Each coursework should be completed in a unique project folder
9/84

Course Details: R set up
􏰒 Download all data files from the hub and place them in EF lecture 1 􏰒 s p price.csv
􏰒 ols basics.csv
􏰒 ames testing.csv 􏰒 ames training.csv
10/84

Course Details: The Tidyverse
􏰒 The majority of the coding we do will utilize the tidyverse
The tidyverse is an opinionated collection of R packages designed for data science.
All packages share an underlying design philosophy, grammar, and data structures.
􏰒 For an excellent introduction and overview:
􏰒 Hadley Wickham’s R for Data Science: https://r4ds.had.co.nz/
install.packages(“tidyverse”)
library(tidyverse)
11/84

Course Details: Tidy Data
􏰒 The tidyverse is structured around tidy datasets
􏰒 There are three interrelated rules which make a dataset tidy:
1. Each variable must have its own column 2. Each observation must have its own row 3. Each value must have its own cell
􏰒 For the theory underlying tidy data:
􏰒 http://www.jstatsoft.org/v59/i10/paper
12/84

An Example of Tidy Data
13/84

An Example of Non-Tidy Data
14/84

Fixing An Observation Scattered Across Rows: spread()
tidy2 <- table2 %>%
spread(key=”type”, value=”count”)
15/84

Another Example of Non-Tidy Data
16/84

Fixing Columns as Values: gather()
tidy4a <- table4a %>%
gather(‘1999‘, ‘2000‘, key = “year”, value = “cases”)
17/84

Introducing the Pipe: %≥%
􏰒 You’ll notice that both of these operations utilize a “pipe”: %≥%
􏰒 A tool for clearly expressing a sequence of multiple operations 􏰒 Can help make code easy to read and understand
􏰒 Consider evaluating the following: x = 􏰺(log(e9)) 􏰒 Could write it as:
x <-sqrt(log(exp(9))) 􏰒 Or with pipes: x <- 9 %>%
exp() %>%
log() %>%
sqrt()
18/84

This Week: Two Parts
(1) Introduction to the conditional expectation function (CEF) 􏰒 Why is the CEF a useful (and widely used) summary of the
relationship between variables Y and X
(2) Ordinary Least Squares and the CEF
􏰒 Review, implementation, and the utility of OLS
19/84

Part 1: The Conditional Expectation Function
􏰒 Overview
􏰒 Key takeaway: useful tool for describing the relationship between
variables Y and X
􏰒 Why: (at least) three nice properties:
1. Law of iterated expections 2. CEF decomposition property 3. CEF prediction property
20/84

Review: Expectation of a Random Variable Y
􏰒 Suppose Y is a random variable with a finite number of outcomes y1,y2,···yk occurring with probability p1,p2,···pk:
􏰒 The expectation of Y is:
k E[Y]= ∑yipi
i=1
􏰒 For example: if Y is the value of a (fair) dice roll:
E[Y]=1×1+2×1+3×1+4×1+5×1+6×1 =3.5 666666
􏰒 Suppose Y is a (continuous) random variable whose CDF F(y) admits density f (y )
􏰒 The expectation of Y is:
􏰒 This is just a number!
􏰩
E[Y]= yf(y)dy
21/84

The Conditional Expectation Function (CEF)
􏰒 We are often interested in the relationship between some outcome Y and a variable (or set of variables) X
􏰒 A useful summary is the conditional expectation function: E[Y|X] 􏰒 Gives the expectation of Y when X takes any particular value
􏰒 Formally, if fy(·|X) is the conditional density of Y|X: 􏰩
E[Y|X]= zfy(z|X)dz
􏰒 E[Y|X] is a random variable itself: a function of the random X
􏰒 Can think of it as E[Y|X]=h(X)
􏰒 Alternatively, evaluate it at particular values: for example X = 0.5
E[Y|X =0.5] is just a number!
22/84

Unconditional Expectation of Height for Adults: E[H]
23/84
Adult Height (Inches)
54 60 66 72 78

Unconditional Expectation of Height for Adults: E[H]
Adult Height (Inches)
54 60 66 72 78
24/84

Unconditional Expectation of Height for Adults: E[H]
E[H]=67.5 In.
Adult Height (Inches)
54 60 66 72 78
25/84

Conditional Expectation of Height by Age: E[H|Age]
E[H|Age=5]
E[H|Age=10]
E[H|Age=15]
E[H|Age=20] E[H|Age=25] E[H|Age=30] E[H|Age=35]
E[H|Age=40]
Height (Inches)
30 40 50 60 70 80
0 5 10 15 20 25 30 35 40 Age
26/84

Why the Conditional Expectation Function?
􏰒 E[Y|X] is not the only function that relates Y to X
􏰒 For example, consider 95th Percentile of Y given X: P95(Y|X)
P95[H|G=Male]
P95[H|G=Female]
Adult Height (Inches)
54 60 66 72 78
􏰒 But E[Y|X] has a bunch of nice properties
27/84

Property 1: The Law of Iterated Expectations
EX[E[Y|X]]=E[Y]
􏰒 Example: let Y be yearly wages for MSc graduates
􏰒 E[Y]=£1,000,900
􏰒 Two values for X : {RMFE, Other}
􏰒 Say 10% of MSc students are RMFE, 90% in other programs 􏰒 E[Y|X=RMFE]=£10,000,000
􏰒 E[Y|X=Other]=£1000
􏰒 The expectation works like always (just over E[Y|X] instead of X): E[E[Y|X]]=E[Y|X =RMFE]×P[X =RMFE]+E[Y|X =Other]×P[X =Other]
􏰐 􏰏􏰎 􏰑􏰐 􏰏􏰎 􏰑
£10,000,000×0.1 £1000×0.9 = £1, 000, 900
28/84

Property 1: The Law of Iterated Expectations
E[E[Y|X]]=E[Y]
􏰒 Not true, for example, for the 95th percentile: E[P95[Y|X]]̸=P95[Y]
29/84

Property 2: The CEF Decomposition Property
􏰒 Any random variable Y can be broken down into two pieces Y =E[Y|X]+ε
􏰒 Where the residual ε has the following properties: (i) E[ε|X] = 0 (“mean independence”)
(ii) ε uncorrelated with any function of X
􏰒 Intuitively this property says we can break down Y into two parts: (i) The part of Y “explained by” X: E[Y|X]
􏰒 This is the (potentially) useful part when predicting Y with X (ii) The part of Y unrelated to X: ε
30/84

Property 2: Proof
Y =E[Y|X]+ε
(i) E[ε|X] = 0 (“mean independence”)
ε = Y − E [Y |X ]
⇒E[ε|X]=E[Y −E[Y|X]|X] =E[Y|X]−E[Y|X]=0
(ii) ε uncorrelated with any function of X
Cov(ε,h(x)) = E[h(X)ε]−E[h(X)]E[ε]
􏰐 􏰏􏰎 􏰑
=0 How come?
= E[E[h(X)ε|X]] 􏰐 􏰏􏰎 􏰑
iterated expectations
= E[h(X)E[ε|X]] = E[h(X)·0] = 0
31/84

Property 3: The CEF Prediction Property
􏰒 Out of any function of X, E[Y|X] is the best predictor of Y
􏰒 In other words, E[Y|X] is the “closest” function to Y on average
􏰒 What do we mean by closest?
􏰒 Consider any function of X, say m(X)
􏰒 m(X) is close to Y if the difference (or “error”) is small: Y −m(x) 􏰒 Close is about magnitude, treat positive/negative the same…
􏰒 m(X) is also close to Y if the squared error is small: (Y −m(x))2 􏰒 E[Y|X] is the closest, in this sense, in expectation:
E [Y |X ] = arg min E [(Y − m(X ))2 ] m(X)
􏰒 “Minimum mean squared error”
32/84

Property 3: Proof (Just for Fun)
􏰒 Out of any function of X, E[Y|X] is the best predictor of Y E [Y |X ] = arg min E [(Y − m(X ))2 ]
m(X)
􏰒 To see this, note:
(Y −m(X))2 =([Y −E[Y|X]]+[E[Y|X]−m(X)])2 = [Y − E [Y |X ]]2 + [E [Y |X ] − m(X )]]2
+2[E[Y|X]−m(X)]]·[Y −E[Y|X]] 􏰐 􏰏􏰎 􏰑􏰐 􏰏􏰎 􏰑
h(x) ε
⇒E[(Y −m(X))2]=E[(Y −E[Y|X])2]+E[(E[Y|X]−m(X))2]+E[h(X)·ε]
􏰐 􏰏􏰎 􏰑􏰐 􏰏􏰎 􏰑􏰐􏰏􏰎􏰑
Unrelated to m(X) Min. when m(X)=E[Y|X] =0
33/84

Summary: Why We Care About Conditional Expectation Functions
􏰒 Useful tool for describing relationship between Y and X
􏰒 Several nice properties
􏰒 Most statistical tests come down to comparing E[Y|X] at certain X 􏰒 Classic example: experiments
34/84

Part 2: Ordinary Least Squares
􏰒 Linear regression is arguably the most popular modeling approach across every field in the social sciences
􏰒 Transparent, robust, relatively easy to understand
􏰒 Provides a basis for more advanced empirical methods 􏰒 Extremely useful when summarizing data
􏰒 Plenty of focus on the technical aspects of OLS last term 􏰒 Focus today on an applied perspective
35/84

Review of OLS in Three Parts
1. Overview
􏰒 Intuition and Review of Population and Sample Regression Algebra
􏰒 Connection With Conditional Expectation Function 􏰒 Estimating a Linear Regression in R
2. An Example: Predicting Home Prices
3. Rounding Out Some Details
􏰒 Scaling and Implementation
36/84

OLS Part 1: Overview
37/84
X
Y

OLS Estimator Fits a Line Through the Data
Y
X
βOLS +βOLSX 01
37/84

A Line Through the Data: Example in R
10
5
0
−5


●● ●


●● ●


●●
● ●


● ●
● ● ●
●●
●●
●●● ●

●● ● ●●
●●
●●● ●●●
●● ● ● ●●
●●●●●

●●●●● ● ●●
● ● ●● ● ●● ●●●●
●●●● ●● ● ● ●●
●●● ●●●
● ● ●●● ●● ●●●● ●●
● ●●
● ● ●
●● ●● ●●●●● ●●

●● ●
●●



● ●●●
● ●● ●● ●● ● ●
●● ●● ●
● ● ●● ●●●●
●● ● ●

● ● ●
●●● ●●●● ● ●● ● ●● ● ● ● ●●●●●●●● ●●●
● ●●●●●●● ● ● ●● ●●●●●●●●●●●●●●
● ● ●
●● ●●●●●●●●●●● ●
●●●● ● ●●
●●● ● ● ● ●●● ●●
● ●●●● ● ● ● ●● ● ● ● ●●● ●●●
● ●●●●●● ●●
●●● ●● ●● ●
● ●● ●
●●● ●● ● ●● ● ●
●●●●
● ● ● ●●

●● ●
● ● ●●
●●●
●●●
● ●
●● ● ●●●●● ●● ● ● ●●
● ●●
● ●
●●

●● ● ●● ●
●● ●●●●● ●●●●● ● ● ●● ● ● ● ●●
● ●● ●●● ● ●● ● ●●●● ● ●●●●●●●● ●● ●●●●● ●●● ●●●● ●
●●●● ● ●● ● ●●
●●● ● ●
●● ●●●●
●● ● ●● ●●●●●●●●●
● ● ●●● ● ● ● ● ● ●● ●● ● ●● ● ●● ● ●● ● ● ●● ●●●●●●●●●●● ●●●● ●●●
●● ●●●●●● ● ● ●●● ●●●●●●●●●●●●● ●
● ● ●●●● ●●● ● ●●●●●●●●● ● ● ● ● ●●●● ●● ● ● ●● ● ●● ●●● ● ●● ● ● ●●● ● ●●●● ●
●● ●● ●● ●● ●●● ●●● ● ● ●●●●●●●●●●●●●●●●● ●● ●●●●●● ●●● ●● ●● ●● ● ●●

●●●●●●●● ●

● ●●● ●
●● ● ●●
●●●
● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ●
● ●
●●●● ●●●● ●●
●●● ●
●●
● ●● ● ●●●

●●● ●●
● ●● ●●●



● ● ●
● ●


● ●
−2 0 2
X
●● ●
● ●● ●

● ●●●●

●●● ●●
●●● ●●
●●● ●●●
●●●





●●●●● ● ●●
38/84
Y

A Line Through the Data: Example in R
10
5
0
−5


●● ●


●● ●


●●
● ●


● ●
● ● ●
●●
●●
●●● ●

●● ● ●●
●●
●●● ●●●
●● ● ● ●●
●●●●●

●●●●● ● ●●
● ● ●● ● ●● ●●●●
●●●● ●● ● ● ●●
●●● ●●●
● ● ●●● ●● ●●●● ●●
● ●●
● ● ●
●● ●● ●●●●● ●●

●● ●
●●



● ●●●
● ●● ●● ●● ● ●
●● ●● ●
● ● ●● ●●●●
●● ● ●

● ● ●
●●● ●●●● ● ●● ● ●● ● ● ● ●●●●●●●● ●●●
● ●●●●●●● ● ● ●● ●●●●●●●●●●●●●●
● ● ●
●● ●●●●●●●●●●● ●
●●●● ● ●●
●●● ● ● ● ●●● ●●
● ●●●● ● ● ● ●● ● ● ● ●●● ●●●
● ●●●●●● ●●
●●● ●● ●● ●
● ●● ●
●●● ●● ● ●● ● ●
●●●●
● ● ● ●●

●● ●
● ● ●●
●●●
●●●
● ●
●● ● ●●●●● ●● ● ● ●●
● ●●
● ●
●●

●● ● ●● ●
●● ●●●●● ●●●●● ● ● ●● ● ● ● ●●
● ●● ●●● ● ●● ● ●●●● ● ●●●●●●●● ●● ●●●●● ●●● ●●●● ●
●●●● ● ●● ● ●●
●●● ● ●
●● ●●●●
●● ● ●● ●●●●●●●●●
● ● ●●● ● ● ● ● ● ●● ●● ● ●● ● ●● ● ●● ● ● ●● ●●●●●●●●●●● ●●●● ●●●
●● ●●●●●● ● ● ●●● ●●●●●●●●●●●●● ●
● ● ●●●● ●●● ● ●●●●●●●●● ● ● ● ● ●●●● ●● ● ● ●● ● ●● ●●● ● ●● ● ● ●●● ● ●●●● ●
●● ●● ●● ●● ●●● ●●● ● ● ●●●●●●●●●●●●●●●●● ●● ●●●●●● ●●● ●● ●● ●● ● ●●

●●●●●●●● ●

● ●●● ●
●● ● ●●
●●●
● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ●
● ●
●●●● ●●●● ●●
●●● ●
●●
● ●● ● ●●●

●●● ●●
● ●● ●●●



● ● ●
● ●


● ●
−2 0 2
X
●● ●
● ●● ●

● ●●●●

●●● ●●
●●● ●●
●●● ●●●
●●●





●●●●● ● ●●
39/84
Y

How Do We Choose Which Line?
40/84
β0 + β1X
X
Y

One Data Point
40/84
β0 + β1X
xi
X
Y

vi: Observation i’s Deviation from β0 +β1xi
40/84
β0 + β1X
vi
β0 + β1xi
xi
X
Y

One Data Point
40/84
β0 + β1X
yi=β0 + β1xi+vi vi
β0 + β1xi
xi
X
Y

Choosing the Regression Line
􏰒 For any line β0 +β1X, the data point (yi,xi) may be written as: yi =β0+β1xi+vi
􏰒 vi will be big if β0 +β1xi is “far” from yi 􏰒 vi willbesmallifβ0+β1xi is“close”toyi 􏰒 We refer to vi as the residual
41/84

Choosing the (Population) Regression Line
yi =β0+β1xi+vi
􏰒 An OLS regression is simply choosing the βOLS,βOLS that make vi
as “small” as possible on average 􏰒 How do we define “small”?
􏰒 Want to treat positive/negative the same: consider vi2
􏰒 Choose βOLS,βOLS to minimize: 01
E[vi2] = E[(yi −β0 −β1xi)2]
01
42/84

(Population) Regression Anatomy
{βOLS,βOLS} = arg min E[(y −β −β x )2] 0 1 {β0,β1} i 0 1i
􏰒 In this simple case with only one xi , β OLS has an intuitive definition: 1
βOLS = Cov(yi,xi) 1 Var(xi)
βOLS =y ̄−βOLSx ̄ 01
43/84

Regression Anatomy (Matrix Notation)
yi =β0+β1xi+vi
􏰒 You will often see more concise matrix notation:
􏰋β0􏰌 􏰋1􏰌
β=β Xi=x 1i
􏰐 􏰏􏰎 􏰑 􏰐􏰏􏰎􏰑
2×1 2×1 y =X′β+v
􏰒 This lets us write the OLS Coefficients as:
βOLS = argminE[(y −X′β)2]
iii
⇒ βOLS = E[X X′]−1E[X y ] ii ii
{β} i i
44/84

(Sample) Regression Anatomy
βOLS = argminE[(y −X′β)2] {β} i i
βOLS = E[X X′]−1E[X y ] ii ii
􏰒 Usually do not explicitly know these expectations, so compute sample analogues:
􏰒 Where
⇒βˆOLS =(X′X)−1(X′Y)
1 x1 y1
1 x2 y2 X=. . Y=.
. . . 1 xN yn
􏰐 􏰏􏰎 􏰑 􏰐􏰏􏰎􏰑
ˆOLS N (yi −Xi′β)2
β =argmin∑ {β} i=1
N
N×2 N×1
45/84

This Should (Hopefully) Look Familiar
N
RSS(b)= ∑(yi −Xi′b)2 i=1
46/84

Estimating a Linear Regression in R
􏰒 Simple command to estimate OLS
􏰒 ols v1<-lm(y∼ x, data= ols basics) 􏰒 And to display results: 􏰒 summary(ols v1) 47/84 A Line Through the Data: Example in R 10 5 0 −5 ● ● ● ●● ● ● ●● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ●●● ● ●● ●● ● ●● ●● ●●● ● ● ●● ● ● ●●● ● ●●●●● ● ●●●● ● ●● ● ●●●● ●● ● ● ● ● ● ● ●● ● ● ●●●● ● ● ● ●●● ● ● ●●● ●●●● ●● ● ●● ● ● ●● ●●● ●●●●● ●●● ● ●● ● ●● ● ● ●●● ● ●● ●● ●● ● ● ● ●●●●●●●●● ●● ● ● ● ●●● ● ●●● ●●●●● ● ●● ● ●● ●●●●●● ●● ● ● ●●● ● ● ● ● ● ●● ●● ● ●● ● ●● ● ●●● ● ● ● ●● ● ●● ● ●●● ●● ●●● ● ● ●●●● ● ● ● ● ●● ●● ●● ● ● ● ●● ●●●●●● ●● ●● ● ●● ●●● ● ●● ● ● ●●● ● ●●●● ● ●●● ● ●● ●●●●● ● ● ● ●●● ●● ●●●●●● ●●●● ●● ● ● ● ●●●●●● ● ●●●●●● ● ● ● ● ● ● ● ●● ● ● ●● ●● ●● ● ● ● ● ● ●● ● ● ●●● ● ●● ● ● ● ● ● ●●● ● ● ● ● ●● ●●● ● ● ● ● ●● ●●●● ●●●● ●● ● ● ● ●●●●●● ●●●●● ●● ●●●●● ● ●● ●●●●●● ●●● ● ● ●●● ● ● ● ●●● ●●● ● ● ●● ●●● ●●● ● ●● ●● ● ●● ● ● ●● ● ● ●●●● ●● ●●● ● ● ● ● ●● ●●● ● ● ●● ● ●●●●● ●● ● ● ●● ● ●●● ●● ● ● ●● ●●●●●● ●●●●● ● ● ●● ● ● ● ●● ● ●● ●●● ●●●● ● ●●●● ● ●●●●●●●● ●●●● ● ● ●●● ●● ●●● ●● ● ●● ●●● ● ●●● ● ●● ● ●● ● ● ● ●●●●●●●●●● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ● ● ●●●● ●● ● ●● ●● ● ●●●● ● ● ●● ● ●●● ● ● ●●● ● ● ● ● ● ●● ● ● ●●● ●●● ●●●● ●●●● ● ● ●●●● ●●● ● ● ●●● ● ●● ● ●● ● ●●● ● ●●● ●● ● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● −2 0 2 X ●● ● ● ●● ● ● ● ●●●● ● ●●● ●● ●●● ●● ●●● ●●● ● ● ● ● ● ● 􏰒 Intercept looks something like 1 􏰒 Slope, approximately 2? 48/84 Y OLS in R 49/84 Recall, for comparison 50/84 Regression and the Conditional Expectation Function 􏰒 Why is linear regression so popular? 􏰒 Simplest way to estimate (or approximate) conditional expectations! 􏰒 Three simple results 􏰒 OLS perfectly captures CEF if CEF is Linear 􏰒 OLS generates best linear approximation to the CEF if not 􏰒 OLS perfectly captures CEF with binary (dummy) regressors 51/84 Regression captures CEF if CEF is Linear 􏰒 Take the special case of a linear conditional expectation function: E[y |X ] = X′β iii 􏰒 Then OLS captures E[yi|Xi] 10 5 0 −5 ● ● ● ●● ● ● ●● ●● ● ●● ● ●● ● ● ● ● ●● ● ● ● ●● ●● ●●● ● ● ●● ● ●● ●● ●●●● ●● ● ●● ●● ●●●●● ●● ● ● ●●● ● ●● ● ● ●●●● ●●●● ●● ● ● ● ● ● ● ●●● ● ● ● ●● ● ●●●● ●● ● ●● ●● ● ●●● ●●●●● ●●● ● ●● ● ●● ● ● ●●● ● ●● ●● ●● ● ● ● ● ●●●●●●●●●● ●● ●●●●● ● ●●●● ● ●●●●●●●● ● ● ●●● ●●● ● ● ●●●● ● ● ● ● ●●●●●●● ● ●● ● ●● ●●●●●●●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●●●●●●●●●● ●● ● ●●●● ●●●●● ● ● ●●●● ●●●●● ●● ●●●●● ● ●●● ●●●● ●●●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ●● ●● ● ● ● ● ● ●● ●●● ● ● ●● ●● ● ●● ● ● ●● ● ● ●●●● ●● ● ● ●● ● ●●● ●●● ● ● ●●● ●● ● ●● ●● ●● ●● ● ●● ● ● ●●● ● ● ● ● ● ● ●● ● ●● ●●● ●●●● ● ●●●● ● ●●●● ● ● ● ● ●●●●●●●●● ●● ●● ●●● ● ●●●●● ●●● ● ●●● ● ●● ●● ●●● ● ● ●●●●●●●● ●●● ● ●●●●●●● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ●●●● ●●● ● ●● ● ● ●●●●● ●● ●● ●● ● ● ● ● ● ●●●● ●●●● ● ●● ● ●●● ●●●●●● ●●●●●●● ● ●● ●●● ●●●● ● ●●●●● ● ●●● ●● ●● ● ●● ●●● ● ● ●● ● ●● ● ●● ● ●●●● ●●● ●●●● ● ● ●●● ● ●●●● ● ● ●●● ● ● ●● ● ● ●●●● ●● ● ● ● ● ● ●●● ●● ●●● ●●●● ●● ●●●● ●●●● ● ● ●●●●● ●●● ● ●● ●● ●●● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● −2 0 2 X ● ● ●● ● ●●●●● ●● ● ● ●● ● ●●● ●● ●● ● ●● ● ● ● ● ● ● ● ● 52/84 Y Conditional Expectation Function Often Non-Linear 10 ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ●●● ● ● ● ● ● ● ● ●● ● ● ● ●● ●●● ●●●● ●● ● ●●●●● ●● ●● ● ● ● ●●● ●●● ●● ● ● ●●●● ● ●● ● ●●● ● ● ●● ● ● ●●●● ● ● ●● ● ●● ● ● ●●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ●●● ●●● ● ●● ● ●● ●● ● ● ●● ● ● ●●● ● ●● ●● ●●● ● ● ●● ● ● ● ●● ● ●● ● ● ● ●● ● ● ●● ●●● ● ●● ●●●● ● ● ●●● ● ● ● ● ● ● ●●● ● ●● ● ● ●●● ● ●● ● ● ●●●●● ● ● ● ●● ● ●●● ● ● ● ● ●● ● ●●●●● ●●●● ● ●● ●●● ●● ●● ●●● ● ● ●●● ●● ●●●●● ●●●● ●●●●●●●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ●● ●● ● ●●●● ● ●● ●●●●● ● ● ●● ●● ●●●●● ●●●●●● ● ●● ●● ●● ●● ●●●● ●●● ●● ●●● ●●●●●● ●● ● ●●● ●●●●●● ● ● ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ●● ●●●● ● ● ●●● ● ●●● ● ●● ● ●● ● ● ●● ● ● ●●● ● ● ●●●●● ● ● ●●● ●● ●● ●● ●●● ● ●●● ●●●●● ● ● ●● ● ●●●● ● ● ● ● ●● ● ● ●●● ● ●●●● ●● ●● ●●●● ● ● ●●●●●●● ● ●● ●●●●● ●● ● ● ● ● ● ● ●● ● ● ●●●● ●● ●●●● ●● ●● ● ●●●●● ● ●●●● ●●● ●● ● ●● ●●●●● ●●● ●● ●● ● ● ●●● ● ●●●●● ●●●● ● ●● ●●● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● 0 ● ● −10 ● ● −2 0 2 X 53/84 Y_nl OLS Provides Best Linear Approximation to CEF 10 −10 0 ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ●● ●● ● ●● ● ● ●● ●●● ●● ● ●●● ●● ● ●● ●●●● ●● ● ● ● ●●●●●● ●● ●●●● ● ●●●● ● ● ● ● ●● ● ● ● ●● ● ● ● ●●● ●● ●●● ● ●● ● ●● ● ● ● ● ● ● ●●● ● ● ● ●● ● ● ● ●●●● ● ● ●● ● ●● ●● ● ●● ● ●● ●●●●●●● ● ● ●●●●● ●● ●● ●● ● ● ●● ●● ●●●●●●● ●●●●●● ● ● ●●● ●●● ● ●● ●●●●● ●●● ●●● ● ● ●● ●●●●●● ●● ● ● ●●●●●● ● ●● ●●● ● ● ●● ●● ● ● ●● ●●●● ● ● ●● ● ●●● ●● ● ● ●●●●● ● ●● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ●●● ●● ●●●● ●● ●●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ●●●●● ● ● ●●●●● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ●●●●● ●●●● ●●●●●●●● ● ●●● ● ●●● ● ●● ● ● ● ●● ● ● ●● ●●●● ●●● ●●●●● ● ● ● ● ● ●● ● ●●●●●● ● ● ● ●● ● ● ● ● ● ● ●●●●●● ● ●● ●●● ● ● ● ●● ● ● ● ● ●● ●● ● ●●●●● ● ● ● ●● ● ● ● ● ●● ●● ●●●● ●●●● ●●●●● ● ● ●●●● ● ● ● ●● ● ●●●● ● ● ●●●● ● ●●● ● ●●●● ●● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●● ●●● ●●●●●● ● ●●●●● ●●● ●● ●● ●●● ●●● ●● ● ●●● ● ●●●●● ●●●● ●● ● ●● ●● ● ●● ●●●● ●●● ● ● ● ●● ● ● ●● ●●●●● ●● ●●●●●●● ●●●●●●● ● ● ● ●● ● ● ●●● ●●● ●● ●●● ●●●● ● ● ●●● ● ●● ●●●● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● −2 0 2 X ● ● ● ● ● ● ● ● 54/84 Y_nl OLS Provides Best Linear Approximation to CEF 􏰒 In most contexts, OLS will not precisely tell us E[yi|Xi] 􏰒 But captures key features of E[yi|Xi] 􏰒 In many contexts this approximation is preferred 􏰒 Even if more complex techniques provide a better fit of the data... 􏰒 Transparent/simple to estimate/easy to digest 55/84 Simple Conditional Expectation with a Dummy Variable 􏰒 There is one more important context in which OLS perfectly captures E[yi|Xi] 􏰒 Dummy variables 􏰒 A Dummy Variable is a binary variable that takes the value of 1 if some condition is met and 0 otherwise 􏰒 For example, suppose we have stock data, and want to create an variable that indicates whether a given stock is classified in the IT sector: Security 3M Company Activision Blizzard Aetna Inc Apple Inc. Bank of America . Xerox Corp. Price Sector IT $247.95 Industrials 0 $70.74 Information Tech. 1 $187.00 Health Care 0 $178.36 Information Tech. 1 $31.66 Financials 0 . . . $31.75 Information Tech. 1 56/84 Simple Conditional Expectation with a Dummy Variable 􏰒 Only two possible values of the conditional expectation function: E[Pricei|ITi =1] or E[Pricei|ITi =0] 􏰒 If we run the regression: Price = βOLS +βOLSIT +vOLS i01ii 􏰒 βOLS and βOLS allow us to recover both! 01 βOLS = E[Price|IT =0] 0ii 􏰐 􏰏􏰎 􏰑 Expected Price for non IT stocks βOLS = E[Price |IT = 1]−E[Price |IT = 0] 1iiii 􏰐 􏰏􏰎 􏰑 Expected Price Difference for IT Stocks 􏰒 Aside: True both in the population and in-sample 57/84 OLS and Conditional Expectations: In Practice β OLS 1 β OLS β OLS 00 Non IT Stocks IT Stocks 􏰒 Average for Non IT Stocks: $92.090 􏰒 Average for IT Stocks: $113.475 58/84 Share Price (USD) 0 20 40 60 80 100 Regressions with Dummy Variables 􏰒 Suppose we regress price on a constant and an IT dummy 􏰒 And we recover βˆOLS and βˆOLS 􏰒 What will the value of βˆOLS be? 1 􏰒 Vote: Go to menti.com 􏰒 Recall: 􏰒 Average for Non IT stocks: $92.090 􏰒 Average for IT stocks: $113.475 01 59/84 Implementing Regression with Dummy Variables 􏰒 It is useful to note that this is simply a mechanical feature of our estimator 􏰒 Letting yi = pricei , xi = IT , Nx be the number of IT observations 􏰒 Our OLS estimates are:   􏰃􏰄  ∑yi  is.t.xi=0N−Nx  􏰐 􏰏􏰎 􏰑  Average Price for non IT stocks   ˆOLS  β0 = (X′X)−1(X′Y ) =  ˆOLS β1 yi− yi   Average Price Difference for IT stocks ∑∑  N N−N  is.t.xi=1 x is.t.xi=0 x  􏰐 􏰏􏰎 􏰑 60/84 Implementing Regressions with Categorical Variables 􏰒 What if we are interested in comparing all 11 GISCS sectors? 􏰒 Create dummy variables for each sector omitting 1 􏰒 Lets call them D1i,···,D10i pricei = β0 +δ1D1i +···+δ10D10i +vi 􏰒 In other words Xi = [1 D1i ···D10]′ or 1 0 ··· 1 0 1 1 ··· 0 0 1 0 ··· 0 0 1 0 ··· 1 0 X=  􏰒 Regress pricei on a constant and those 10 dummy variables 1 0 ··· 0 1 . . . . . . . .. . . 1 1 ··· 0 0 61/84 Average Share Price by Sector for Some S&P Stocks OLS δ4 OLS δ6 OLS δ1 OLS OLS δ2 δ3 OLS δ7 OLS δ5 OLS δ8 OLS δ10 OLS δ9 OLS β0 62/84 Share Price (USD) 0 20 40 60 80 100 Cons. Discret. Cons. Staples Energy Financials Health Care Industrials IT Materials Real Estate Telecom Utilities Implementing Regressions with Dummy Variables 􏰒 βˆOLS (coef. on the constant) is the mean for the omitted category: 0 􏰒 In this case “Consumer Discretionary” 􏰒 The coefficient on each dummy variable (e.g. δˆOLS) is the difference k between βˆOLS and the conditional mean for that category 0 􏰒 Key point: If you are only interested in categorical variables... 􏰒 You can perfectly capture the full CEF in a single regression 􏰒 For example: E[price|sector =consumerstaples]=βOLS+δOLS ii 01 E[price|sector =energy]=βOLS+δOLS ii 02 . 63/84 Very Simple to Implement in R 􏰒 R has a trick to estimate regressions with categorical variables 􏰒 ols sector<-lm(price∼as.factor(sector), data= s p price) 64/84 Why Do We Leave Out One Category? Flashback: 65/84 Why Do We Leave Out One Category? Flashback: 66/84 Why Do We Leave Out One Category? 􏰒 X has full column rank when all of its columns are linearly independent 􏰒 Suppose we had a dataset of 6 stocks from two sectors: 􏰒 e.g. Consumer Discretionary and IT 􏰒 And suppose we include dummies for both sectors 1 0 1 1 1 0 1 0 1 X=1 0 1 1 1 0 110 􏰒 Are the columns of X linearly independent? 67/84 Why Do We Leave Out One Category? 􏰒 Perhaps a more intuitive explanation: suppose we include all sectors: pricei = β0 +δ1D1i +···+δ10iD10i +δ11D11i +vi 􏰒 Then the interpretation of βOLS =E[price |D =0,···,D =0,D =0] 􏰒 e.g. expected price for stocks that belong to no sector—nonsensical 0 i1i 10i 11i 􏰒 Not specific to this example, true for any categorical variable 􏰒 Forgetting to omit a category sometimes called the “dummy variable trap” 􏰒 An alternative: If you omit the constant from a regression, you can include all categories 68/84 OLS Part 2: A Predictive Model 􏰒 Suppose we see 1500 observations of some outcome yi 􏰒 Example: residential real estate prices 􏰒 We have a few characteristics of the homes 􏰒 E.g. square feet/year built 􏰒 Want to build a model that helps us predict yi out of sample 􏰒 I.e. the price of some other home 69/84 We Are Given 100 Observations of yi 20 10 0 −10 −20 0 25 50 75 100 Observation ●● ●● ● ●●●● ● ● ●● ●● ● ● ●● ●●●● ●●●●● ● ●●● ●●●●● ●● ●●● ●●●● ●●● ● ●● ●● ● ● ● ●●●● ● ● ●● ●● ● ● ●● ● ● ●● ●● ●● ●● ●●●●●●● ● ●●●● ●● ● ●● ● ● 70/84 Outcome How Well Can We Predict Out-of-Sample Outcomes (yoos) i 20 10 0 −10 −20 0 25 50 75 100 Observation ● ● ●● ● ● ● ● ● ●● ●● ● ●●● ● ●●●●● ● ●●● ●●●●●● ●●●●● ● ●●●● ●●●● ● ● ● ●● ● ● ●● ●● ●● ● ●●● ● ● ●●● ●●●●●● ● ●●●● ●●● ●●●●● ●● ● ●●● ● ● ● ● ● ● 71/84 Outcome and Prediction Our Best Prediction (yˆoos) i 20 10 0 −10 −20 0 25 50 75 100 Observation ● ● ● ●● ●● ●● ●● ●● ● ●●●●●●● ● ●●● ●●● ● ●●● ●●●●●● ● ●●● ●●● ● ●●●● ●●●● ●●● ●●● ● ● ● ● ●●●●●● ●●● ●●● ● ●●● ●● ● ● ●●● ● ● ●●● ●● ● ● ●● ● ● 71/84 Outcome and Prediction Prediction vs reality (yˆoos vs. yˆoos) ii 20 10 0 −10 −20 0 25 50 75 100 Observation ●● ●●●● ●●● ● ● ● ● ● ●●●●●● ●●● ●●●●● ● ●●●● ●●●●● ●●● ● ●● ●●●● ●●●● ●●● ●●●●●●●● ● ● ● ●●●● ● ●● ●●●●● ●●● ●●● ● ●● ●● ●●●●●● ●●● ● ● ●●● ●●●● ● ● ●●●●● ●●●●●● ●●●●●● ●● ● ●●● ●●● ●● ●●●●●●● ●●● ●●● ●●●●●● ● ●●● ●●● ●● ●●●● ●● ●● ●●● ● ●● ● ●●● ●●● ● ● ● ● ● ● ●● 71/84 Outcome and Prediction A Good Model Has Small Distance (yoos −yˆoos)2 i i 20 10 0 −10 −20 0 25 50 75 100 Observation ●● ●●● ●● ● ●● ●● ●● ●●● ● ●●●●●● ●●● ●●●● ● ●●● ●●●● ●●●●●●● ●●●●●● ●●●●● ●●●● ●● ● ● ● ● ● ●● ●● ●●●●●● ●●● ● ● ●●●● ●●● ●●●●●●●●●●● ●●●● ● ● ●●● ●●●●● ●●●●●● ●●●● ●●●●● ● ●● ● ● ● ●● ● ●●● ●●●●●● ● ●●● ● ● ● ● ● ●● ●●● ●● ●●●● ●● ●● ● ●●● ● ●●● ● ● ● ● ● 72/84 Outcome and Prediction Measure of Fit: Out of Sample Mean Squared Error (yoos −yˆoos)2 MSEoos=∑ i Noos 73/84 Example Using Ames Housing Data 􏰒 Predict log prices with year home is built 􏰒 Predict log prices with year home is built and square footage 􏰒 Menti: What is MSEoos in the more complex model? 74/84 A Few Practical Details When Using OLS 􏰒 Dummys and Continuous Variables 􏰒 Scaling Coefficients 􏰒 Data Transformations 75/84 Dummy and Continuous variables 􏰒 Suppose we want to combine dummy and continuous variables 􏰒 Consider the impact of education on wages 􏰒 Let Yi be wages, Xi be years of education 􏰒 Let Dmale,i be a dummy variable equal to 1 for males, 0 otherwise Yi = β0 = β1Xi +δmaleDMale,i +vi 76/84 Education and Wages with a Male Dummy δ OLS Male β OLS 0 Y =β OLS+β OLSX i01i Y =β OLS+β OLSX +δ OLS i 0 1 i Male Years of Education (Xi) 77/84 Wages (Yi) Dummy and Continuous variables 􏰒 Similar interpretation of dummies as before with one caveat 􏰒 β OLS is mean of the omitted category (non-males) when Xi = 0 􏰒 δ OLS is the difference in wages for males when Xi = 0 male 􏰒 OLS coefficients can be interpreted as (differences in) means with continous variables set to 0 􏰒 Sometimes referred to as group or category specific intercepts 􏰒 Works with many dummies: 􏰒 e.g. different “intercepts” for each sector 0 78/84 Scaling Variables: Independent Variables yi =β0+β1xi+vi 􏰒 yi wages (in $), and xi as years of eduction 􏰒 Suppose we want to change the units of xi ? 􏰒 E.g. convert to months of education: xmonths = x ×12 ii 􏰒 β1 will simply scale accordingly: y =β +β1xmonths+v i012i i 􏰒 Intercept, R2, statistical significance unchanged 79/84 Scaling Variables: Dependent Variable yi =β0+β1xi+vi 􏰒 yi wages (in $), and xi as years of eduction 􏰒 Suppose we want to change the units of Yi ? 􏰒 E.g. Convert $ to 1000s of $ y1000 = y /1000 ii 􏰒 β0, β1, vi will scale accordingly: y1000 = 1000×β +1000×β x +1000×v i01ii 􏰒 Again R2, statistical significance unchanged 80/84 Percent vs. Percentage Point Change 􏰒 Percent change is proportionate (or relative) change x1−x0 ×100 x0 􏰒 Percentage point change is a raw change in percentages 􏰒 For example: consider the unemployment rate (in %) 􏰒 If unemployment goes from 10% to 11%: 􏰒 1 percentage point change 􏰒 (10−9)×100=10%change 10 􏰒 Take care to distinguish between them 81/84 Quadratics and Higher Order Polynomials 􏰒 Can often do better job of approximating the CEF using higher order polynomials, for example: y =β +β x +β x2+v i01i2ii 􏰒 Downside: the relationship between Xi Yi harder to summarize: ∂yi =β1+2β2xi ∂xi 􏰒 Changes for different values of Xi 􏰒 Tradeoff between quality of approximation and simplicity 82/84 This Week (1) Introduction to the conditional expectation function (CEF) 􏰒 Why is the CEF a useful (and widely used) summary of the relationship between variables Y and X (2) Ordinary Least Squares and the CEF 􏰒 Review, implementation, and the utility of OLS 83/84 Next Week 􏰒 The Basics of Causal Inference 84/84