The Multivariate Linear Regression Analysis and Inference
Zhenhao Gong University of Connecticut
Welcome 2
This course is designed to be:
1. Introductory
2. Leading by interesting questions and applications 3. Less math, useful, and fun!
Most important:
Feel free to ask any questions!
Enjoy!
Multivariate regression analysis
Unbiasedness and Consistency 4
There are two terms that are often used to decide whether an estimator is good or not:
Unbiasedness: An estimator is unbiased if, the mean of the sampling distribution of the estimator is equal to the true parameter value.
Consistency: An estimator is consistent if, as the sample size increases, the sampling distribution of the estimator becomes increasingly concentrated at the true parameter value.
Omitted Variable Bias 5
The OLS estimator will have omitted variable bias when two conditions are true:
When the omitted variable is a determinant of the dependent variable
When the omitted variable is correlated with the included regressor
Remark: Omitted variable bias means that the first least squares assumption, E(ui|Xi) = 0, is incorrect.
Bias Formula 6
Let the correlation between Xi and ui be corr(Xi, ui) = ρxu. Then the OLS estimator has the limit
βˆ →p β + ρ σ u . 1 1 xuσx
That is, as the sample size increases, βˆ1 is close to β1 + ρxu σu
with increasingly high probability (βˆ1 is biased and inconsistent).
σx
Summary 7
Omitted variable bias is a problem whether the sample size is large or small.
Whether this bias is large or small in practice depends on the correlation ρxu between the regressor and the error term. The larger |ρxu| is, the larger the bias.
The direction of the bias in βˆ1 depends on whether X and u are positively or negatively correlated.
Question: What can we do about omitted variable bias?
The Multiple Regression Model 8
Consider the case of two regressor: Yi=β0+β1X1i+β2X2i+ui, i=1,2,···,n
X1, X2 are the two independent variables (regressors)
β0 = unknown population intercept
β1 = effect on Y of a change in X1, holding X2 constant β2 = effect on Y of a change in X2, holding X1 constant ui = the regression error (omitted factors)
The OLS Estimators 9
The OLS estimators βˆ0, βˆ1, and βˆ2 solves: n
min u2i β0,β1,β2 i=1
n
= min [Yi − (β0 + β1X1i + β2X2i)]2. β0,β1,β2 i=1
Measures of Fit 10
Three commonly used summary statistics in multiple regression are the standard error of the regression (SER), the regression R2, and the adjusted R2 (as known as R ̄2)
R ̄2 = R2 with a degrees-of-freedom correction that adjusts for estimation uncertainty; R ̄2 < R2
Remark: The R2 always increases when you add another regressor. It’s a bit of a problem for a measure of “fit”.
The Least Squares Assumptions 11
For the multiple linear regression model
Yi =β0+β1X1i+β2X2i+···+βkXki+ui, i=1,2,···,n
Assumption 1: The conditional distribution of ui given X1i,··· ,Xki has a mean of zero: E(ui|X1i,··· ,Xki) = 0; Failure of this condition leads to omitted variable bias.
Assumption 2: (Xi, Yi), i = 1, · · · , n, are independent and identically distributed (i.i.d.);
Assumption 3 : Large outliers are unlikely
Assumption 4 : No Perfect Multicollinearity
Statistical Inference in Multiple Regression
Inference for a single coefficient 13
Hypothesis tests and confidence intervals for a single coefficient in multiple regression follow the same logic and recipe as for the slope coefficient in a single-regressor model.
When n is large, βˆ1 is normally distributed (CLT),
βˆ 1 − E ( βˆ 1 )
That is, √ ˆ ∼ N(0,1).
Var(β1 )
Thus hypotheses on β1 can be tested using the usual
tstatistic, and 95% confidence intervals are constructed as {βˆ1 ± 1.96 × SE(βˆ1)}.
So too for β2,··· ,βk!
Tests of Joint Hypotheses 14
Consider the multiple regression model:
W agei = β0 + β1Educi + β2Experi + ui
The null hypothesis that “Education and work experience don’t matter,” and the alternative that they do, corresponds to:
H0 : β1 = 0 and β2 = 0
vs.H1 : either β1 ̸=0 or β2 ̸=0 or both
F-statistic 15
The F-statistic tests all parts of a joint hypothesis at once. Formula for the special case of the joint hypothesis β1 = 0 and
β2 = 0 in a regression with two regressors:
1 t 2 + t 2 − 2 ρˆ t t
F= 1 2 t1,t212, 2
2 1−ρˆt1,t2
where ρˆt1,t2 estimates the correlation between t1 and t2, and
t1= βˆ1−0, t2= βˆ2−0. S E ( βˆ 1 ) S E ( βˆ 2 )
F-statistic 16
Under the null, t1 and t2 are independent, so ρˆt1,t2 →p 0; in large samples the formula becomes
1 t 2 + t 2 − 2 ρˆ t t 1
F=12 t1,t212≈(t21+t2), 2
2 1−ρˆt1,t2 2
The average of two squared standard normal random variables, which is defined as the chi-squared distribution with 2 degrees of freedom.
F-statistic 17
In large samples, F is distributed as χ2q/q. (n ≥ 100)
Extension: nonlinear regression models 18
If a relation between Y and X is nonlinear:
The effect on Y of a change in X depends on the value of
X – that is, the marginal effect of X is not constant.
A linear regression is mis-specified – the functional form is
wrong.
The estimator of the effect on Y of X is biased.
The solution to this is to estimate a regression function that is nonlinear in X.
Difference in slopes 19
In Figure (a), the population regression function has a constant slope;
In Figure (b), the slope of the population regression function depends on the value of X1.
Real data 20
The TestScore – Average district income relation looks like it is nonlinear, so the linear OLS regression line does not adequately describe the relationship between these variables:
Quadratic Regression Model 21
A quadratic population regression model relating test scores and income can be written as
TestScorei = β0 + β1Incomei + β2Income2i + ui,
β0, β1 and β2 are coefficients
Income2i is the square of income in the i-th district ui is the error terms, represents all the other factors the population regression function
f(X1,X2,...,Xk)=β0 +β1Incomei +β2Income2i.
The quadratic OLS regression function fits the data better than the linear OLS regression function.
Quadratic Regression Model 23
The quadratic population regression model above is in fact a version of the multiple regression model with two regressors: The first regressor is Income, and the second regressor is
I ncome2 .
So like the multiple regression model, we can estimate and test β0,β1 and β2 in the quadratic population regression model using the OLS methods we have learned before.
Log function 24
Log regression models 25
The coefficients can be estimated by OLS method.
Hypothesis tests and confidence intervals are computed as
usual.
Choice of specification should be guided by judgment (which interpretation makes the most sense in your application?), tests, and plotting predicted values.
Linear-log model 26
Linear-log regression model:
Yi =β0 +β1ln(Xi)+ui,i=1,...,n
Interpretation: a 1% change in X (∆X/X = 0.01) is associated with a change in Y of 0.01β1.
∆Y = [β0 +β1 ln(X +∆X)]−[β0 +β1 ln(X)] ∼ ∆X
Remark: ln(x + ∆x) − ln(x) ∼= ∆x when ∆x is small . xx
= β1 X = 0.01β1.
Log-linear model 28
Log-linear regression model:
ln(Yi)=β0 +β1Xi +ui,i=1,...,n
Interpretation: a one-unit change in X(∆X = 1) is associated with a (100 × β1)% change in Y.
ln(Y +∆Y)−ln(Y)∼= ∆Y =β1∆X, Y
Remark: Log-linear regression model is a linear regression model!
Log-log model 29
Log-log regression model:
ln(Yi)=β0 +β1ln(Xi)+ui,i=1,...,n
Interpretation: a 1% change in X(∆X/X = 1%) is associated with a β1% change in Y.
ln(Y +∆Y)−ln(Y)∼= ∆Y =β1∆X, Y
Thus in the log-log specification β1 is the ratio of the percentage change in Y associated with the percentage change in X (Price Elasticity!).
The log-log specification fits better than the log-linear specification.