Review of Regression Analysis
Zhenhao Gong University of Connecticut
Welcome 2
This course is designed to be:
1. Introductory
2. Focusing on the core techniques with the widest applicability
3. Less math, useful, and fun! Most important:
Feel free to ask any questions!
Enjoy!
Review 3
Statistical inference
Sampling distribution of Y ̄
Sample mean and variance
Law of large numbers and CLT Jarque-Bera (JB) test
Hypothesis testing
P-value and significant levels
Confidence interval
Simple Linear Regression Analysis and Inference
Simple Linear Regression Model 5
Yi=β0+β1Xi+ui, i=1,2,···,n
Y is the dependent variable and X is the independent
variable or the regressor.
β0 + β1Xi is the population regression line.
The intercept β0 and the slope β1 are the coefficients (parameters) of the population regression line.
ui is the error term. It incorporates all of the factors that affect Yi except Xi.
Scatterplot of hypothetical data 6
Estimating the coefficients 7
In practice, the intercept β0 and the slope β1 are unknown, so we must use the sample of data at hand to estimate them!
How?
Scatterplot of the wages data 8
The sample correlation is 0.398, indicating a weak positive relationship between the two variables.
Draw a regression line 9
If one could somehow draw a straight line through these data, the slope of this line would be an estimate of β1
the intercept of the line would be an estimate of β0
Furthermore, we want to find the linear function of x (Educ) gives the best forecast of y (Log Wage), But how? What does the ”best forecast” means?
Least square estimation 10
We want to find the line that best fits the data points, in the sense that the sum of squared vertical distances of the data points from the fitted line is minimized.
When we ”run a regression”, or ”fit a regression line”, that’s what we do.
The estimation strategy is called least squares.
Fit a regression line into the data:
The sum of squared vertical distances of the data points from the fitted line is minimized
The OLS estimators 12
The ordinary least squares (OLS) estimators estimate the intercept (β0) and slope (β1) by minimizing the sum of squared mistakes. That is, the OLS estimators solves:
n
min (Yi − β0 − β1Xi)2. β0,β1 i=1
The OLS estimators 13
The OLS estimators are denoted as βˆ0 and βˆ1. They are ˆ ni=1 Xi − X ̄ Yi − Y ̄
β1 = ni=1 Xi − X ̄ 2 , βˆ 0 = Y ̄ − βˆ 1 X ̄ .
The OLS regression line, also called the sample regression line can be constructed as βˆ0 + βˆ1X.
The predicted value of Yi given Xi is Yˆi = βˆ0 + βˆ1Xi and the residual for the i-th observation is uˆi = Yi − Yˆi.
The Least Squares Assumptions 14
For the linear regression model Yi=β0+β1Xi+ui, i=1,2,···,n
Assumption 1: The conditional distribution of ui given Xi has a mean of zero: E(ui|Xi) = 0;
Assumption 2: (Xi, Yi), i = 1, · · · , n, are independent and identically distributed (i.i.d.) draws from their joint distribution;
Assumption 3 : Large outliers are unlikely
The Conditional Distributions 15
Remark: The Conditional Distributions of ui given Xi.
Outlier 16
Remark: OLS can be sensitive to an outlier.
Inference for OLS estimators 17
The OLS estimators is computed from a sample of data. A different sample yields a different value of βˆ1. This is the source of the “sampling uncertainty” of βˆ1.
We want to:
Quantify the sampling uncertainty associated with βˆ1 Use βˆ1 to test hypotheses such as β1 = 0
Construct a confidence interval for β1
Remark: β1 is unknown in population.
The Sampling Distribution of the βˆ 18 1
Like Y ̄ , βˆ1 has a sampling distribution. If the three least Square assumptions hold, then
E(βˆ1) = β1 (that is, βˆ1 is unbiased)
Var(βˆ ) = 1 × Var[(Xi−μx)ui], which is inversely
1 n ( σ x2 ) 2
proportional to n and the variance of X.
Other than its mean and variance, the exact distribution of βˆ1 is complicated when n is small.
When n is large, βˆ1 is normally distributed (CLT),
βˆ 1 − E ( βˆ 1 )
That is, √ ˆ ∼ N(0,1).
Var(β1 )
Summary 19
1. Object of interest: β1 in,
Yi = β0 + β1Xi + ui, i = 1, · · · , n
β1 = ∆Y/∆X, the change of Y for the units change in X 2. Estimator: the OLS estimator βˆ1.
3. The Sampling Distribution of the βˆ1: Under the three Least Squares Assumptions, for large n, βˆ1 is normally distributed (CLT).
4. Formula for SE(βˆ1):
Recall Var(βˆ ) = 1 × Var[(Xi−μx)ui], we can get the formula for
1 n ( σ x2 ) 2
SE(βˆ1) by taking square root of the estimator of Var(βˆ1).
Hypothesis testing 20
The objective is to test a hypothesis, like β1 = 0, using data to tell us whether the (null) hypothesis is correct or incorrect.
Steps for Hypothesis testing
Step 1: Null hypothesis and two-sided alternative:
H0 :β1 =β1,0 vs. H1 :β1 ̸=β1,0
where β1,0 is the hypothesized value under the null. For example, β1,0 = 0.
Steps for hypothesis testing 21
Step 2: Compute the test statistic, t=βˆ1−β1,0 ∼N(0,1)whennislarge.
S E ( βˆ 1 )
Step 3: Choose a significance level of the test (5%) or
compute the p-value.
Step 4: Apply the decision rule. Reject the hypothesis at the 5% significance level if |t| > 1.96 or, equivalently, if the p-value is less than 0.05.
Confidence Interval for β1 22
When the sample size is large,
95% confidence Intervals for β1
= [βˆ1 − 1.96SE(βˆ1), βˆ1 + 1.96SE(βˆ1)].
Interpretation: it is an interval that has a 95% probability of containing the true value of β1.
Measures of Fit 23
Two regression statistics provide complementary measures of how well the regression line “fits” or explains the data:
The R2 measures the fraction of the variance of Yi that explained by Xi. It ranges between 0 (zero fit) and 1 (perfect fit).
The standard error of the regression (SER) measures how far Yi typically is from its predicted value. Small SER mean the observations tightly clustered around the regression line.
Recall 24
Regression Output in R 25
Accordingly, the OLS regression line is
LNWAGE = 1.27 + 0.08 × EDUC.