Financial Econometrics and Data Science
Linear Regression Models, Multiple Linear Regression Models, and Assumptions
3. Linear Regression Models (LRMs)
Copyright By PowCoder代写 加微信 powcoder
3.1 Assumptions Underlying LRMs
3.2 Precision and Standard Errors
3.3 Statistical Inference and Hypothesis Testing
3.4 t-Test and t-ratio
4. Multiple Linear Regression Model (MLRM)
4.1 Generalising the LRM
4.2 MLRM OLS
4.3 TheF-Test
4.4 Goodness of Fit
5. Violating the Assumptions of LRMs
3. Linear Regression Models (LRMs)
3.1 Assumptions Underlying LRMs
3.1 Assumptions Underlying LRMs
The Assumptions Underlying the Classical Linear Regression Model (CLRM)
The model which we have used is known as the classical linear regression model
yt = α + βxt + ut
We observe data for xt, but since yt also depends on ut, we
must be specific about how the ut are generated.
We usually make the following set of assumptions about the
ut’s (the unobservable Technical notation (1) E(ut) = 0
(2) var(ut) = σ2
(3) cov(ui, uj) = 0 (4) cov(ut, xt) = 0
error terms):
Interpretation
The errors have zero mean
The variance of the errors is constant and
finite over all values of xt
The errors are linearly independent of
one another
There is no relationship between the error
and corresponding x variate
3.1 Assumptions Underlying LRMs
An alternative assumption to (4), which is slightly stronger, is that the xt’s are non-stochastic or fixed in repeated samples.
A fifth assumption is required if we want to make inferences about the population parameters (the actual α and β) from the sample parameters (αˆ and βˆ)
Additional assumption
(5) ut is normally distributed
3.1 Assumptions Underlying LRMs
Properties of the OLS Estimator
If assumptions (1) through (4) hold, then the estimators and determined by OLS are known as Unbiased Estimators (BLUE).
What does the acronym stand for?
‘Estimator’ – αˆ and βˆ are estimators of the true value of
‘Linear’ – αˆ and βˆ are linear estimators
‘Unbiased’ – on average, the actual values of αˆ and βˆ will be equal to their true values
‘Best’ – means that the OLS estimator βˆ has minimum variance among the class of linear unbiased estimators; the Gauss–Markov theorem proves that the OLS estimator is best.
3.1 Assumptions Underlying LRMs
Consistency/Unbiasedness/Efficiency
Consistent
The least squares estimators αˆ and βˆ are consistent. That is, the estimates will converge to their true values as the sample size increases to infinity. Need the assumptions E(xtut) = 0 and V ar(ut) = σ2 < ∞ to prove this. Consistency implies that
lim Pr[|βˆ−β|>δ]=0 ∀δ>0 T→∞
3.1 Assumptions Underlying LRMs
Unbiased
The least squares estimates of αˆ and βˆ are unbiased. That is E(αˆ) = α and E(βˆ) = β. Thus on average the estimated value will be equal to the true values. To prove this also requires the assumption that E(ut) = 0. Unbiasedness is a stronger condition than consistency.
Efficiency
An estimator βˆ of parameter β is said to be efficient if it is unbiased and no other unbiased estimator has a smaller variance. If the estimator is efficient, we are minimising the probability that it is a long way off from the true value of β.
3.2 Precision and Standard Errors
3.2 Precision and Standard Errors
Precision and Standard Errors
Any set of regression estimates of α and β are specific to the sample used in their estimation
Recall that the estimators of α and β from the sample parameters (αˆ and βˆ) are given by
xtyt −Txy
βˆ= x2t−Tx ̄2 and αˆ=y ̄−βˆx ̄
3.2 Precision and Standard Errors
Precision and Standard Errors
What we need is some measure of the reliability or precision of the estimators (αˆ and βˆ). The precision of the estimate is given by its standard error. Given assumptions (1)–(4) above, then the standard errors are given by
SE(αˆ) = sT(xt−x ̄)2 =sTx2−Tx ̄2
ˆ11 SE(β) = s (xt−x ̄)2=s x2t−Tx ̄2
where s is the estimated standard deviation of the residuals.
3.2 Precision and Standard Errors
Estimating the Variance of the Disturbance Term
The variance of the random variable ut is given by V ar(ut) = E[(ut) − E(ut)]2
which reduces to
Var(ut) = E(u2t )
We could estimate this using the average of u2t :
s 2 = T1 u 2t
Unfortunately this is not workable since ut is not observable.
We can use the sample counterpart to ut, which is uˆt: s 2 = T1 uˆ 2t
But this estimator is a biased estimator of σ2.
3.2 Precision and Standard Errors
Estimating the Variance of the Disturbance Term (cont’d)
An unbiased estimator of σ is given by s = uˆ 2t
where uˆ2t is the residual sum of squares and T is the
sample size.
Some Comments on the Standard Error Estimators
1. Both SE(αˆ) and SE(βˆ) depend on s2 (or s). The greater the variance s2, the more dispersed the errors are about their mean value and therefore the more dispersed y will be about its mean value.
2. The sum of the squares of x about their mean appears in both formulae. The larger the sum of squares, the smaller the coefficient variances.
3.2 Precision and Standard Errors
Some Comments on the Standard Error Estimators
Consider what happens if (xt − x ̄)2 is small or large:
1. The larger the sample size, T, the smaller will be the coefficient variances. T appears explicitly in SE(αˆ) and implicitly in SE(βˆ).
3.2 Precision and Standard Errors (Cont’d)
T appears implicitly since the sum (xt − x ̄)2 is from t = 1 to T.
2. The term x2t appears in the SE(αˆ).
The reason is that x2t measures how far the points are away from the y-axis.
3.2 Precision and Standard Errors
Example: How to Calculate the Parameters and Standard Errors
Assume we have the following data calculated from a regression of y on a single variable x and a constant over 22 observations.
xtyt =830102,T=22,x ̄=416.5,y ̄=86.65,
x2t = 3919654, RSS = 130.6
Calculations
830102 − (22 × 416.5 × 86.65) = 0.35 3919654 − 22 × (416.5)2
86.65 − 0.35 × 416.5 = −59.12
3.2 Precision and Standard Errors (Cont’d)
We write yˆt = αˆ + βˆxt
yˆt = −59.12 + 0.35xt
SE(regression), s = uˆ2t = 130.6 = 2.55 T−2 20
22 × (3919654 − 22 × 416.52) = 3.35
SE(αˆ) = 2.55 × ˆ1
SE(β) = 2.55 × 3919654 − 22 × 416.52 = 0.0079 We now write the results as
yˆt = −59.12 + 0.35xt (3.35) (0.0079)
3.2 Precision and Standard Errors
To fit a simple linear regression model in R, you need the command
from the package stats.
In the following example, the Ford returns is the dependent and the SP500 returns is the independent variable. You also have to indicate the dataset you take the variables from (here, the dataset called capm). You can use the summary() command to print out your regression results, which also prints t-ratios and p-values.
3.2 Precision and Standard Errors
R example output of the lm() and summary() command:
3.3 Statistical Inference and Hypothesis Testing
3.3 Statistical Inference and Hypothesis Testing
An Introduction to Statistical Inference
We want to make inferences about the likely population values from the regression parameters.
Example: Suppose we have the following regression results: yˆt = 20.3 + 0.5091xt
(14.38) (0.2561)
βˆ = 0.5091 is a single (point) estimate of the unknown
population parameter, β. How “reliable” is this estimate?
The reliability of the point estimate is measured by the coefficient’s standard error.
3.3 Statistical Inference and Hypothesis Testing
Hypothesis Testing: Some Concepts
We can use the information in the sample to make inferences about the population.
We will always have two hypotheses that go together, the null hypothesis (denoted H0) and the alternative hypothesis (denoted H1).
The null hypothesis is the statement or the statistical hypothesis that is actually being tested. The alternative hypothesis represents the remaining outcomes of interest.
For example, suppose given the regression results above, we are interested in the hypothesis that the true value of β is in fact 0.5. We would use the notation
H0: β = 0.5 H1: β ̸= 0.5
This would be known as a two sided test.
3.3 Statistical Inference and Hypothesis Testing
One-Sided Hypothesis Tests
Sometimes we may have some prior information that, for example, we would expect β > 0.5 rather than β < 0.5. In this case, we would do a one-sided test:
H0: β = 0.5 H1: β < 0.5
or we could have had H0: β = 0.5
H1: β > 0.5
There are two ways to conduct a hypothesis test: via the test of significance approach or via the confidence interval approach.
3.3 Statistical Inference and Hypothesis Testing
The Probability Distribution of the Least Squares Estimators
We assume that ut ∼ N(0,σ2)
Since the least squares estimators are linear combinations of the random variables
i.e. βˆ=wtyt
The weighted sum of normal random variables is also normally distributed, so
αˆ ∼ N (α, V ar(α)) βˆ ∼ N(β,V ar(β))
What if the errors are not normally distributed? Will the parameter estimates still be normally distributed?
3.3 Statistical Inference and Hypothesis Testing (Cont’d)
Yes, if the other assumptions of the CLRM hold, and the sample size is sufficiently large.
Standard normal variates can be constructed from αˆ and βˆ:
βˆ−β ∼tT−2 and √ ˆ
∼ N(0,1) and √
But var(α) and var(β) are unknown, so
S E ( αˆ )
3.3 Statistical Inference and Hypothesis Testing
Testing Hypotheses: The Test of Significance Approach
Assume the regression equation is given by,
yt = α + βxt + ut for t = 1, 2, …, T
The steps involved in doing a test of significance are:
1. Estimate αˆ, βˆ and SE(αˆ), SE(βˆ) in the usual way
2. Calculate the test statistic. This is given by the formula
test statistic = βˆ−β∗ SE(βˆ)
where β∗ is the value of β under the null hypothesis.
3.3 Statistical Inference and Hypothesis Testing
3. We need some tabulated distribution with which to compare the estimated test statistics. Test statistics derived in this way can be shown to follow a t-distribution with T-2 degrees of freedom.
As the number of degrees of freedom increases, we need to be less cautious in our approach since we can be more sure that our results are robust.
4. We need to choose a “significance level”, often denoted α. This is also sometimes called the size of the test and it determines the region where we will reject or not reject the null hypothesis that we are testing. It is conventional to use a significance level of 5%.
Intuitive explanation is that we would only expect a result as extreme as this or more extreme 5% of the time as a consequence of chance alone.
Conventional to use a 5% size of test, but 10% and 1% are also commonly used.
3.3 Statistical Inference and Hypothesis Testing
Determining the Rejection Region for a Test of Significance
5. Given a significance level, we can determine a rejection region and non-rejection region. For a 2-sided test:
2.5% 95% non-rejection region rejection region
2.5% rejection region
3.3 Statistical Inference and Hypothesis Testing
The Rejection Region for a 1-Sided Test (Upper Tail)
95% non-rejection region
5% rejection region
3.3 Statistical Inference and Hypothesis Testing
The Rejection Region for a 1-Sided Test (Lower Tail)
5% rejection region
95% non-rejection region
3.3 Statistical Inference and Hypothesis Testing
The Test of Significance Approach: Drawing Conclusions
6. Use the t-tables to obtain a critical value or values with which to compare the test statistic.
7. Finally perform the test. If the test statistic lies in the rejection region then reject the null hypothesis (H0), else do not reject H0.
3.3 Statistical Inference and Hypothesis Testing
A Note on the t and the Normal Distribution
You should all be familiar with the normal distribution and its
characteristic “bell” shape.
We can scale a normal variate to have zero mean and unit variance by subtracting its mean and dividing by its standard deviation.
There is, however, a specific relationship between the t- and the standard normal distribution. Both are symmetrical and centred on zero. The t-distribution has another parameter, its degrees of freedom. We will always know this (for the time being from the number of observations −2).
3.3 Statistical Inference and Hypothesis Testing
What Does the t -Distribution Look Like? f(x)
normal distribution
t-distribution
3.3 Statistical Inference and Hypothesis Testing
Comparing the t and the Normal Distribution
In the limit, a t-distribution with an infinite number of degrees of freedom is a standard normal, i.e. t(∞) = N (0, 1)
Examples from statistical tables:
Significance level N(0,1) t(40) t(4) 50% 0 0 0
5% 1.64 2.5% 1.96 0.5% 2.57
1.68 2.13 2.02 2.78 2.70 4.60
The reason for using the t-distribution rather than the standard normal is that we had to estimate σ2, the variance of the disturbances.
3.3 Statistical Inference and Hypothesis Testing
The Confidence Interval Approach to Hypothesis Testing
An example of its usage: We estimate a parameter, say to be 0.93, and a “95% confidence interval” to be (0.77, 1.09). This means that we are 95% confident that the interval containing the true (but unknown) value of β.
Confidence intervals are almost invariably two-sided, although in theory a one-sided interval can be constructed.
3.3 Statistical Inference and Hypothesis Testing
How to Carry out a Hypothesis Test Using Confidence Intervals
1. Calculate αˆ, βˆ and SE(αˆ), SE(βˆ) as before.
2. Choose a significance level, α, (again the convention is 5%). This is equivalent to choosing a (1-α)×100% confidence interval, i.e. 5% significance level = 95% confidence interval
3. Use the t-tables to find the appropriate critical value, which will again have T-2 degrees of freedom.
4. The confidence interval is given by
(βˆ − tcrit × SE(βˆ), βˆ + tcrit × SE(βˆ))
5. Perform the test: If the hypothesised value of β (β∗) lies outside the confidence interval, then reject the null hypothesis that β = β∗, otherwise do not reject the null.
3.3 Statistical Inference and Hypothesis Testing
Confidence Intervals Versus Tests of Significance
Note that the Test of Significance and Confidence Interval approaches always give the same answer.
Under the test of significance approach, we would not reject H0 that β = β∗ if the test statistic lies within the non-rejection region, i.e. if
−tcrit ≤ βˆ−β∗ ≤ +tcrit S E ( βˆ )
Rearranging, we would not reject if
−tcrit × SE(βˆ) ≤ βˆ − β∗ ≤ +tcrit × SE(βˆ))
βˆ−tcrit ×SE(βˆ)≤β∗ ≤βˆ+tcrit ×SE(βˆ))
But this is just the rule under the confidence interval approach.
3.3 Statistical Inference and Hypothesis Testing
Constructing Tests of Significance and Confidence Intervals: An Example
Using the regression results above,
yˆt =20.3+0.5091xt , T =22
(14.38) (0.2561)
Using both the test of significance and confidence interval approaches, test the hypothesis that β = 1 against a two-sided alternative.
The first step is to obtain the critical value. We want tcrit = t20;5%
3.3 Statistical Inference and Hypothesis Testing
Determining the Rejection Region
2.5% rejection region
95% non-rejection region
2.5% rejection region
+2.086 x 39/149
3.3 Statistical Inference and Hypothesis Testing
Performing the Test
The hypotheses are: H0 : β = 1
H1 : β ̸= 1
Test of significance approach
test stat = βˆ−β∗ S E ( βˆ )
= 0.5091 − 1 = −1.917 0.2561
Do not reject H0 since test statistic lies within non-rejection region
Confidence interval approach
Find tcrit = t20;5% = ±2.086
βˆ ± tcrit · SE(βˆ)
= 0.5091 ± 2.086 · 0.2561 = (−0.0251, 1.0433)
Do not reject H0 since 1 lies within the confidence interval
3.3 Statistical Inference and Hypothesis Testing
Testing other Hypotheses
What if we wanted to test H0: β = 0 or H0: β = 2?
Note that we can test these with the confidence interval
For interest (!), test
H0: β = 0 vs. H1:β̸=0
H0: β = 2 vs. H1:β̸=2
3.3 Statistical Inference and Hypothesis Testing
Changing the Size of the Test
But note that we looked at only a 5% size of test. In marginal cases (e.g. H0: β = 1), we may get a completely different answer if we use a different size of test. This is where the test of significance approach is better than a confidence interval.
For example, say we wanted to use a 10% size of test. Using the test of significance approach,
βˆ−β∗ S E ( βˆ )
test stat =
= 0.5091 − 1 = −1.917
as above. The only thing that changes is the critical t-value.
3.3 Statistical Inference and Hypothesis Testing
Changing the Size of the Test: The New Rejection Regions
3.3 Statistical Inference and Hypothesis Testing
Changing the Size of the Test: The Conclusion
t20;10% = 1.725. So now, as the test statistic lies in the rejection region, we would reject H0.
Caution should therefore be used when placing emphasis on or making decisions in marginal cases (i.e. in cases where we only just reject or not reject).
3.3 Statistical Inference and Hypothesis Testing
Error sources
We usually reject H0 if the test statistic is statistically significant at a chosen significance level.
There are two possible errors we could make:
1. Rejecting H0 when it was really true. This is called a type I
2. Not rejecting H0 when it was in fact false. This is called a type II error.
H0 is false
Result of test
Significant (reject H0)
Insignificant
(do not reject H0)
H0 is true
Type I error = α
Type II error = β
3.3 Statistical Inference and Hypothesis Testing
The Trade-off Between Type I and Type II Errors
The probability of a type I error is α, the significance level or size of test we chose.
What happens if we reduce the size of the test (e.g. from a 5% test to a 1% test)? We reduce the chances of making a type I error. But we also reduce the probability that we will reject the null hypothesis at all, so we increase the probability of a type II error:
Less likely
to falsely Reduce size →More strict →Reject null↗reject
Lower →chance of
type I error
of test (e.g. criterion for hypothesis↘
5% to 1%) rejection less often More likely to Higher
incorrectly →chance of not reject type II error
3.3 Statistical Inference and Hypothesis Testing
The Trade-off Between Type I and Type II Errors
So there is always a trade off between type I and type II errors when choosing a significance level. The only way we can reduce the chances of both is to increase the sample size.
3.4 t-Test and t-ratio
3.4 t-Test and t-ratio
A Special Type of Hypothesis Test: The t-ratio
Recall that the formula for a test of significance approach to hypothesis testing using a t-test was
test statistic = βˆi−βi∗ S E ( βˆ )
Ifthetestis
i.e. a test that the population coefficient is zero against a
two-sided alternative, this is known as a t-ratio test:
The ratio of the coefficient to its SE is known as the t-ratio or t-statistic.
H0: βi =0 H1: βi ̸= 0
Since β∗ = 0, test stat = βi
i S E ( βˆ )
3.4 t-Test and t-ratio The t-ratio: An Example
Suppose that we have the following parameter estimates, standard errors and t-ratios for an intercept and slope respectively.
Do we reject H0: β1 = 0? (No) H0: β2 = 0? (Yes)
Coefficient 1.10 -4.40 SE 1.35 0.96 t-ratio 0.81 -4.63
Compare this with a tcrit with 15-2 (212% in each tail for a 5% test)
= 2.160 5% = 3.012 1%
3.4 t-Test and t-ratio
What Does the t-ratio tell us?
If we reject H0, we say that the result is significant. If the coefficient is not “significant” (e.g. th