CS计算机代考程序代写 finance Excel MFIN6201

MFIN6201
Empirical Techniques and Applications in Finance
Week 4
NMMMBqpp.gg
Dr. Jaehoon Lee
School of Banking and Finance University of New South Wales
e-mail: jaehoon.lee@unsw.edu.au
ommmmrwaTtTte
Semester 2, 2017
mammonism
Last update: 15 August 2017

Summary of the last week
• Population regression model
Yi = 0 + 1Xi + ui, i = 1, · · · , n
• Least Squares Assumptions
– Assumption #1. E (u | X = x) = 0
– Assumption #2. (Xi,Yi), i = 1,··· ,n are i.i.d.
– Assumption #3. Large outliers in X and/or Y are rare.
MFIN6201 – Empirical Techniques and Applications in Finance
1

Summary of the last week
• Population parameter
1 = cov(X, Y )
var(X)
• Sample OLS estimator
ˆ = s X Y
1 s2X • Goodness of fit: R2, SER, RMSE
MFIN6201 – Empirical Techniques and Applications in Finance
2

• Mean
• Variance
Summary of the last week
E ( ˆ ) = 11
v a r ⇣ ˆ ⌘ = 1 v a r [ ( X i μ X ) u i ] 1 n [var(Xi)]2
• Asymptotic distribution ! ˆ⇠N ,1var[(XiμX)ui]
1 1 n [var(Xi)]2 MFIN6201 – Empirical Techniques and Applications in Finance
3

Regression with a Single Regressor
Hypothesis Tests and Confidence Intervals
• The standard error of ˆ 1
• Hypothesis tests concerning 1
• Confidence intervals for 1
• Regression when X is binary
• Heteroskedasticity and homoskedasticity
• Eciency of OLS and the Student t distribution
MFIN6201 – Empirical Techniques and Applications in Finance
4

A big picture review of where we are going
We want to learn about the slope of the population regression line. We have data from a sample, so there is sampling uncertainty. There are five steps towards this goal.
1. State the population object of interest
2. Provide an estimator of this population object
3. Derive the sampling distribution of the estimator (this requires certain assumptions). In large samples this sampling distribution will be normal by the CLT.
4. The square root of the estimated variance of the sampling distribution is the standard error (SE) of the estimator
5. Use the SE to construct t-statistics (for hypothesis tests) and confidence intervals.
MFIN6201 – Empirical Techniques and Applications in Finance
5

Step 1–3 are learned from the previous week
1. State the population object of interest
Yi = 0 +1Xi +ui, i = 1,2,··· ,n
We are interested in 1 = Y/X, the response of Y for an autonomous change in X (causal e↵ect)
2. Provide an estimator of this population object ˆ = s X Y
1 s2X MFIN6201 – Empirical Techniques and Applications in Finance
6

Step 1–3 are learned from the previous week
3. Derive the sampling distribution of the estimator
To derive the large-sample distribution of ˆ , we make the 1
following assumptions.
• E(u|X = x) = 0
• (Xi,Yi), i = 1,2,··· ,n are i.i.d • Large outliers are rare
Under the assumptions, for n large, ˆ is approximately 1
distributed as
ˆ⇠N ,1var[(XiμX)ui]!
1 1 n [var(Xi)]2 MFIN6201 – Empirical Techniques and Applications in Finance
7

We have also seen step 4–5 already
4. The square root of the estimated variance of the sampling distribution is the standard error (SE) of the estimator
5. Use the SE to construct t-statistics (for hypothesis tests) and confidence intervals.
MFIN6201 – Empirical Techniques and Applications in Finance
8

Hypothesis Testing and the Standard Error of ˆ 1
The objective is to test a hypothesis, like 1 = 0, using data – to reach a tentative conclusion whether the (null) hypothesis is correct or incorrect.
General setup
Null hypothesis and two-sided alternative:
H0 :1 =1,0 vs. H1 :1 6=1,0 where 1,0 is the hypothesized value under the null.
Null hypothesis and one-sided alternative:
H0 :1 =1,0 vs. H1 :1 <1,0 MFIN6201 - Empirical Techniques and Applications in Finance 9 Hypothesis Testing and the Standard Error of ˆ 1 General approach: construct t-statistic, and compute p-value (or compare to the N(0,1) critical value) In general: t = estimator - hypothesized value standard error of the estimator where the SE of the estimator is the square root of an estimator of the variance of the estimator. MFIN6201 - Empirical Techniques and Applications in Finance 10 Hypothesis Testing and the Standard Error of ˆ 1 For example, • For testing the mean of Y, t= sY/pn 1 Y ̄ μ Y , 0 ˆ • For testing 1, where SE(ˆ ) = the square root of an estimator of the variance of the sampling distribution of ˆ 1 t=1 1,0 S E ( ˆ ) 1 MFIN6201 - Empirical Techniques and Applications in Finance 11 Formula for SE(ˆ ) 1 Recall the expression for the variance of ˆ : 1 v a r ( ˆ ) = 1 v a r [ ( X i μ X ) u i ] 1 n [var(Xi)]2 =1 v2 n hX2 i2 where vi ⌘ (Xi μX ) ui. MFIN6201 - Empirical Techniques and Applications in Finance 12 Formula for SE(ˆ ) 1 v2 and X2 are unknown population variances. Thus, they need to be replaced with sample variances. ˆ2 = 1 ⇥ estimator of v2 ˆ n h 2i2 1 estimator of X 1Pn ̄22 =1⇥n2 i=1 XiX uˆi hi and n 1Pnr ̄22 n i=1(XiX) SE(ˆ)⌘ ˆ2 1 ˆ 1 MFIN6201 - Empirical Techniques and Applications in Finance 13 Formula for SE(ˆ ) 1 This is a bit nasty, but • It is less complicated than it seems. • The numerator estimates var(v), and the denominator estimates [var(X)]2. • Why the degrees-of-freedom adjustment n 2? Because two coecients have been estimated (0 and 1). • Your statistics software has memorized this formula so you don’t need to compute on your own. MFIN6201 - Empirical Techniques and Applications in Finance 14 Summary of hypothesis testing • Null and alternative hypotheses H0 :1 =1,0 vs. H1 :1 6=1,0 • Construct the t-statistic ˆ t=1 1,0 S E ( ˆ ) 1 • Reject at 5% significance level if |t| > 1.96
• The p-value is p = P r[|t| > |tact|] = probability in tails of normal outside |tact|; you reject at the 5% significance level if the p-value is < 5%. • This procedure relies on the large-n approximation that ˆ is 1 normally distributed; typically n = 50 is large enough for the approximation to be excellent. MFIN6201 - Empirical Techniques and Applications in Finance 15 Example: Test Scores and STR in California MFIN6201 - Empirical Techniques and Applications in Finance 16 Example: Test Scores and STR in California Estimated regression line Test Score = 698.9 2.28 ⇥ STR Regression software reports the standard errors SE(ˆ ) = 10.4, SE(ˆ ) = 0.52 01 t-statistic testing that 1,0 = 0 ˆ 2.280 1 1,0 = = 4.38 SE(ˆ ) 0.52 1 The 1% 2-sided significance level is 2.58, so we reject the null at the 1% significance level. MFIN6201 - Empirical Techniques and Applications in Finance 17 Example: Test Scores and STR in California Alternatively, we can compute the p-value... The p-value based on the large-n standard normal approximation to the t-statistic is 0.00001 (105) MFIN6201 - Empirical Techniques and Applications in Finance 18 Confidence Intervals for 1 Recall that a 95% confidence is, equivalently: • The set of points that cannot be rejected at the 5% significance level • A set-valued function of the data (an interval that is a function of the data) that contains the true parameter value 95% of the time in repeated samples Because the t-statistic for 1 is N(0,1) in large samples, construction of a 95% confidence for 1 is just like the case of the sample mean: 95% confidence interval for = ˆ ± 1.96 ⇥ SE(ˆ ) 111 MFIN6201 - Empirical Techniques and Applications in Finance 19 Confidence interval example: Test Scores and STR Estimated regression line: Test Score = 698.9 2.28 ⇥ STR SE(ˆ ) = 10.4, SE(ˆ ) = 0.52 01 95% confidence interval for ˆ : 1 ˆ ±1.96⇥SE(ˆ)=2.28±1.96⇥0.52={3.3,1.26} 11 The following two statements are equivalent (why?) • The 95% confidence interval does not include zero; • The hypothesis 1 = 0 is rejected at the 5% level MFIN6201 - Empirical Techniques and Applications in Finance 20 A concise (and conventional) way to report regressions: Put standard errors in parentheses below the estimated coecients to which they apply. Test Score = 698.9 2.28 ⇥ STR, R2 = 0.05, SER = 18.6 (10.4) (0.52) This expression gives a lot of information: • The estimated regression line is Test Score = 698.9 2.28 ⇥ STR • The standard error of ˆ is 10.4 • The standard error of ˆ is 0.52 1 • The R2 is 0.05; the standard error of the regression is 18.6 0 MFIN6201 - Empirical Techniques and Applications in Finance 21 OLS regression: reading STATA output so, t(1 = 0) = 4.38 p-value = 0.000 (2-sided) TestScore = 698.9 2.28 ⇥ STR, ,R2 = 0.05,SER = 18.6 (10.4) (0.52) 95% 2-sided conf. interval for 1 is (-3.30, -1.26) MFIN6201 - Empirical Techniques and Applications in Finance 22 Summary of statistical inference about 0 and 1 Estimation: • OLS estimators ˆ and ˆ 01 • ˆ and ˆ have approximately normal sampling distributions in 01 large samples Testing: • H0 :1 =1,0 v.s. H1 :1 6=1,0 •t=(ˆ )/SE(ˆ) 1 1,0 1 • p-value = area under standard normal outside tact (large n) MFIN6201 - Empirical Techniques and Applications in Finance 23 Summary of statistical inference about 0 and 1 Confidence Intervals: • 95% confidence interval for is ˆ ± 1.96 ⇥ SE(ˆ ) 111 • This is the set of 1 that is not rejected at the 5% level • The 95% CI contains the true 1 in 95% of all samples. MFIN6201 - Empirical Techniques and Applications in Finance 24 Regression when X is Binary Sometimes a regressor is binary: •X= 8<1 •X= 8<1 if small class size :0 if not if male :0 if female 8<1 if treated (experimental drug) •X= :0 MFIN6201 - Empirical Techniques and Applications in Finance if not (placebo) 25 Regression when X is Binary Binary regressors are sometimes called “dummy” variables. So far, 1 has been called a “slope”, but that doesn’t make sense if X is binary. How do we interpret regression with a binary regressor? MFIN6201 - Empirical Techniques and Applications in Finance 26 Interpreting regressions with a binary regressor Yi =0+1Xi+ui, where X is binary (Xi = 0 or 1): When Xi = 0, • E(Yi|Xi = 0) = 0 When Xi = 1, • E(Yi|Xi = 1) = 0 + 1 Thus, 1 = E(Yi|Xi = 1) E(Yi|Xi = 0) = population di↵erence in group means MFIN6201 - Empirical Techniques and Applications in Finance 27 Example MFIN6201 - Empirical Techniques and Applications in Finance 28 Example Let Di =8<1 if STR20 :0 if STR>20
OLS regression
Test Score = 650 + 7.4 ⇥ D (1.3) (1.8)
Di↵erence in means
Y ̄ Y ̄ = 657.4650.0 = 7.4 small large
Standard error
vut s2s s2l s 19.42 17.92
n +n = 238 + 182 =1.8
SE=
MFIN6201 – Empirical Techniques and Applications in Finance
sl
29

Summary: regression when Xi is binary (0/1) Yi = 0 +1Xi +ui
• 0 =meanofY whenX=0
• 0 +1 =meanofY whenX=1
• 1 = di↵erence in group means, X =1 minus X =0
• SE(ˆ ) has the usual interpretation 1
• t-statistics, confidence intervals constructed as usual
• This is another way (an easy way) to do di↵erence-in-means
analysis
• The regression formulation is especially useful when we have additional regressors (as we will very soon)
MFIN6201 – Empirical Techniques and Applications in Finance
30

Heteroskedasticity and Homoskedasticity, stand errors
• What…?
• Consequences of homoskedasticity
• Implication for computing standard errors
What do these two terms mean? If var(u|X = x) is constant – that is, if the variance of the conditional distribution of u given X does not depend on X – then u is said to be homoskedastic. Otherwise, u is heteroskedastic
MFIN6201 – Empirical Techniques and Applications in Finance
31

Example: the comparison of means
Hetero/homoskedasticity in the case of a binary regressor • Standard error when group varvuiances are unequal:
S E = ut s 2s + s 2l ns nl
Standard error when group variances are equal: S E = s p s n1 + n1
sl Where s2p = (ns1)s2s +(nl1)s2l (SW, Sect 3.6)
ns+nl2
sp = “pooled estimator of 2” when l2 = s2
• Equal group variances = homoskedasticity
• Unequal group variances = heteroskedasticity MFIN6201 – Empirical Techniques and Applications in Finance
32

Heteroskedasticity in a picture
• E(u|X = x) = 0 (u satisfies Least Squares Assumption #1) • The variance of u does depend on X
MFIN6201 – Empirical Techniques and Applications in Finance
33

Heteroskedasticity in a picture
A real-data example from labor economics: average hourly earnings vs. years of education (data source: Current Population Survey):
Heteroskedastic or homoskedastic?
MFIN6201 – Empirical Techniques and Applications in Finance
34

Heteroskedastic or homoskedastic?
The class size data
MFIN6201 – Empirical Techniques and Applications in Finance
35

Homoskedasticity / Heteroskedasticity
So far we have (without saying so) assumed that u might be heteroskedastic
Recall the three least squares assumptions:
• E(u|X = x) = 0
• (Xi,Yi), i = 1,··· ,n are i.i.d. • Large outliers are rare
Heteroskedasticity and homoskedasticity concern var(u|X = x). Because we have not explicitly assumed homoskedastic errors, we have implicitly allowed for heteroskedasticity.
MFIN6201 – Empirical Techniques and Applications in Finance
36

What if the errors are in fact homoskedastic?
• You can prove that OLS has the lowest variance among estimators that are linear in Y . This result is called the Gauss-Markov theorem that we will return to shortly.
• The formula for the variance of ˆ and the OLS standard error 1
simplifies: If var(ui|Xi = x) = u2, then
var(ˆ ) = var[(Xi μx)ui] (general formula)
u2
• Along with this homoskedasticity-only formula for the variance
of ˆ , we have homoskedasticity-only standard errors. 1
1
n ( X2 ) 2
= nX2 (simplification if u is homoscedastic)
MFIN6201 – Empirical Techniques and Applications in Finance
37

Derivation of Homoskedasticity standard error
Assuming homoskedasticity,
Therefore,
var [(Xi μX ) ui] = E h{(Xi μX ) ui}2i =Eh(XiμX)2Ehu2i |Xii
= X2 u2
v a r ⇣ ˆ ⌘ = 1 v a r [ ( X i μ X ) u i ]
1 n [var(Xi)]2 = 1 u2
n X2
MFIN6201 – Empirical Techniques and Applications in Finance
38

What if the errors are in fact homoskedastic?
Homoskedasticity-only standard error formula
S E ( ˆ ) = vu ut 1 s 2u 1 ns2X
vu 1Pn2 =ut1⇥ n2 i=1uˆi
n n1 Pni=1(Xi X ̄)2
Some people (e.g. Excel programmers) find the homoskedasticity-only formula simpler – but it is wrong unless the errors really are homoskedastic.
MFIN6201 – Empirical Techniques and Applications in Finance
39

Two formulas for standard errors for ˆ 1
• Homoskedasticity-only standard errors – these are valid only if the errors are homoskedastic.
• The usual standard errors-to di↵erentiate the two, it is conventional to call these heteroskedasticity-robust standard errors, because they are valid whether or not the errors are heteroskedastic.
• Heteroskedasticity-robust standard errors are also called Eicker-Huber-White standard errors.
• The main advantage of the homoskedasticity-only standard errors is that the formula is simpler. But the disadvantage is that the formula is only correct if the errors are homoskedastic.
MFIN6201 – Empirical Techniques and Applications in Finance
40

Practical implications
• The homoskedasticity-only formula for the standard error of ˆ 1
and the “heteroskedasticity-robust” formula di↵er – so in general, you get di↵erent standard errors using the di↵erent formulas.
• Homoskedasticity-only standard errors are the default setting in regression software – sometimes the only setting (e.g. Excel). To get the general “heteroskedasticity-robust” standard errors you must override the default.
• If you don’t override the default and there is in fact heteroskedasticity, your standard errors (and t-statistics and confidence intervals) will be wrong – typically, homoskedasticity-only SEs are too small.
MFIN6201 – Empirical Techniques and Applications in Finance
41

Heteroskedasticity-robust standard errors in STATA
• If you use the “, robust” option, STATA computes heteroskedasticity-robust standard errors
• Otherwise, STATA computes homoskedasticity-only standard errors
MFIN6201 – Empirical Techniques and Applications in Finance
42

The bottom line
• If the errors are either homoskedastic or heteroskedastic and you use heteroskedastic-robust standard errors, you are OK
• If the errors are heteroskedastic and you use the
homoskedasticity-only formula for standard errors, your
standard errors will be wrong (the homoskedasticity-only
estimator of the variance of ˆ is inconsistent if there is 1
heteroskedasticity).
• The two formulas coincide (when n is large) in the special case of homoskedasticity
• So, it will be safer to use heteroskedasticity-robust standard errors.
MFIN6201 – Empirical Techniques and Applications in Finance
43

Some Additional Theoretical Foundations of OLS
We have already learned a very great deal about OLS: OLS is unbiased and consistent; we have a formula for heteroskedasticity-robust standard errors; and we can construct confidence intervals and test statistics. Also, a very good reason to use OLS is that everyone else does – so by using it, others will understand what you are doing. In e↵ect, OLS is the language of regression analysis, and if you use a di↵erent estimator, you will be speaking a di↵erent language
MFIN6201 – Empirical Techniques and Applications in Finance
44

Still, you may wonder…
• Is this really a good reason to use OLS? Aren’t there other estimators that might be better – in particular, ones that might have a smaller variance?
• Also, what happened to our old friend, the Student t distribution?
• So we will now answer these questions – but to do so we will need to make some stronger assumptions than the three least squares assumptions already presented.
MFIN6201 – Empirical Techniques and Applications in Finance
45

The Extended Least Squares Assumptions
• These consist of the three LS assumptions, plus two more: 1. E(u|X =x)=0.
2. (Xi,Yi), i = 1,··· ,n are i.i.d.
3. Large outliers are rare (E(Y 4) < 1, E(X4) < 1) 4. u is homoskedastic 5. u is distributed N(0,2) • Assumptions 4 and 5 are more restrictive - so they apply to fewer cases in practice. However, certain mathematical calculations simplify and you can prove stronger results that hold if these additional assumptions are true. • We start with a discussion of the eciency of OLS MFIN6201 - Empirical Techniques and Applications in Finance 46 Eciency of OLS: part 1 The Gauss-Markov Theorem • Under extended LS assumptions 1-4 (the basic three, plus homoskedasticity), ˆ has the smallest variance among all 1 linear estimators (estimators that are linear functions of Y1, · · · , Yn). This is the Gauss-Markov theorem. • Comments: The GM theorem is proven in SW Appendix 5.2 MFIN6201 - Empirical Techniques and Applications in Finance 47 The Gauss-Markov Theorem • ˆ is a linear estimator, that is, it can be written as a linear 1 function of Y1,··· ,Yn: P ˆ = P i = 1 = w u n (XiX ̄)ui 1Xn 1 1 n(XiX ̄)2 n ii i=1 i=1 i n1 ni = 1 ( X i X ̄ ) 2 where w = P(XiX ̄) • The G-M theorem says that among all possible choices of wi, the OLS weights yield the smallest var(ˆ ) 1 MFIN6201 - Empirical Techniques and Applications in Finance 48 Eciency of OLS, part II • Under all five extended LS assumptions - including normally distributed errors - ˆ has the smallest variance of all 1 consistent estimators (linear or nonlinear functions of Y1,··· ,Yn), as n ! 1. • This is a pretty amazing result - it says that, if (in addition to LSA 1-3) the errors are homoskedastic and normally distributed, then OLS is a better choice than any other consistent estimator. And because an estimator that isn’t consistent is a poor choice, this says that OLS really is the best you can do if all five extended LS assumptions hold. (The proof of this result is beyond the scope of this course and isn’t in SW it is typically done in graduate courses.) MFIN6201 - Empirical Techniques and Applications in Finance 49 Some not-so-good thing about OLS The foregoing results are impressive, but these results and the OLS estimator have important limitations. • The GM theorem really isn’t that compelling: – The condition of homoskedasticity often doesn’t hold (homoskedasticity is special) – The result is only for linear estimators - only a small subset of estimators (more on this in a moment) • The strongest optimality result (“part II” above) requires homoskedastic normal errors - not plausible in applications MFIN6201 - Empirical Techniques and Applications in Finance 50 Some not-so-good thing about OLS • OLS is more sensitive to outliers than some other estimators. In the case of estimating the population mean, if there are big outliers, then the median is preferred to the mean because the median is less sensitive to outliers - it has a smaller variance than OLS when there are outliers. Similarly, in regression, OLS can be sensitive to outliers, and if there are big outliers other estimators can be more ecient (have a smaller variance). One such estimator is the least absolute deviations (LAD) estimator: Xn min | Yi (b0 +b1Xi) | b0,b1 i=1 • In virtually all applied regression analysis, OLS is used - and that is what we will do in this course too. MFIN6201 - Empirical Techniques and Applications in Finance 51 Inference if u is homoskedastic and normally distributed Recall the five extended LS assumptions: 1. E(u|X =x)=0. 2. (Xi,Yi), i = 1,··· ,n are i.i.d. 3. Large outliers are rare (E(Y 4) < 1, E(X4) < 1) 4. u is homoskedastic 5. u is distributed N(0,2) MFIN6201 - Empirical Techniques and Applications in Finance 52 Inference if u is homoskedastic and normally distributed If all five assumptions hold, then: • ˆ and ˆ are normally distributed for all n 01 • the t-statistic has a Student t distribution with n 2 degrees of freedom - this holds exactly for all n MFIN6201 - Empirical Techniques and Applications in Finance 53 Normality of the sampling distribution of under 1-5: where wi = Pn(XiX ̄) i = 1 ( X i X ̄ ) Pn (XiX ̄)ui ˆ = Pi=1 (1) = n wi ui (2) i=1 2. What is the distribution of a weighted 11n ̄2 X (XiX) i=1 1n average of normals? Under assumptions 1 - 5: ˆ 1 Xn 2 ! 2 ! 11⇠N 0,n2 wi u (*) (3) i=1 Substituting wi into (*) yields the homoskedasticity-only variance formula. MFIN6201 - Empirical Techniques and Applications in Finance 54 Normality of the sampling distribution of under 1-5: In addition, under assumptions 1-5, under the null hypothesis the t-statistic has a Student t distribution with n 2 degrees of freedom • Why n 2? because we estimated 2 parameters, 0 and 1 • For n < 30, the t critical values can be a fair bit larger than the N(0,1) critical values • For n > 50 or so, the di↵erence in tn2 and N(0,1)
distributions is negligible. Recall the Student t table:
MFIN6201 – Empirical Techniques and Applications in Finance
55

Practical implication
• If n < 50 and you really believe that, for your application, u is homoskedastic and normally distributed, then use the tn2 instead of the N(0,1) critical values for hypothesis tests and confidence intervals. • In most econometric applications, there is no reason to believe that u is homoskedastic and normal - usually, there are good reasons to believe that neither assumption holds. • Fortunately, in modern applications, n > 50, so we can rely on the large-n results presented earlier, based on the CLT, to perform hypothesis tests and construct confidence intervals using the large-n normal approximation.
MFIN6201 – Empirical Techniques and Applications in Finance
56

Summary and Assessment
• The initial policy question: Suppose new teachers are hired so the student-teacher ratio falls by one student per class. What is the e↵ect of this policy intervention (“treatment”) on test scores?
• Does our regression analysis using the California data set answer this convincingly? Not really – districts with low STR tend to be ones with lots of other resources and higher income families, which provide kids with more learning opportunities outside schoolthis suggests that corr(ui, ST Ri) > 0, so
E(ui|Xi) 6= 0.
• It seems that we have omitted some factors, or variables, from our analysis, and this has biased our results…
• In the next lecture, you will be introduced to multivariate regression to address the issue of omitted variable bias.
MFIN6201 – Empirical Techniques and Applications in Finance
57

Practice questions
• Try question 5.1, 5.4, 5.7, 5.13.
• Answers will be provided next week. • This is not an assessment.
• They are just for practice.
MFIN6201 – Empirical Techniques and Applications in Finance
58