CS计算机代考程序代写 scheme finance MFIN6201

MFIN6201
Empirical Techniques and Applications in
Finance
ONBOARD
Week 3
pwMtMmgHllllBk0
February 17, 2019

Summary of Lecture 2
• The sample average, Y ̄, is an estimator of the population mean, μY . When Y1, …, Yn are i.i.d.
– The sampling distribution of Y ̄ has mean μY and variance 2 ̄ = Y2 / n
Y
– Y ̄ is unbiased
– by the law of large numbers, Y ̄ is consistent
– by the central limit theorem, Y ̄ has an approximately normal
sampling distribution when the sample size is large
• The t-statistics is used to test the null hypothesis that the population mean takes on a particular value. If n is large, the t-statistics has a standard normal sampling distribution when the null hypothesis is true
• The t-statistics can be used to calculate the p-value associated with the null hypothesis. A small p-value is evidence that the null hypothesis is likely to be false
1

• A 95% confidence interval for μY is an interval constructed so that it contains the true value of μY in 95% of all samples
• Hypothesis tests and confidence intervals for the di↵erence in the means of two populations are conceptually similar to tests and intervals for the mean of a single population
• The sample correlation coecient is an estimator of the population correlation coecient and measures the linear relationship between two variables – that is how well their scatterplot is approximately by a straight line.
MFIN6201 – Empirical Techniques and Applications in Finance

Linear Regression with One Regressor
Outlines
• The population linear regression model
• The ordinary least squares (OLS) estimator and the sample
regression line
• Measures of fit of the sample regression
• The least squares assumptions
• The sampling distribution of the OLS estimator
MFIN6201 – Empirical Techniques and Applications in Finance
2

Linear Regression with One Regressor
Linear regression lets us estimate the slope of the population regression line
• The slope of the population regression line is the expected e↵ect on Y of a unit change in X.
• Ultimately our aim is to estimate the causal e↵ect on Y of a unit change in X – but for now, just think of the problem of fitting a straight line to data on two variables, Y and X.
MFIN6201 – Empirical Techniques and Applications in Finance
3

Linear Regression with One Regressor
The problem of statistical inference for linear regression is, at a general level, the same as for estimation of the mean or of the di↵erences between two means. Statistical, or econometric, inference about the slope entails:
• Estimation:
– How should we draw a line through the data to estimate the
population slope?
– Answer: ordinary least squares (OLS).
– What are advantages and disadvantages of OLS?
• Hypothesis testing:
– How to test if the slope is zero?
• Confidence intervals:
– How to construct a confidence interval for the slope?
MFIN6201 – Empirical Techniques and Applications in Finance
4

The Linear Regression Model (SW Section 4.1)
The population regression line: Test Score=0+1STR
1 = slope of population regression line = Test Score
STR
= change in test score for a unit change in STR
• Why are 0 and 1 “population” parameters?
• We would like to know the population value of 1. • We don’t know 1, so must estimate it using data.
MFIN6201 – Empirical Techniques and Applications in Finance
5

The Population Linear Regression Model
Yi = 0 +1Xi +ui,i = 1,…,n
• We have n observations, (Xi, Yi), i = 1, .., n.
• X is the independent variable or regressor
• Y is the dependent variable
• 0 = intercept
• 1 = slope
• ui = the regression error
• The regression error consists of omitted factors. In general, these omitted factors are other factors that influence Y, other than the variable X. The regression error also includes error in the measurement of Y.
MFIN6201 – Empirical Techniques and Applications in Finance
6

The Population Linear Regression Model
The population regression model in a picture: Observations on Y and X (n = 7); the population regression line; and the regression error (the “error term”):
MFIN6201 – Empirical Techniques and Applications in Finance
7

The Ordinary Least Squares Estimator
How can we estimate 0 and 1 from data? Recall that Y ̄ was the least squares estimator of μY : solves,
Xn 2 min (Yi m)
By analogy, we will focus on the least squares (“ordinary least squares” or “OLS”) estimator of the unknown parameters 0 and 1. The OLS estimator solves,
minb0,b1 Pni=1[Yi (b0 + b1Xi)]2 MFIN6201 – Empirical Techniques and Applications in Finance
8
m i=1

Mechanics of OLS
Thepopulationregressionline: TestScore=0+1STR 1 = Test Score
STR
MFIN6201 – Empirical Techniques and Applications in Finance
9

The OLS estimator solves:
Xn 2 min (Yi m)
• The OLS estimator minimizes the average squared di↵erence between the actual values of Yi and the prediction (“predicted value”) based on the estimated line.
• This minimization problem can be solved using calculus (Appendix 4.2).
• The result is the OLS estimators of 0 and 1.
m i=1
MFIN6201 – Empirical Techniques and Applications in Finance
10

OLS estimates – Key Concepts
MFIN6201 – Empirical Techniques and Applications in Finance
11

Application to the California Test Score example
• Estimated slope = ˆ = 2.28 1
• Estimated intercept = ˆ = 698.9 0
• Estimated regression line: Test Score = 698.9 2.28 ⇥ ST R
MFIN6201 – Empirical Techniques and Applications in Finance
12

Interpretation of the estimated slope and intercept
Test Score=698.92.28⇥STR
• Districts with one more student per teacher on average have
test scores that are 2.28 points lower.
• That is, Test Score = 2.28
• The intercept (taken literally) means that, according to this estimated line, districts with zero students per teacher would have a (predicted) test score of 698.9.
• But this interpretation of the intercept makes no sense, it extrapolates the line outside the range of the data
STR
• the intercept is not economically meaningful. MFIN6201 – Empirical Techniques and Applications in Finance
13

Predicted values & residuals:
One of the districts in the data set is Antelope, CA, for which STR = 19.33 and Test Score = 657.8
predicted value: Yˆ = 698.9 2.28 ⇥ 19.33 = 654.8 Antelope
residual: uˆAntelope = 657.8 654.8 = 3.0 MFIN6201 – Empirical Techniques and Applications in Finance
14

OLS regression: STATA output
MFIN6201 – Empirical Techniques and Applications in Finance
15

Measures of Fit (Section 4.3)
Two regression statistics provide complementary measures of how well the regression line “fits” or explains the data:
• The regression R2 measures the fraction of the variance of Y that is explained by X; it is unitless and ranges between zero (no fit) and one (perfect fit)
• The standard error of the regression (SER) measures the magnitude of a typical regression residual in the units of Y.
MFIN6201 – Empirical Techniques and Applications in Finance
16

The regression R2
is the fraction of the sample variance of Yi “explained” by the
regression.
Y = Yˆ + uˆ = OLS prediction + OLS residual
iii
! sample var (Y) = sample var(Yˆ) + sample var(uˆi) (why?)
! total sum of squares = “explained” SS + “residual” SS
P P n ( Yˆ E ( Yˆ ) ) 2 Definition of R2: R2 = ESS = i=1 i
TSS ni=1(YiE(Y ))2 • R2=0meansESS=0
• R2 = 1 means ESS = TSS, 0  R2  1
• For regression with a single X, R2 = the square of the
correlation coecient between X and Y
MFIN6201 – Empirical Techniques and Applications in Finance
17

The Standard Error of the Regression (SER)
The SER measures the spread of the distribution of u. The SER is (almost) the sample standard deviation of the OLS residuals:
vut1Xn ̄2 SER= n2 (uˆiuˆ)
v i=1
ut 1 Xn 2
= n2 uˆi i=1
1 Pn
The second equality holds because uˆ = n i=1 uˆi = 0
1
S E R = vu ut n 2
̄
Xn i=1
2 uˆ i
MFIN6201 – Empirical Techniques and Applications in Finance
18

Root Mean Square Errors (RMSE)
The SER: has the units of u, which are the units of Y measures the average “size” of the OLS residual (the average “mistake” made by the OLS regression line) The root mean squared error (RMSE) is closely related to the SER:
vut1 Xn RMSE= n uˆi
This measures the same thing as the SER – the minor di↵erence is division by 1/n instead of 1/(n-2).
2 i=1
MFIN6201 – Empirical Techniques and Applications in Finance
19

Technical note:
Why divide by n-2 instead of n-1?
1
S E R = vu ut n 2
• Division by n-2 is a “degrees of freedom” correction – just like
division by n-1 in , except that for the SER, two parameters
have been estimated ( and , by ˆ and ˆ ), whereas in S2 0101 Y
only one has been estimated (μY , by Y ̄).
• When n is large, it does not matter whether n, n-1, or n-2 are used – although the conventional formula uses n-2 when there is a single regressor.
• For details, see SW Section 17.4 MFIN6201 – Empirical Techniques and Applications in Finance
20
Xn i=1
2 uˆ i

Application to the California Test Score example
• Estimated Test Score = 698.9 -2.28⇥ STR, R2 = 0.05, SER = 18.6
• STR explains only a small fraction of the variation in test scores
• Does this make sense?
• Does this mean the STR is unimportant in a policy sense?
MFIN6201 – Empirical Techniques and Applications in Finance
21

R2
• In social science, low R-squared is regression equations are not
uncommon especially in cross-sectional analysis
• A seemingly low R2 does not necessarily mean that an OLS regression equation is useless
• It is still possible that the econometric model has a good estimate of the relationship between the outcome(dependent) and treatment (independent) variables
• Whether this is true or not does not depend on the size of the R2. One should look at the economic significance of the estimates for policy implication
• Do not put too much weight on the size of the R2 in evaluating regression equations
MFIN6201 – Empirical Techniques and Applications in Finance
22

The Least Squares Assumptions (SW Section 4.4)
• What, in a precise sense, are the properties of the sampling distribution of the OLS estimator?
• When will the estimator be unbiased?
• What is its variance?
• To answer these questions, we need to make some assumptions about how Y and X are related to each other, and about how they are collected (the sampling scheme)
• These assumptions – there are three – are known as the Least Squares Assumptions.
MFIN6201 – Empirical Techniques and Applications in Finance
23

The Least Squares Assumptions
Yi = 0 +1Xi +ui,i = 1,…,n
• The conditional distribution of u given X has mean zero, that
is,E(u|X=x)=0. ˆ
– This implies that 1 is unbiased
• (Xi,Yi), i =1,…,n, are i.i.d.
– This is true if (X, Y) are collected by simple random
sampling
– This delivers the sampling distribution of ˆ and ˆ 01
• Large outliers in X and/or Y are rare.
– Technically, X and Y have finite fourth moments
– Outliers can result in meaningless values of ˆ 1
MFIN6201 – Empirical Techniques and Applications in Finance
24

Least squares assumption #1: E(u|X = x) = 0 For any given value of X, the mean of u is zero:
Example: Test Scorei = 0 + 1ST Ri + ui, ui = other factors • What are some of these “other factors”?
• Is E(u|X = x) = 0 plausible for these other factors? MFIN6201 – Empirical Techniques and Applications in Finance
25

Least squares assumption #1, ctd.
• A benchmark for thinking about this assumption is to consider an ideal randomized controlled experiment:
• X is randomly assigned to people (students randomly assigned to di↵erent size classes; patients randomly assigned to medical treatments). Randomization is done by computer – using no information about the individual.
• Because X is assigned randomly, all other individual characteristics – the things that make up u – are distributed independently of X, so u and X are independent
• Thus, in an ideal randomized controlled experiment, E(u|X = x) = 0 (that is, LSA #1 holds)
• In actual experiments, or with observational data, we will need to think hard about whether E(u|X = x) = 0 holds.
MFIN6201 – Empirical Techniques and Applications in Finance
26

Least squares assumption #1, ctd.
• The assumption implies that the unconditional mean of the population values of the random error term u equals zero:
Xn
E(u)=
This implication follows from the so-called law of iterated
E(u|X =xi)p(X =xi)=0 expectations, which states that E[E(u|X)] = E(u)
i=1
• The assumption also implies that the population values Xi of the regressor X and ui of the random error term u have zero covariance – i.e., the population values of X and u are uncorrelated:
cov(X, u) = E[(XE(X))(uE(u))] = E(Xu)E(X)E(u) = E(Xu) By the law of iterated expectations
E(Xu) = E[E(Xu|X)] = E[E(u|X)X] = 0 MFIN6201 – Empirical Techniques and Applications in Finance
27

Least squares assumption #2
(Xi,Yi), i = 1,…,n are i.i.d.
This arises automatically if the entity (individual, district) is
sampled by simple random sampling:
• The entities are selected from the same population, so (Xi,Yi) are identically distributed for all i = 1,…,n.
• The entities are selected at random, so the values of (X, Y) for di↵erent entities are independently distributed.
The main place we will encounter non-i.i.d. sampling is when data are recorded over time for the same entity (panel data and time series data) – we will deal with that complication when we cover panel data.
MFIN6201 – Empirical Techniques and Applications in Finance
28

LSA #3: Large outliers are rare
Technical statement: E(X4) < 1 and E(Y 4) < 1 • A large outlier is an extreme value of X or Y • On a technical level, if X and Y are bounded, then they have finite fourth moments. (Standardized test scores automatically satisfy this; STR, family income, etc. satisfy this too.) • To justify large-sample approximation. i.e. the assumption that consistency for sample variance with LLN • The substance of this assumption is that a large outlier can strongly influence the results - so we need to rule out large outliers. • Look at your data! If you have a large outlier, is it a typo? Does it belong in your data set? Why is it an outlier? MFIN6201 - Empirical Techniques and Applications in Finance 29 OLS can be sensitive to an outlier: • Is the lone point an outlier in X or Y? • In practice, outliers are often data glitches (coding or recording problems). Sometimes they are observations that really shouldn’t be in your data set. • Plot your data! MFIN6201 - Empirical Techniques and Applications in Finance 30 OLS Estimator Sampling Distribution The OLS estimator is computed from a sample of data. A di↵erent sample yields a di↵erent value of ˆ . This is the source of 1 the “sampling uncertainty” of ˆ . We want to: 1 • quantify the sampling uncertainty associated with ˆ 1 • use ˆ to test hypotheses such as = 0 11 • construct a confidence interval for 1 • All these require figuring out the sampling distribution of the OLS estimator. Two steps to get there... – Probability framework for linear regression – Distribution of the OLS estimator MFIN6201 - Empirical Techniques and Applications in Finance 31 Probability Framework for Linear Regression The probability framework for linear regression is summarized by the three least squares assumptions. Population - The group of interest (ex: all possible school districts) Random variables: Y, X - Ex: (Test Score, STR) Joint distribution of (Y, X). We assume: • The population regression function is linear • E(u|X) = 0 (1st Least Squares Assumption) • X, Y have nonzero finite fourth moments (3rd L.S.A.) Data Collection by simple random sampling implies: • {(Xi,Yi)}, i = 1,..., n, are i.i.d. (2nd L.S.A.) MFIN6201 - Empirical Techniques and Applications in Finance 32 The Sampling Distribution of ˆ 1 • Like Y ̄ , ˆ has a sampling distribution. 1 • What is E(ˆ )? 1 – If E(ˆ )= , then OLS is unbiased - a good thing! 11 • What is var(ˆ )? (measure of sampling uncertainty) 1 – We need to derive a formula so we can compute the standard error of ˆ • What is the distribution of ˆ in small samples? 1 1 – It is very complicated in general • What is the distribution of pn(ˆ ) in large samples? – In large samples, pn(ˆ ) is normally distributed. 11 11 MFIN6201 - Empirical Techniques and Applications in Finance 33 Mean and Variance of the sampling distribution of ˆ 1 Some preliminary algebra: Thus, Pn ˆ = i=1 1 ̄ ̄ P(Yi Y )(Xi X) Yi = 0 +1Xi +ui Y ̄ = 0 + 1 X ̄ + u ̄ Y i Y ̄ = 1 ( X i X ̄ ) + ( u i u ̄ ) ni = 1 ( X i X ̄ ) 2 Pn ( (XPX ̄)+(u u ̄))(X X ̄) = i=1 1iii ni = 1 ( X i X ̄ ) 2 ˆ = Pni=1(Xi X ̄)(Xi X ̄) + Pni=1(ui u ̄)(Xi X ̄) 1 1 Pni=1(Xi X ̄)2 Pni=1(Xi X ̄)2 34 so Pn ̄ P(Xi X)(ui u ̄) Now, i=1ni=1(Xi X ̄)2 Xn ̄ Xn ̄ Xn ̄ ˆ = 11 (Xi X)(ui u ̄) = (Xi X)ui (Xi X)u ̄ i=1 i=1 i=1 nn = X X ( X i X ̄ ) u i [ ( X X i ) n X ̄ ] u ̄ Substitute Pni=1(Xi X ̄)(ui u ̄) = Pni=1(Xi X ̄)ui into the expression ˆ : 11 i=1 i=1 n ̄ = (Xi X)ui i=1 ˆ = 11 ̄ P(Xi X)(ui u ̄) Pn i=1ni=1(Xi X ̄)2 Pn (XX ̄)u ˆPii 11= i=1 2 ni = 1 ( X i X ̄ ) MFIN6201 - Empirical Techniques and Applications in Finance E(ˆ ) and var(ˆ ): 11 Pn (Xi X ̄)(ui) E ( ˆ ) = E Pi = 1 11n ̄2 P (Xi X) i=1 = E8