MFIN6201
Empirical Techniques and Applications in Finance
Week 2
Breastfeeds
University of New South Wales
e-mail: jaehoon.lee@unsw.edu.au
Dr. Jaehoon Lee
School of Banking and Finance
tg§RBM3T*q
Semester 2, 2017
Last update: 2 August 2017
Review of Statistical Theory
• The probability framework for statistical inference • Estimation
• Hypothesis testing
• Confidence intervals
MFIN6201 – Empirical Techniques and Applications in Finance
1
Distribution of a sample of data
Distribution of a sample data drawn randomly from a population: Y1,…,Yn. We will assume simple random sampling
• Choose an individual (district, entity) at random from the population
Randomness and data
• Prior to sample selection, the value of Y is random because the individual selected is random
• Once the individual is selected and the value of Y is observed, then Y is just a number – not random
• The data set is (Y1, Y2, …, Yn), where Yi = value of Y for the i-th individual (district, entity) sampled
MFIN6201 – Empirical Techniques and Applications in Finance
2
Distribution of Y1,…,Yn under simple random sampling Because individual #1 and #2 are selected at random, the value
of Y1 has no information content for Y2. Thus: • Y1 and Y2 are independently distributed
• Y1 and Y2 come from the same distribution, that is, Y1, Y2 are identically distributed
• That is, under simple random sampling, Y1 and Y2 are independently and identically distributed (i.i.d.)
• More generally, under simple random sampling, Yi for i = 1, …, n, are i.i.d.
MFIN6201 – Empirical Techniques and Applications in Finance
3
Estimation
• Suppose independent random draws from an identical distribution, Y1, · · · , Yn.
) this setup is called iid
• Sample average
̄1 1Xn
Y ⌘n(Y1+···+Yn)=n Yi i=1
• Y ̄ is the natural estimator of the mean (E[Y ]), but they are not the same.
• Remember, Y ̄ is the best guess of E[Y ], but not E[Y ] itself!
• Y ̄ is also another random variable.
MFIN6201 – Empirical Techniques and Applications in Finance
4
Sampling distribution of Y ̄
Y ̄ is a random variable, and its properties are determined by the
sampling distribution of Y ̄
• The individuals in the sample are drawn at random.
• Thus the values of (Y1,…,Yn) are random
• Thus functions of (Y1,…,Yn), such as Y ̄, are random: had a di↵erent sample been drawn, they would have taken on a di↵erent value
• The distribution of Y ̄ over di↵erent possible samples of size n is called the sampling distribution of Y ̄.
• The mean and variance of Y ̄ are the mean and variance of its sampling distribution, E(Y ̄) and var(Y ̄).
• The concept of the sampling distribution underpins all of econometrics.
MFIN6201 – Empirical Techniques and Applications in Finance
5
Example:
Then,
8<1 with prob. 0.78 Y = :0 with prob. 0.22
E[Y ] = p = 0.78
var(Y ) = p(1 p) = 0.1716
Sampling distribution of Y ̄: example
MFIN6201 - Empirical Techniques and Applications in Finance
6
Sampling distribution of Y ̄: example
The sampling distribution of Y ̄ depends on n. Consider n = 2.
The sampling distribution of Y ̄ = 12 (Y1 + Y2) is,
Pr(Y ̄ = 0) = 0.222 = .0484
P r(Y ̄ = 12) = 2 ⇥ 0.22 ⇥ 0.78 = 0.3432 Pr(Y ̄ = 1) = 0.782 = 0.6084
MFIN6201 - Empirical Techniques and Applications in Finance
7
The sampling distribution of Y ̄
MFIN6201 - Empirical Techniques and Applications in Finance
8
Moments of Y ̄: mean
• Mean
X
E⇥Y ̄⇤=E n1(Y1+···+Yn) 1n
= n E [Yi] i=1
=μ
– E[Y ̄] = μ, thus Y ̄ is an unbiased estimator of μ
MFIN6201 - Empirical Techniques and Applications in Finance
9
Moments of Y ̄: variance • Variance
v a r ( Y ̄ ) = v a r X✓ n 1 ( Y 1 + · · · + Y n ) ◆ 1n
= n2 var (Yi)
⇣ i=1 ⌘
* cov(Yi,Yj) = 0 for i 6= j because Yi and Yj are iid = 2
n
– Thus, var(Y ̄) decreases with n
MFIN6201 - Empirical Techniques and Applications in Finance
10
Moments of Y ̄ E ( Y ̄ ) = μ
v a r ( Y ̄ ) = 2 n
Implications:
• Y ̄ is an unbiased estimator of μY (that is, E(Y ̄) = μY ) • var(Y ̄) is inversely proportional to n
– The spread of the sampling distribution is proportional to 1/pn
– Thus the samplingpuncertainty associated with Y ̄ is proportional to 1/ n (larger samples, less uncertainty, but square-root law)
MFIN6201 - Empirical Techniques and Applications in Finance
11
Sampling distribution of Y ̄ when n is large
For small sample sizes, the distribution of Y ̄ is complicated, but if n is large, the sampling distribution is simple!
• As n increases, the distribution of Y ̄ becomes more tightly centered around μY (Law of Large Numbers)
• Moreover, the distribution of pn Y ̄ μY becomes normal (Central Limit Theorem)
MFIN6201 - Empirical Techniques and Applications in Finance
12
The Law of Large Numbers
An estimator is consistent if the probability that its falls within an interval of the true population value tends to one as the sample size increases.
If (Y1, ..., Yn) are i.i.d. and Y2 < 1, then Y ̄ is a consistent estimator of μY , that is,
P r [ | Y ̄ μ Y | < " ] ! 1 a s n ! 1 ̄p
which can be written, Y ! μY or
P r [ | Y ̄ μ Y | > ” ] ! 0 a s n ! 1
̄p ̄p
which can be written, Y ! μY (“Y ! μy” means “converges in
probability to μY ”). The math requires Chebyshev’s inequality: which states that Pr[|X μ| a] var(X). Thus, as n ! 1,
2 a2
var(Y ̄)= nY !0,whichimpliesthatPr[|Y ̄ μY|<"]!1)
MFIN6201 - Empirical Techniques and Applications in Finance
13
The Central Limit Theorem (CLT)
If (Y1,...,Yn) are i.i.d. and 0 < Y2 < 1 , then when n is large, the distribution of Y ̄ is well approximated by a normal distribution.
• Y ̄ is approximately distributed N (μY , Y2 ) (“normal distribution 2 n
with mean μY and variance nY ”) Y ̄ E ( Y ̄ ) Y ̄ μ
as N(0,1) (standard normal)
• The larger is n, the better is the approximation
• Standardized Y ̄ ⌘ p = p var(Y ̄) Y/n
is approximately distributed
Y
MFIN6201 - Empirical Techniques and Applications in Finance
14
Sampling distribution of Y ̄
MFIN6201 - Empirical Techniques and Applications in Finance
15
Sampling distribution of pvar(Y ̄)
Sampling distribution of Y ̄ Y ̄ E ( Y ̄ )
MFIN6201 - Empirical Techniques and Applications in Finance
16
Summary: Sampling Distribution of Y ̄ For Y1, ..., Yn i.i.d. with 0 < Y2 < 1,
• The exact (finite sample) sampling distribution of Y ̄ has mean μY (“Y ̄ is an unbiased estimator of μY ”) and variance Y2 /n
• Other than its mean and variance, the exact distribution of Y ̄ is complicated and depends on the distribution of Y (the population distribution)
MFIN6201 - Empirical Techniques and Applications in Finance
17
Summary: Sampling Distribution of Y ̄ When n is large, the sampling distribution simplifies
• Law of Large Numbers
̄p
Y ! μ Y • Central Limit Theorem !
p Y ̄ μY d
n Y !N(0,1)
MFIN6201 - Empirical Techniques and Applications in Finance
18
Y ̄ i s t h e B L U E o f μ Y • Y ̄ is an unbiased estimator of μY
E ( Y ̄ ) = μ Y • Y ̄ is a consistent estimator of μY
̄p
Y ! μ Y
• Y ̄ is the most e cient estimator of μY
var(Y ̄) < var(μ ̃Y ) for any other plausible estimator, μ ̃Y
• Thus, Y ̄ is the Best Linear Unbiased Estimator (BLUE) of μY
MFIN6201 - Empirical Techniques and Applications in Finance
19
How about Y1 instead of Y ̄ as an estimator of μY ? I know it sounds stupid, but why is it stupid?
• Unbiasedness : YES
• Consistency : NO
E[Y1] = μY
– Y1 is only Y1. It does not converge to μY even as n increases
to infinity.
• E ciency : NO
– var(Y1) > var(Y ̄), thus Y1 is not e cient.
MFIN6201 – Empirical Techniques and Applications in Finance
20
Y ̄ is the least squares estimator of μY
• E[Yi] = μY means Yi = μY +✏ where ✏ is some random noise.
• SSE (sum of squared errors)
X ✏2 = X (Yi μY )2
• Best estimator would have reduced errors most. Thus, it would be a solution to
minSSE = minX(Yi m)2 m
MFIN6201 – Empirical Techniques and Applications in Finance
21
Y ̄ is the least squares estimator of μY • Optimization (maximization / minimization)
f(x) is maximized / minimized at x⇤ where f0(x⇤) = 0 • To minimize SSE, one need to di↵erentiate it
dX
d mSSE = 2 X(Yi m) = 0
) m = n1 Y i = Y ̄
• Therefore, Y ̄ minimizes SSE. That’s why Y ̄ is also called the
least squares estimator of μY . MFIN6201 – Empirical Techniques and Applications in Finance
22
Sample Selection Bias
What will happen to the estimator Y ̄ when we have non-random sampling?
Textbook example:
Suppose that, to estimate the monthly national unemployment rate, a statistical agency adopts a sampling scheme in which interviewers survey working-age adults sitting in city parks at 10 a.m. on the second Wednesday of the month. Because most employed people are at work at that hour (not sitting in the park!), the unemployed are overly represented in the sample, and an estimate of the unemployment rate based on this sampling plan would be biased. This bias arises because this sampling scheme overrepresents, or oversamples, the unemployed members of the population.
MFIN6201 – Empirical Techniques and Applications in Finance
23
Review of Statistical Theory
• The probability framework for statistical inference • Estimation
• Hypothesis testing
• Confidence intervals
MFIN6201 – Empirical Techniques and Applications in Finance
24
Hypothesis Testing
The hypothesis testing problem (for the mean):
• make a provisional decision based on the evidence at hand whether a null hypothesis is true,
• or instead that some alternative hypothesis is true. • That is, test
– H0 : E(Y) = μY,0 vs. H1 : E(Y) 6= μY,0 (2-sided)
– H0 : E(Y ) = μY,0 vs. H1 : E(Y ) > μY,0 (1-sided, >) – H0 : E(Y ) = μY,0 vs. H1 : E(Y ) < μY,0 (1-sided, <)
• μY,0 is the hypothesized value based on your conjecture or null hypothesis
MFIN6201 - Empirical Techniques and Applications in Finance
25
Hypothesis Testing
Suppose your null hypothesis is
H0 : μY = 0
and your actual estimate of Y ̄ is Y ̄act = 1. Can you reject the null hypothesis?
• Case 1: H0 is wrong, and the true μY 6=0
• Case 2: H0 is right, but μY 6= Y ̄act because of sampling errors
How to decide? You need to compute the probability of each case!
MFIN6201 - Empirical Techniques and Applications in Finance
26
Hypothesis testing and p-value
P(H0 is right)=P⇣Y ̄act μY,0 is due to sample errors⌘
= P ⇣ Y ̄ μ Y , 0 > Y ̄ a c t μ Y , 0 ⌘ ⌘ p-value
You can reject the null hypothesis if p-value is very small
MFIN6201 – Empirical Techniques and Applications in Finance
27
Some terminology for testing statistical hypotheses
• p-value = probability of drawing a statistic (e.g., Y ̄) at least as adverse to the null as the value actually computed with your data, assuming that the null hypothesis is true.
• The significance level of a test is a pre-specified probability of incorrectly rejecting the null, when the null is true.
• For example, if your p-value is 3%, you can reject the null hypothesis at 5% significance level but not at 1% significance level.
MFIN6201 – Empirical Techniques and Applications in Finance
28
Type I and II errors
• Type I error: incorrect rejection of a true null hypothesis
If significance level is 5%, it would incorrectly reject true null
hypothesis with 5% probability
• Type II error: failure to reject a false null hypothesis
MFIN6201 – Empirical Techniques and Applications in Finance
29
Calculating the p-value p-value = PrH0[| Y ̄ μY,0 |>| Y ̄act μY,0 |]
where Y ̄act is the value of Y ̄ actually observed (nonrandom)
The p-value is the probability of drawing Y ̄ at least as far in the tails of its distribution under the null hypothesis as the sample average you actually computed.
To compute the p-value, you need the to know the sampling distribution of Y ̄, which is complicated if n is small. If n is large, however, you can use the normal approximation (CLT).
MFIN6201 – Empirical Techniques and Applications in Finance
30
Calculating the p-value p-value = Pr h| Y ̄ μY,0 |>| Y ̄act μY,0 |i
H0 “