程序代写 ECON3206/5206: Review of Linear regression model

ECON3206/5206: Review of Linear regression model

Course materials subject to Copyright
UNSW Sydney owns copyright in these materials (unless stated otherwise). The material is subject to copyright under Australian law and overseas under
international treaties.
The materials are provided for use by enrolled UNSW students. The materials, or any part, may not be copied, shared or distributed, in print or digitally,
outside the course without permission.
Students may only copy a reasonable portion of the material for personal research or study or for criticism or review. Under no circumstances may these
materials be copied or reproduced for sale or commercial purposes without prior written permission of UNSW Sydney.

Statement on class recording
To ensure the free and open discussion of ideas, students may not record, by any means, classroom lectures, discussion and/or activities without the
advance written permission of the instructor, and any such recording properly approved in advance can be used solely for the student?s own private use.

WARNING: Your failure to comply with these conditions may lead to disciplinary action, and may give rise to a civil action or a criminal offence under the
THE ABOVE INFORMATION MUST NOT BE REMOVED FROM THIS MATERIAL.

Dr. (ECON3206/5206) Pre-requisite c©UNSW 1 / 28

ECON3206/5206: Review of Linear regression model

School of Economics

c©Copyright University of Wales 2020. All rights reserved. This copyright notice
must not be removed from this material

Dr. (ECON3206/5206) Pre-requisite c©UNSW 2 / 28

Probability distribution

• Suppose we are interested in wage rates in the United states. Wages vary across workers and
can be described using a probability distribution.

• Formally, we view the wage of an individual worker as a random variable wage with
probability distribution

F (u) = Pr(wage ≤ u)

• A person wage is random: do not know the wage before it is measured. Observed wages are
realizations from the distribution F

• We usually do not know F : we can learn about the distribution from many realizations of the
wage variable.

Dr. (ECON3206/5206) Pre-requisite c©UNSW 4 / 28

Probability distribution

CHAPTER 3. CONDITIONAL EXPECTATION AND PROJECTION 29

Dollars per Hour

0 10 20 30 40 50 60 70

Dollars per Hour

0 10 20 30 40 50 60 70 80 90 100

Figure 3.1: Wage Distribution and Density. All full-time U.S. workers

In Figure 3.1 we display estimates1 of the probability distribution function (on the left) and
density function (on the right) of U.S. wage rates in 2009. We see that the density is peaked around
$15, and most of the probability mass appears to lie between $10 and $40. These are ranges for
typical wage rates in the U.S. population.

Important measures of central tendency are the median and the mean. The median m of a
continuous2 distribution F is the unique solution to

The median U.S. wage ($19.23) is indicated in the left panel of Figure 3.1 by the arrow. The median
is a robust3 measure of central tendency, but it is tricky to use for many calculations as it is not
a linear operator. For this reason the median is not the dominant measure of central tendency in
econometrics.

As discussed in Sections 2.2 and 2.14, the expectation or mean of a random variable y with
density f is

The mean U.S. wage ($23.90) is indicated in the right panel of Figure 3.1 by the arrow. Here
we have used the common and convenient convention of using the single character y to denote a
random variable, rather than the more cumbersome label wage.

The mean is convenient measure of central tendency because it is a linear operator and arises
naturally in many economic models. A disadvantage of the mean is that it is not robust4 especially
in the presence of substantial skewness or thick tails, which are both features of the wage distribution

1The distribution and density are estimated nonparametrically from the sample of 50,742 full-time non-military
wage-earners reported in the March 2009 Current Population Survey. The wage rate is constructed as individual
wage and salary earnings divided by hours worked.

2 If F is not continuous the definition is m = inf{u : F (u) ≥

3The median is not sensitive to pertubations in the tails of the distribution.
4The mean is sensitive to pertubations in the tails of the distribution.

Dr. (ECON3206/5206) Pre-requisite c©UNSW 5 / 28

Probability distribution: measure of central tendency

• Important measures of central tendency are the median and the mean. The median m of a
continuous distribution F is the unique solution to

The median U.S. wage in 2009 is $19.23.

• A convenient measure (but not robust) of central tendency is the mean or expectation.

• The expectation of a random variable y with density f is

µ = E(y) =

We use the single character y to denote the random variable, rather than the more
cumbersome label wage

The mean wage in this example is $23.90. The mean is not robust in the presence of
substantial skewness or thick tails, which are both features of the wage distribution.

Dr. (ECON3206/5206) Pre-requisite c©UNSW 6 / 28

Logarithm transformation

• In this context it is useful to transform the data by taking natural logarithm.

The mean of the random variable log(wage) also denoted log(y) is $2.95.

• The density of log wages is much less skewed and fat-tailed than the density of the level of
wages, so its mean E(log(y)) = 2.95 is a much better measure of central tendency of the
distribution.

• In fact, the geometric mean exp(E(log(y))) = $19.11 is a robust measure of central
tendency of y!!

CHAPTER 3. CONDITIONAL EXPECTATION AND PROJECTION 30

as can be seen easily in the right panel of Figure 3.1. Another way of viewing this is that 64% of
workers earn less that the mean wage of $23.90, suggesting that it is incorrect to describe $23.90
as a “typical”wage rate.

Log Dollars per Hour

1 2 3 4 5 6

Figure 3.2: Log Wage Density

In this context it is useful to transform the data by taking the natural logarithm5. Figure 3.2
shows the density of log hourly wages log(wage) for the same population, with its mean 2.95 drawn
in with the arrow. The density of log wages is much less skewed and fat-tailed than the density of
the level of wages, so its mean

E (log(wage)) = 2.95

is a much better (more robust) measure6 of central tendency of the distribution. For this reason,
wage regressions typically use log wages as a dependent variable rather than the level of wages.

3.3 Conditional Expectation

We saw in Figure 3.2 the density of log wages. Is this wage distribution the same for all workers,
or does the wage distribution vary across subpopulations? To answer this question, we can compare
wage distributions for different groups —for example, men and women. The plot on the left in Figure
3.3 displays the densities of log wages for U.S. men and women with their means (3.05 and 2.81)
indicated by the arrows. We can see that the two wage densities take similar shapes but the density
for men is somewhat shifted to the right with a higher mean.

The values 3.05 and 2.81 are the mean log wages in the subpopulations of men and women
workers. They are called the conditional means (or conditional expectations) of log wages
given gender. We can write their specific values as

E (log(wage) | gender = man) = 3.05 (3.1)

E (log(wage) | gender = woman) = 2.81. (3.2)
We call these means conditional as they are conditioning on a fixed value of the variable gender.

While you might not think of gender as a random variable, it is random from the viewpoint of
5Throughout the text, we will use log(y) to denote the natural logarithm of y.
6More precisely, the geometric mean exp (E (logw)) = $19.11 is a robust measure of central tendency.

Dr. (ECON3206/5206) Pre-requisite c©UNSW 7 / 28

Conditional expectation

CHAPTER 3. CONDITIONAL EXPECTATION AND PROJECTION 31

Log Dollars per Hour

0 1 2 3 4 5 6

Log Dollars per Hour

white women
black women

Figure 3.3: Left: Log Wage Density for Women and Men. Right: Log Wage Density by Gender

econometric analysis. If you randomly select an individual, the gender of the individual is unknown
and thus random. (In the population of U.S. workers, the probability that a worker is a woman
happens to be 43%.) In observational data, it is most appropriate to view all measurements as
random variables, and the means of subpopulations are then conditional means.

As the two densities in Figure 3.3 appear similar, a hasty inference might be that there is not
a meaningful difference between the wage distributions of men and women. Before jumping to this
conclusion let us examine the differences in the distributions of Figure 3.3 more carefully. As we
mentioned above, the primary difference between the two densities appears to be their means. This
difference equals

E (log(wage) | gender = man)− E (log(wage) | gender = woman) = 3.05− 2.81
= 0.24 (3.3)

A difference in expected log wages of 0.24 implies an average 24% difference between the wages
of men and women, which is quite substantial. (For an explanation of logarithmic and percentage
differences see the box on Log Differences below.)

Consider further splitting the men and women subpopulations by race, dividing the population
into whites, blacks, and other races. We display the log wage density functions of four of these
groups on the right in Figure 3.3. Again we see that the primary difference between the four density
functions is their central tendency.

Focusing on the means of these distributions, Table 3.1 reports the mean log wage for each of
the six sub-populations.

white 3.07 2.82
black 2.86 2.73
other 3.03 2.86

Table 3.1: Mean Log Wages by Sex and Race

The entries in Table 3.1 are the conditional means of log(wage) given gender and race. For

E (log(wage) | gender = man, race = white) = 3.07

Dr. (ECON3206/5206) Pre-requisite c©UNSW 8 / 28

Conditional expectation

• Is the wage distribution the same for all workers, or does the wage distribution vary across
subpopulations?

• the plots above displays the densities of log wages in the U.S. men and women with their
means (3.05 and 2.81).

• the means displayed are called the conditional means (or conditional expectations) of log
wages given gender:

E(log(wage)|gender = man) = 3.05

E(log(wage)|gender = woman) = 2.81

• Here the conditioning variable gender is a random variable from the viewpoint of
econometric analysis.

• We can use more than one variable in the conditioning of the expectation:

E(log(wage)|gender = man, race = white) = 3.07

Dr. (ECON3206/5206) Pre-requisite c©UNSW 9 / 28

Conditional expectation

• In many cases it is convenient to simplify notation by writing variables using single
characters, typically y, x, and/or z;

• Typically in econometrics it is conventional to denote the dependent variable by the letter y
and the conditioning variables by the letter x, and multiple conditioning by the subscripted
letters x1, x2, · · · , xk.

• Conditional expectation can be written with the generic notation

E(y|x1, x2, · · · , xk) = m(x1, x2, · · · , xk)

• This is called the conditional expectation function. For example, the conditional expectation
of y = log(wage) given (x1, x2) = (gender, race) is given by

white 3.07 2.82
black 2.86 2.73
other 3.03 2.86

Dr. (ECON3206/5206) Pre-requisite c©UNSW 10 / 28

Conditional expectation

• An econometrician has observational data

{(x1, y1), (x2, y2), · · · , (xn, yn)}

• If the data are cross-sectional, it is reasonable to assume they are mutually independent

• If the data are randomly gathered, it is reasonable to model each observation as a random
draw from the same probability distribution. In this case the data are independent and
identically distributed, or iid.

• To study how the distribution of yi varies with xi, we can focus on the conditional density of
yi given xi and its conditional mean m(xi).

• The conditional mean function is the regression function.

yi = E[yi|xi] + (yi − E[yi|xi]) = E[yi|xi] + µi

• E[µi|xi] = 0.
• µ is called the conditional expectation function error.

Dr. (ECON3206/5206) Pre-requisite c©UNSW 11 / 28

Linear regression model

• While the conditional mean m(x) is the best predictor of y among all functions of x , its
functional form is typically unknown.

• For empirical implementation and estimation, it is typical to replace m(x) with an
approximation.

• Most commonly, this approximation is linear in x.

• It is convenient to augment the regressor vector x by listing the number 1 as an element. We
call this the constant or intercept term.

m(x) = β0 + β1×1 + β2×2 + · · ·+ βkxk = x′β

x = (1, x1, x2, · · · , xK)′

β = (β0, β1, β2, · · · , βK)′

• Boldface letter indicates a column vector. In the case of one regressor x and a constant
term: β = (β0, β1) and x = (1, x)

′, and x′β = β0 + β1x.

(Wisdom: Models should have a constant term unless the theory says they should not.)

Dr. (ECON3206/5206) Pre-requisite c©UNSW 12 / 28

Assumption 1 (MLR.1): Linearity

Assumption

MLR.1 Linearity : The population model is linear in the parameters:

y = β0 + β1×1 + β2×2 + · · ·+ βkxk + µ, (1)

where βi, i = 0, · · ·, k are the unknown (constants) parameters of interest, xi’s are the regressors
which can be assumed to be either fixed or random, and µ the random error.

If the linearity assumption is violated then the regression model is misspecified. This is known as
functional form misspecification (although this is still linear in β’s )

Dr. (ECON3206/5206) Pre-requisite c©UNSW 13 / 28

Functional Form Misspecification

• The model does not account for some important nonlinearities;

• Omitting important variables is also model misspecification;

• Generally functional form misspecification causes biases in the remaining parameter
estimators.

Dr. (ECON3206/5206) Pre-requisite c©UNSW 14 / 28

Functional Form Misspecification

Suppose that the correct specification of the wage equation is:

log(wage) = β0 + β1educ+ β2exper + β3(exper)
2 + µ, (2)

then the return for an extra year of experience is

= wage× [β2 + 2β3exper] .

If the estimated model is instead:

log(wage) = β0 + β1educ+ β2exper + µ, (3)

then use of the biased (upward) OLS estimator of β2 can be misleading.
If the estimated model is instead:

wage = β0 + β1educ+ β2exper + β3(exper)
2 + µ, (4)

∂wage/∂exper = β2 + 2β3exper (5)

Assumption MLR2: Random Sampling

Assumption

MLR2. Random Sampling:
We have a random sample of n observations, {(xi1, xi2, · · ·, xik, yi) : i = 1, 2, · · ·, n}, following
the population model in Assumption 1.
Nonrandom sampling causes OLS estimator to be biased and inconsistent.

Scenarios where Assumption 2 does not hold include:

• Missing Data

• Nonrandom Samples

• Outliers

Assumption MLR3: No Perfect Collinearity

Assumption

MLR3. No Perfect Collinearity:
In the sample and in the population, none of the independent variables is constant, and there are
no exact linear relationships among the independent variables.

Scenarios where Assumption 3 is violated include:

• One independent variable is a linear combination of one or more other regressors. It is not a
problem to include nonlinear functions of the same variables
• For example include consumption, investment and income on the right hand side of the regression

equation. In national accounts, national income is the sum of consumption and investment
• Including all seasonal dummies and the constant term in the regression

Assumption MLR4: No Perfect Collinearity

Assumption

MLR4. Zero Conditional Mean:
The error term µ has a conditional expected value of zero given any values of the independent
variables,

E(µ|x1, · · · , xK) = 0

This assumption fails for many reasons, these include:

• Misspecification of the functional form

• Omitting important factors correlated with any of the regressors: omitted variables bias.

• Measurement error in the explanatory variables (more later, W. Ch. 15).

• Endogeneity and Simultaneity: some explanatory variables are determined jointly with the
dependent variable

Finite Sample Properties of OLS: Unbiasedness

Unbiasedness
Under Assumptions MLR1-MLR4, the ordinary least squares (OLS) estimator, β̂j , j = 0, · · · ,K
is unbiased. That is its expected value is equal to the population parameter,

= βj , , for j = 0, · · · ,K

• OLS estimator minimizes the sum of squared residuals. For the simple case of one regressor
x1, β̂ = (β̂0, β̂1) minimizes,

(yi − β0 − β1×1)2

Anatomy of the single regression

Consider the case of multiple regressors:

yi = β0 + β1xi1 + µi (6)

The population regression coefficients β0 and β1 are defined by solving:

β0, β1 = argminb0,b1E
(yi − b0 − b1xi)2

The first order conditions,

(yi − β0 − β1xi)2

= E [−2(yi − β0 − β1xi)] = 0 (7)

(yi − β0 − β1xi)2

= E [−2xi(yi − β0 − β1xi)] = 0 (8)

Solving for β0 and β1:

Cov(yi, xi

β0 = E [yi]− β1E [xi] (10)

Anatomy of the Multiple regression

Consider the case of multiple regressors:

程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts