程序代写代做代考 C Columbia University MA in Economics

Columbia University MA in Economics
GR 5411 Econometrics I Seyhan Erden
year |
ahe |
bachelor |
female |
age |
800 2015
800 23.18579
800 .6125
800 .44875
800 29.495
0 2015 2015 12.74801 2.040816 80 .4874842 0 1 .4976776 0 1 2.920297 25 34
SOLUTIONS TO Problem Set 2 due on Oct.12th at 10am through Gradescope
__________________________________________________________________________________________
1. (Practice question, this question will not be graded, so do not need to submit solutions. Answer to this question will be covered in recitations this week) Use CPS2015_without constant.dta to answer the following question. Let 𝑦 contain ahe (average hourly earnings) and 𝑋 contain rest of the variables except year. Also let the first column of 𝑋 be a column of ones.
(a) Create the dependent variable vector y and the matrix X containing ones on the first column and regressors on each of the rest of the columns.
(b) Compute the least squares regression coefficients in the regression of y on X. Report the coefficients. You must use the OLS estimator’s matrix formula here, not the regress command of Stata.
(c) Compute the adjusted R2
(d) Verify your results in part (b) by running regress command
. ** Problem Set 2 Question 1 ** . clear all
. **** Fetch the dataset ****
. use “/Users/seyhanerden/Documents/COLUMBIA ECONOMETRICS I (GR5411) MA/Problem
Sets Fall 2020 ONLINE/Problem Set 2 _ GR5411 _ MA Metrics/Problem Set 2 _ Fall 2020 _ OLS/CPS2015_without constant.dta”
. *part(a)
. *** generate column of ones ***
. gen constant=1
.
. su
Variable | Obs Mean ————-+———————————————————
————-+——————————————————— c| 800 1 0 1 1
. drop year
. *** Regression *** . * Store Y
. mkmat ahe, matrix(Y)
. * Store X
. mkmat female age bachelor constant, matrix(X)
Std. Dev.
Min Max

. *part (b)
. * X’X
. matrix XX=X’*X
. mat list XX
symmetric XX[4,4] female female 359 age 10610 bachelor 246 c 359
age bachelor
c 800
. * (X’X)^{-1}
. matrix XX_inv=syminv(XX)
. mat list XX_inv
symmetric XX_inv[4,4] female female .00514793
c
702778
14467
23596
490 490
age bachelor
age -.00001459 .00014682
bachelor -.00070686 -9.167e-06 .00536453
c -.00144687 -.00431836 -.00269819 .13092187
. * X’Y
. matrix XY=X’*Y
. mat list XY XY[4,1]
ahe female 7844.6458 age 551138.48 bachelor 13169.217 c 18548.629
. * Beta Hat *
. matrix b_hat=XX_inv*XY
. mat list b_hat
b_hat[4,1]
ahe female -3.8032856 age .58455309 bachelor 10.001571 c 1.5251549
. * part (c)
** Method 1 **
. * Residuals
. matrix e=Y-X*b_hat
. * Residual sum of squares (sigma squared)

. mat ss=(e’*e)/(800-3-1) . mat list ss
symmetric ss[1,1]
ahe ahe 135.14426
. mat var=ss*XX_inv
. * the row vector containing the diagonal of matrix (X’X)^{-1} . mat kk=vecdiag(var)
. * obtain square root of each element
. net install dm79, from(http://www.stata.com/stb/stb56) checking dm79 consistency and verifying not already installed… all files already exist and are up to date.
. matewmf kk SE , function(sqrt)
. mat li SE
SE[1,4]
female age bachelor constant
r1 .83409406 .14086224 .85146082 4.2063451
. mat ssr=(e’*e)/800
. egen y_ave=mean(ahe)
. mkmat y_ave, matrix(y_bar)
. mat tss=(y-y_bar)’*(y-y_bar)/800 . mat r_sq=1-(ssr*syminv(tss))
. mat li r_sq
symmetric r_sq[1,1] c1
r1 .17152593
*now calculate adjusted r_squared
. mat r_adj=1-(((800-1)/(800-4))*(1-r_sq)) . mat li r_adj
symmetric r_adj[1,1] c1
r1 .16840354
. *Alternative method to calculate R_sqared using Po and Mo matrices . mkmat constant, matrix(i)
. matrix ii=i’*i
. matrix ii_inv=syminv(ii)

. matrix Po=i*ii_inv*i’
. matrix Mo=I(800)-Po
. matrix sst=Y’*Mo*Y
. matrix ess=b_hat’*X’*Mo*X*b_hat . mat R_sq=ess*syminv(sst)
. mat li R_sq
symmetric R_sq[1,1] ahe
ahe .17152593
.
. * part (d)
. ** Verify by running usual regression **
. reg ahe female age bachelor
Source | SS df ————-+———————————- Model | 22272.118 3 7424.03934 Residual | 107574.831 796 135.14426 ————-+———————————- Total | 129846.949 799 162.511826
—————————————————————————— ahe | Coef. Std. Err. t P>|t| [95% Conf. Interval] ————-+—————————————————————- female | -3.803286 .8340941 -4.56 0.000 -5.440569 -2.166002 age| .5845531 .1408622 4.15 0.000 .3080477 .8610584 bachelor| 10.00157 .8514608 11.75 0.000 8.330197 11.67294 _cons| 1.525155 4.206345 0.36 0.717 -6.731685 9.781995 ——————————————————————————
.
end of do-file
2. (34p) Use the data set CPS2015.dta to answer the following questions:
Let X1 equal a constant and age. Let X2 contain bachelor and female. Let y be the average hourly earnings.
(a) (2p) Compute the least squares regression coefficients in the regression of y on X1. Report the coefficients.
(b) (2p) Compute the least squares regression coefficients in the regression of y on X1 and X2. Report the coefficients.
(c) (4p) Regress each variable in X2 on all the variables in X1 and store the residuals. These new variables are X2*. What are the sample means of these variables? Explain the finding.
(d) (4p) Compute the R2 for the regression of y on X1 and X2. Repeat the computation for the case in which the constant term is omitted from X1. What happens to R2?
MS
Number of obs = F(3, 796) = Prob>F = R-squared = Adj R-squared = Root MSE =
800
54.93
0.0000
0.1715
0.1684
11.625

(e) (4p) Compute the adjusted R2 for the full regression including the constant term. Interpret your result.
(f) (10p) Referring to the result in part c, regress y on X1 and X2*. How do your results compare to the results of the regression of y on X1 and X2? The comparison you are making is between the least squares coefficients when y is regressed on X1 and M1X2 and when y is regressed on X1 and X2. Derive the result theoretically. (Your numerical results should match the theory, of course)
(g) (8p) Copy/paste your do file (Word fond Courier New Bold Size 9 shows the Stata output as seen on the screen)
Solution: See attached code
/* Problem Set 2 question 2 */
use “/Users/seyhanerden/Documents/COLUMBIA ECONOMETRICS I (GR5411) MA/Problem Sets Fall 2020 ONLINE/Problem Set 2 _ GR5411 _ MA Metrics/Problem Set 2 _ Fall 2020 _ OLS/CPS2015.dta”
log using ps2_q2.log, text replace
mkmat constant age, matrix(X1) /*create matrix X1 */ mkmat female bachelor, matrix(X2) /*create matrix X2 */ mkmat ahe, matrix(y)
matrix list X1
matrix list X2
matrix list y
/* part (a)*/
matrix b1=invsym(X1’*X1)*(X1’*y) /*compute LS coef in the regression y on X1 */ matrix list b1 /* reporting results for part (a) */
/* part (b) */
matrix M1=I(800)-(X1*invsym(X1’*X1)*X1′) /*create M1 matrix*/ matrix M2=I(800)-(X2*invsym(X2’*X2)*X2′) /*create M2 matrix*/ /*compute LS coef in the regression y on X1 and X2 */
matrix bb1=invsym(X1’*M2*X1)*(X1’*M2*y)
matrix bb2=invsym(X2’*M1*X2)*(X2’*M1*y)
/* reporting results for part (b) */
matrix list bb1
matrix list bb2
/* part (c) */
/* creating y matrices for all regressions */ mkmat bachelor, matrix(bachelor)
mkmat female, matrix(female)
/* regressing each of the 2 variables in X2 on all variables in X1 */ matrix b_bachelor=invsym(X1’*X1)*(X1’*bachelor)
matrix b_female=invsym(X1’*X1)*(X1’*female)
matrix list b_bachelor matrix list b_female
/* residual matrix for all 2 using residual maker */ matrix res_bachelor=M1*bachelor
matrix res_female=M1*female
/* list residual matrices made above */ matrix list res_bachelor

matrix list res_female
/* create unit vector */
matrix i = J(rowsof(X1),1,1)
/* find the means of the residuals */
matrix mean_res_bachelor=(i’*res_bachelor)/800 matrix mean_res_female=(i’*res_female)/800
/* define X2 STAR vector & get the means of the residuals */ matrix define X2_star=res_bachelor,res_female
matrix define X2_star_means=mean_res_bachelor\mean_res_female /* The means are (essentially) zero.
The sums must be zero, as these new variables are orthogonal to the columns of X1. The first column in X1 is a column of ones,
so this means that these residuals must sum to zero*/
matlist X2_star_means
/* part (d) */
matrix define X=X1,X2
/*
if need to define i vector
matrix define i = init(n,1,1)
*/
matrix define M0 = I(800) – ((1/800)*(i*i’))/* define M0 matrix */
matrix b_all=invsym(X’*X)*(X’*y) /* find beta hat using all regressors */ matrix TSS=y’*M0*y /* calculate Total Sum of Squares */
matrix ESS=b_all’*X’*M0*X*b_all /*compute Explained Sum of Squares */ matrix define R2=ESS*invsym(TSS) /* compute R_squared */
matrix list R2
/* repeat computation of R_squared without the constant term in X */
mkmat age bachelor female, matrix(x) /* small x matrix is the one with no constant term */
matrix b_noconstant=invsym(x’*x)*x’*y /* find the beta hat vector with no constant x matrix */
matrix list b_noconstant
matrix uhat = y – x*b_noconstant /*residuals for the no constant model */
matrix define R2_noconstant = 1 – uhat’*uhat * invsym(TSS)
matrix list R2_noconstant /* compute R_squared with no constant term*/
/* part (e) */
matrix define adjR2=1-((799/796)*(1-R2)) /* compute adjusted R2 */ matrix list adjR2
/* part (f) */
matrix define X12_star=X1,X2_star
matrix b_star=invsym(X12_star’*X12_star)*(X12_star’*y)
matlist (b_star, b_all)
/* The coefficients on X1 variables are different but the coefficients on the X2 variables are the same
The X1 coeffiicents in the b_star regression are the same as those in the regression of y on X1 only
and so the interpretation is that any correlation between X1 & X2 variables gets attributed to the X1 part
note also prediction values for b_star regression need to plug in the X2_star version instead of X2
or else the numbers will be wrong.*/
log close
(f) Here age, bachelor, female variables are denoted by 𝑎 , 𝑏 and 𝑓 for 𝑖 = 1, … ,800 $$$

1𝑎𝑏𝑓 …∗
𝑋. =/⋮ ⋮ 3 𝑋5 =/ ⋮ ⋮ 3 ⇒ 𝑋5 =𝑀.𝑋5
1𝑎𝑏𝑓 122 122 122
𝑀 =𝐼−𝑋(𝑋<𝑋)>.𝑋< and 𝑀 =𝐼−𝑋(𝑋<𝑋)>.𝑋< . .... 5 5555 Therefore, the coefficients we get from regressing 𝑦 on 𝑋. and 𝑋5 are respectively 𝛽@ =(𝑋<𝑀 𝑋 )>.𝑋<𝑀 𝑦 and 𝛽@ =(𝑋<𝑀 𝑋 )>.𝑋<𝑀 𝑦 ..5..5 55.55. Intheregression𝑦on𝑋 and𝑋∗, 𝑀∗ =𝐼−𝑋 (𝑋<𝑋 )>.𝑋< =𝑀 .5...... and 𝑀∗ = 𝐼 − 𝑋∗A𝑋∗<𝑋∗B>.𝑋∗′ 55555
By plugging 𝑋∗ = 𝑀 𝑋 into the expression of 𝑀∗, we obtain that 5.55
𝑀∗ = 𝐼 − 𝑋∗A𝑋∗<𝑋∗B>.𝑋∗< 55555 = 𝐼 − 𝑀.𝑋5A(𝑀.𝑋5)′(𝑀.𝑋5)B>.(𝑀.𝑋5)< =𝐼−𝑀 𝑋 (𝑋<𝑀<𝑀 𝑋 )>.𝑋<𝑀< .55..5 5. Because 𝑀. is symmetric and idempotent, 𝑀∗ =𝐼−𝑀 𝑋 (𝑋<𝑀 𝑋 )>.𝑋<𝑀< 5 .55.5 5. whichmaynotbeequalto𝑀.Thus,thecoefficientof𝑋 intheregression𝑦on𝑋 and𝑋∗ 5..5 which is 𝛽@∗ = (𝑋<𝑀∗𝑋 )>.𝑋<𝑀∗𝑦 may not be equal to 𝛽@ . ..5..5 . However, the coefficient of 𝑋∗ in the regression 𝑦 on 𝑋 and 𝑋∗ which is 𝛽@∗ is equal to 𝛽@ . 5.555 We can derive this result as follows: 𝛽@∗ = (𝑋∗′𝑀 𝑋∗)>.𝑋∗<𝑀 𝑦 55.55. = A(𝑀.𝑋5)′𝑀.𝑀.𝑋5B>.(𝑀.𝑋5)<𝑀.𝑦 = (𝑋<𝑀 𝑋 )>.𝑋<𝑀 𝑦 5.5 5. = 𝛽@ 5 where the second equality holds due to 𝑋∗ = 𝑀 𝑋 and the forth equality holds according to the 5.5 fact that 𝑀5 is symmetric and idempotent. 3. (8p) Consider the population regression of test scores against income and the square of income in non-matrix form as follows: 𝑇𝑒𝑠𝑡𝑆𝑐𝑜𝑟𝑒 = 𝛽 + 𝛽 𝐼𝑛𝑐𝑜𝑚𝑒 + 𝛽 𝐼𝑛𝑐𝑜𝑚𝑒5 + 𝜀 $2.$5$$ Write the regression in matrix form using the general notation we used in class. Define 𝑦, 𝑋, 𝜀 and 𝛽. Solutions: The regression in the matrix form is with 𝑦 = 𝑋𝛽 + 𝜀 1 𝐼𝑛𝑐𝑜𝑚𝑒 𝐼𝑛𝑐𝑜𝑚𝑒5 𝑋=S⋮ ⋮ . ⋮ .T 1 𝐼𝑛𝑐𝑜𝑚𝑒 𝐼𝑛𝑐𝑜𝑚𝑒5 Q Q 𝜀. 𝛽2 𝑇𝑒𝑠𝑆𝑐𝑜𝑟𝑒. 𝑦=P𝑇𝑒𝑠𝑆𝑐𝑜𝑟𝑒5R ⋮ 𝑇𝑒𝑠𝑆𝑐𝑜𝑟𝑒Q 4. (12p)Let𝑃 =𝑋(𝑋′𝑋)>.𝑋′andlet𝑀 =𝐼 −𝑃 V VQV
𝜀 = P𝜀5R 𝛽 = /𝛽 3 ⋮.
(a) (4p) Prove that 𝑃 𝑀 = 0 and that 𝑃 and 𝑀 are idempotent. VVQ×Q VV
(b) (5p)Showthat 𝑌Y=𝑃 𝑌andshowthat𝑈[=𝑀 𝑌=𝑀 𝑈. VVV
(c) (3p) Find the 𝑟𝑎𝑛𝑘(𝑃 ) and 𝑟𝑎𝑛𝑘(𝑀 ) VV
𝜀Q
𝛽5

Solutions:
(a) 𝑃 is idempotent because
𝑃 𝑃 = 𝑋(𝑋<𝑋)>.𝑋<𝑋(𝑋<𝑋)>.𝑋< = 𝑋(𝑋<𝑋)>.𝑋< = 𝑃 VVV V 𝑀V is idempotent because 𝑀𝑀 =(𝐼 −𝑃)(𝐼 −𝑃)=𝐼 −𝑃 −𝑃 +𝑃𝑃 𝑃𝑀 =0 VV Q×Q because VVQVQVQVVVV =𝐼 −2𝑃 +𝑃 =𝐼 −𝑃 =𝑀 QVVQVV 𝑃𝑀 =𝑃(𝐼 −𝑃)=𝑃 −𝑃𝑃 =𝑃 −𝑃 =0 VV VQ V V VV V V Q×Q (b) Because 𝛽@ = (𝑋′𝑋)>.𝑋′𝑌, we have
𝑌Y = 𝑋 𝛽@ = 𝑋 ( 𝑋 ′ 𝑋 ) > . 𝑋 < 𝑌 = 𝑃 𝑌 The residual vector is 𝑈[=𝑌−𝑌Y=𝑌−𝑃𝑌=(𝐼 −𝑃)𝑌=𝑀𝑌 VQVV We know that 𝑀V𝑋 is orthogonal to the columns of 𝑋: 𝑀 𝑋=(𝐼 −𝑃 )𝑋=𝑋−𝑃 𝑋=𝑋−𝑋(𝑋<𝑋)>.𝑋<𝑋=𝑋−𝑋=0 VQVV so the residual vector can be further written as 𝑈[ = 𝑀V𝑌 = 𝑀V(𝑋𝛽 + 𝑈) = 𝑀V𝑋𝛽 + 𝑀V𝑈 = 𝑀V𝑈 (c) rank(𝑃 ) = trace(𝑃 ) = trace(𝑋(𝑋′𝑋)>.𝑋<) = trace((𝑋′𝑋)>.𝑋<𝑋) = trace(𝐼 ) = 𝑘. VV^ rank(𝑀V) = trace(𝑀V) = trace(𝐼 − 𝑋(𝑋′𝑋)>.𝑋<) = trace(𝐼 − (𝑋′𝑋)>.𝑋<𝑋) = trace(𝐼Q − 𝐼^) = 𝑛 − 𝑘. 5. (8p) Let 𝑊 be an 𝑚 × 1 vector with covariance matrix Σa , where Σa is finite and positive definite. Let 𝑐 be a nonrandom 𝑚 × 1 vector, and let 𝑄 = 𝑐′𝑊 (a) (5p) Show that 𝑣𝑎𝑟(𝑄) = 𝑐′Σa𝑐 (b) (3p) Suppose that 𝑐 ≠ 0e. Show that 0 < 𝑣𝑎𝑟(𝑄) < ∞ V Solutions: (a) 𝑉𝑎𝑟(𝑄) = 𝐸 jA𝑄 − 𝜇lB5m = 𝐸nA𝑄 − 𝜇lBA𝑄 − 𝜇lB′o = 𝐸[(𝑐<𝑊 − 𝑐′𝜇a)(𝑐<𝑊 − 𝑐′𝜇a)<] = 𝑐"𝐸[(𝑊 − 𝜇a)(𝑊 − 𝜇a)<] = 𝑐<𝑣𝑎𝑟(𝑊)𝑐 = 𝑐< Σa𝑐 where the second equality uses the fact that Q is a scalar and the third equality uses the fact that μQ = c¢μw. (b) Because the covariance matrix Σa is positive definite, we have 𝑐< Σa𝑐 > 0 for every non-zero vector from the definition. Thus, 𝑣𝑎𝑟(𝑄) > 0. Both the vector c and the matrix Σa are finite, so 𝑣𝑎𝑟(𝑄) = 𝑐< Σa𝑐 is also finite. Thus, 0 < 𝑣𝑎𝑟(𝑄) < ∞. 6. (30p) Suppose that a sample of 𝑛 = 20 households has the sample means and sample covariances below for a dependent variable and two regressors: CalculatetheOLSestimatesof𝛽,𝛽 and𝛽.Calculatethesamplevarianceoftheresiduals,𝑠5. 2.5 z[ Calculate the 𝑅5 of the regression. Solutions: The sample size n = 20. We write the regression in the matrix from: Y = Xb + U with . Sample Covariances . Sample Means 𝑌 𝑋. 𝑋5 𝑌 6.39 0.26 0.22 0.32 𝑋. 7.24 0.80 0.28 𝑋5 4.00 2.40 ⎝n⎠ The OLS estimator the coefficient vector is with and βˆ = ( X ′ X ) − 1 X ′ Y . æ n ån X ån X ö i=1 2i ÷ Note ⎛Y1⎞ ⎛1XX⎞ ⎜ 1,1 2,1⎟ X=⎜ 1 X1,2 X2,2 ⎟ ⎜!!!⎟ ⎜1XX⎟ ⎝ 1,n 2,n⎠ ⎜Y⎟ Y=⎜ !2 ⎟ , ⎜Y⎟ ⎝n⎠ ⎛ u1 ⎞ ⎜ u ⎟ ⎛ β ⎞ ⎜ 0 ⎟ U=⎜!2⎟, β=⎜β1⎟ ⎜ u ⎟ ⎝ β2 ⎠ ç i=1 1i X¢X=ånX ånX2 ånXX, ç i=1 1i i=1 1i çån X ån XX i=1 1i 2i÷ è i=1 1i i=1 1i 2i æ ån Y ö ån X2 ÷ i=1 2i ø ç i=1 i ÷ X¢Y=ån XY. çi=1 1ii÷ çån XY÷ èi=1 2iiø åX i=1 i=1 n By the definition of sample variance s2 = Y i åY2 - Y2, n = 20 ́7.24 =144.8, åX =nX =20 ́4.00=80.0, = nX 1i 1 n 2i 2 åY = nY = 20 ́6.39 =127.8. i i=1 1n1nn å(Y -Y)2 = Y2 =(n-1)s2 +nY2. iY we know n-1 i=1 ån n-1 i=1 i n-1 i=1 Thus using the sample means and sample variances, we can get ån i=1 ån i=1 =(20-1) ́2.40+20 ́4.002 =365.6. By the definition of sample covariance and sXY = (X-X)(Y-Y)= n-1 i i X2 =(n-1)s2 +nX2 1i X 1 =(20-1) ́0.80+20 ́7.242 =1063.6, X2 =(n-1)s2 +nX2 2,i X 2 1 1 ån 1 ån n XY- XY, 2 we know n-1 ii n-1 i =1 i =1 n åXY=(n-1)s +nXY. ii XY i=1 Thus using the sample means and sample covariances, we can get n åX Y =(n-1)s 1 +nXY 1ii XY1 i=1 =(20-1) ́0.22+20 ́7.24 ́6.39=929.45, åX Y =(n-1)s 2 +nX Y i=1 =(20-1) ́0.32+20 ́4.00 ́6.39=517.28, n åXX=(n-1)s1 2+nXX =(20-1) ́0.28+20 ́7.24 ́4.00=584.52. n 2ii XY 2 and Therefore we have 20.0 144.8 80.0 𝑋<𝑋 = /144.8 1063.6 584.523, 80.0 584.52 365.6 1i2i XX 12 i=1 The inverse of matrix 𝑋¢𝑋 is (𝑋′𝑋)>. = /−0.4631 0.0684 −0.00803
−0.0337 −0.0080 0.0229
The OLS estimator of the coefficient vector is
@ >. < 3.5373 𝛽 = (𝑋′𝑋) 𝑋 𝑌 = /−0.4631 −0.0337 −0.4631 0.0684 −0.0080 0.0229 517.28 0.1033 3.5373 −0.4631 −0.0337 That is, 𝛽@ = 4.2063, 𝛽@ = 0.2520, and 𝛽@ = 0.1033 2.} With the number of slope coefficients 𝑘 = 2, the squared standard error of the regression 𝑠5 is ~ −0.0337 127.8 127.8 𝑋<𝑌 = /929.453 517.28 4.2063 −0.00803 /929.453 = /0.25203 We have U¢U = (Y - Xβ)¢ (Y - Xβ) = Y¢Y - 2β 'X¢Y + β 'X¢Xβ. n 1∑n 1 s2 = uˆ = U′U. ˆˆ uˆ n−k−1 i n−k−1 i=1 ˆˆˆ TheOLSresiduals U=Y-Y=Y-Xb,so ˆˆˆˆˆˆˆ Y¢Y=åY2 =(n-1)s2 +nY2 iY i=1 =(20-1) ́0.26+20 ́6.392 =821.58, æ4.2063ö¢æ 127.8 ö ç÷ç÷ β'X¢Y= ç0.2520÷ ç929.45÷=825.22, ˆ ç0.1033÷ ç517.28÷ and Therefore the sum of squared residuals èøèø æ 4.2063 ö¢ æ 20 144.8 80.0 ö æ 4.2063 ö ç÷ç÷ç÷ ˆˆ β'X¢Xβ=ç0.2520÷ ç144.8 1063.6 584.52÷ç0.2520÷=832.23. TSS= the R2 of the regression is ç 0.1033 ÷ ç 80.0 584.52 365.6 ÷ ç 0.1033 ÷ èøèøèø uˆ n å i=1 ˆˆˆˆˆ uˆ2 =U¢U=Y¢Y-2b¢X¢Y+b¢X¢Xb SSR= = 821.58 - 2 ́ 825.22 + 832.23 = 3.37. The squared standard error of the regression 𝑠5 is ~ i 1ˆˆ1 s2 = U¢U= ́3.37=0.1982. n - k -1 20 - 2 -1 With the total sum of squares ån iY i =1 (Y-Y)2 =(n-1)s2 =(20-1) ́0.26=4.94, 7. R2 =1- SSR =1- 3.37 = 0.3178. TSS 4.94 (8p)Showthatif𝑋=[𝑋. 𝑋5]then𝑃𝑋. =𝑋. and𝑀𝑋. =0, where 𝑃 = 𝑋(𝑋′𝑋)>.𝑋′ and 𝑀 = 𝐼 − 𝑋(𝑋′𝑋)>.𝑋′
similarly,
𝑀𝑋. =𝑀𝑋Γ=(𝐼−𝑋(𝑋′𝑋)>.𝑋<)𝑋Γ= 𝑋Γ−𝑋(𝑋<𝑋)>.𝑋<𝑋Γ=𝑋Γ−𝑋Γ=0 (Practice question, this question will not be graded, so do not need to submit solutions. Answer to this question will be covered in recitations this week) Consider the regression model 𝑌 = 𝛽 + 𝛽.𝑋$ + 𝑢, and assume that the least squares assumptions hold. (a) Write the model in matrix form. 8. Solution: Wecanwrite𝑋. =𝑋Γ whereΓ=j𝐼mthen 0 𝑃𝑋. =𝑃𝑋Γ=𝑋(𝑋′𝑋)>.𝑋<𝑋Γ=XΓ=𝑋. (b) Use the general formula for the OLS estimator, 𝛽@, to derive the expression for 𝛽@2 and 𝛽@. (c) Show that (1,1) element of ΣÉ[ given in Σ[ =1𝑄>.Σ 𝑄>. É𝑛VÑV
Σ =𝐸(𝑉𝑉<), 𝑉=𝑋𝑢 Ñ$$$$$ $2 is equal to the expression for 𝜎5 given in É[ Ü 𝑛 [ 𝑣 𝑎 𝑟 ( 𝑋 $ ) ] 5 Solution: (a) The regression in the matrix form is É[Ü 𝜎5 =1𝑣𝑎𝑟[(𝑋$−𝜇V)𝑢$] with Y = Xb + U æYö æ1Xö æuö ç1÷ç1÷ç1÷ Y = ç Y2 ÷ , X = ç 1 X 2 ÷ , U ç u 2 ÷ , b = æ b 0 ö . çb÷ è 1 ø ç!÷ ç! !÷ ç!÷ çY ÷ ç1 X ÷ çu ÷ ènøènøènø (b) Matrix multiplication of X¢X and X¢Y yields The inverse of X¢X is The estimator for the coefficient vector is ˆ β = (X¢X)-1 X¢Y æn ånXö X¢X=ç i=1 i ÷. ån X ån X2 èi=1i i=1iø æån Y ö æ nY ö X¢Y = ç i =1 i ÷ = ç ÷ . æn ånXö-1 (X¢X)-1 =ç i=1 i ÷ çån XY÷ ån XY è i=1 i iø è i=1 i iø ån X ån X2 èi=1i i=1iø 1 nån X2 -(ån X )2 ç-ån X n ÷ = = çi=1 i ÷. æån X2 -ån Xö çi=1 i i=1 i÷ i=1 i i=1 i è i=1 i 1 æån X2/n -Xö ø ån (X-X)2 -X 1 i=1i è ø 1 æånX2/n-XöænYö çi=1i ÷ç÷ = i=1i è øèi=1iiø ån (X-X)2 -X 1 çån XY÷ æYån X2 -Xån XYö = ç i=1 i i=1 ii÷ i=1 i è i=1 ii ø 1 ån (X-X)2ç ån XY-nXY ÷ Therefore we have ån XY-nXY ån (X-X)(Y-Y) b=i=1ii =i=1i i , ån (X -X)2 ån (X -X)2 i=1 i i=1 i ˆ and ˆ Y ån X 2 - X ån X Y i=1 i i=1 ii ån (X-X)2 i=1 i =Yån (X-X+X)2-Xån XY i=1 i i=1 ii ån (X-X)2 i=1 i =Yån (X-X)2+nX2Y-Xån XY i=1 i i=1 ii ån (X-X)2 i=1 i b= 0 éån X Y -nXY ù =Y- i=1 i i X êån (X-X)2 ú ëi=1i û ˆ =Y -b1X. (c) The large-sample covariance matrix of 𝛽@, conditional on X, converges to Σ =1Q-1ΣQ-1 observation is so we have and èiiø èii iiø Taking expectations, we get Q =E(XX¢)=æ1 μX ö, X i i çμ E(X2)÷ ˆ XvX β n with QX = E(𝑋 𝑋<) and Σ = 𝐸(𝑉𝑉<) = 𝐸(𝑋 𝑢 𝑢<𝑋<). The column vector Xi for the ith $$á$$$$$$ X=æ1ö, içX÷ èiø XX¢=æ1ö(1 X)=æ1 Xiö, iiçX÷içXX2÷ èiø èi iø V=Xu=æui ö, iiiçXu÷ èiiø æu2 VV¢= i (u Xu)= i Xu2ö i i . æu ö iiçXu÷i iiçXu2X2u2÷ èXiø and Σ =E(VV¢) vii æ E(u2) E(Xu2)ö =çiii÷ E(Xu2) E(X2u2) èii iiø = æ var(ui ) cov( X i ui , ui ) ö . çcov(Xu,u) var(Xu) ÷ èiii iiø In the above equation, the third equality has used the fact that 𝐸(𝑢$|𝑋$) = 0 so 𝐸(𝑢$) = 𝐸[𝐸(𝑢$|𝑋$)] = 0 𝐸(𝑋$𝑢$) = 𝐸[𝑋$𝐸(𝑢$|𝑋$)] = 0 𝐸A𝑢5B = 𝑣𝑎𝑟(𝑢 ) + [𝐸(𝑢 )]5 = 𝑣𝑎𝑟(𝑢 ) $$$$ 𝐸A𝑋5𝑢5B=𝑣𝑎𝑟(𝑋𝑢)+[𝐸(𝑋𝑢)]5 =𝑣𝑎𝑟(𝑋𝑢) $$$$$$$$ 𝐸A𝑋 𝑢5B = 𝑐𝑜𝑣(𝑋 𝑢 , 𝑢 ) + 𝐸(𝑋 𝑢 )𝐸(𝑢 ) = 𝑐𝑜𝑣(𝑋 𝑢 , 𝑢 ) $$$$$$$$$$$ The inverse of QX is æ1 μö-1 1 æE(X2)-μö Q-1= X = i x. X çμ E(X2)÷ E(X2)-μ2 ç -μ 1 ÷ èXiøiXèXø We now can calculate the large-sample covariance matrix of 𝛽@, conditional on X, from cov(Xu, u)öæE(X2) -μ ö ii i÷ç i X÷. var(Xiui) øè -μX 1 ø = 1 ân𝐸A𝑋5Bo5𝑣𝑎𝑟(𝑢)−2𝐸A𝑋5B𝜇 𝑐𝑜𝑣(𝑋𝑢,𝑢)+𝜇5𝑣𝑎𝑟(𝑋𝑢)ä 𝑛n𝐸A𝑋5B−𝜇5o5 $ $ $V $$$ V $$ = å =1Q-1åQ-1 ˆ XvX b =1 n n[E(X2)-μ2 ]2 iX æE(X2) -μ öæ var(u) ́ç i X÷ç i è -μX 1 øècov(Xiui,ui) The (2, 2) element of ΣÉ[ is $V 1 𝑣𝑎𝑟n𝐸A𝑋5B𝑢 −𝜇 𝑋𝑢o 𝑛n𝐸A𝑋5B−𝜇5o5 $ $ V $ $ $V n𝐸A𝑋5Bo5 𝜇 = $ 𝑣𝑎𝑟ã𝑢$ − V 𝑋$𝑢$å 5 55 𝐸A𝑋5B 𝑛n𝐸A𝑋$ B−𝜇Vo $ $ = 5 5 𝑣𝑎𝑟ãç1− V 𝑋$é𝑢$å 𝐸A𝑋5B 1 𝜇5 𝑛ã1− 𝜇V5 å 𝐸A𝑋$ B = 𝑣𝑎𝑟(𝐻$𝑢$) 𝑛𝐸A𝐻5B $ by defining $ The denominator in the last equality for the (2, 2) element of ΣÉ[ has used the facts that 𝐻$=1− 𝜇V 𝑋$ 𝐸A𝑋5B so 𝜇5 𝜇5 2𝜇 𝐻5=ç1− V 𝑋é=1+ V 𝑋5− V𝑋 $ 𝐸A𝑋5B $ 5 5 $ 𝐸A𝑋5B $ 𝐸A𝐻5B = 1 + $ V 𝐸A𝑋5B − $ V 𝜇 = 1 − V n𝐸A𝑋$ Bo $ 𝜇5 2𝜇 𝜇5 5 5 n𝐸A𝑋$ Bo $ 𝐸A𝑋5B V $ 𝐸A𝑋5B $