程序代写代做代考 Sta$s$cal Inference STAT 431

Sta$s$cal Inference STAT 431
Lecture 12: Simple Regression (II) Probabilis$c Model and Basic Inferences

Review of Last Lecture
• Simple regression summarizes the rela$onship between a predictor and a response
• The LS regression line gives a linear equa$on for the rela$onship
ˆ=βˆ +βˆ
βˆ =·, βˆ = ̄−βˆ ̄
• Transforma$on to new coordinates allows LS regression to capture nonlinear trend as well (Tukey’s bulging rule)
STAT 431


A Probabilis$c Model for Simple Regression
Yi =β0 +β1xi +￿i, i=1,…,n Signal Noise
￿i values are noises (errors) sa$sfying the following assump$ons
1. Independence:￿i’saremutuallyindependentrandomvariables
2. Homoscedas/city:￿i’shavecommonmean0,andcommonvarianceσ2
3. Normality:The￿’sarenormallydistributed i
The model assumes that there exists a large or infinite (possibly hypothe$cal) popula$on

STAT 431

Normality Assump$on
i.i.d. 2 Pricei =β0 +β1 ×Sqfti +￿i, where ￿i ∼ N(0,σ )
STAT 431
Subpopula$ons

Parameter Es$ma$on
• Least square es$mators for β0 and β1 (what are their units?)
sy ￿ni=1(xi −x ̄)(Yi −Y ̄) β1=r·s= ￿n(x−x ̄)2
ˆ
βˆ = Y ̄ − βˆ x ̄
x i=1i 01
• Terminology
– FiSedvalues: Yˆ =βˆ +βˆx, i=1,…,n
i01i
– Residuals: E =Y −Yˆ, i=1,…,n
iii
– Errorsumofsquares(SSE):
• Es$mator for σ2 ￿n E2 SSE
n−2 n−2
– S is called root mean square error (RMSE), or residual standard error
STAT 431
￿n
S S E = E i2
i=1 S2= i=1 i=

Sampling Distribu$ons • Sampling distribu$o￿ns of βˆ and βˆ [deriva$on in class]
– MeanandSDofβˆ andβˆ ￿￿ 01
01 – DefineSxx= ni=1(xi−x ̄)2
x2i nSxx
E(βˆ)=β, SD(βˆ)=√σ
1 1 1 Sxx
E(βˆ)=β, SD(βˆ)=σ 000
– Normality
SD(βˆ )
• Sampling distribu$on of S2 : (n − 2)S2 SSE 2
βˆ − β
0 0 ∼N(0,1),
βˆ − β
1 1 ∼N(0,1)
SD(βˆ ) 01
σ2 = σ2 ∼χn−2
– Important fact: S2 is independent of both βˆ and βˆ 01
STAT 431

Inferences for Regression Coefficients
• Typically, we do not know σ , so SD(βˆ ) and SD(βˆ ) are es$mated by
SE(βˆ)=S￿￿x2i, SE(βˆ)=√S
0 nSxx 1 Sxx
• Since S2 is independent of both βˆ and βˆ , we obtain 01
βˆ − β
0 0 ∼tn−2,
βˆ − β
1 1 ∼tn−2
01
SE(βˆ ) 01
SE(βˆ ) Pivotal R.V.’s
• Based on these pivotal r.v.’s, we can use the t distribu$on to construct tes$ng procedures and CI’s for β0 and β1
STAT 431

• 100(1−α)%CIfor β0 and β1
βˆ ±t SE(βˆ), βˆ ±t
SE(βˆ) 0 n−2,α/2 0 1 n−2,α/2 1
• Tes$ng H0 :β1 =β10 vs.H1 :β1 ￿=β10 atlevel α
Reject H0 if |t| = 1 1
> tn−2,α/2 • Case of par$cular interest: H0 : β1 = 0vs.H1 : β1 ￿= 0
– Teststa$s$c βˆ t=1
| βˆ − β 0 |
S E ( βˆ ) 1
– Reject H0 if |t| > tn−2,α/2
S E ( βˆ ) 1
STAT 431

Linear Regression in R
> regmodel <- lm(Price/1000 ~ Sqft., data = newton) > summary(regmodel)
Call:
lm(formula = Price/1000 ~ Sqft., data = newton)
Residuals:
Min 1Q Median 3Q Max
-445.09 -125.97 36.45 107.27 281.39
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -25.96758 54.77713 -0.474 0.638
Sqft. 0.35607 0.02152 16.549 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 180.7 on 44 degrees of freedom Multiple R-squared: 0.8616, Adjusted R-squared: 0.8584 F-statistic: 273.9 on 1 and 44 DF, p-value: < 2.2e-16 STAT 431 Analysis of Variance for Simple Regression • Three sums of squares ￿ – Total sum of squares (SST): SST = (yi − y ̄)2 • Measure varia$ons of the yi’s around y ̄ ￿ – Regression sum of squares (SSR): SSR = (yˆ − y ̄)2 • Represents varia$ons in the respon￿ses that can be explained by the predictor – Error sum of squares (SSE): SSE = (y − yˆ )2 • Coefficient of determina$on: r2(R2) = SSR/SST (= squared correla$on coef.) • Pythagorean theorem: SST = SSR + SSE (Geometric representa$on?) Degrees of freedom: n-1 1 n-2 (Why?) • Fact: When β1 = 0, we have F = SSR/1 ∼ F1,n−2 ii i SSE/(n − 2) STAT 431 • Key points of this class – Modelingassump$onsinsimpleregression – Terminologyforregressionanalysis – Es$matorsofthethreeparameters • Sampling distribu$ons • Independence – Pivotalrandomvariablesforinferenceoftheinterceptandtheslope – Threesumsofsquares • Reading parts of Sec$ons 10.1—10.3 of the textbook • Next class: Simple Regression (III) (parts of Ch.10.3) STAT 431