Regression
(Module 5)
Statistics (MAST20005) & Elements of Statistics (MAST90058)
School of Mathematics and Statistics University of Melbourne
Copyright By PowCoder代写 加微信 powcoder
Semester 2, 2022
Aims of this module
• Introduce the concept of regression
• Show a simple model for studying the relationship between two
• Discuss correlation and how it relates to regression
Introduction
Regression
Simple linear regression
Point estimation of the mean Interlude: Analysis of variance Point estimation of the variance Standard errors of the estimates Confidence intervals
Prediction intervals R examples
Model checking
Further regression models Correlation
Definitions Point estimation
Relationship to regression
Confidence interval
Relationships between two variables
We have studied how to do estimation for some simple scenarios:
• iid samples from a single distribution (Xi)
• comparing iid samples from two different distributions (Xi & Yj) • differences between paired measurements (Xi − Yi)
We now consider how to analyse bivariate data more generally, i.e. two variables, X and Y , measured at the same time, i.e. as a pair.
The data consist of pairs of data points, (xi,yi). These can be visualised using a scatter plot.
Example data
1.80 9.18 1.40 7.66 2.10 6.33 0.30 4.51 3.60 14.04 0.70 4.94 1.10 4.24 2.10 8.19 0.90 4.55 3.80 11.57
n = 10 5 of 73
Introduction
Regression
Simple linear regression
Point estimation of the mean Interlude: Analysis of variance Point estimation of the variance Standard errors of the estimates Confidence intervals
Prediction intervals
R examples
Model checking
Further regression models Correlation
Definitions Point estimation
Relationship to regression
Confidence interval
Regression
Often interested in how Y depends on X. For example, we might want to use X to predict Y .
In such a setting, we will assume that the X values are known and fixed (henceforth, x instead of X), and look at how Y varies given x.
Example: Y is a student’s final mark for Statistics, and x is their mark for the prerequisite subject Probability. Does x help to predict Y ?
The regression of Y on x is the conditional mean, E(Y | x) = μ(x). The regression can take any form. We consider simple linear
regression, which has the form of a straight line:
E(Y |x)=α+βx and var(Y |x)=σ2.
Example: simple linear regression model
E(Y |x)=α+βx var(Y | x) = σ2
Terminology
• Y is called a response variable. Can also be called an outcome or target variable. Please do not call it the ‘dependent’ variable.
• x is called a predictor variable. Can also be called an explanatory variable. Please do not call it an ‘independent’ variable.
• μ(x) is called the (linear) predictor function or sometimes the regression curve or the model equation.
• The parameters in the predictor function are called regression coefficients.
Why ‘regression’?
It is strange terminology, but it has stuck.
Refers to the idea of ‘regression to the mean’:
if a variable is extreme on its first measurement, it will tend to be closer to the average on its second measurement, and vice versa.
First described by Sir Francis Galton when studying the inheritance of height between fathers and sons. In doing so, he invented the technique of simple linear regression.
A regression model is called linear if it is linear in the coefficients. It doesn’t have to define a straight line!
Complex and non-linear functions of x are allowed, as long as the resulting predictor function is a linear combination (i.e. an additive function) of them, with the coefficients ‘out the front’.
For example, the following are linear models:
μ(x) = α + βx + γx2 μ(x) = α + β
μ(x) = α sin x + β log x
The following are NOT linear models:
μ(x) = α sin(βx) μ(x) = α
1+βx μ(x) = αxβ
. . . but the last one can be re-expressed as a linear model on a log scale (by taking logs of both sides),
μ∗(x) = α∗ + β log x
Introduction Regression
Simple linear regression
Point estimation of the mean Interlude: Analysis of variance Point estimation of the variance Standard errors of the estimates Confidence intervals
Prediction intervals R examples
Model checking
Further regression models Correlation
Definitions Point estimation
Relationship to regression
Confidence interval
Estimation goals
Back to our simple linear regression model:
E(Y |x)=α+βx and var(Y |x)=σ2.
• We wish to estimate the slope (β), the intercept (α), the variance of the errors (σ2), their standard errors and construct confidence intervals for these quantities.
• Often want to use the fitted model to make predictions about future observations (i.e. predict Y for a new x).
• Note: the Yi are not iid. They are independent but have different means, since they depend on xi.
• We have not (yet) assumed any specific distribution for Y , only a conditional mean and variance.
Reparameterisation
Changing our model slightly. . . Let α0 = α + βx ̄, which gives:
E(Y |x)=α+βx
= α 0 + β ( x − x ̄ )
Now our model is in terms of α0 and β.
This will make calculations and proofs simpler.
Least squares estimation
Choose α0 and β to minimize the sum of squared deviations:
H(α0,β)=(yi −α0 −β(xi −x ̄))2
Solve this by finding the partial derivatives and setting to zero:
∂ H ( α 0 , β ) n ∂α =2
[yi −α0 −β(xi −x ̄)](−1)
[yi −α0 −β(xi −x ̄)](−(xi −x ̄))
0 i=1 ∂ H ( α 0 , β ) n
These are called the normal equations.
Least squares estimators
Some algebra yields the least square estimators,
̄ˆ Another expression for βˆ is:
ni=1(xi − x ̄)Yi αˆ0=Y, β=ni=1(xi−x ̄)2.
ni=1(xi −x ̄)(Yi −Y ̄) β = ni=1(xi − x ̄)2 .
These are equivalent, due to the following result:
(xi − x ̄)(Yi − Y ̄ ) = (xi − x ̄)Yi.
Can also then get an estimator for α:
αˆ = αˆ 0 − βˆ x ̄
= Y ̄ − βˆx ̄.
And also an estimator for the predictor function,
μˆ ( x ) = αˆ + βˆ x
= αˆ 0 + βˆ ( x − x ̄ )
= Y ̄ + βˆ ( x − x ̄ ) .
Ordinary least squares
This method is sometimes called ordinary least squares or OLS. Other variants of least squares estimation exist, with different names.
For example, ‘weighted least squares’.
Example: least squares estimates
For our data:
The fitted model equation is then:
μˆ(x) = 2.91 + 2.59x
x ̄ = 1.78
y ̄ = 7.52 = αˆ0
αˆ = 2.91 βˆ = 2.59
> rbind(y, x)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
y 9.18 7.66 6.33 4.51 14.04 4.94 4.24 8.19 4.55 11.57
x 1.80 1.40 2.10 0.30 3.60 0.70 1.10 2.10 0.90 3.80
> model1 <- lm(y ~ x)
lm(formula = y ~ x)
Coefficients:
(Intercept) x
2.911 2.590
Properties of these estimators
What do we know about these estimators? They are all linear combinations of the Yi,
n 1 αˆ0= nYi
ˆ n xi−x ̄
β= KYi i=1
where K = ni=1(xi − x ̄)2.
This allows us to easily calculate means and variances.
E(αˆ0)=E(Y)= n ˆ n(xi−x ̄)
E(Yi)= n [α0 +β(xi −x ̄)]=α0 i=1
E(Yi) = K (xi −x ̄)(α0 +(xi −x ̄)β)
i=1 ( x i − x ̄ ) α 0 + K β = β
This also implies, E(αˆ) = α and E(μˆ(x)) = μ(x), and so we have
that all of the estimators are unbiased.
Variances?
var(αˆ0) = var(Y ) = n2
n (xi−x ̄) n xi−x ̄2
var(βˆ)=var Yi =
i=1 K i=1 K
var(Yi) (xi −x ̄)2 σ2
var(Yi) = n
(xi −x ̄)2 var(Yi) = K2 1n 1
(xi −x ̄)2 = K2σ2K
Similarly,
1 x ̄22 var(αˆ)= n+K σ
cov(αˆ0,βˆ) = 0
1 (x−x ̄)2 var(μˆ(x))= n+ K σ2
Can we get their standard errors? We need an estimate of σ2.
Analysis of variance: iid model
For Xi ∼ N(μ, σ2) iid,
( X i − μ ) 2 = ( X i − X ̄ ) 2 + n ( X ̄ − μ ) 2
Analysis of variance: regression model
(Yi − α0 − β(xi − x ̄))2
= (Yi − αˆ0 − βˆ(xi − x ̄) + αˆ0 + βˆ(xi − x ̄) − α0 − β(xi − x ̄))2 i=1
= (Yi − αˆ0 − βˆ(xi − x ̄) + (αˆ0 − α0) + (βˆ − β)(xi − x ̄))2
= (Yi − αˆ0 − βˆ(xi − x ̄))2 + n(αˆ0 − α0)2 + K(βˆ − β)2 i=1
Note that the cross-terms disappear. Let’s see...
The cross-terms. . .
t1 = 2(Yi − αˆ0 − βˆ(xi − x ̄))(αˆ0 − α0)
t2 = 2(Yi −αˆ0 −βˆ(xi −x ̄))(βˆ−β)(xi −x ̄) i=1
t3 = 2 (xi − x ̄)(βˆ − β)(αˆ0 − α0)
Since ni=1(xi − x ̄) = 0 and ni=1(Yi − αˆ0) = ni=1(Yi − Y ̄ ) = 0,
the first and third cross-terms are easily shown to be zero.
For the second term,
2 =(Yi −Y ̄)(xi −x ̄)−βˆ(xi −x ̄)2
2(βˆ−β) i=1 i=1 n
= (Yi − Y ̄ )(xi − x ̄) − βˆK i=1
=Yi(xi −x ̄)−Yi(xi −x ̄)
i=1 i=1 =0
Therefore, all the cross-terms are zero.
Back to the analysis of variance formula. . .
(Yi − α0 − β(xi − x ̄))2
= (Yi − αˆ0 − βˆ(xi − x ̄))2 + n(αˆ0 − α0)2 + K(βˆ − β)2 i=1
Taking expectations gives,
nσ2 = E(D2) + σ2 + σ2 ⇒ E(D2) = (n − 2)σ2
D2 = (Yi − αˆ0 − βˆ(xi − x ̄))2.
Variance estimator
Based on these results, we have an unbiased estimator of the variance,
σˆ2= 1 D2. n−2
The inferred mean for each observation is called its fitted value, Yˆ i = αˆ 0 + βˆ ( x i − x ̄ ) .
The deviation from each fitted value is called a residual, Ri = Yi − Yˆi. The variance estimator is based on the sum of squared residuals,
D2 = ni=1 Ri2. 32 of 73
Example: variance estimate
For our data:
d2 = 16.12 σˆ2 = 2.015 σˆ = 1.42
Standard errors
We can subsitute σˆ2 into the formulae for the standard deviation of the estimators in order to calculate standard errors.
For example,
var(βˆ) = σ2 K
⇒ se(β)=√ K
Example: standard errors
For our data:
se(αˆ0) = √n = 0.449
se(β) = √ 22
se(μˆ(x))=σˆ 1+(x−x ̄) =1.42× 1 +(x−1.78) n K 10 12.34
Maximum likelihood estimation
Want to also construct confidence intervals. This requires further assumptions about the population distribution.
Let’s assume a normal distribution:
Yi ∼N(α+βxi,σ2).
Alternative notation (commonly used for regression/linear models):
Yi =α+βxi +εi, where εi ∼N(0,σ2). Let’s maximise the likelihood. . .
Since the Yi’s are independent, the likelihood is:
L(α,β,σ )= =
ni=1(yi − α0 − β(xi − x ̄))2
(yi−α−βxi)2
1 n √2πσ2
(yi − α0 − β(xi − x ̄))2
− ln L(α, β, σ2) = 2 ln(2πσ2) + 2σ2 =nln(2πσ2)+ 1 H(α0,β)
The α0 and β that maximise the likelihood (minimise the log-likelihood) are the same as those that minimise the sum of squares, H.
The OLS estimates are the same as the MLEs!
What about σ2?
Differentiate by σ, set to zero, solve. . .
σˆ2 =1D2 MLE n
This is biased. Prefer to use the previous, unbiased estimator,
σˆ2= 1 D2 n−2
Sampling distributions
The Y1, · · · , Yn are independent normally distributed random variables.
Except for σˆ2, our estimators are linear combinations of the Yi so will also have normal distributions, with mean and variance as previously derived.
For example,
ˆ σ2 β∼N β,K .
Moreover, we know αˆ0 and βˆ are independent, because they are bivariate normal rvs with zero covariance.
Using the analysis of variance decomposition (from earlier), we can
show that,
(n − 2)σˆ2 2
σ2 ∼ χn−2.
Therefore, we can define pivots for the various mean parameters. For example,
σˆ/ K μˆ(x) − μ(x)
σˆ1+(x−x ̄)2 ∼tn−2 nK
This allows us to construct confidence intervals.
Example: confidence itervals
For our data, a 95% CI for β is:
β±c√ =2.59±2.31×0.404=(1.66,3.52)
where c is the 0.975 quantile of tn−2.
A 95% CI for μ(3) is:
μˆ(3) ± c × se(μˆ(3)) = 10.68 ± 2.31 × 0.667 = (9.14, 12.22)
Deriving prediction intervals
Use the same trick as we used for the simple model,
Y∗ ∼Nμ(x∗),σ2
1(x∗−x ̄)2
μˆ(x∗)∼Nμ(x∗),n+ K σ2 1 (x∗−x ̄)2
Y∗−μˆ(x∗)∼N 0, 1+n+ K σ2
A 95% PI for Y ∗ is given by:
μˆ ( x ∗ ) ± c σˆ 1 + n + K
1 (x∗−x ̄)2
Example: prediction interval
A 95% PI for Y ∗ corresponding to x∗ = 3 is: 2
10.68±2.31×1.42× 1+ 1 +(3−1.78) =(7.06,14.30) 10 12.34
Much wider than the corresponding CI, as we’ve seen previously.
> model1 <- lm(y ~ x)
> summary(model1)
lm(formula = y ~ x)
Residuals:
Min 1Q Median 3Q Max
-2.01970 -1.05963 0.02808 1.04774 1.80580
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.9114 0.8479 3.434 0.008908 **
x 2.5897 0.4041 6.408 0.000207 ***
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 1.419 on 8 degrees of freedom
Multiple R-squared: 0.8369, Adjusted R-squared: 0.8166
F-statistic: 41.06 on 1 and 8 DF, p-value: 0.0002074
> # Confidence intervals for mean parameters
> confint(model1)
2.5 % 97.5 %
(Intercept) 0.9560629 4.866703
x 1.6577220 3.521623
> # Data to use for prediction.
> data2 <- data.frame(x = 3)
> # Confidence interval for mu(3).
> predict(model1, newdata = data2, interval = “confidence”)
fit lwr upr
1 10.6804 9.142823 12.21798
> # Prediction interval for y when x = 3.
> predict(model1, newdata = data2, interval = “prediction”)
fit lwr upr
1 10.6804 7.064 14.2968
R example explained
• The lm (linear model) command fits the model.
• model1 is an object that contains all the results of the regression
needed for later calculations.
• summary(model1) acts on model1 and summarizes the regression.
• predict can calculate CIs and PIs.
• R provides more detail than we need at the moment. Much of the output relates to hypothesis testing that we will get to later.
Plot data and fitted model
> plot(x, y, col = “blue”)
> abline(model1, col = “blue”)
The command abline(model1) adds the fitted line to a plot.
0.5 1.0 1.5 2.0 2.5 3.0 3.5
4 6 8 10 12 14
Fitted values and CIs for their means
> predict(model1, interval = “confidence”)
fit lwr upr
1 7.572793 6.537531 8.608056
2 6.536924 5.442924 7.630925
3 8.349695 7.272496 9.426895
4 3.688285 1.963799 5.412771
5 12.234204 10.247160 14.221248
6 4.724154 3.280382 6.167925
7 5.760023 4.546338 6.973707
8 8.349695 7.272496 9.426895
9 5.242088 3.921478 6.562699
10 12.752138 10.603796 14.900481
Confidence band for the mean
> data3 <- data.frame(x = seq(-1, 5, 0.05))
> y.conf <- predict(model1, data3, interval = "confidence")
> head(cbind(data3, y.conf))
x fit lwr upr
1 -1.00 0.3217104 -2.468232 3.111653
2 -0.95 0.4511941 -2.295531 3.197919
3 -0.90 0.5806777 -2.122943 3.284298
4 -0.85 0.7101613 -1.950472 3.370794
5 -0.80 0.8396449 -1.778124 3.457414
6 -0.75 0.9691286 -1.605906 3.544164
> matplot(data3$x, y.conf, type = “l”, lty = c(1, 2, 2),
+ lwd = 2, xlab = “x”, ylab = “y”)
> points(x, y, col = “blue”)
−1 0 1 2 3 4 5
Prediction bands for new observations
> y.pred <- predict(model1, data3, interval = "prediction")
> head(cbind(data3, y.pred))
x fit lwr upr
1 -1.00 0.3217104 -3.979218 4.622639
2 -0.95 0.4511941 -3.821827 4.724215
3 -0.90 0.5806777 -3.664763 4.826119
4 -0.85 0.7101613 -3.508034 4.928357
5 -0.80 0.8396449 -3.351646 5.030936
6 -0.75 0.9691286 -3.195606 5.133863
> matplot(data3$x, y.pred, type = “l”, lty = c(1, 3, 3),
+ lwd = 2, xlab = “x”, ylab = “y”)
> points(x, y, col = “blue”)
−1 0 1 2 3 4 5
0 5 10 15 20
Both bands plotted together
> matplot(data3$x, cbind(y.conf, y.pred[, -1]),
+ type = “l”, lwd = 2, lty = c(1, 2, 2, 3, 3),
+ xlab = “x”, ylab = “y”)
> points(x, y)
−1 0 1 2 3 4 5
0 5 10 15 20
Checking our assumptions
What modelling assumptions have we made?
• Linear model for the mean
• Equal variances for all observations (homoscedasticity) • Normally distributed residuals
Ways to check these:
• Plot the data and fitted model together (done!) • Plot residuals vs fitted values
• QQ plot of the residuals
In R, the last two of these are very easy to do:
> plot(model1, 1:2)
Residuals vs Fitted
Normal Q−Q
4 6 8 10 12
Fitted values
−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5
Theoretical Quantiles
−2 −1 0 1 2
Standardized residuals
−1.5 −0.5 0.0 0.5 1.0 1.5
Introduction Regression
Simple linear regression
Point estimation of the mean Interlude: Analysis of variance Point estimation of the variance Standard errors of the estimates Confidence intervals
Prediction intervals R examples
Model checking
Further regression models
Correlation Definitions
Point estimation
Relationship to regression
Confidence interval
Multiple regression
• What if we have more than one predictor?
• Observe xi1,…,xik as well as yi (for each i)
• Can fit a multiple regression model:
E(Y |x1,…,xk)=β0 +β1×1 +β2×2 +···+βkxk
• This is linear in the coefficients, so is still a linear model
• Fit by method of least squares by minimising:
H =(yi −β0 −β1xi1 −β2xi2 −···−βkxik)2
• Take partial derivatives, etc., and solve for β0, . . . , βk.
• The subject Linear Statistical Models (MAST30025) looks into
these types of models in much more detail.
Two-sample problem
• The two-sample problem can be expressed as a linear model!
• Sample Y1,…,Yn ∼ N(μ1,σ2) and Yn+1,…,Yn+m ∼ N(μ2,σ2).
• Define indicator variables (xi1, xi2) where (xi1, xi2) = (1, 0) for i = 1, . . . , n and (xi1, xi2) = (0, 1) for i = n + 1, . . . , n + m.
• Observed data: (yi, xi1, xi2)
• ThenY1,…,Yn eachhavemean1×β1+0×β2 =μ1 and
Yn+1,…,Yn+m eachhavemean0×β1+1×β2 =μ2.
• This is in the form a multiple regression model.
• The general linear model unifies many different types of models together into a common framework. The subject MAST30025 covers this in more detail.
Introduction Regression
Simple linear regression
Point estimation of the mean Interlude: Analysis of variance Point estimation of the variance Standard errors of the estimates Confidence intervals
Prediction intervals R examples
Model checking
Further regression models
Correlation Definitions
Point estimation
Relationship to regression
Confidence interval
Correlation coefficient
(Revision) for two rvs X and Y , the correlation coefficient, or simply the correlation, is defined as:
cov(X,Y) σXY ρ=ρXY =√varXvarY =σXσY
This is a quantitative measure of the strength of relationship, or association, between X and Y .
We will now consider inference on ρ, based on an iid sample of pairs (Xi,Yi).
Note: unlike in regression, X is now considered as a random variable.
# 1.031 0.863 0.734 -0.133 -0.278 1.756 0.582 -0.462 -0.195 -0.221
# -0.727 -1.439 -0.820 -0.893 0.908 -0.466 -0.457 -1.640 -0.306 -1.369 # 0.494 0.110 0.570 -0.090 -0.790 -0.562 1.680 2.464 2.094 2.971
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
-0.716 -0.462
-0.458 0.518
0.805 -0.439 -0.916 -0.631 -0.793 -0.004 -0.008 -0.403 -1.042 -1.219
0.060 -1.333 -0.670 -0.303 1.954
0.295 1.157 -1.073 0.025 -0.629
0.478 -0.831 -0.512 0.051 -0.285
-0.383 0.544 -0.263 0.789 1.088
-0.187 0.434 -1.026 1.315 -1.015
0.531 0.819 -0.073 0.300 1.360
-0.210 -0.680
-1.466 0.362 0.428 -0.023 -0.752 -0.450 0.425 -0.538 -0.24953 -0.8755389 -0.38459 -0.552 0.936 1.030 -0.754
-0.271 0.016
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com