BIA 652
Simple + Multiple Linear Regression
Simple Regression
2
Introduction to Regression Analysis
• Regression analysis is used to:
– Predict the value of a dependent variable based
on the value of at least one independent variable
– Explain the impact of changes in an independent variable on the dependent variable
Dependent variable: the variable we wish to explain (also called the endogenous variable)
Independent variable: the variable used to explain the dependent variable
(also called the exogenous variable)
Copyright © 2013 Pearson
Education, Inc. Publishing as Ch. 11-3 Prentice Hall
Aims
• Describe the relationship between an independent variable X, and a continuous dependent variable Y as a straight line in R2
– Two Cases:
• Fixed X: values of X are preselected by investigator • Variable X: a random sample of (X,Y) pairs
• Draw inferences regarding the relationship
• Predict the value of Y for a given X
4
Simple Linear Regression Model
The population regression model:
Population Y intercept
Population Slope Coefficient
Independent Variable
Random Error term
Dependent Variable
yi β0 β1xi εi
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
Linear component
Ch. 11-5
Random Error component
Linear Regression Assumptions
• The true relationship form is linear (Y is a linear function of X, plus random error)
• The error terms, εi are independent of the x values
• The error terms are random variables with mean 0 and
constant variance, σ2
(the uniform variance property is called homoscedasticity)
• The random error terms, εi, are not correlated with one another, so that
E[ε]0 and E[ε2]σ2 for(i1,,n) ii
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
E[εiεj]0
forall ij Ch. 11-6
Graphically (p 85)
7
Y
Observed Value of Y for xi
Predicted Value of Y for xi
Intercept = β0
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
εi
Slope = β1
Simple Linear Regression Model
(continued)
YββXε i01ii
Random Error for this Xi value
xi
X
Ch. 11-8
α and β (p 86)
9
Estimated (or predicted) y value for observation i
Estimate of the regression intercept
Estimate of the regression slope
Value of x for observation i
Simple Linear Regression Equation
The simple linear regression equation provides an estimate of the population regression line
ˆ
y b bx i01i
The individual random error terms ei have a mean of zero
Copyright © 2013 Pearson
Education, Inc. Publishing as Ch. 11-10 Prentice Hall
ˆ e(y-y)y-(b bx)
iiii01i
11.3
Least Squares Coefficient Estimators
• b0 and b1 are obtained by finding the values of b0 and b1 that minimize the sum of the squared residuals (errors), SSE:
n min SSEmine2
i i1
ˆ
min (yy)2
ii
min[y(b bx)]2 i01i
Differential calculus is used to obtain the coefficient
Copyright © 2013 Pearson estimators b0 and b1 that minimize SSE Education, Inc. Publishing as Ch. 11-11
Prentice Hall
11.6
Prediction
• The regression equation can be used to predict a value for y, given a particular x
• For a specified value, xn+1 , the predicted value is
Copyright © 2013 Pearson
Education, Inc. Publishing as Ch. 11-12 Prentice Hall
ˆ
y bbx n1 0 1 n1
Least Squares Coefficient Estimators
• The slope coefficient estimator is
(continued)
n
(x x)(y y) Cov(x,y) sy ii
x
b1 i1 r
n s2s
(x x)2 x
i i1
• And the constant or y-intercept is b0 yb1x
• The regression line always goes through the mean x, y
Copyright © 2013 Pearson
Education, Inc. Publishing as Ch. 11-13 Prentice Hall
Y
yi
_ SST = (yi – y)2
y
_ y
(continued)
y
Explained variation
Analysis of Variance
Unexplained variation
SSE = (yi – yi )2
_ SSR = (yi – y)2
_
y
X
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
xi
Ch. 11-14
11.4
Explanatory Power of a Linear Regression Equation
• Total variation is made up of two parts: SST SSR SSE
Total Sum of Squares
SST (y y)2 i
Regression Sum of Squares
SSR
ˆ
where:
y = Average value of the dependent variable
(y y)2 i
SSE
ˆ
Error (residual) Sum of Squares
(y y )2 ii
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
yi = Observed values of the dependent variable
yˆ = Predicted value of y for the given x value i Ch. 11-15 i
Proof
16
Hypothesis Test for Population Slope Using the F Distribution
• F Test statistic:
where
F MSR MSE
MSR SSR k
MSE SSE nk1
where F follows an F distribution with k numerator and (n – k – 1) denominator degrees of freedom
(k = the number of independent variables in the regression model)
Copyright © 2013 Pearson
Education, Inc. Publishing as Ch. 11-17 Prentice Hall
Results:
Computer Analysis
• estimates of slope (β1) and intercept (β0 ), using least squares
• residual mean square = estimate of variance ( S2 ) • testifβ = β0
• Usually, test β = 0, i.e. X has no effect on Y
18
Hypothesis Test for Population Slope
Using the F Distribution
(continued)
• An alternate test for the hypothesis that the slope is zero:
• Use the F statistic
• The decision rule is
reject H0 if F ≥ F1,n-2,α
Copyright © 2013 Pearson
Education, Inc. Publishing as Ch. 11-19 Prentice Hall
H0:β1 =0 H1: β1 0
F MSR SSR MSE s2e
Steps in Simple Regression
1. State the research hypothesis.
2. State the null hypothesis
3. Gather the data
4. Assess each variable separately first (obtain measures of central tendency and dispersion; frequency distributions; graphs); is the variable normally distributed?
5. Calculate the regression equation from the data
6. Calculate and examine appropriate measures of association and tests of statistical significance for each coefficient and for the equation as a whole
7. Accept or reject the null hypothesis
8. Reject or accept the research hypothesis
9. Explain the practical implications of the findings
20
Effect of Outliers (p 102)
21
Leverage
22
Influence Measures
• Cook’s distance: “distance” between B with and without the ith observation
• DFFITS: “distance” between Ŷ with and without the ith observation
23
Cook’s Distance
24
Influential observations
An observation is influential if:
– It is an outlier in X and Y
– Cook’s distance > F0.5(P+1, N-P-1)
2 𝑃+1 𝑁−𝑃 −1
Try analysis with and without influential observations and compare results.
– DFFITS >
25
11.4
Explanatory Power of a Linear Regression Equation
• Total variation is made up of two parts: SST SSR SSE
Total Sum of Squares
SST (y y)2 i
Regression Sum of Squares
SSR
ˆ
where:
y = Average value of the dependent variable
(y y)2 i
SSE
ˆ
Error (residual) Sum of Squares
(y y )2 ii
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
yi = Observed values of the dependent variable
yˆ = Predicted value of y for the given x value i Ch. 11-26 i
Confidence & Prediction Intervals • Confidence interval (CI) for mean of Y
• Prediction interval (PI) for individual Y PI is wider than CI
27
Confidence Interval for the Average Y, Given X
Confidence interval estimate for the expected value of y given a particular xi
Confidence interval for E(Y | X ) : n1 n1
1 (x x)2 ytsn1
ˆ
n1 n2,α/2 e n (x x)2
i
x)2 so the size of interval varies according to the distance xn+1 is
Notice that the formula involves the term (x
n1
from the mean, x
Copyright © 2013 Pearson
Education, Inc. Publishing as Ch. 11-28 Prentice Hall
Prediction Interval for an Individual Y, Given X
Confidence interval estimate for an actual observed value of y given a particular xi
Confidenceinterval for yn1 :
ˆ
ˆ
1 (x x)2
y t s 1 n1
n1 n2,α/2 e n (xx)2
i
This extra term adds to the interval width to reflect the added uncertainty for an individual case
Copyright © 2013 Pearson
Education, Inc. Publishing as Ch. 11-29 Prentice Hall
Estimating Mean Values and Predicting Individual Values
Goal: Form intervals around y to express uncertainty about the value of y for a given xi
Y
y
Confidence Interval for the expected value of y, given xi
y = b0+b1xi
Prediction Interval for
an single observed y,
given xi
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
Ch. 11-30
xi
X
Relevant Data Range
• When using a regression model for prediction, only predict within the relevant range of data
Relevant data range
450
400
350
300
250
200
150
100
50 0
Risky to try to extrapolate far beyond the range of observed x values
0 500
1000
1500 2000
2500 3000
Ch. 11-31
Copyright © 2013 Pearson
Education, Inc. Publishing as Square Feet Prentice Hall
House Price ($1000s)
Multiple Regression
Multiple Regression
33
Adjusted R-Sqr
34
VIF
35
VIF
36
37
Correlation Coefficient – ρ
• Correlation coefficient measures the strength of linear association between X and Y in the population (ρ).
• it is estimated by sample ( r )
38
11.7
Correlation Analysis
• Correlation analysis is used to measure strength of the association (linear relationship) between two variables
– Correlation is only concerned with strength of the relationship
– No causal effect is implied with correlation – Correlation was first presented in Chapter 4
Copyright © 2013 Pearson
Education, Inc. Publishing as Ch. 11-39 Prentice Hall
Correlation Analysis
• The population correlation coefficient is
denoted ρ (the Greek letter rho)
• The sample correlation coefficient is
r
sxy sxsy
where
sxy
(xi x)(yi y) n1
Copyright © 2013 Pearson
Education, Inc. Publishing as Ch. 11-40 Prentice Hall
•
Calculating the value of ρ
100 (1 – ρ2)1/2 = % of Standard Deviation NOT “explained” by X
σ2 = σ y 2 (1 – ρ2) => σ = σ y 1 − ρ2
=> ρ2 = σy 2−σ2 σy2
41
Graphically (p 92)
42
•
Calculating the value of ρ
100 (1 – ρ2)1/2 = % of Standard Deviation NOT “explained” by X
σ2 = σ y 2 (1 – ρ2) => σ = σ y 1 − ρ2
=> ρ2 = σy 2−σ2 σy2
43
Interpretation of ρ
• ρ2 = reduction in variance of Y associated with knowledge of X/original variance of Y
• 100ρ2 = % of variance of Y “explained by X” Caveat: correlation vs causation
44
Estimating the value of ρ (Pearson’s Correlation Coefficient)
ρ= σXY σXσY
r= SXY SXSY
SXY = (𝑋 − 𝑚(𝑋))(𝑌 − 𝑚(𝑌))/(𝑁 − 1) 45
Interpretation of ρ
ρ
% of variance “explained”
% of variance not “explained”
% of SD “explained”
% of SD not “explained”
±0.3
9%
91%
5%
95%
±0.5
25%
75%
13%
87%
±0.71
50%
50%
29%
71%
±0.95
90%
10%
69%
31%
46
Test for Zero Population Correlation
• To test the null hypothesis of no linear association,
H0 :ρ0
the test statistic follows the Student’s t
distribution with (n – 2 ) degrees of freedom:
t
r (n2) (1r2)
Copyright © 2013 Pearson
Education, Inc. Publishing as Ch. 11-47 Prentice Hall
Example from Text: Lung Function
• Data from an epidemiological study of households
•
– living in four areas with different amounts and types of air pollution (Appendix A)
Data only on non-smoking fathers
– X = height in inches
– Y = forced expiratory volume in 1 second (FEV1)
48
Scatter Plot (p 83)
49
Example Results
• Least Squares Equation: Y = -4.087 + 0.118X
• Correlation r = 0.504
• Test p = 0,
– t = 7.1 (p 94), ρ < 0.0001
– t test can be one or two sided
50
Analysis of Variance
• SST = total sum of squares
– Measures the variation of the yi values around their
mean, y
• SSR = regression sum of squares
– Explained variation attributable to the linear
relationship between x and y
• SSE = error sum of squares
– Variation attributable to factors other than the linear
relationship between x and y
Copyright © 2013 Pearson
Education, Inc. Publishing as Ch. 11-51 Prentice Hall
11.4
Explanatory Power of a Linear Regression Equation
• Total variation is made up of two parts: SST SSR SSE
Total Sum of Squares
SST (y y)2 i
Regression Sum of Squares
SSR
ˆ
where:
y = Average value of the dependent variable
(y y)2 i
SSE
ˆ
Error (residual) Sum of Squares
(y y )2 ii
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
yi = Observed values of the dependent variable
yˆ = Predicted value of y for the given x value i Ch. 11-52 i
Y
yi
_ SST = (yi - y)2
y
_ y
(continued)
y
Explained variation
Analysis of Variance
Unexplained variation
SSE = (yi - yi )2
_ SSR = (yi - y)2
_
y
X
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
xi
Ch. 11-53
Coefficient of Determination, R2
• The coefficient of determination is the portion of the total variation in the dependent variable that is explained by variation in the independent variable
• The coefficient of determination is also called R- squared and is denoted as R2
R2 SSRregressionsumofsquares SST total sum of squares
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
note:
0R2 1 Ch. 11-54
Correlation and R2
• The coefficient of determination, R2, for a simple regression is equal to the simple correlation squared
R2 r2
Copyright © 2013 Pearson
Education, Inc. Publishing as Ch. 11-55 Prentice Hall
Y
Examples of Approximate r2 Values
X
r2 = 1
Perfect linear relationship between X and Y:
100% of the variation in Y is explained by variation in X
Y
r2 = 1
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
Ch. 11-56
r2 = 1
X
Y
Examples of Approximate r2 Values
X
0 < r2 < 1
Weaker linear relationships between X and Y:
Some but not all of the variation in Y is explained by variation in X
Y
X
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
Ch. 11-57
Examples of Approximate r2 Values
Y
r2 = 0
No linear relationship between X and Y:
The value of Y does not depend on X. (None of the variation in Y is explained by variation in X)
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
Ch. 11-58
r2 = 0
X
Estimation of Model Error Variance
• An estimator for the variance of the population model error is
ˆ
SSE n2 n2
σ 2 s 2e i 1
n
e2 i
• Division by n – 2 instead of n – 1 is because the simple regression model uses two estimated parameters, b0 and b1, instead of one
is called the standard error of the estimate
Copyright © 2013 Pearson
Education, Inc. Publishing as Ch. 11-59 Prentice Hall
s s2 ee
Comparing Standard Errors
se is a measure of the variation of observed y values from the regression line
YY
small s X large s X ee
The magnitude of se should always be judged relative to the size of the y values in the sample data
Copyright © 2013 Pearson
Education, Inc. Publishing as Ch. 11-60 Prentice Hall
11.5
Statistical Inference: Hypothesis Tests and Confidence Intervals
• The variance of the regression slope coefficient (b1) is estimated by
s2s2e s2e
b1
(x x)2 (n1)s2 ix
where:
sb1 = Estimate of the standard error of the least squares slope
se
SSE n2
= Standard error of the estimate
Ch. 11-61
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
Example Results
• Least Squares Equation: Y = -4.087 + 0.118X
• Correlation r = 0.504
• Test p = 0,
– t = 7.1 (p 94), ρ < 0.0001
– t test can be one or two sided
62
Hypothesis Test for Population Slope Using the F Distribution
• F Test statistic:
where
F MSR MSE
MSR SSR k
MSE SSE nk1
where F follows an F distribution with k numerator and (n – k - 1) denominator degrees of freedom
(k = the number of independent variables in the regression model)
Copyright © 2013 Pearson
Education, Inc. Publishing as Ch. 11-63 Prentice Hall
Hypothesis Test for Population Slope
Using the F Distribution
(continued)
• An alternate test for the hypothesis that the slope is zero:
• Use the F statistic
• The decision rule is
reject H0 if F ≥ F1,n-2,α
Copyright © 2013 Pearson
Education, Inc. Publishing as Ch. 11-64 Prentice Hall
H0:β1 =0 H1: β1 0
F MSR SSR MSE s2e
ANOVA Overview
65
Test β = 0 • From ANOVA table: F – 50.5
– Gives 2-sided test, p-value < 0.0001 • One sided test is: t = F1/2 = 7.1 Same as test for ρ = 0
66
Outliers
• Outlier in Y is studentized (or deleted studentized) residual >2
• Leverage = h = 1 + 𝑋 −𝑋 2 𝑁 𝑋 −𝑋 2
– X’s far from the mean of X have large leverage (h)
– Observations with large leverage have large effect on
the slope of the line.
• Outlier in X if h > 4/N
67
Residual Analysis • Residual = e = Y – Ŷ
• Studentized residual = e/S(1 – h)1/2 – h called “leverage”
• Deleted studentized residual = studentized residual with observation for computing regression and S deleted.
68
Influential observations
An observation is influential if: – It is an outlier in X and Y
– Cook’s distance > F0.5(2,N-2)
22 𝑁−2
Try analysis with and without influential observations and compare results.
– DFFITS >
69
Observations
• Point 1 is an outlier in Y with low leverage – impacts estimate of intercept but not slope – Tends to increase the estimates of S & SE of B
• Point 2 has high leverage; not an outlier in Y – doesn’t impact estimate of B or A
• Point 3 has high leverage and is an outlier in Y – impacts the values of B, A, and S
70
Assumptions
• Homogeneity of variance (same σ2)
– Not extremely serious
– Can be achieved through transformations if necessary
• Normal residuals
– Slightdeparturesok
– Can use transformations to achieve it
• Randomness – Serious
– Can use hierarchical models for clustered samples
71
Checking Assumptions
• Plot residuals vs X or vs the predicted Y to check linearity and homogeneity of variance
• Create normal probability plots of residuals to check for normality
72
Residual Plots (p 98)
73
Transformations (p 105 )
74
Weighted Regression
• If σ2 are not equal, use weight for each residual in the sum of squares used in Least Squares process.
• Weight = 1/ σ2
• Gives unbiased estimate with smaller variance
75
Weighted Regression – Caveat
• Solution,, standardize weight (w) to add up to the sample size (N)
– e.g. N = 5, w = 4,1,8,2,4, sum of w = 19
– define standardized weight (sw) = w*5/19
– sum of sw = 5
– = 1.05 + .26 + 2.11 + .53 + 1.05 = 5
76
What to watch for
• Need representative sample
• Range of prediction should match observed range in X in sample
• Use of nominal or ordinal, rather than interval or ration data
• Errors in variables
• Correlation does not imply causation
• Violation of assumptions
• Influential points
• Appropriate model
77
Multiple Linear Regression
78
Keywords for OUTPUT Statement
Keyword
COOKD=names COVRATIO=names
DFFITS=names H=names
LCL=names
LCLM=names
PREDICTED | P=names PRESS=names
RESIDUAL | R=names
RSTUDENT=names
STDI=names
STDP=names
STDR=names STUDENT=names
UCL=names UCLM=names
Description
Cook’s influence statistic
standard influence of observation on covariance of betas
standard influence of observation on predicted value
leverage,
lower bound of a % confidence interval for an
individual prediction. This includes the variance of the error, as well as the variance of the parameter estimates.
lower bound of a % confidence interval for the expected value (mean) of the dependent variable
predicted values
th residual divided by , where is the leverage,
and where the model has been refit without the th observation
residuals, calculated as ACTUAL minus PREDICTED
a studentized residual with the current observation deleted standard error of the individual predicted value
standard error of the mean predicted value
standard error of the residual
studentized residuals, which are the residuals divided by their standard errors
upper bound of a % confidence interval for an individual prediction
upper bound of a % confidence interval for the expected value (mean) of the dependent variable
79
Aims
• Extend simple linear regression to multiple dependent variables.
• Describe a linear relationship between: – A single continuous Y variable, and
– Several X variables
• Draw inferences regarding the relationship
• Predict the value of Y from X1, X2, …, Xp.
• Research Questions: To what extent does some combination of the IVs predict the DV?
• E.g. To what extent does age, gender, type/amount of food consumption predict low density lipid level.
80
Assumptions
• Level of Measurement:
– IVs – two or more, Continuous or dichotomous – DV – continuous
• Sample Size – Enough cases per IV
• Linearity: Are bivariate relationships linear
• Constant Variance (about line of best fit) – Homoscedasticity
• Multicollinearity: Between the IVs
• Multivariateoutliers
• Normality of residuals about predicted value
81
Approaches
• Direct: All IVs entered simultaneously
• Forward: IVs entered one by one until there are no significant IVs to be entered.
• Backward: IVs removed one by one until there are no significant IVs to be removed.
• Stepwise: Combination of Forward and Backward
• Hierarchical: IVs entered in steps.
82
Write ups
• Assumptions: How tested, extent met
• Correlations: What are they, what conclusions • Regression coefficients: Report and interpret
• Conclusions and Caveats
83
Steps in Multiple Regression
1. State the research hypothesis.
2. State the null hypothesis
3. Gather the data
4. Assess each variable separately first (obtain measures of central tendency and dispersion; frequency distributions; graphs); is the variable normally distributed?
5. Assess the relationship of each independent variable, one at a time, with the dependent variable (calculate the correlation coefficient; obtain a scatter plot); are the two variables linearly related?
6. Assess the relationships between all of the independent variables with each other (obtain a correlation coefficient matrix for all the independent variables); are the independent variables too highly correlated with one another?
7. Calculate the regression equation from the data
8. Calculate and examine appropriate measures of association and tests of statistical significance for each coefficient and for the equation as a whole
9. Accept or reject the null hypothesis
10. Reject or accept the research hypothesis
11. Explain the practical implications of the findings
84
Example (p 121)
85
Example (p 122)
86
Mathematical Model • The mean of Y values at a given X is:
a+ b1X1 +β2X2 +…+βpXp
• Variance of Y values at any set of X’s is σ2
(For all X)
• Y values are normally distributed at each X (needed for inference)
87
Types of X (independent) variables
• Fixed: selected in advance
• Variable: as in most studies
• X’s can be continuous or discrete (categorical)
• X’s can be transformations of other X’s, e.g., polynomial regression.
88
Computer Analysis
• Estimates of: α, β1, β2, …, βp using least- squares.
• Residual mean square ( S2 ) is estimate of variance σ2
• Confidence intervals for mean of Y • Prediction intervals for individual Y
89
Example of Bonferroni
• Test 3 hypotheses
• P-values are: 0.014, 0.036, 0.075
• Let nominal significance level = 0.15 – ∴ first 2 are significant
• Bonferroni Adjusted p-values: multiply by 3, giving: 0.042, 0.108, 0.225
– Only first is significant
– Probablility of at rejecting at least 1 out of m hypotheses
90
Analysis of variance (p 132)
• Does regression plane help in predicting values of Y?
• Test hypothesis that all 𝛽I’s = 0
91
Example: Reg of FEV1 on height and weight (p 132)
• F = 36.81; df = 2, 147; p-value <0.0001
• Use percentile link from web site:
http://faculty.vassar.edu/lowry/tabs.html#f
92
Venn Diagrams
• Multiple R2
• Bivariate Correlation between IV1 and DV • Bivariate Correlation between IV2 and DV • Correlation between IV1 and IV2
• Target: IV’s that highly correlate with the DV, but don’t highly correlate
with each other
DV
IV2 IV1
93
Correlation Coefficient
• The multiple correlation coefficient (R) measures the strength of association between Y, and the set of X’s in the population.
• It is estimated as the simple correlation coefficient between the Y’s and their predicted values ( Ŷ’s )
94
Coefficient of Determination
• R2 = Coefficient of determination
= SS due to regression/SS total
• R2 = (reduction in variance of Y due to X’s) / (original variance of Y).
• Therefore 100R2 = % of variance of Y “explained by X’s”.
• And 100(1 - 𝜌 2 )1/2 = % of Standard Deviation NOT “explained” by X’s
95
Regression
96
Standard Deviation of bet1
97
Confidence Interval Mean Value
98
Confidence Interval for prediction
99
Adjusted R-square
100
101
102
Sequential SS vs. Partial SS
103
104
VIF
105
Interpretation of R
R
% of variance “explained”
% of variance not “explained”
% of SD “explained”
% of SD not “explained”
±0.3
9%
91%
5%
95%
±0.5
25%
75%
13%
87%
±0.71
50%
50%
29%
71%
±0.95
90%
10%
69%
31%
106
Partial Correlation
• The correlation coefficient measuring the degree of dependence between two variables
– after adjusting for the linear effect of one or more of the other X variables
Example: T1 and T2 are test scores
• Find partial R between T1 and T2 after adjusting for IQ
107
Visually ( p 130)
• Partial R = simple R between the two residuals
108
Interpretation of regression coefficients
• In the model: a + b1X1 + β2X2 + ... + βpXp if 𝜌 is the partial correlation between Y and X1 , given X1, X2, ..., Xp, then
• Testing that b1 = 0, is equivalent to testing that 𝜌 = 0
Hence, bi is called the partial regression coefficient of Y on X1 , given X1, X2, ..., Xp
109
Values of regression coefficients • Problem: Values of bi ‘s are not directly
comparable
• Hence: Standardized coefficients:
– Standardized bi = bi * (SD (Xi) / SD (Y))
• Standardized bi are directly comparable.
110
Multicollinearity
• The case where some of the X variables are highly correlated
• This will impact estimates, and their SE’s (p 143)
• Consider Tolerance, and its inverse, Variance Inflation Factor
• Target Tolerance < 0.01, or VIF > 100
• Remedy: use variable selection to delete some X variables, or a dimension reduction techniques such as Principal Components.
111
Misleading Correlations
• Example (Lung Function data, Appendix A): FEV1 vs height and age
• Depends on gender
112
Total vs Stratified Correlation
Gender
Correlation between FEV1 and:
Height
Age
Total
0.739
-0.073
Male
0.504
-0.310
Female
0.465
-0.267
113
FEV1 vs height
114
FEV1 vs height – Regression lines
115
FEV1 vs age
116
FEV1 vs age– Regression lines
117
Residual Analysis • Residual = e = Y – Ŷ
• Studentized residual = e/S(1 – h)1/2 – h called “leverage”
• Deleted studentized residual = studentized residual with observation for computing regression and S deleted.
118
Outliers
• Outlier in Y is studentized (or deleted studentized) residual >2 (same as simple case)
• Outlier in X if h > 2(P+1)/N
119
Some Caveats
• See list for simple regression
• Needrepresentativesample
• Violations of assumptions, outliers
• Multicollinearity: coefficient of any one variable can vary widely, depending on what others are included in the model
• Missing values
• Number of observations in the sample should be large enough relative to number of variables in the model.
120
Outline
• Matrix Review: (A – λ I) X = 0; Eigenvalues
• Simple linear regression
• Visit http://www.ats.ucla.edu/stat/sas/output/reg.htm
• Assign HW 6.1,2,5 for next week
• If we get to Chapter 7, assign HW 7.2, 7.4, 7.5, 7.6
(Hand in 7.2,4,5) 7.7 Will be assigned next week.
• Start Multiple Regression Lecture
• Go over Multiple Regression Example – 7.1
121
Quick Matrix Review (A – λ I) X = 0
A = 3 1 (A – λ I ) = 3 − λ 1 22 22−λ
λ = 1,4
λ = 1 => y = -2x
λ = 4 => y = x
122
Quick Matrix Review
(A – λ I) X = 0
A = (A – λ I ) =
(3-λ)(2-λ)(2-λ)+1*5*1+3*3*2 – ((1*(2-λ)*3) + (2*1*(2-λ)) + ((3-λ)*5*3) = 0 (12 -16λ + 7λ^2 – λ^3 +5 +18) – ((6 – 3λ) + (4 – 2λ) + (45 – 15λ)) = 0
-20 + 4λ +7λ^2 – λ^3 = 0
λ = 7.17, -1.76, 1.59
3-λ
1
3
2
2-λ
5
1
3
2-λ
3
1
3
2
2
5
1
3
2
123
Analysis of Variance Y
Observed Value of Y for three Groups
125
126