Simple linear regression: I. introduction
Miaoyan Wang
Department of Statistics UW Madison
Copyright By PowCoder代写 加微信 powcoder
Simple linear regression
References:
Chapter 2 in JF ( . Faraway)
Chapter 2.1-2.9, 2.11 in RC ( )
Both textbooks are available in Canvas/files/textbook/
Example: Wetland Species Richness
A study was performed on insect species richness in 58 wetlands in Ontario, Canada.
The goal of the study was to determine the relationship between forest density around the wetland and insect species richness.
The investigators sample insects in each wetland and then recorded the number of species present in each sample.
The percent forest cover within a 1500-meter buffer around the wetland was also recorded, among other wetland characteristics.
Example: Wetland Species Richness
x wetland y 0.056 30 5 0.546 31 6 0.637 32 9 0.815 33 4 0.676 34 11 0.871 35 8 0.467 36 5 0.684 37 10 0.496 38 10 0.415 39 5 0.680 40 10 0.773 41 6 0.319 42 10 0.127 43 10 0.306 44 7 0.676 45 9 0.684 46 7 0.546 47 18 0.542 48 12 0.263 49 6 0.488 50 4 0.359 51 6 0.680 52 8 0.393 53 4 0.773 54 11 0.815 55 7 0.642 56 10 0.580 57 10 0.396 58 11
0.637 0.488 0.580 0.705 0.439 0.705 0.680 0.396 0.467 0.306 0.684 0.415 0.684 0.340 0.871 0.871 0.680 0.263 0.396 0.306 0.359 0.439 0.542 0.705 0.127 0.496 0.263 0.127 0.546
Example: Wetland Species Richness
●● ●●●●●●●●
0.2 0.4 0.6 0.8
Percent forest cover
Number of species
Specific Goals
To describe the relationship between the percent forest cover (x) and the number of species (y).
To estimate or predict the number of species for a given percent forest cover.
Q: How to account for uncertainty in the fitted line and variation?
● ●●●●● ●● ●●●●●●●
● ● ● ●● ●● ●
● ● ● ● ● ●●●●● ● ●●●
● ● ● ● ● ●●●●
0.25 0.50 0.75 Percent forest cover
Number of species
Modeling Idea
Model y by a random variable Y .
Regard x as fixed, or condition on x (x could be modeled by a
random variable X.)
Consider the model of Y conditional on X = x:
E(Y|X =x)=β0 +β1x.
β0,β1 are fixed unknown parameters (i.e., the intercept and slope) characterizing the relationship between X and Y .
Simple Linear Regression Model
The formal simple linear regression (SLR) model for the data (xi , yi ) is: Yi = β0 + β1xi + εi
for i = 1,2,…,n, where
Yi is the ith response variable.
Xi is the ith explanatory variable (also called predictors, covariates). εi is the ith random error term.
The random errors follow a normal distribution with mean zero and variance σ2 and are independent of each other.
iid 2 That is, εi ∼ N(0,σ ).
iid = independently and identically distributed
Features of Simple Linear Regression Model
Under the SLR model for the data (xi , yi ):
Simple one explanatory variable only
Linear parameters enter the model linearly.
Regression Galton: taller fathers tend to have shorter sons; regression toward the mean
Randomness Q: What kind of distribution does Yi have?
Independence The random errors are independent and thus the response variables are (conditionally) independent.
Q: What kind of independence?
Q: What kind of dependence?
The model parameters are: β0, β1, σ2.
Model Assumptions
A straight line relationship between the response variable Y and the explanatory variable X:
Equal variance:
E(Yi|Xi)=β0 +β1xi. Var(Yi|Xi) = σ2.
Independence (conditional on Xi , Xi′): Cov(Yi,Yi′) = 0 for i ̸= i′.
Normal distribution:
Yi|Xi ∼N(β0+β1xi,σ2).
Equivalent Model Assumptions
Equivalently, the assumptions are
A straight line relationship between the response variable Y and the
explanatory variable X:
Yi=β0+β1xi+εi where E(εi)=0
Equal variance: Independence: Normal distribution:
Var(εi) = σ2.
Cov(εi,εi′) = 0 for i ̸= i′.
εi ∼N(0,σ2).
Q: εi are iid. How about Yi ? iid? Not iid? It depends?
Model Parameters
The model parameters are β0,β1, and σ2 (population parameters). β0 and β1: regression coefficients.
β0: intercept.
When the model scope includes x = 0, β0 can be interpreted as the meanofY atx=0.
β1: slope.
Interpreted as the change in the mean of Y per unit increase in x.
σ2: error variance, sometimes written as σ2 or σ2 . ε Y|x
Q: How to estimate the model parameters based on data?
Estimation of Model Parameters
Our goal is to estimate these model parameters by estimators βˆ0,βˆ1, and σˆ2, based on data.
Two methods:
Least squares (LS).
Maximum likelihood (ML).
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com