Introduction to information system
Linear Regression
Deema Abdal Hafeth
Bowei Chen
CMP3036M/CMP9063M Data Science
2016 – 2017
Today’s Objectives
• Simple Linear Regression
– Formulation
– Parameters Estimation: Least Square Estimation (LSE)
• Multiple linear Regression
• Appendix: Derivation of LSE
References
• James, G., Witten, D., Hastie, T., and Tibshirani, R. (2013). An introduction
to statistical learning. Springer. (Chapters 3 and 4)
• Hastie, T., Tibshirani, R., and Friedman, J. (2001). The elements of
statistical learning. Springer. (Chapter 3)
Price House size
1 420 5850
2 385 4000
3 495 3060
4 605 6650
5 610 6360
6 660 4160
7 660 3880
8 690 4160
9 838 4800
10 885 5500
… … …
Dataset
For this new house with size 4050 (sq ft), can we predict what is it the rent price?
Simple Linear Regression
Price House size
1 420 5850
2 385 4000
3 495 3060
4 605 6650
5 610 6360
6 660 4160
7 660 3880
8 690 4160
9 838 4800
10 885 5500
…
Response variable
Independent variable (x): Ppredictors
variable, feature or explanatory variable
Predictor variable
4050
𝑦 ≔ 𝑓 𝑥 = 𝛽0 + 𝛽1𝑥 + ε
Slop Intercept
𝑥
𝑦
𝑦
Simple Linear Regression Line
Effect of 𝛽0 Effect of 𝛽1
𝑦 ≔ 𝑓 𝑥 = 𝛽0 + 𝛽1𝑥 + 𝜀
Which Line Fits the Data “Best”?
Fig.(1) Fig.(2) Fig.(3)
Error
𝜀𝑖 = 𝑦𝑖 − 𝑦𝑖
𝑖-th data point
Sum of squared errors (SSE) = 2.3 − 2.8 2 + 4 − 2.9 2 + 2.8 − 3.4 2 = 1.82
Sum of Squared Errors (SSE)
How to find the line with the smallest SSE?
That’s the “best” line?!
And this is called the Least Square estimation (LSE) method
Index (𝒊) Price (𝑦) House size (𝑥)
1 420 5850
2 385 4000
3 495 3060
4 605 6650
5 610 6360
6 660 4160
7 660 3880
8 690 4160
9 838 4800
10 885 5500
…
(𝑥𝑖 , 𝑦𝑖)
(420, 5850)
(385, 4000)
…
…
…
…
…
…
…
(885,5500)
…
𝑦 𝑖 = 𝛽 0 + 𝛽 1𝑥𝑖
Expression of SSE
SSE = 𝜀1
2 + 𝜀2
2 + … + 𝜀𝑛
2 = 𝜀𝑖
2,
𝑛
𝑖
where 𝜀𝑖 = 𝑦𝑖 − 𝑦 𝑖 = 𝑦𝑖 − 𝛽0 − 𝛽1𝑥𝑖. Then
SSE = (𝑦𝑖 − 𝛽0 − 𝛽1𝑥𝑖)
2
𝑛
𝑖=1
We can consider SSE is a
function of 𝛽0 and 𝛽1
We see SSE is a quadratic
function of 𝛽0 and 𝛽1
Estimation of 𝛽0 and 𝛽1
Taking derivative of SSE with respect to 𝛽0 and 𝛽1 then gives
𝜕SSE
𝜕𝛽0
= 0,
𝜕SSE
𝜕𝛽1
= 0.
Solving this system of linear equations, we have
𝛽1 =
𝑦𝑖𝑥𝑖 −
𝑦𝑖 𝑥𝑖
𝑛
𝑖=1
𝑛
𝑖=1
𝑛
𝑛
𝑖=1
𝑥𝑖
2 −
𝑥𝑖
𝑛
𝑖=1
2
𝑛
𝑛
𝑖=1
, 𝛽0 = 𝑦 − 𝛽1 𝑥 .
Please see Appendix for detailed derivation.
For this new house with size 4050 (sq ft), can we predict what is it the rent price?
Price prediction:
608.61 = 141.36 + 0.065 × 4050
Simple Linear Regression Solution
For this new house with size 4050 (sq ft), 4 bedrooms and 2 bathrooms,
can we predict what is it the rent price?
There Are Other Features of Houses
Price House size Bedrooms Bathrms Stories Driveway Recroom Fullbase
1 420 5850 3 1 2 1 0 1
2 385 4000 2 1 1 1 0 0
3 495 3060 3 1 1 1 0 0
4 605 6650 3 1 2 1 1 0
5 610 6360 2 1 1 1 0 0
6 660 4160 3 1 1 1 1 1
7 660 3880 3 2 2 1 0 1
8 690 4160 3 1 3 1 0 0
9 838 4800 3 1 1 1 1 1
10 885 5500 3 2 4 1 1 0
… … … … … … … … …
Multiple Linear Regression
Simple expression:
𝑦𝑖 = 𝛽0 + 𝛽1𝑥𝑖,1 + 𝛽2𝑥𝑖,2 + ⋯ + 𝛽𝑝𝑥𝑖,𝑝 + 𝜀𝑖,
Matrix expression:
𝒚 = 𝒙𝜷 + 𝜺,
where
𝐲 =
𝑦1
⋮
𝑦𝑛
, 𝜷 =
𝛽0
𝛽1
⋮
𝛽𝑝
, 𝒙 =
1 𝑥1,1 ⋯ 𝑥1,𝑝
⋮ ⋮ ⋱ ⋮
1 𝑥𝑛,1 ⋯ 𝑥𝑛,𝑝
, 𝜺 =
𝜀1
⋮
𝜀𝑛
Multiple Linear Regression Solution
823.047 = −24.18293 + 0.05411 × 4050 + 58.26 × 4 + 197.5 × 2
Price prediction:
Conclusion
• Simple Linear Regression
– Formulation
– Parameters Estimation: Least Square Estimation (LSE)
• Multiple linear Regression
• Appendix: Derivation of LSE
Thank You!
Appendix: Derivation of LSE (1/2)
The SSE can be obtained as follows:
SSE = (𝑦𝑖 − 𝛽0 − 𝛽1𝑥𝑖)
2
𝑛
𝑖=1
Taking the partial derivative of SSE with respect to the 𝛽0 and 𝛽1 then gives
𝜕𝑆𝑆𝐸
𝜕𝛽0
=
𝜕
𝜕𝛽0
𝑦𝑖 − 𝛽0 − 𝛽1𝑥𝑖
2
𝑛
𝑖=1
= −2 (𝑦𝑖 − 𝛽0 − 𝛽1𝑥𝑖)
𝑛
𝑖=1
= 0,
𝜕𝑆𝑆𝐸
𝜕𝛽1
=
𝜕
𝜕𝛽0
𝑦𝑖 − 𝛽0 − 𝛽1𝑥𝑖
2
𝑛
𝑖=1
= −2 𝑥𝑖(𝑦𝑖 − 𝛽0 − 𝛽1𝑥𝑖)
𝑛
𝑖=1
= 0.
Appendix: Derivation of LSE (2/2)
Solving the system of linear equations then gives
𝛽1 =
𝑦𝑖𝑥𝑖 −
𝑦𝑖 𝑥𝑖
𝑛
𝑖=1
𝑛
𝑖=1
𝑛
𝑛
𝑖=1
𝑥𝑖
2 −
𝑥𝑖
𝑛
𝑖=1
2
𝑛
𝑛
𝑖=1
, 𝛽0 = 𝑦 − 𝛽1 𝑥 .