COMP9321:
Data services engineering
Week 8: Linear Regression
Term 1, 2021
By Mortada Al-Banna, CSE UNSW
2
Refresher
Regression Analysis
Supervised ML
Regression
Simple Linear Multiple Linear Polynomial Linear Regression Regression Regression
3
https://www.slideshare.net/Simplilearn/linear-regression-analysis-linear-regression-in-python-machine-learning-algorithms- simplilearn?qid=c0cbd932-3ad1-45ec-b123-36805981982d&v=&b=&from_search=4
Sir Francis Galton, 1822-1911
4
https://en.wikipedia.org/wiki/Francis_Galton
5
https://en.wikipedia.org/wiki/Francis_Galton
Regression Analysis
• A linear Model is a sum of weighted variables that predict a target output value given an input data instance
Example: Predicting housing prices
House features: taxes paid per year (X tax), age in years (X age)
Predicted price= 80000+ 100 X tax – 4000 X age
• So if the house tax per year is 20000, and the age of the house is 60 years then the predicted selling price is:
Predicted price=80000+100 x 20000 – 4000 x 60= 1,840,000 6
Linear Regression
We want to find the “best” line (linear function y=f(X)) to explain the data.
y
X
Linear Regression
The predicted value of y is given by:
𝑝
መመ 𝑦ො = 𝛽 + 𝑋 𝛽
መ
The vector of coefficients 𝛽 is the regression model.
0𝑗𝑗 𝑗=1
Linear Regression
መ𝑝መ Theregressionformula 𝑦ො= 𝛽 +σ 𝑋𝛽 + 𝜀
e.g., j = 1
predictor Intercept (where the line crosses y-axis)
Random error Slope of the line
መመ
ෝ𝑦 = 𝛽 + 𝑋 𝛽 + 𝜀
011
0 𝑗=1𝑗𝑗
The slope and intercept of the line are called regression coefficients, model parameters
Our goal is to estimate the model parameters
Assumptions When using Linear Regression
• Outcome Variable must be continuous.
• Linear Relationship between the features and target.
• Little or no Multicollinearity between the features.
• Normal distribution of error terms.
• Minimum Outliers
How to check these assumptions?
• Plotting (e.g., scatter plot, Histogram)
• Calculating coefficients
Correlation Coefficient
If r
• 1 indicates a strong positive relationship.
• -1 indicates a strong negative relationship.
• A result of zero indicates no relationship at all.
Linear Regression
መ𝑝መ Theregressionformula 𝑦ො= 𝛽 +σ 𝑋𝛽 + 𝜀
e.g., j = 1
predictor Intercept (where the line crosses y-axis)
Random error Slope of the line
መመ
ෝ𝑦 = 𝛽 + 𝑋 𝛽 + 𝜀
011
0 𝑗=1𝑗𝑗
The slope and intercept of the line are called regression coefficients, model parameters
Our goal is to estimate the model parameters
Challenge
• Find the values of β0 and β1 that the line corresponding to those values is the best fitting line or gives the minimum error (minimum cost)
• Possible solution is to use the Least Square Error solution
• But where do we start and how we determine the proposed line? Gradient descent
Least Square Error Solution
To estimate (β0,β1) , we find values that minimize squared error
Solution:
What is Gradient Descent?
Gradient Descent
• Gradient descent is a method of updating β0 and β1 to reduce the cost function(Least Square Error).
• The idea is that we start with some values for β0 and β1 and then we change these values iteratively to reduce the cost. Gradient descent helps us on how to change the values.
• To update β0 and β1 , we take gradients from the cost function. To find these gradients, we take partial derivatives with respect to β0 and β1.
• A smaller learning rate could get you closer to the minima but takes more time to reach the minima, a larger learning rate converges sooner but there is a chance that you could overshoot the minima.
Linear Regression
Simple linear regression Y = β0 + β1X1 + ε
Multiple linear regression
Y = β0 + β1X1 + β2X2 + ε
Multiple Linear Regression Vs Simple Linear Regression
• Simple Linear regression compares the response of a dependent variable given a change in some explanatory variable.
• However, it is rare that a dependent variable is explained by only one variable. In this case, we use multiple regression, which attempts to explain a dependent variable using more than one independent variable.
• Multiple regressions are based on the assumption that there is a linear relationship between both the dependent and independent variables. It also assumes no major correlation between the independent variables.
Multiple Linear Regression (Example)
• Predicting Exxon Mobil (XOM) stock price
• Option1: Simple Linear Regression rely on the value of the S&P 500 index as the independent variable, or predictor, and the price of XOM as the dependent variable. Problem: Unrealistic and provide lower accuracy.
• Option2: Multiple Linear Regression depends on more than just the performance of the overall market. Other predictors such as the price of oil, interest rates, and the price movement of oil futures can affect the price of XOM and stock prices of other oil companies
https://www.investopedia.com/terms/m/mlr.asp
Multiple Linear Regression (Example) Cont’d
• Examines how multiple independent variables are related to one dependent variable.
• Once each of the independent factors has been determined to predict the dependent variable, the information on the multiple variables can be used to create an accurate prediction on the level of effect they have on the outcome variable.
https://www.investopedia.com/terms/m/mlr.asp
Multiple Linear Regression (Example) Cont’d
yi = dependent variable: price of XOM xi1 = interest rates
xi2 = oil price
xi3 = value of S&P 500 index
xi4= price of oil futures
β0 = y-intercept at time zero
β1 = regression coefficient that measures a unit change in the dependent variable when xi1 changes – the change in XOM price when interest rates change
β2 = coefficient value that measures a unit change in the dependent variable when xi2 changes – the change in XOM price when oil prices change
β3 = coefficient value that measures a unit change in the dependent variable when xi3 changes – the change in XOM price when the value of S&P 500 index change
β4 = coefficient value that measures a unit change in the dependent variable when xi4 changes – the change in XOM price when price of oil futures change
Yi = β0 + β1X1 + β2X2 + β3X3 + β4X4 + ε
https://www.investopedia.com/terms/m/mlr.asp
Multiple Linear Regression (Selecting the Features)
• The multiple linear regression explains the relationship between one continuous dependent variable (y) and two or more independent variables (β1, β2, β3… etc.)
• Challenge: How to determine which features to keep and which to toss?
➢ Chuck Everything In and Hope for the Best ➢ Backward Elimination
➢ Forward Selection
➢ Bidirectional Elimination
Something You Need to Know (P-Value)
• The p-value for each term tests the null hypothesis that the coefficient is equal to zero (no effect). A low p-value (< 0.05) indicates that you can reject the null hypothesis. In other words, a predictor that has a low p-value is likely to be a meaningful addition to your model because changes in the predictor's value are related to changes in the response variable.
• Conversely, a larger (insignificant) p-value suggests that changes in the predictor are not associated with changes in the response.
• Null Hypothesis: It’s like if you were doing a trial of a drug that doesn’t work. In that trial, there just wouldn’t be a difference between the group that took the drug and the rest of the population. The difference would be null. You always assume that the null hypothesis is true until you have evidence that it isn’t.
https://towardsdatascience.com/multiple-linear-regression-in-four-lines-of-code-b8ba26192e84
Backward Elimination
1. First, you’ll need to set a significance level for which data will stay in the model. For example, you might want to set a significance level of 5% (SL = 0.05). This is important and can have real ramifications, so give it some thought.
2. Next, you’ll fit the full model with all possible predictors.
3. You’ll consider the predictor with the highest P- value. If your P-value is greater than your significance level, you’ll move to step four, otherwise, you’re done!
https://towardsdatascience.com/multiple-linear-regression-in-four-lines-of-code-b8ba26192e84
Backward Elimination
4. Remove that predictor with the highest P-value. 5. Fit the model without that predictor variable. If you just remove the variable, you need to refit
and rebuild the model. The coefficients and constants will be different. When you remove one, it affects the others.
6. Go back to step 3, do it all over, and keep doing that until you come to a point where even the highest P-value is < SL. Now your model is ready. All of the variables that are left are less than the significance level.
https://towardsdatascience.com/multiple-linear-regression-in-four-lines-of-code-b8ba26192e84
Correlation and Collinearity
• Checking for collinearity helps you get rid of variables that are skewing your data by having a significant relationship with another variable
• Correlation between variables describe the relationship between two variables. If they are extremely correlated, then they are collinear
• Having high collinearity (correlation of 1.00) between predictors will affect your coefficients and the accuracy, plus its ability to reduce the LSE (Least Squared Errors)
• The simplest method to detect collinearity would be to plot it out in graphs or to view a correlation matrix to check out pairwise correlation.
Useful Resources
• https://towardsdatascience.com/introduction-to-machine- learning-algorithms-linear-regression-14c4e325882a
• https://medium.com/@powersteh/an-introduction-to- applied-machine-learning-with-multiple-linear-regression- and-python-925c1d97a02b
• https://machinelearningmastery.com/linear-regression-for- machine-learning/
• https://www.investopedia.com/terms/m/mlr.asp
• https://towardsdatascience.com/multiple-linear- regression-in-four-lines-of-code-b8ba26192e84
Questions? 28