LINEAR REGRESSION
COMP2420/COMP6420 INTRODUCTION TO DATA MANAGEMENT, ANALYSIS AND SECURITY
WEEK 3 – LECTURE 2 Wednesday 08 March 2022
of Computing
Copyright By PowCoder代写 加微信 powcoder
College of Engineering and Computer Science
Credit: (previous course convenor)
HOUSEKEEPING
2 P KAN JOHN
Lab enrolment You need to be enrolled in a lab urgently if you are not already.
3 P KAN JOHN
Assignment 1
Assignment 1 is now available
(Due Friday 01 April [week 6] at
4 P KAN JOHN
Lab-test (week 04)
Carried out in Week 4
Available on Wattle over 3 h period Wednesday of week 04 (1-4pm) (Wednesday 16 March)
5 P KAN JOHN
Public Holiday
Canberra Public Holiday Canberra Day 14 March 2022
No live lecture (recording to be provided)
Labs – tutor will either re-schedule or you can go to another lab in that week
6 P KAN JOHN
Learning Outcomes
Describe what linear regression (LR) is
Explain which situations are relevant for applying LR
Describe the different considerations involved in applying LR
Discuss the various approaches that can be taken for linear regression
7 P KAN JOHN
INTRODUCTION
8 P KAN JOHN
Linear Regression
What is linear regression?
9 P KAN JOHN
Supervised Learning (recap)
• labeled data available
• Goal: Learn a function that maps input to
Given a training set of N example input-output pairs (!”, #”), (!$, #$),…(!%, #%),
Where each #&was generated by an unknown function y=f(x),
discover a function h that approximates the true function f.
10 P KAN JOHN
Supervised Learning – Classification and Regression (recap)
Classification: when output y is one of a finite set of values [discrete categorical] (e.g. sunny, cloudy or rainy)
Regression: when output y is a real-valued number (e.g. average temperature, height)
11 P KAN JOHN
Problem Setup
Can we predict height of the child from father’s height?
Attribution:Galton Data P KAN JOHN
We want to find the best line (linear function ! = # $ ) to explain our data.
13 P KAN JOHN
We can assume there is a straight line that can fit all these points, which is given by:
“! = $ % + ‘
where $ is commonly known as the slope, and ‘ is commonly known as the intercept.
x (feature) is the father height and y (target) is the child height
14 P KAN JOHN
Model slope (!): 0.3993812658985648
Model intercept (“): 39.11038683707543 15 P KAN JOHN
• This is an example of using linear regression for prediction
regression
statistics)
• It has been around for a long time (in
• It is a supervised learning based
• There are different approaches to training the model
16 P KAN JOHN
Moving to higher dimensions
We can generalize the above to get a general form when we have more than one variable as a feature to perform regression:
“! = $ % + $ ‘ ( ‘ + $ ) ( ) + ⋯
Geometrically, this is equivalent to fitting a plane to points in three dimensions, or fitting a hyper-plane to points in higher dimensions.
17 P KAN JOHN
Matrix Form
Now if instead of writing this equation in vector form, if we write the input samples in a single matrix X:
#$$ #$% #$& … #$(
#%$ #%% #%& … #%( ……………
#)$ #)% #)& … #)(
Each row is a distinct observation, columns of X are input features. Then equation is given by
18 P KAN JOHN
19 P KAN JOHN
Fitting Parameters
How do we find a “good” fit to the data (estimate the coefficients)?
Many different approaches, such as:
• Simple linear regression
• Ordinary Least Squares
• Gradient Descent
• Regularization
20 P KAN JOHN
Ordinary Least Squares
The most natural way is to minimize the distance between the fitted value “! and real value “. Eg. the Residual Sum of Squares (squared loss):
# $ $ % = ‘ ” * − “! , (
Question: What is the term (“*−”!)? 21 P KAN JOHN
Ordinary Least Squares
The most natural way is to minimize the distance between the fitted value “! and real value “. Eg. the Residual Sum of Squares (squared loss):
# $ $ % = ‘ ” * − “! , (
Question: What is the term (“*−”!)?
It is called the ‘residual’ and is given by the difference between the
observed (“*) and predicted (“!) values of the response variable. P KAN JOHN
Since “! = $% in matrix form, the squared loss can be written as:
&” % = ” − $% ) ” − $%
We can write the closed form solution o f %* a s
%* = $ ) $ + , $ ) ”
where %* is the coefficient vector for which the square loss is minimized
23 P KAN JOHN
In case you are interested …
⟹ $ &% = “!
⟹ $ ( $ &% = $ ( “!
⟹ $ ( $ &% = $ ( “!
⟹ &% = $ ( $ ) * $ ( ”
The columns of $ must be linearly independent
24 P KAN JOHN
Gradient Descent
In practice, !”! #$ may be expensive to compute, so we often resort to iterative methods (e.g. gradient descent).
%←%−) *−!% !
until convergence
(ie loss function is a very small value)
Loss function (estimate of error in predicted value)
25 P KAN JOHN
Gradient descent : illustrated
Credit: https://towardsdatascience.com/linear-regression-using-gradient-descent-97a6c8700931 26 P KAN JOHN
Is the fitted model correct?
We can always fit a linear model to any dataset, but how do we know if there is a real linear relationship?
Attribution:Anscombe’s Quartet P KAN JOHN
Approach: Use a hypothesis test. The null hypothesis is that there is no linear relationship (! = 0).
Statistic: Some value which should be small under the null hypothesis, and large if the alternate hypothesis is true.
28 P KAN JOHN
29 P KAN JOHN
30 P KAN JOHN
So what does it mean if !” has the following values?
outside the range [0,1]
High value ⇔ a good model? Low value ⇔ a bad model?
31 P KAN JOHN
Inference – Hypothesis testing
Testing for association
Correlation and Correlation Coefficient
A correlation reflects the dynamic quality of the relationship between variables. It allows us to understand how variables change with respect to one another. If variables change in the same direction, the correlation is called direct or positive correlation. If variables change in opposite directions, the correlation is called indirect or negative correlation.
A correlation coefficient is a numerical index that reflects the relationship (assumed linear) between 2 variables. It ranges between -1.00 to +1.00.
We will look at the Pearson product-moment correlation coefficient which applies to interval and ratio variables only.
(There exist other types of correlation coefficients for other levels of measurement).
Salkind, Neil, Statistics for People (who think) they hate statistics, 6th Ed, 2017 Hair, Lukas, Marketing Research, 4th Ed, 2014
P KAN JOHN
Inference – Hypothesis testing Correlation Coefficient
The correlation coefficient between two variables X and Y is given by:
!”# = %∑'(−∑’∑( %∑’*−(∑’)* %∑(*−(∑()*
n: size of sample
X: each observed value for the X variable
Y: each observed value for the Y variable
XY: product of each X value times its corresponding Y value
If !”# is positive, it means that X and Y change in the same direction (positive or direct correlation). Likewise, if !”# is
negative, it means that X and Y change in opposite directions (negative or indirect correlation).
The absolute value of !”# indicates the strength of the correlation.
33 P KAN JOHN
Inference – Hypothesis testing
Testing for association Correlation Coefficient –rule of thumb for interpretation
Use these values as guidelines to help get a general idea of the strength of correlation between 2 variables. (note: these are guidelines only)
Size of correlation
Coefficient general interpretation
0.8 to 1.0
Very strong relationship
0.6 to 0.8
Strong relationship
0.4 to 0.6
Moderate relationship
0.2 to 0.4
Weak relationship
0.0 to 0.2
Weak or no relationship
34 P KAN JOHN
Inference – Hypothesis testing
Testing for association Correlation and Coefficient of determination
The coefficient of determination is the percentage of variance in one variable accounted for by the variance in the other variable. It is given by the square of the correlation coefficient !$ .
E.g. if the correlation between GPA and study time is 0.7, then !$ = 0.7$=0.49. This means
that 49% of the variance in GPA can be explained by the variance in study time.
35 P KAN JOHN
Inference – Hypothesis testing
Testing for association Correlation Coefficient – Considerations
The Pearson product moment correlation coefficient is based on several assumptions:
Variables under investigation are measured using interval or ratio scaled measures
Nature of relationship between variables under investigation is linear
The variables under investigation come from a bivariate normally distributed population (ie observations with a given value of one variable have values of the second variable normally distributed).
Statistical Significance
A correlation coefficient has to be statistically significant to have meaning.
Response vs predictor variable
The correlation coefficient does not distinguish between response or predictor variable. It is up to the analyst/researcher to use logic to specify which is which.
36 P KAN JOHN
Inference – Hypothesis testing
Testing for association
Correlation Coefficient – Statistical Significance
We are normally interested in drawing inference about the population from the sample. Hence we want to draw conclusions about the population correlation coefficient ! “# from our sample data. A statistical significance test is therefore needed to do this.
The null hypothesis is that there is no correlation between variable X and Y in the population ($%: ! “# =0).
The corresponding research hypothesis is that there is a correlation between X and Y (two- tailed test). If we have relevant information to do so, we could also hypothesize that there is a positive correlation or a negative correlation between X and Y (one-tailed tests).
A table of critical values for r can be used to determine significance.
you can find one at: https://www.mne.psu.edu/cimbala/me345/Exams/Critical_values_linear_correclation.pdf
The degree of freedom (df) is given by n- 2 where n is the sample size.
A t-test or ANOVA (F-test) can also be used to determine statistical significance (the values for critical r above are derived from those). We won’t go into details here but you can read more at these two references:
https://onlinecourses.science.psu.edu/stat501/node/259/
https://web.ma.utexas.edu/users/mks/M358KInstr/Ch27SuppFandCorr.pdf
P KAN JOHN
Inference – Hypothesis testing
Testing for association Correlation – Scatter Diagram
A scatter diagram (or scatter plot) can be used to visually represent the relationship between two variables and get a sense of their correlation (amount of covariation). The response variable should be plotted on the Y-axis and the predictor variable on the X-axis.
Source: https://www.latestquality.com/interpreting-a-scatter-plot/
Excel is pretty intuitive to use for doing scatter diagrams.
You can also check this resource:
https://www.westmont.edu/~phunter/ma5/excel/scatterplot.html
38 P KAN JOHN
Inference – Hypothesis testing
Testing for association and making predictions Regression Analysis
A regression analysis involves a regression equation used to describe a relationship between variables and can be used to make predictions about future outcomes. Normally, we consider making predictions after we have established that the correlation between the variables is statistically significant.
The most common type of regression is linear regression (which assumes a linear relationship between variables).
Source: Hair, Lukas, Marketing Research, 4th Ed, 2014 39 P KAN JOHN
Inference – Hypothesis testing
Testing for association and making predictions Regression Analysis
A bivariate linear regression equation is given by: “! = $ % + ‘ ± e
“!: predicted value of Y based on a known value of X
b: regression coefficient (given by the slope of the line)
a: point at which line crosses y-axis (gives value of Y’ at X=0)
e: average error in prediction
The regression coefficient (b) is an indicator of the importance of a predictor variable in predicting the response variable. Larger coefficients are better predictors (likewise smaller coefficients are weaker predictors).
[nb: the regression coefficient is not the same as the correlation coefficient]
Salkind, Neil, Statistics for People (who think) they hate statistics, 6th Ed, 2017 Hair, Lukas, Marketing Research, 4th Ed, 2014
P KAN JOHN
Inference – Hypothesis testing
Testing for association and making predictions Correlation and Regression – Example
Height (x) (inches)
Self Esteem (y)
We are tasked to investigate the relationship between 2 variables: height (in inches) and self-esteem (given by a score on some test). Let’s say that we are hypothesizing that a person’s height could affect their self-esteem. We have data on 20 male individuals. How could we determine if there is a significant relationship between height and self-esteem? Are we able to predict self-esteem from a male person’s height?
Source: http://www.socialresearchmethods.net/kb/statcorr.php P KAN JOHN
Inference – Hypothesis testing
Testing for association and making predictions Correlation and Regression – Example (solution using Excel)
42 P KAN JOHN
Inference – Hypothesis testing
Testing for association and making predictions Correlation and Regression – Example (solution using Excel) [cont]
5 4 3 2 1 0
In excel (if you are interested) Check this ref for how to:
https://www.westmont.edu/~phunter/ ma5/excel/regression.html
Using Python is expected in this course
Self-esteem vs Height
Scatter plot with regression line drawn in (by adding a trend line).
45 50 55 60 65 70 75 80 Height (inches)
P KAN JOHN
Self-esteem
Inference – Hypothesis testing
• The data analysis toolpak has a “correlation” tool which will calculate the correlation coefficient (see bottom left on previous slide).
• The toolpak also has a “regression” tool which will give the summary output on the right of previous slide. The significance level was set as 95%.
àResults for an ANOVA test are given. We can see that it is significant given the F-statistic and associated p-value. Hence we can reject the null hypothesis (that there is no correlation between height and self-esteem). We can say, with a 95% level of confidence, that height and self-esteem are correlated and that the strength of association is strong (r=0.73).
Testing for association and making predictions Correlation and Regression – Example (solution using Excel)
P KAN JOHN
Inference – Hypothesis testing
Prediction Model
àThe values of the regression coefficient and the y-intercept are also given. Thus we have a prediction model of:
“! = 0.071*X – 0.87±0.016
the average error in prediction is given by 0.016 (the standard error calculated).
We know from the coefficient of determination that 53% of the variance in self-esteem can be explained by the variance in height. We also need to look at the value of the regression coefficient (0.071). While it is statistically significant (as we found on previous slide), it is relative quite small, meaning that the response variable will not change very much for a given unit change in the predictor variable. Thus, we can say that a statistically significant relationship (with a strong association) exists in the studied population but that the relationship is weak. [it is important to explain this when presenting your results]
Testing for association and making predictions Correlation and Regression – Example (solution using Excel) [cont]
P KAN JOHN
Are we done?
• If we are just trying to study and analyze the data, then probably yes
• But if we are trying to predict, then we need to:
– check that the data sample is i.i.d
– create training/test sets
– train the model on the training set and test on the test set
– use validation where appropriate
– avoid under/over fitting
46 P KAN JOHN
In this equation, we are applying a penalty or regulizer.
One common way to avoid overfitting is to regularize the fit, e.g., minimize
regularizer
“#($−(*$++. #0$+ $%& ⏟ $%&
. 12 345567 8h6 :17;6 <4:4=686: 0$ are the ridge coefficients
P KAN JOHN
Sometimes called Ridge regression. This particular regularization has closed-form solution too:
"! = $ % $ + ' ( ) * $ % +
Attribution:Tikhanov
Note: there are other types of regularization e.g. KAN JOHN
49 P KAN JOHN
Polynomial fitting
The idea is to take our multidimensional linear model:
"! = $ % + $ ' ( ' + $ ) ( ) + ⋯
and build the (', (), (,, and so on, from our single-dimensional input (. Thatis,welet(- =.- ( ,where
.- is some function that transforms our data.
For example, if .- ( = (-, our model becomes a polynomial regression:
"! = $ % + $ ' ( + $ ) ( ) + $ , ( , + ⋯
50 P KAN JOHN
Which is the best one to use? Why?
51 P KAN JOHN
• Still a linear model
• Linearity w.r.t. coefficients
• Feature engineering: take one-dimensional ! values and project them onto a higher dimension
• Allows a linear model to fit more complicated relationships between ! and "
Further reading on Feature Engineering ( -Lee) P KAN JOHN
Live coding (next)
Mindika will go through some live coding next to illustrate concepts relevant to implementing linear regression
Notebook available.
53 P KAN JOHN
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com