程序代写 COMP9417 Machine Learning and Data Mining Term 2, 2022

Regression (1)
COMP9417 Machine Learning and Data Mining Term 2, 2022
COMP9417 ML & DM Regression (1) Term 2, 2022 1 / 50
Acknowledgements

Material derived from slides for the book
“Elements of Statistical Learning (2nd Ed.)” by T. Hastie, R. Tibshirani & J. Friedman. Springer (2009) http://statweb.stanford.edu/~tibs/ElemStatLearn/
Material derived from slides for the book
“Machine Learning: A Probabilistic Perspective” by P. IT Press (2012)
http://www.cs.ubc.ca/~murphyk/MLbook
Material derived from slides for the book “Machine Learning” by P. Flach Cambridge University Press (2012) http://cs.bris.ac.uk/~flach/mlbook
Material derived from slides for the book
“Bayesian Reasoning and Machine Learning” by D. University Press (2012) http://www.cs.ucl.ac.uk/staff/d.barber/brml
Material derived from slides for the book “Machine Learning” by T. Graw-Hill (1997) http://www-2.cs.cmu.edu/~tom/mlbook.html
Material derived from slides for the course “Machine Learning” by A. Srinivasan BITS Pilani, Goa, India (2016)
COMP9417 ML & DM
Regression (1)
Term 2, 2022
This lecture will introduce you to machine learning approaches to the problem of numerical prediction. Following it you should be able to reproduce theoretical results, outline algorithmic techniques and describe practical applications for the topics:
• the supervised learning task of numeric prediction
• how linear regression solves the problem of numeric prediction • fitting linear regression by least squares error criterion
• non-linear regression via linear-in-the-parameters models
• gradient descent to estimate parameters for regression
COMP9417 ML & DM Regression (1) Term 2, 2022 3 / 50
Introduction to Regression
Introduction to Regression
COMP9417 ML & DM Regression (1) Term 2, 2022 4 / 50

Introduction to Regression
Task1 is to learn a model to predict CPU performance from a dataset of examples of 209 different computer configurations:
1Available from: https://archive.ics.uci.edu/ml/datasets/Computer+Hardware
COMP9417 ML & DM Regression (1) Term 2, 2022 5 / 50
Introduction to Regression
One possible model is a linear model, e.g.,
+ 0.049 MYCT
+ 0.015 MMIN
+ 0.006 MMAX
+ 0.630 CACH
– 0.270 CHMIN
+ 1.46 CHMAX
COMP9417 ML & DM
Regression (1)
Term 2, 2022
Introduction to Regression
Regression
We will look at the simplest model for numerical prediction: a regression equation
Outcome will be a linear sum of feature values with appropriate weights.
Note: the term regression is overloaded – it can refer to:
• the process of determining the weights for the regression equation, or • the regression equation itself.
COMP9417 ML & DM Regression (1) Term 2, 2022 7 / 50
Introduction to Regression
Linear Regression
Assumes: expected value of the output given an input, E[y|x], is linear. Simplest case: Out(x) = bx for some unknown b.
Learning problem: given the data, estimate b (i.e., ˆb).
COMP9417 ML & DM Regression (1) Term 2, 2022 8 / 50

Introduction to Regression
Linear Models
• Data has p numeric features, and we need numeric prediction ⇒ regression
• Linear models, i.e., outcome is linear combination of attributes y = b0 + b1x1 + b2x2 + . . . + bpxp
• Predicted value for first training instance x(1) is:
b x(1) + b x(1) + b x(1) + . . . + b x(1) = 􏰇 b x(1) 001122 pp ii
• p + 1 weights or coefficients must be learned on training data
• x(1) =1 0
COMP9417 ML & DM Regression (1) Term 2, 2022 9 / 50
Introduction to Regression
Minimizing Mean Squared Error
Difference between predicted and actual values is the error !
p + 1 coefficients are chosen so that mean of sum of squared errors on all
instances in training data is minimized. Mean Squared Error (MSE):
1n􏰅 p 􏰆2 􏰇 y(j)−􏰇bix(j)
ni j=1 i=0
Coefficients bi can be derived using calculus.
Can be done if there are more instances than attributes (roughly speaking).
Known as “Ordinary Least Squares” (OLS) regression – minimizing the mean of the sum of squared distances of data points to the estimated regression line.
Not the only approach — could use absolute error, etc. but this the most widely used . . .
COMP9417 ML & DM Regression (1) Term 2, 2022 10 / 50
Introduction to Regression
Multivariate (Multiple) Regression
Given 2 real-valued variables X1, X2, labelled with a real-valued variable Y , find “plane of best fit” that captures the dependency of Y on X1, X2.
Learning here is by minimizing MSE, i.e., average of squared vertical
ˆˆ distances of actual values of Y from the learned function Y = f(X).
COMP9417 ML & DM Regression (1) Term 2, 2022 11 / 50
Statistical Techniques for Data Analysis
Step back: Statistical Techniques for Data Analysis
COMP9417 ML & DM Regression (1) Term 2, 2022 12 / 50

Statistical Techniques for Data Analysis
Probability vs Statistics: The Difference
• Probability versus Statistics
• Probability: reasons from populations to samples
• This is deductive reasoning, and is usually sound (in the logical sense of the word)
• Statistics: reasons from samples to populations
• This is inductive reasoning, and is usually unsound (in the logical sense
of the word)
COMP9417 ML & DM Regression (1) Term 2, 2022 13 / 50
Statistical Techniques for Data Analysis
Statistical Analyses
• Statistical analyses usually involve one of 3 things:
1 The study of populations;
2 The study of variation; and
3 Techniques for data abstraction and data reduction
• Statistical analysis is more than statistical computation:
1 What is the question to be answered?
2 Can it be quantitative (i.e., can we make measurements about it)?
3 How do we collect data?
4 What can the data tell us?
COMP9417 ML & DM Regression (1) Term 2, 2022 14 / 50
Statistical Techniques for Data Analysis
COMP9417 ML & DM Regression (1) Term 2, 2022 15 / 50
Statistical Techniques for Data Analysis
Where do the Data come from? (Sampling)
• For groups (populations) that are fairly homogeneous, we do not need to collect a lot of data. (We do not need to sip a cup of tea several times to decide that it is too hot.)
• For populations which have irregularities, we will need to either take measurements of the entire group, or find some way of get a good idea of the population without having to do so
• Sampling is a way to draw conclusions about the population without having to measure all of the population. The conclusions need not be completely accurate
• All this is possible if the sample closely resembles the population about which we are trying to draw some conclusions
COMP9417 ML & DM Regression (1) Term 2, 2022 16 / 50

Statistical Techniques for Data Analysis
What We Want From a Sampling Method
• No systematic bias, or at least no bias that we cannot account for in our calculations
• The chance of obtaining an unrepresentative sample can be calculated. (So, if this chance is high, we can choose not to draw any conclusions.)
• The chance of obtaining an unrepresentative sample decreases with the size of the sample
COMP9417 ML & DM Regression (1) Term 2, 2022 17 / 50
Statistical Techniques for Data Analysis
For the class of numeric representations, machine learning is viewed as:
“searching” a space of functions . . .
represented as mathematical models (linear equations, neural nets, . . . ).
Which model is best for some sample(s) of data ? Which model is best for generalising to the population ?
COMP9417 ML & DM Regression (1) Term 2, 2022 18 / 50
Statistical Techniques for Data Analysis
Methods to predict a numeric output from statistics and machine learning:
• linear regression (statistics) determining the “line of best fit” using the least squares criterion
• linear models (machine learning) learning a predictive model from data under the assumption of a linear relationship between predictor and target variables
Very widely-used, many applications
Ideas that are generalised in Artificial Neural Networks (Deep Learning) and other types of learning . . .
COMP9417 ML & DM Regression (1) Term 2, 2022 19 / 50
Statistical Techniques for Data Analysis
Regression as a term occurs in many areas of machine learning:
• linear regression the classic
• non-linear regression by adding non-linear basis functions
• multi-layer neural networks (machine learning) learning non-linear predictors via hidden nodes between input and output
• regression trees (statistics / machine learning) tree where each leaf predicts a numeric quantity
• local (nearest-neighbour) regression
COMP9417 ML & DM Regression (1) Term 2, 2022 20 / 50

Statistical Techniques for Data Analysis
The inductive learning hypothesis
Any estimate2 found to approximate the target function well over a sufficiently large set of training examples will also approximate the target function well over other unobserved examples3.
2Estimate forming part of a model, such as regression coefficients, or other model parameters.
3After Mitchell (1997)
COMP9417 ML & DM Regression (1) Term 2, 2022 21 / 50
Statistical Techniques for Data Analysis
Estimation
Estimation
COMP9417 ML & DM Regression (1) Term 2, 2022 22 / 50
Statistical Techniques for Data Analysis
Estimation
Estimation from a Sample
• Estimating some aspect of the population using a sample is a common task. Along with the estimate, we also want to have some idea of the accuracy of the estimate (usually expressed in terms of confidence limits)
• Some measures calculated from the sample are very good estimates of corresponding population values. For example, the sample mean m is a very good estimate of the population mean μ. But this is not always the case. For example, the range of a sample usually under-estimates the range of the population
• We will have to clarify what is meant by a “good estimate”. One meaning is that an estimator is correct on average. For example, on average, the mean of a sample is a good estimator of the mean of the population
COMP9417 ML & DM Regression (1) Term 2, 2022 23 / 50
Statistical Techniques for Data Analysis
Estimation
Estimation from a Sample
• For example, when a number of samples are drawn and the mean of each is found, then average of these means is equal to the population mean
• Such an estimator is said to be statistically unbiased
COMP9417 ML & DM Regression (1) Term 2, 2022 24 / 50

Statistical Techniques for Data Analysis
Estimation
Sample Estimates of the Mean and the Spread I
Mean. This is calculated as follows.
• Find the total T of n observations. Estimate the (arithmetic) mean by m = T/n.
• This works very well when the data follow a symmetric bell-shaped frequency distribution (of the kind modelled by “normal” distribution)
• A simple mathematical expression of this is
m = n1 􏰈i xi, where the observations are x1, x2 …xn
• If we can group the data so that the observation x1
occurs f1 times, x2 occurs f2 times and so on, then the
mean is calculated as m = 􏰈1 􏰈 xifi ifi i
COMP9417 ML & DM Regression (1)
Term 2, 2022 25 / 50
Statistical Techniques for Data Analysis
Estimation
Sample Estimates of the Mean and the Spread II
• If, instead of frequencies, you had relative frequencies (i.e. instead of fi you had pi = fi/n), then the mean is simply the observations weighted by relative frequency. That is, m = 􏰈i xipi
• We want to connect this up to computing the mean value of observations modelled by some theoretical probability distribution function. That is, we want to a similar counting method for calculating the mean of random variables modelled using some known distribution
COMP9417 ML & DM
Regression (1) Term 2, 2022 26 / 50
Statistical Techniques for Data Analysis
Estimation
Sample Estimates of the Mean and the Spread III
• Correctly, this is the mean value of the values of the random variable function. But this is a bit cumbersome, so we will just say the “mean value of the r.v.” For discrete r.v.’s this is:
E(X) = 􏰇xip(X=xi) i
Variance. This is calculated as follows:
• Calculate the total T and the sum of squares of n
obse􏰊rvation􏰈s. The estimate of the standard deviation is
s = 1 (xi − m)2 n−1 i
• Again, this is a very good estimate when the data are modelled by a normal distribution
COMP9417 ML & DM Regression (1) Term 2, 2022 27 / 50
Statistical Techniques for Data Analysis
Estimation
Sample Estimates of the Mean and the Spread IV
• For grouped data, this is modified to s=􏰊1 􏰈(xi−m)2fi
• Again, we have a similar formula in terms of expected
values, for the scatter (spread) of values of a r.v. X around a mean value E(X):
V ar(X) = E((X − E(X))2) = E(X2) − [E(X)]2
• You can remember this as “the mean of the squares minus the square of the mean”
COMP9417 ML & DM Regression (1) Term 2, 2022 28 / 50

Statistical Techniques for Data Analysis
Covariance and Correlation
Covariance and Correlation
COMP9417 ML & DM Regression (1) Term 2, 2022 29 / 50
Statistical Techniques for Data Analysis
Covariance and Correlation
Correlation
• The correlation coefficient is a number between -1 and +1 that indicates whether a pair of variables X and Y are associated or not, and whether the scatter in the association is high or low
• High values of X are associated with high values of Y and low values of X are associated with low values of Y , and scatter is low
• A value near 0 indicates that there is no particular association and that there is a large scatter associated with the values
• A value close to -1 suggests an inverse association between X and Y
• Only appropriate when X and Y are roughly linearly associated
(doesn’t work well when the association is curved)
COMP9417 ML & DM Regression (1) Term 2, 2022 30 / 50
Statistical Techniques for Data Analysis
Covariance and Correlation
Correlation
• Correlation between X and Y as a population quantity is: cov(X,Y)
r = 􏰉var(X)􏰉var(Y ) • Sample variance for X:
SXX =􏰇(xi−x)2
• Sample variance for Y :
SYY =􏰇(yi−y)2
• Sample covariance for X, Y :
SXY =􏰇(xi−x)(yi−y)
• The formula for computing correlation from a sample of data on X
rˆ = S X Y S Y Y SXX
COMP9417 ML & DM Regression (1) Term 2, 2022 31 / 50
This is sometimes also called Pearson’s correlation coefficient
Statistical Techniques for Data Analysis
Covariance and Correlation
Correlation
• What does “covariance” intuitively mean ? Consider 1 Case1: xi >x,yi >y
2 Case2: xi y
4 Case4: xi >x,yi CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts