Useful Formulas
Linear Regression
• Pseudoinverse of matrix Φ: Φp=(ΦT Φ)−1 ΦT Questions
What is the role of basis functions in linear regression?
Copyright By PowCoder代写 加微信 powcoder
Basis functions allow us to represent a non-linear function of the input variables with a function which is linear in the weights.
Can an algorithm doing linear regression learn only linear functions of the inputs?
No, the learned function is linear in the weights but does not need to be linear in the input variables.
When can we solve the linear regression problem exactly (with 0 error)? Why is it not a good idea to do so?
When the number of parameters is the same as the number of points in the data set. Normally, we want much fewer parameters than data points.
What is the error we want to minimize when doing linear regression?
1NT2 The sum of squares error: E(X)=2∑(Φi w−ti)
, where X is the dataset, Φi is the vector of the basis functions evaluated on point i, ti is the desired value for point i (target),
i=1 and w is the vector of weights to optimise.
What is the least-squares solution? How is it affected by outliers?
The least-squares solution uses the pseudo-inverse of the matrix Φ of the basis functions evaluated on the data set, and is defined as: w=(ΦT Φ)−1ΦT t where
Φp=(ΦT Φ)−1 ΦT is the pseudo-inverse of Φ (the full derivation is in the slides). Since the least-squares solution minimises the average error on all the points, outliers affect the average strongly by pushing it towards their value.
How can we find the least-squares solution when there are too many points to compute the pseudoinverse efficiently?
We can perform stochastic gradient descent on the error point by point, updating the current
weight vector according to: wk+1=wk−η∇Ei=wk−η(ΦTi wk−ti)Φn (full derivation in the slides).
What are the bias and the variance for a supervised learning problem?
The bias is an error of the regression (or the classifier) which on average converges towards something away from the desired value. The variance is the dependency of the model on the data set, so that with different training sets we get different regressed functions (or classifiers). The variations of the different functions (or classifiers) is captured by the variance.
What is the link between the error on the validation set increasing with training, and the bias/variance decomposition?
The bias/variance decomposition shows us that the expected error has three components: the bias, the variance, and the noise in the data. The noise is an intrinsic property of the data set and training does nothing about it. On the other hand, training decreases the bias, making the average estimate increasingly correct. The total expected error, however, does not change, therefore the reduction of the bias has to happen at the expense of something else: the variance. Therefore, the model becomes increasingly dependent on the particular data points used for training (which increases the variance) and loses generalization.
Given the dataset: <-1, -0.5>, <0,1.1>, <1,3.8>, <2,8.8>, find the least-squares solution for the function: y(x ,w)=w0+w1 x
First, we need to create the matrix of the coefficients for the linear system. The first column of the matrix is the value of the first basis function on the points. The first basis function,
w0 is the constant 1. The second basis function is the
Then, we need to compute the pseudo inverse of Φ :
1 −1 ΦTΦ=[1 1 1 1]1 0 =[4 2]
that is what is multiplied by function x:
−1 0 1 2 1 1 2 6 []
12 (ΦTΦ)−1=[0.3 −0.1]
−0.1 0.2 and lastly:
Φp=[0.4 0.3 0.2 0.1] . −0.3 −0.1 0.1 0.3
We can now use the psudo inverse to compute the optimal vector of weights:
− 0.5 w=Φpt=[0.4 0.3 0.2 0.1][1.1]=[1.77],
−0.3 −0.1 0.1 0.3 3.8 3.06 8.8
where the vector t is the vector of the values of the function over the points in the dataset (the last element of each vector in the dataset).
10. Given the dataset: <-1, 0.78>, <0,1>, <1,1.22>, <2,1.52>, find the least-squares solution for (x+1)2
y(x,w)=w0+w1e 20
All the steps are illustrated before, here I will just compute the final vectors for your
reference:
0.78 w=Φpt=[1.52 1.22 0.19 −1.93][1 ]=[−0.30]
−1.05 −0.80 0.05 1.81 1.22 1.18 1.52
11. Given the dataset: <-1, 1.6>, <0,0.95>, <1,1.2>, <2,1.9>, find the least-squares solution for
the function: y(x,w)=w +w 0
1 1+e−(x+1)
0.48 −0.49 −0.94]0.95 =[1.22]
−0.29 0.97 1.56 [1.2 ] 0.28 1.9
w=Φ t=[1.96 p −2.23
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com