Unit 1.5 Linear Least Squares
One of the main uses of linear least squares is statistical modeling.
The basic idea is to fit data.
We assume a linear relationship.
For example, we may wish to predict the maple syrup yield of a tree based on certain factors (https://peerj.com/articles/428/ (https://peerj.com/articles/428/)).
For simplicity, we will take the mean summer temperature and mean winter temperature from the year before (see the reference for a more detailed model).
1. is the yield for a given tree
1. was the mean temperature from the preceeding summer for that tree
1. was the mean tempreature from the immediately preceeding winter
Then a linear model would have the form
where the are the unknown weights for each purported contributing factor.
Once we use more than 2 records from other trees in previous years to solve for the weights , we have an
overdetermined linear system.
If we set to be the th element from the record of tree out of trees, then we get the overdetermined linear system
where, in our case, has the form
This does not have a unique solution.
Instead we want to find name “least squares”.
For a general problem with
and that have the smallest error with the data set in the sense of the 2-norm, hence the
parameters to fit, instead of just 2,
we get the linear system
To solve this system we will need to factor .
We have seen two choices, both popular in practice, the QR decomposition, and the SVD. The QR decomposition is the standard when the problem is not ill-conditioned.
For ill-conditioned problems we can use the SVD.
QR Decomposition for Least Squares
For the least squares problem, we want to be as small as possible. We write as .
Then, following Demmel 1997,
To work with this, we use the fact, not proved here, that (but is not necessarily the identity).
We then consider
with .
Thus and are orthogonal.
Recall the triangle inequality , which has equality when and are orthogonal. Using another property of orthogonal matrices, that , we get
Since the last term doesn’t depend on Thus is minimized when This means .
, it doesn’t affect the minimization. is zero.
The least squares solution to is given by .
In [ ]: In [ ]:
In [ ]: In [ ]:
A = rand(7,4)
using LinearAlgebra F = qr(A);
b = rand(7,1)
x = inv(F.R)*F.Q’*b
In Julia, the backslash operator when applied to a non-square matrix returns the least squares solution:
In[]: x=A\b
This is a nice feature, but you need to be mindful that it is not an “exact” answer in the sense of what we would get from using
a nonsingular square matrix.
SVD for Least Squares
A more complicated treatment tells us that we can also use the SVD to find the least squares solution to .
If
, has full rank (i.e., has linearly independent column vectors) and SVD given by , then
The least squares solution to is given by
In[]: U,S,V=svd(A); In [ ]: U
In [ ]: diagm(S)
In [ ]: V
In [ ]: x = V*inv(diagm(S))*U’*b