STAT 513/413: Lecture 23 Iteratively Reweighted Least Squares
(A dubious relative)
Recall: Cauchy regression
yi = xTi β + σεi with εi Cauchy errors (and σ known) The EM-algorithm: select β1
Calculate weights zi = 1
1 + ( y i − x Ti β 1 ) 2
σ2
Calculate β2 as a weighted least squares estimate, solving
n
zi(yi − xTi β)2 min !
i=1 and repeat…
β
So it works via EM… but after all, it is nothing but… IRLS: Iteratively Reweighted Least Squares
(Some call it IWLS: Iteratively Weighted Least Squares)
1
Weighted least squares
How to solve this?
n
wi(yi − xTi β)2 min !
β
i=1
As in the standard case, just take derivatives
n nn
wi(−xi)2(yi − xTi β) = 0 that is
wixixTi β = wixiyi i=1
i=1
And when you think about it, you will see the matrix form
XTWXβ = XTWy where, as usual, X is n × p matrix and W is an n × n matrix with the wi’s on the diagonal
Also, the R function lm() has a parameter weights
i=1
2
i=1
we may want to solve
β n
Now something else
Now weights – but now squares either; to obtain
n
ρ(yi − xTi β) min !
−xiψ(yi − xTi β) = 0 where ψ(u) = ρ′(u); let us say that ψ(u) = uκ(u)
i=1
n i=1
n i=1
that wi depends on the exactly same β we would like to find… … but maybe we can trick it iteratively
Then it is
−xi(yi −xTi β)κ(yi −xTi β) = 0 n
κ(yi − xTi β)xixTi β =
κ(yi − xTi β)xiyi
which is
Recognizing that? It is like wi = κ(yi − xTi β) – the only problem is
i=1
3
Select β1 (typically: the result of the least-squares fit) Calculate weights: wi = κ(yi − xTi β1)
(note: the arguments of κ are residuals of the β1 fit) Obtain β2 as the least squares fit with weights wi
And repeat
For the Cauchy: ρ(u) = log(1 + u2) Thus ψ(u)= 2u =u 2
1+u2 1+u2
Well, for the EM-algorithm it was 1 …
IRLS
1+u2
Does factor of 2 really matter? Isn’t the least-squares fit the same
if weighted by wi or 2wi?
4
Another use: quantile regression
(An insightful alternative to least squares)
Ruder Josip Boˇskovi ́c (1711-1787)
5
A highly confidential dataset
6
i
The idea
Recall one characterization of the mean: it is the number μ for
2
(y1 − μ) achieves the minimum
which the expression
Thus, least-squares regression fits the conditional mean
Does there exist some ρp which would yield us p-quantiles in a similar
way?
Yes it does!
And allows us to do, in the same manner as least-squares, to do Quantile regression: fitting conditional quantiles
qp asasolutionof
i
ρp(y1−q)min! q
7
ρp(u) =
(p − 1)u pu
if u 0 if u 0
p = 0.50
Check function(s)
p = 0.25
-20 -10 0 10 20
u
-20 -10 0 10 20
-20 -10 0 10 20
ρp as ρ in the minimization outlined above produces p-quantile(s)
In particular, ρ0.5(u) = |u|/2 – which in the minimization is equivalent to ρ0.5(u) = |u| (making an expression minimal is equivalent to making minimal its half)
uu
p = 0.75
8
(0.25 – (u < 0)) * u
0 5 10 15
(0.5 - (u < 0)) * u
0 5 10 15
(0.75 - (u < 0)) * u
0 5 10 15
For fitting the regression p-quantile, take ρp and solve
n
ρp(yi − xTi β) min !
i=1 How? IRLS?
β
Summary
OK, but there may be a problem with 0; there is no derivative... Be engineers: don’t worry. Everywhere else there is
But what if it returns an error that I am dividing by zero? Right, don’t. Add 0.0000001 to every denominator
Recall: weights wi = 1 are safe 1 + ( y i − x Ti β 1 ) 2
σ2
because of 1 in the denominator; if they were wi = 1
then could be a problem, but then we would rather make it...
( y i − x Ti β 1 ) 2 σ2
9
Mitigating possible division by zero in IRLS
wi = 1
0.000001 + (yi − xTi β1)2
σ2
If we have to divide by zero, and we don’t want to, we may
- replace zero by some small ε
- have all denominators (if nonnegative) larger by some small ε
- but NEVER EVER just omit the index i in the summation when this happens (thinking that you can return it back in later iterations): this is known to spoil the method!!
Or don’t use IRLS for computing regression quantiles at all and instead compute them via linear programming, as implemented in the R package quantreg
10