CS计算机代考程序代写 algorithm STAT 513/413: Lecture 23 Iteratively Reweighted Least Squares

STAT 513/413: Lecture 23 Iteratively Reweighted Least Squares
(A dubious relative)

Recall: Cauchy regression
yi = xTi β + σεi with εi Cauchy errors (and σ known) The EM-algorithm: select β1
Calculate weights zi = 1
1 + ( y i − x Ti β 1 ) 2
σ2
Calculate β2 as a weighted least squares estimate, solving
􏰇n
zi(yi − xTi β)2 􏰊 min !
i=1 and repeat…
β
So it works via EM… but after all, it is nothing but… IRLS: Iteratively Reweighted Least Squares
(Some call it IWLS: Iteratively Weighted Least Squares)
1

Weighted least squares
How to solve this?
􏰇n
wi(yi − xTi β)2 􏰊 min !
β
i=1
As in the standard case, just take derivatives
􏰇n 􏰇n􏰇n
wi(−xi)2(yi − xTi β) = 0 that is
wixixTi β = wixiyi i=1
i=1
And when you think about it, you will see the matrix form
XTWXβ = XTWy where, as usual, X is n × p matrix and W is an n × n matrix with the wi’s on the diagonal
Also, the R function lm() has a parameter weights
i=1
2

i=1
we may want to solve
β 􏰇n
Now something else
Now weights – but now squares either; to obtain
􏰇n
ρ(yi − xTi β) 􏰊 min !
−xiψ(yi − xTi β) = 0 where ψ(u) = ρ′(u); let us say that ψ(u) = uκ(u)
i=1
􏰇n i=1
􏰇n i=1
that wi depends on the exactly same β we would like to find… … but maybe we can trick it iteratively
Then it is
−xi(yi −xTi β)κ(yi −xTi β) = 0 􏰇n
κ(yi − xTi β)xixTi β =
κ(yi − xTi β)xiyi
which is
Recognizing that? It is like wi = κ(yi − xTi β) – the only problem is
i=1
3

Select β1 (typically: the result of the least-squares fit) Calculate weights: wi = κ(yi − xTi β1)
(note: the arguments of κ are residuals of the β1 fit) Obtain β2 as the least squares fit with weights wi
And repeat
For the Cauchy: ρ(u) = log(1 + u2) Thus ψ(u)= 2u =u 2
1+u2 1+u2
Well, for the EM-algorithm it was 1 …
IRLS
1+u2
Does factor of 2 really matter? Isn’t the least-squares fit the same
if weighted by wi or 2wi?
4

Another use: quantile regression
(An insightful alternative to least squares)
Ruder Josip Boˇskovi ́c (1711-1787)
5

A highly confidential dataset
6

i
The idea
Recall one characterization of the mean: it is the number μ for
􏰇2
(y1 − μ) achieves the minimum
which the expression
Thus, least-squares regression fits the conditional mean
Does there exist some ρp which would yield us p-quantiles in a similar
way?
Yes it does!
And allows us to do, in the same manner as least-squares, to do Quantile regression: fitting conditional quantiles
qp asasolutionof
􏰇
i
ρp(y1−q)􏰊min! q
7

􏰘
ρp(u) =
(p − 1)u pu
if u 􏰋 0 if u 􏰌 0
p = 0.50
Check function(s)
p = 0.25
-20 -10 0 10 20
u
-20 -10 0 10 20
-20 -10 0 10 20
ρp as ρ in the minimization outlined above produces p-quantile(s)
In particular, ρ0.5(u) = |u|/2 – which in the minimization is equivalent to ρ0.5(u) = |u| (making an expression minimal is equivalent to making minimal its half)
uu
p = 0.75
8
(0.25 – (u < 0)) * u 0 5 10 15 (0.5 - (u < 0)) * u 0 5 10 15 (0.75 - (u < 0)) * u 0 5 10 15 For fitting the regression p-quantile, take ρp and solve 􏰇n ρp(yi − xTi β) 􏰊 min ! i=1 How? IRLS? β Summary OK, but there may be a problem with 0; there is no derivative... Be engineers: don’t worry. Everywhere else there is But what if it returns an error that I am dividing by zero? Right, don’t. Add 0.0000001 to every denominator Recall: weights wi = 1 are safe 1 + ( y i − x Ti β 1 ) 2 σ2 because of 1 in the denominator; if they were wi = 1 then could be a problem, but then we would rather make it... ( y i − x Ti β 1 ) 2 σ2 9 Mitigating possible division by zero in IRLS wi = 1 0.000001 + (yi − xTi β1)2 σ2 If we have to divide by zero, and we don’t want to, we may - replace zero by some small ε - have all denominators (if nonnegative) larger by some small ε - but NEVER EVER just omit the index i in the summation when this happens (thinking that you can return it back in later iterations): this is known to spoil the method!! Or don’t use IRLS for computing regression quantiles at all and instead compute them via linear programming, as implemented in the R package quantreg 10