程序代写代做代考 scheme algorithm 1

1

Solving Rank-Deficient Linear Least-Squares Problems*

Thomas F. Coleman1 and Chunguang Sun2

Abstract

Numerical solution of linear least-squares problems is a key computational task in science and
engineering. Effective algorithms have been developed for the linear least-squares problems in
which the underlying matrices have full rank and are well-conditioned. However, there are few
efficient and robust approaches to solving the linear least-squares problems in which the
underlying matrices are rank-deficient and sparse. In this paper, we propose a new method for
solving rank-deficient linear least-squares problems. Our proposed method is mathematically
equivalent to an existing method but has several practical advantages over the existing method.
Furthermore, our proposed method is applicable to solving both dense and sparse rank-
deficient linear least-squares problems. Our experimental results demonstrate the practical
potential of our proposed method.

Section 1: Introduction

Let , .m n mA R b R×∈ ∈ The numerical solution of the linear least-squares problem

2
min

nx R
Ax b


− (1.1)

lies at the heart of many computational problems frequently arising in scientific, engineering,
and economic disciplines. Efficient algorithms are available when the matrix 𝐴𝐴 has full rank and
is well-conditioned. However, when the matrix is ill-conditioned or rank-deficient, numerical
solution of (1.1) often requires some variant of rank-revealing QR factorization (RRQR) or
singular value decomposition (SVD). The resulting solution process is relatively expensive.
Furthermore, the solution to (1.1) is not unique. Usually, the minimum-norm solution is sought.

In practical applications, the matrix 𝐴𝐴 is often large and sparse. If 𝐴𝐴 is also rank-deficient, there
are few effective algorithms available for solving (1.1). Because of the paramount need for
sparsity preservation, the algorithms involving RRQR or SVD developed for dense linear least
squares problems are not suitable for sparse linear least squares problems.

________

*This work was supported by the National Sciences and Engineering Research Council of Canada. The views expressed herein
are solely from the authors.
1. Department of Combinatorics and Optimization, University of Waterloo, 200 University Avenue West, Waterloo, ON, N2L
3G1, Canada (tfcoleman@uwaterloo.ca)
2. Faculty of Mathematics, University of Waterloo, 200 University Avenue West, Waterloo, ON, N2L 3G1, Canada
(c23sun@uwaterloo.ca)

mailto:tfcoleman@uwaterloo.ca�
mailto:c23sun@uwaterloo.ca�

2

Effective approaches (George, et al., 1980)(Heath, 1982)(Heath, 1984)(Liu, 1986)(Sun, 1996)
(Sun, 1997) have been developed to solve sparse linear least-squares problem when the
underlying matrix has full rank and is well-conditioned. Issues in handling rank deficiency in
solving sparse linear least-squares problems are considered in (Ng, 1991) and (Avron, et al.,
2009).

In this paper, we propose a new method for solving rank-deficient linear least-squares
problems. We show that our proposed method is mathematically equivalent to an existing
method. If we view both our method and the existing method as generating a sequence of
points (i.e. approximate solutions) approaching the true solution, the two methods are
mathematically equivalent in the sense that they generate the same sequence of points in exact
arithmetic. However, there are two major differences between the two methods. First, the
theoretical underpinnings of the two methods are very different. In particular, the
mathematical derivations of the two methods are drastically different. Second, the
computations at each iterative step are organized differently. Because of these differences, our
proposed method offers practical advantages over the existing method in terms of algorithmic
efficiency and applicability to handling both dense and sparse rank-deficient linear least-
squares problems.

We focus our attention upon the design and analysis of our new method for solving dense and
rank-deficient linear least-squares problems in this paper. We outline our approach to the
solution of sparse and rank-deficient linear least-squares problems. However, detailed results
on the sparse problems will be presented in (Coleman, et al., 2010).

In Section 2 we propose a new algorithm for solving rank-deficient linear least-squares
problems. We prove that our proposed algorithm is mathematically equivalent to an existing
algorithm in Section 3. In Section 4, we discuss the selection of a crucial parameter used in our
algorithm. In Section 5, we compare the practical performance of our algorithm with that of the
existing algorithm. Finally, we discuss future work and summarize our results in Section 6.

Notations. Assume 𝑘𝑘 = 𝑟𝑟𝑟𝑟𝑟𝑟𝑘𝑘(𝐴𝐴). Let 𝐴𝐴 = 𝑈𝑈𝑈𝑈𝑉𝑉𝑇𝑇 be the SVD of 𝐴𝐴. Assuming 𝑈𝑈1is the leading
submatrix of 𝑈𝑈 corresponding to the 𝑘𝑘 positive singular values, the compact SVD of 𝐴𝐴 can be
written as 𝐴𝐴 = 𝑈𝑈1𝑈𝑈1𝑉𝑉1𝑇𝑇 , where

𝑈𝑈1 = [𝑢𝑢1,𝑢𝑢2, … ,𝑢𝑢𝑘𝑘 ], 𝑉𝑉1 = [𝑣𝑣1, 𝑣𝑣2, … , 𝑣𝑣𝑘𝑘],

and

1

2
1 ,

k

σ
σ

σ

 
 
 Σ =
 
 
 

where 𝜎𝜎1 ≥ 𝜎𝜎2 ≥ ⋯ ≥ 𝜎𝜎𝑘𝑘 > 0.

3

The minimum-norm solution to the above linear least-squares problem (1.1) is given by

𝑥𝑥� = 𝐴𝐴+𝑏𝑏 = 𝑉𝑉1𝑈𝑈1−1𝑈𝑈1𝑇𝑇𝑏𝑏,

where 𝐴𝐴+is the pseudo-inverse of 𝐴𝐴. Let ˆ ˆ.r b Ax= −

Section 2: A New Method for Handling Rank Deficiency

For 𝜆𝜆 > 0 define

1( ) ( )T Tx A A I A bλ λ −= + (2.1)

Using the compact SVD, (2.1) can be written

2 11 1 1 1( ) ( ) ( )

Tx V I U bλ λ −= Σ + Σ (2.2)

From (2.2) it is clear that 𝑥𝑥(𝜆𝜆) → 𝑥𝑥� = 𝑉𝑉1𝑈𝑈1−1𝑈𝑈1𝑇𝑇𝑏𝑏 as 𝜆𝜆 → 0.

An approximate minimum-norm least-squares solution to (1.1) is therefore given by (2.2) or,
equivalently, the least-squares solution to the following full-rank problem:

1
2

2

min
0nx R

A b
x

Iλ∈

     −      
(2.3)

While there are several numerical approaches to (2.3), the stable QR factorization method can
be used, perhaps with the use of a permutation matrix Π to limit fill in the upper triangular
matrix 𝑅𝑅𝜋𝜋 (in the case where matrix A is sparse):

1
2 0

T
A R

Q


    ∏ =      

(2.4)

One general approach to solving the rank-deficient linear least-squares problem is to choose a
positive value for 𝜆𝜆, solve (2.3) by using the factorization in (2.4) to obtain 𝑥𝑥(𝜆𝜆), and then refine
𝑥𝑥(𝜆𝜆) to get (or approximate) ˆ(0) .x x= The question we address next is how to effectively do
this refinement. We discuss our approach to choosing the parameter 𝜆𝜆 in Section 4.

4

Let 𝑀𝑀 = 𝐴𝐴𝑇𝑇𝐴𝐴, 𝑑𝑑 = 𝐴𝐴𝑇𝑇𝑏𝑏. The solution to the full-rank problem (2.3) is the same as the solution
to the following system of semi-normal equations:

( )T TA A I x A bλ+ =

or
( )M I x dλ+ = (2.5)

which obviously yields the solution

1( ) x M I dλ −= + (2.6)

Differentiating (2.5) with respect to 𝜆𝜆 yields

( ) ‘ 0 x M I xλ+ + = (2.7)

or ‘ 1 2( ) ( ) .x M I x M I dλ λ− −= − + = − +

Differentiating (2.7) with respect to 𝜆𝜆 yields

( )’ “2 0 x M I xλ+ + = (2.8)

or, ” 1 ‘ 32( ) 2( )x M I x M I dλ λ− −= − + = + .

Generally,
( ) 1 ( 1)( ) .k kx k M I xλ − −= − + (2.9)

By Taylor’s theorem,

( )
1

0 ( 1)k kk
k

x x s λ

=

= + −∑ (2.10)

where 1 ( )!( ), ( ), ( 1, 2,…).

k
k kx x s x kλ λ = 

Note that by (2.9)

1 1 ( 1) 1 1( 1)! ( ) ( ) , ( 2,3,…)

k
k kks M I x M I s kλ λ

− − − −
−−

= + = − + = (2.11)

where ‘ 11 ( ) .s x M I xλ

−= = − +

5

An important computational observation is that given 𝑠𝑠𝑘𝑘−1 and ΠT(M + λI)Π = RΠ
T RΠ , vector

ks can be computed with two triangular solves, and thus in time 𝑂𝑂�𝑟𝑟𝑟𝑟𝑛𝑛(𝑅𝑅Π)�, where 𝑟𝑟𝑟𝑟𝑛𝑛(𝑅𝑅Π)
is the number of nonzeroes in the matrix 𝑅𝑅Π .

Computationally, our algorithm can be described as follows:

1) 𝑥𝑥0 = 𝑥𝑥(𝜆𝜆)
2) 𝑠𝑠0 = 𝑥𝑥0
3) 𝑓𝑓𝑓𝑓𝑟𝑟 𝑖𝑖 = 1, 2, 3 , …
𝑠𝑠𝑖𝑖 = −(𝑀𝑀 + 𝜆𝜆𝜆𝜆)−1𝑠𝑠𝑖𝑖−1
𝑥𝑥𝑖𝑖 = 𝑥𝑥𝑖𝑖−1 + (−1)𝑖𝑖𝑠𝑠𝑖𝑖𝜆𝜆𝑖𝑖

At the i -th refinement step, 𝑠𝑠𝑖𝑖 = −(𝑀𝑀 + 𝜆𝜆𝜆𝜆)−1𝑠𝑠𝑖𝑖−1 is accomplished by solving 𝑅𝑅𝑇𝑇𝑅𝑅𝑠𝑠𝑖𝑖 = −𝑠𝑠𝑖𝑖−1,
where 𝑀𝑀 + 𝜆𝜆𝜆𝜆 = 𝑅𝑅𝑇𝑇𝑅𝑅. In practice, this version of our algorithm is numerically unstable since
the product 𝑠𝑠𝑖𝑖𝜆𝜆𝑖𝑖 is formed at each refinement step. Typically, ||𝑠𝑠𝑖𝑖||2 is very large and 𝜆𝜆𝑖𝑖 very
small.

To overcome the numerical instability, we revise our algorithm as follows:

1) 𝑥𝑥0 = 𝑥𝑥(𝜆𝜆)
2) 𝑡𝑡0 = 𝑥𝑥0
3) 𝑓𝑓𝑓𝑓𝑟𝑟 𝑖𝑖 = 1, 2, 3 , …
𝑡𝑡 = 𝜆𝜆𝑡𝑡𝑖𝑖−1
𝑡𝑡𝑖𝑖 = (𝑀𝑀 + 𝜆𝜆𝜆𝜆)−1𝑡𝑡
𝑥𝑥𝑖𝑖 = 𝑥𝑥𝑖𝑖−1 + 𝑡𝑡𝑖𝑖

It is straightforward to show that

𝑡𝑡𝑖𝑖 = (−1)𝑖𝑖𝑠𝑠𝑖𝑖𝜆𝜆𝑖𝑖 , 𝑖𝑖 = 1, 2, 3, …

via mathematical induction. Therefore, our revised algorithm is equivalent to our original
algorithm. However, our revised algorithm avoids forming the product 𝑠𝑠𝑖𝑖𝜆𝜆𝑖𝑖 at each refinement
step. Therefore, the issue of numerical instability in the original algorithm has been resolved.

Now we discuss the implementation issues of our algorithm. For a given 0,λ > step 1) of our
algorithm is accomplished by solving the full-rank linear least-squares problem (2.3). Let

1
2

A
C

 
 =
 
 

(2.12)

and

.
0

f
b 

=  
 

(2.13)

6

Then problem (2.3) can be written as

2
min

nx R
Cx f


− (2.14)

where C is an (𝑚𝑚 + 𝑟𝑟)-by-𝑟𝑟 matrix and f is an (𝑚𝑚 + 𝑟𝑟)-vector. First, we compute the
“economy size” QR factorization:

,C QR= (2.15)

where Q is an (𝑚𝑚 + 𝑟𝑟)-by-𝑟𝑟 matrix with orthonormal columns and R is an 𝑟𝑟-by-𝑟𝑟 upper
triangular matrix. Let

I

II

Q
Q

Q
 

=  
 

(2.16)

where 𝑄𝑄𝜆𝜆 is an 𝑚𝑚-by-𝑟𝑟 matrix and 𝑄𝑄𝜆𝜆𝜆𝜆 is an 𝑟𝑟-by-𝑟𝑟 matrix. Second, the solution to (2.3) (i.e.
(2.14)), denoted by 0x or ( ),x λ is obtained by performing a matrix-vector multiply followed by a
triangular solve as follows:
10 ( ) ( ).

T
Ix x R Q bλ

−= = (2.17)

Mathematically, we have
( ) .T TM I C C R Rλ+ = = (2.18)

Hence, at the i -th refinement step, 1( )it M I tλ
−= + is accomplished by two triangular solves:

.T iR Rt t= (2.19)

Clearly, each iterative refinement step of our algorithm requires precisely two triangular solves,
one scalar-vector multiply, and the addition of two vectors.

It is important to note that the QR factorization (2.15) is computed only once. The overall
complexity of our algorithm consists of the arithmetic work for solving (2.3) (which consists of
the arithmetic work for the QR factorization (2.15) and the arithmetic work for (2.17)) and the
arithmetic work for N iterative refinement steps, where N is the total number of iterative
refinement steps performed. As shown in Section 5, N is typically less than or around 10.

7

Section 3: Mathematical Equivalence of Our Algorithm with an Existing
Algorithm

In this section we show that our algorithm is mathematically equivalent to an existing algorithm
in the sense that both algorithms generate the same sequence of points (i.e. approximate
solutions) approaching the true solution.

Riley (Riley, 1956) proposed the following iterative scheme (referred to as Iterated
Regularization scheme or IR scheme for short) for solving linear least-squares problems with
full column rank. Let 𝑥𝑥(0) be an arbitrary vector, solve

( ) ( ) ( )1q qT TA A I x A b xλ λ++ = + (3.1)

The sequence 𝑥𝑥(𝑞𝑞) converges to 𝑥𝑥� (the true solution) if 𝜆𝜆 > 0 since the spectral radius of
𝜆𝜆(𝐴𝐴𝑇𝑇𝐴𝐴 + 𝜆𝜆𝜆𝜆)−1 is less than 1.

Golub (Golub, 1965) observed that the Riley’s iterative scheme is equivalent to the following:

( )( ) qqr b Ax= −
( ) ( ) ( )q qT TA A I e A rλ+ = (3.2)

( ) ( ) ( )1q q qx x e+ = +

The vector 𝑒𝑒(𝑞𝑞) is the solution of the following linear least-squares problem:

( )
( )

( )
2||i |n |mq n

q

R

q

e
Ce f


− (3.3)

where

1
2

A
C

 
 =
 
 

(3.4)

and

( )

( )

0

q
q rf

 
=  
 

(3.5)

Therefore, the IR scheme can be described as follows:

𝑥𝑥(0) = 0
𝑓𝑓𝑓𝑓𝑟𝑟 𝑞𝑞 = 0, 1, 2, 3, …
a) 𝑟𝑟(𝑞𝑞) = 𝑏𝑏 − 𝐴𝐴𝑥𝑥(𝑞𝑞)
b) Solve the linear least-squares problem (3.3).
c) 𝑥𝑥(𝑞𝑞+1) = 𝑥𝑥(𝑞𝑞) + 𝑒𝑒(𝑞𝑞)

8

Golub (Golub, 1965) has shown that

𝑥𝑥(𝑞𝑞) = 𝜇𝜇1
(𝑞𝑞)𝑣𝑣1 + 𝜇𝜇2

(𝑞𝑞)𝑣𝑣2 + ⋯+ 𝜇𝜇𝑘𝑘
(𝑞𝑞)𝑣𝑣𝑘𝑘

where

𝜇𝜇𝑗𝑗
(𝑞𝑞) = �1 − �

𝜆𝜆
𝜆𝜆 + 𝜎𝜎𝑗𝑗

2�
𝑞𝑞


𝑢𝑢𝑖𝑖
𝑇𝑇𝑏𝑏
𝜎𝜎𝑗𝑗

for 𝑗𝑗 = 1, 2, … ,𝑘𝑘.

Therefore, as 𝑞𝑞 → ∞

𝑥𝑥(𝑞𝑞) =
𝑢𝑢1𝑇𝑇𝑏𝑏
𝜎𝜎1

𝑣𝑣1 +
𝑢𝑢2𝑇𝑇𝑏𝑏
𝜎𝜎2

𝑣𝑣2 + ⋯+
𝑢𝑢𝑘𝑘
𝑇𝑇𝑏𝑏
𝜎𝜎𝑘𝑘

𝑣𝑣𝑘𝑘 = 𝑥𝑥�

As indicated in (Golub, 1965), it is easy to show that

𝑒𝑒(𝑞𝑞+1) = 𝜆𝜆(𝐴𝐴𝑇𝑇𝐴𝐴 + 𝜆𝜆𝜆𝜆)−1𝑒𝑒(𝑞𝑞)
and

�𝑒𝑒(𝑞𝑞+1)�
2

𝜆𝜆
𝜆𝜆 + 𝜎𝜎𝑘𝑘

2 �𝑒𝑒
(𝑞𝑞)�

2
< �𝑒𝑒(𝑞𝑞)� 2 Therefore, a good termination procedure is to stop the iterative process as soon as �𝑒𝑒(𝑞𝑞)� 2 increases or doesn’t change. Now we prove that the IR scheme as formulated in (Golub, 1965) is equivalent to our algorithm. The IR scheme shows that 𝑥𝑥(𝑞𝑞) = 𝑒𝑒(0) + 𝑒𝑒(1) + 𝑒𝑒(2) … + 𝑒𝑒(𝑞𝑞), where 𝑥𝑥(𝑞𝑞) is the solution vector computed by the IR scheme after 𝑞𝑞 iterative steps. Clearly, 𝑒𝑒(0) = 𝑥𝑥0 = 𝑥𝑥(𝜆𝜆). Therefore 𝑥𝑥(𝑞𝑞) = 𝑥𝑥0 + 𝑒𝑒(1) + 𝑒𝑒(2) … + 𝑒𝑒(𝑞𝑞). Our algorithm shows that 𝑥𝑥𝑞𝑞 = 𝑥𝑥0 + 𝑡𝑡1 + 𝑡𝑡2 … + 𝑡𝑡𝑞𝑞 where 𝑥𝑥𝑞𝑞 is the solution vector computed by our algorithm after 𝑞𝑞 iterative steps. We prove ( )( ) 1, 2, 3, q qx x q= = … (3.6) by showing 9 ( )( ) 1, 2, 3, .q qe t q= = … (3.7) We can easily derive the following from our algorithm: ( 1)( )q qqt M I dλ λ − += + (3.8) Let 1( )TG A A Iλ λ −= + and 1( )T Th A A I A bλ −= + It has been shown in (Golub, 1965) that ( ) ( )1 1q q qx G G G I h+ −= + +…+ + (3.9) Therefore, 𝑒𝑒(𝑞𝑞) = 𝑥𝑥(𝑞𝑞+1) − 𝑥𝑥(𝑞𝑞) = (𝐺𝐺𝑞𝑞 + 𝐺𝐺𝑞𝑞−1 + ⋯+ 𝐺𝐺 + 𝜆𝜆)ℎ − (𝐺𝐺𝑞𝑞−1 + 𝐺𝐺𝑞𝑞−2 + ⋯+ 𝐺𝐺 + 𝜆𝜆)ℎ = 𝐺𝐺𝑞𝑞ℎ Thus, ( ) ( ) ( )11 1[ ( ) ] ( ) qq q T q T T qe G h A A I A A I A b M I dλ λ λ λ λ − +− −= = + + = + (3.10) Combing (3.8) and (3.10), we obtain (3.7). Therefore, we have proved (3.6). In other words, the IR scheme is equivalent to our algorithm in the sense that both algorithms generate the same sequence of approximate solutions approaching the true solution in exact arithmetic. Clearly, the mathematical derivations of the two algorithms are very different. Our algorithm relies on the application of the Taylor series expansion while the existing algorithm is an iterated regularization method. Furthermore, the computations at each iterative step are different. In Section 5, we will demonstrate that our algorithm offers several practical advantages over the IR scheme. Section 4: Selection of Lambda Given that the singular values of 𝐴𝐴 are 𝜎𝜎1,𝜎𝜎2, … ,𝜎𝜎𝑘𝑘 , 0, … , 0. It is easy to show that the singular values of the extended matrix 𝐶𝐶 as defined in (2.12) are as follows: �(𝜎𝜎1 2 + 𝜆𝜆),�(𝜎𝜎2 2 + 𝜆𝜆), … ,�(𝜎𝜎𝑘𝑘 2 + 𝜆𝜆),√𝜆𝜆, … ,√𝜆𝜆 Therefore, the conditioning number of the extended matrix 𝐶𝐶 10 𝑐𝑐𝑓𝑓𝑟𝑟𝑑𝑑(𝐶𝐶) = 𝑐𝑐𝑓𝑓𝑟𝑟𝑑𝑑(𝑅𝑅) = � 𝜎𝜎1 2 + 𝜆𝜆 𝜆𝜆 , where 𝐶𝐶 = 𝑄𝑄𝑅𝑅 is the QR factorization of 𝐶𝐶 as defined in (2.15). Clearly, the conditioning number of 𝑅𝑅 will be large if 𝜆𝜆 is small. This will have negative impact on the numerical accuracy of solving triangular systems involving 𝑅𝑅 as required by both our algorithm and the existing algorithm. Hence, 𝜆𝜆 cannot be too small. On the other hand, the following discussion shows that 𝜆𝜆 cannot be too large either. Let 𝛿𝛿 be a lower bound of the smallest non-zero singular value 𝜎𝜎𝑘𝑘 . Golub (Golub, 1965) suggested that 𝜆𝜆 be chosen so that 2 0.1 λ λ δ < + (4.1) Clearly, a wide range of choices for 𝜆𝜆 and 𝛿𝛿 would satisfy (4.1). Our approach is to choose the greatest lower bound for 𝜎𝜎𝑘𝑘 , which is 𝜎𝜎𝑘𝑘 itself. Hence, we have 2 0.1 k λ λ σ < + (4.2) Obviously, an infinite number of choices for 𝜆𝜆 would satisfy (4.2). In particular, we choose 𝜆𝜆 = 𝛽𝛽𝜎𝜎𝑘𝑘 2. Then 2 2 2 0.11 k k k βσ β βσ σ β = < + + (4.3) Thus, 𝛽𝛽 < 1 9 Our experimental results have shown that 𝛽𝛽 = 0.01 produces satisfactory results. So our choice for 𝜆𝜆 is 0.01𝜎𝜎𝑘𝑘 2. To distinguish different choices for 𝜆𝜆 in the following discussion, let 𝜆𝜆𝑠𝑠𝑣𝑣𝑑𝑑 = 0.01𝜎𝜎𝑘𝑘 2. It is well-known that computing the SVD of a matrix is an expensive process. At least, it is considerably more expensive than computing the QR factorization of the same matrix. So our approach avoids computing the SVD of the given matrix. Hence, we have no direct knowledge of 𝜎𝜎𝑘𝑘 . In other words, we cannot compute 𝜆𝜆𝑠𝑠𝑣𝑣𝑑𝑑 exactly. Our approach to choosing 𝜆𝜆 is to obtain an approximate value for 𝜆𝜆𝑠𝑠𝑣𝑣𝑑𝑑 . Our approach is described as follows: 11 • Compute the QR factorization of 𝐴𝐴: 𝐴𝐴 = 𝑄𝑄1𝑅𝑅1, where 𝑄𝑄1is 𝑚𝑚-by-𝑚𝑚, and 𝑅𝑅1is 𝑚𝑚-by-𝑟𝑟. • Let 𝑊𝑊 denote the set of absolute values of the nonzero diagonal elements of 𝑅𝑅1. Let 𝜔𝜔𝑚𝑚𝑖𝑖𝑟𝑟 and 𝜔𝜔𝑚𝑚𝑟𝑟𝑥𝑥 denote the smallest and largest elements of 𝑊𝑊, respectively. Both 𝜆𝜆1 = �̂�𝛽𝜔𝜔𝑚𝑚𝑖𝑖𝑟𝑟 2 and 𝜆𝜆2 = �̂�𝛽 𝜔𝜔𝑚𝑚𝑖𝑖𝑟𝑟 2 𝜔𝜔𝑚𝑚𝑟𝑟𝑥𝑥 2 , where �̂�𝛽 = 0.00025 produce satisfactory results. Furthermore, we observe that 𝜆𝜆𝑄𝑄𝑅𝑅 = 1 2 (𝜆𝜆1 + 𝜆𝜆2) = �̂�𝛽 𝜔𝜔𝑚𝑚𝑖𝑖𝑟𝑟 2 𝜔𝜔𝑚𝑚𝑟𝑟𝑥𝑥2 (𝜔𝜔𝑚𝑚𝑟𝑟𝑥𝑥2 + 1) 2 produces improved results over 𝜆𝜆1or 𝜆𝜆2. The following table shows 𝜆𝜆𝑠𝑠𝑣𝑣𝑑𝑑 and 𝜆𝜆𝑄𝑄𝑅𝑅 for six dense problems. Except for the problem 600x300, 𝜆𝜆𝑄𝑄𝑅𝑅 is a good approximation for 𝜆𝜆𝑠𝑠𝑣𝑣𝑑𝑑 . Section 5: Performance Comparison First, we consider the implementation details of the IR scheme. Each iterative step of the IR scheme solves a linear least-squares problem with the same (𝑚𝑚 + 𝑟𝑟)-by-𝑟𝑟 matrix C as defined in (2.12). As in our algorithm, the (economy) QR factorizationC QR= defined in (2.15) is computed only once. Assume that the matrix Q is partitioned as in (2.16). There are two possible implementations of the IR scheme: • Option 1: The matrix 𝑄𝑄 is saved (actually, it is only necessary to save the submatrix 𝑄𝑄𝜆𝜆). Then at the 𝑞𝑞-th iterative step, we solve the following triangular system: 𝑅𝑅𝑒𝑒(𝑞𝑞) = 𝑄𝑄𝑇𝑇𝑓𝑓(𝑞𝑞) = 𝑄𝑄𝜆𝜆𝑇𝑇𝑟𝑟(𝑞𝑞) Dense Problems rank 𝜆𝜆𝑠𝑠𝑣𝑣𝑑𝑑 𝜆𝜆𝑄𝑄𝑅𝑅 600x300 238 2.2441e-11 6.0530e-14 1500x1000 812 8.2283e-16 5.1143e-16 2000x1500 1212 3.7025e-17 2.4798e-17 3000x2000 1632 1.6702e-21 4.0261e-20 3500x500 383 1.8902e-19 5.2419e-19 3500x1000 812 9.4053e-18 8.5434e-18 12 Compared with our algorithm, this implementation of the IR scheme performs two extra matrix-vector multiplications 𝐴𝐴𝑥𝑥(𝑞𝑞) and 𝑄𝑄𝜆𝜆𝑇𝑇𝑟𝑟(𝑞𝑞) per iterative step. But it solves only one n n× triangular system instead of two as in our algorithm. If 𝐴𝐴 is sparse, 𝑄𝑄(or 𝑄𝑄𝜆𝜆) is often dense. Therefore, saving 𝑄𝑄(or 𝑄𝑄𝜆𝜆) would not be practical for sparse linear least-squares problems. Evidently, this implementation of the IR scheme is not suitable for solving sparse linear least-squares problems. • Option 2: The matrix 𝑄𝑄(or 𝑄𝑄𝜆𝜆) is not saved. Then at the 𝑞𝑞-th iterative step, we solve the following system of semi-normal equations (SNE): 𝑅𝑅𝑇𝑇𝑅𝑅𝑒𝑒(𝑞𝑞) = 𝐴𝐴𝑇𝑇𝑟𝑟(𝑞𝑞) Compared with our algorithm, this option performs two extra matrix-vector multiplications 𝐴𝐴𝑥𝑥(𝑞𝑞) and 𝐴𝐴𝑇𝑇𝑟𝑟(𝑞𝑞) per iterative step. In other words, our algorithm is more efficient than this implementation of the IR scheme. In the following discussion, Option 1 and Option 2 will be referred to as the Algorithm IR-Q and the Algorithm IR-SNE, respectively. All three algorithms solve exactly the same extended linear least-squares problem (2.3). The running time for solving (2.3) includes running times for forming the extended matrix 𝐶𝐶, computing the (economy) QR factorization of 𝐶𝐶, and solving the resulting triangular system 𝑅𝑅𝑥𝑥0 = 𝑄𝑄𝜆𝜆𝑇𝑇𝑏𝑏. The following table shows the running times for solving (2.3) on six dense problems Now we compare running times of the three algorithms for the iterative refinement process. We measure the running time for ten iterative steps for each of the three algorithms. The running times are summarized in the following table. Dense Problems Running Times for Solving the Extended LS Problem (Seconds) 600x300 0.0550 1500x1000 0.7882 2000x1500 2.1226 3000x2000 4.9765 3500x500 0.5450 3500x1000 1.5034 13 Overall, IR-Q is the most efficient approach since it performs only one triangular solve per iterative step. However, when the matrix 𝐴𝐴 is very skinny (that is 𝑚𝑚 ≫ 𝑟𝑟, e.g. 3500x500), it is not as efficient as our algorithm. The reason is that the forming the matrix-vector products 𝐴𝐴𝑥𝑥(𝑞𝑞) and 𝑄𝑄𝜆𝜆𝑇𝑇𝑟𝑟(𝑞𝑞) would be more expensive than a single triangular solve. The most severe drawback of Algorithm IR-Q lies in the fact that it is not suitable for solving sparse linear least- squares problems. Clearly, our algorithm is more efficient than IR-SNE since IR-SNE performs two extra matrix- vector multiplications 𝐴𝐴𝑥𝑥(𝑞𝑞) and 𝐴𝐴𝑇𝑇𝑟𝑟(𝑞𝑞) per iterative step. Furthermore, our algorithm becomes increasingly more efficient as the underlying matrix gets skinnier. We used 𝜆𝜆 = 𝜆𝜆𝑄𝑄𝑅𝑅 , discussed in the previous section, in our experiments. All three algorithms converge in about the same number of iterative steps. They converge under ten iterations for five of the six test problems. They converge in 12 iterations for the remaining test problem 3000-by-2000. As suggested in (Golub, 1965), a good termination criterion is to measure the size of the update vector 𝑡𝑡𝑞𝑞 at each step. The iterative process should stop as soon as �𝑡𝑡𝑞𝑞�2 doesn’t change or increases. For both algorithm IR-Q and our algorithm, once they converge, the following relative change to 𝑥𝑥𝑞𝑞 being computed 2 2 q q t x (5.1) remains unchanged (at least the first five significant digits of the ratio (5.1) remain unchanged) for a considerable number of subsequent iterations. This is a clear signal for terminating the iterative process. In practice, as soon as the first five significant digits of the ratio (5.1) of two consecutive iterations agree, the iterative process terminates. This stopping criterion can be easily implemented. It is worth noting that algorithm IR-SNE doesn’t have this convergent property. Dense Problems Running Times for 10 Iterative Steps (Seconds) IR-Q IR-SNE Our Algorithm 600x300 0.0031 0.0103 0.0084 1500x1000 0.0593 0.1685 0.1210 2000x1500 0.1137 0.3640 0.2813 3000x2000 0.2111 0.6392 0.4836 3500x500 0.0506 0.0799 0.0229 3500x1000 0.1077 0.2149 0.1166 14 IR-Q and our algorithm are virtually identical in terms of numerical properties. Using the stopping criterion outlined above, the number of steps required for convergence for both IR-Q and our algorithm are given in the following table: The algorithm IR-SNE also converges around the same number of steps as shown in the above table. However, the termination criterion for IR_SNE is not as easy to implement as the other two algorithms because of the lack of the convergent property discussed above. Let 𝑟𝑟(𝑞𝑞) = 𝑟𝑟𝑞𝑞 = 𝑏𝑏 − 𝐴𝐴𝑥𝑥 (𝑞𝑞) = 𝑏𝑏 − 𝐴𝐴𝑥𝑥𝑞𝑞 (𝑞𝑞 = 1, 2, 3, … ) Define 𝑅𝑅𝑅𝑅�𝑥𝑥𝑞𝑞� = �𝑥𝑥� − 𝑥𝑥𝑞𝑞�2 ||𝑥𝑥�||2 and 𝑅𝑅𝑅𝑅�𝑟𝑟𝑞𝑞� = ��̂�𝑟 − 𝑟𝑟𝑞𝑞�2 ||�̂�𝑟||2 for 𝑞𝑞 ≥ 0. 𝑅𝑅𝑅𝑅 stands for Relative Error. The following three tables show experimental results on the test problem 2000-by-1500. In the tables, column 1 represents iterative steps (from 0 to 10), column 2 shows 𝑅𝑅𝑅𝑅�𝑥𝑥𝑞𝑞�, and column 3 shows 𝑅𝑅𝑅𝑅�𝑟𝑟𝑞𝑞�. The last column shows ||𝑡𝑡𝑞𝑞 ||2/||𝑥𝑥𝑞𝑞 ||2. Dense Problems Number of Steps for Convergence (IR-Q and our algorithm) 600x300 3 1500x1000 6 2000x1500 5 3000x2000 12 3500x500 6 3500x1000 5 15 Table 1: Dense Problem 2000-by-1500 (Our Algorithm) IT 𝑅𝑅𝑅𝑅�𝑥𝑥𝑞𝑞� 𝑅𝑅𝑅𝑅�𝑟𝑟𝑞𝑞� ||𝑡𝑡𝑞𝑞 ||2/||𝑥𝑥𝑞𝑞 ||2 0 6.3065e-003 2.4782e-004 0 1 4.2058e-005 1.6469e-006 6.3023e-003 2 4.3457e-006 1.1063e-008 4.1703e-005 3 5.7820e-006 2.8667e-010 1.4719e-006 4 7.2276e-006 2.4866e-010 1.4455e-006 5 8.6731e-006 2.5100e-010 1.4455e-006 6 1.0119e-005 2.5207e-010 1.4455e-006 7 1.1564e-005 2.5067e-010 1.4455e-006 8 1.3010e-005 2.5192e-010 1.4455e-006 9 1.4455e-005 2.5199e-010 1.4455e-006 10 1.5901e-005 2.5323e-010 1.4455e-006 Table 2: Dense Problem 2000-by-1500 (Algorithm IR-Q) Table 3: Dense Problem 2000-by-1500 (Algorithm IR-SNE) IT 𝑅𝑅𝑅𝑅�𝑥𝑥𝑞𝑞� 𝑅𝑅𝑅𝑅�𝑟𝑟𝑞𝑞� ||𝑡𝑡𝑞𝑞 ||2/||𝑥𝑥𝑞𝑞 ||2 0 6.3065e-003 2.4782e-004 0 1 4.2058e-005 1.6469e-006 6.2648e-003 2 4.3457e-006 1.1058e-008 4.1702e-005 3 5.7820e-006 2.9249e-010 1.4719e-006 4 7.2276e-006 2.4562e-010 1.4455e-006 5 8.6731e-006 2.4256e-010 1.4455e-006 6 1.0119e-005 2.4222e-010 1.4455e-006 7 1.1564e-005 2.4858e-010 1.4455e-006 8 1.3010e-005 2.4483e-010 1.4455e-006 9 1.4455e-005 2.4470e-010 1.4455e-006 10 1.5901e-005 2.4480e-010 1.4455e-006 IT 𝑅𝑅𝑅𝑅�𝑥𝑥𝑞𝑞� 𝑅𝑅𝑅𝑅�𝑟𝑟𝑞𝑞� ||𝑡𝑡𝑞𝑞 ||2/||𝑥𝑥𝑞𝑞 ||2 0 6.3065e-003 2.4782e-004 0 1 4.2098e-005 1.6469e-006 6.2648e-003 2 4.9380e-006 1.1022e-008 4.1746e-005 3 6.2523e-006 3.0371e-010 2.3706e-006 4 7.3532e-006 2.4815e-010 2.1699e-006 5 8.4908e-006 2.5428e-010 2.4117e-006 6 9.7705e-006 2.9390e-010 2.3501e-006 7 1.0947e-005 2.6434e-010 2.3575e-006 8 1.2334e-005 2.2960e-010 2.4873e-006 9 1.3579e-005 2.6337e-010 2.3336e-006 10 1.4774e-005 2.4588e-010 2.3539e-006 16 Section 6: Future Work and Concluding Remarks Efficiency Enhancement In our implementations, we have computed two separate QR decompositions -- the QR decomposition of the original matrix 𝐴𝐴 to obtain 𝜆𝜆𝑄𝑄𝑅𝑅 and the QR decomposition of the extended matrix 𝐶𝐶 to obtain 𝑥𝑥(𝜆𝜆). We plan to explore the approach in which the QR decomposition of 𝐶𝐶 is not computed from scratch rather it is built upon the QR decomposition of 𝐴𝐴. The problem will then become how to compute the following QR decomposition efficiently 1 1 2 ˆ R QR Iλ     =     (6.1) where 𝐴𝐴 = 𝑄𝑄1𝑅𝑅1is the QR decomposition of the matrix 𝐴𝐴. Mathematically, 1 1 1 1 1 1 1 1 2 2 2 0 0 ˆ 0 0 A Q R RQ Q C QR QR I II I Iλ λ λ              = = = = =                 (6.2) In this context, the matrix √𝜆𝜆𝜆𝜆 is considered an 𝑟𝑟-by-𝑟𝑟 upper triangular matrix. Depending on whether 𝑅𝑅1 is stored row-by-row or column-by-column, the QR decomposition defined in (6.1) can be computed by a sequence of Givens rotations or a sequence of Householder reflections. If 𝑅𝑅1is dense, we need 𝑟𝑟(𝑟𝑟 + 1)/2 Givens rotations or 𝑟𝑟 Householder reflections to perform the QR decomposition (6.1). If 𝑅𝑅1is sparse, choosing an appropriate storage scheme for 𝑅𝑅1 is critical. Assume that 𝑅𝑅1is stored row-by-row, we need 𝑟𝑟𝑟𝑟𝑛𝑛(𝑅𝑅1) = 𝑟𝑟𝑟𝑟𝑛𝑛(𝑅𝑅) Givens rotations to compute the QR factorization. In this situation, Householder reflections would be very inefficient. Note that the sparsity structure of 𝑅𝑅1is the same as that of 𝑅𝑅. We can determine the sparsity structure of 𝑅𝑅 in a preliminary step. Therefore, we don’t need to compute the sparsity structure of 𝑅𝑅 again in carrying out (6.2). Furthermore, the sparsity structure 𝑅𝑅 can fully accommodate the process of merging √𝜆𝜆𝜆𝜆 into 𝑅𝑅1 without the need to dynamically allocate additional storage for 𝑅𝑅. We plan to investigate whether the proposed method of combining the two QR decompositions is more efficient than computing the two QR decompositions separately for both dense and sparse problems. 17 The following describes our overall approach to solving rank-deficient linear least-squares problems. 1. For a given m-by-n rank-deficient matrix 𝐴𝐴, compute the QR decomposition 𝐴𝐴 = 𝑄𝑄1𝑅𝑅1. The matrix 𝑄𝑄1 is not saved if 𝐴𝐴 is sparse. 2. Calculate 𝜆𝜆𝑄𝑄𝑅𝑅 as described in Section 4. 3. Compute the QR decomposition of the extended matrix 𝐶𝐶 as defined in (2.12) from scratch or using the method of combining the two QR decompositions described above. 4. Compute 𝑥𝑥0 = 𝑥𝑥(𝜆𝜆) using the results of the QR decomposition of 𝐶𝐶. 5. Perform the iterative refinement process. The application of our algorithm to the solution of sparse and rank-deficient linear least- squares problems is discussed in (Coleman, et al., 2010). Sparse Linear Least-Squares Problems with Dense Rows In practice, a sparse linear least-squares problem frequently contains a number of dense or nearly dense rows. The corresponding upper triangular factor 𝑅𝑅 becomes nearly or completely full. Therefore, the straightforward application of the QR decomposition method is not a viable approach to the solution of the sparse least-squares problems with dense rows. Heath (Heath, 1982) proposed a method for handling dense rows. Through row permutation, a sparse linear least-squares problem with dense rows can be easily transformed into the following equivalent problem 2 min nx R A b x E f∈     −        (6.3) where 𝐴𝐴 consists of sparse rows and 𝑅𝑅 the dense rows of the original matrix, respectively. The solution to (6.3) is obtained by first computing the solution to the sparse linear least-squares problem 2 min x Ax b− as usual and then updating the solution by the rows in 𝑅𝑅. We refer the reader to (Heath, 1982) for details. The crucial assumption made in the method described in (Heath, 1982) is that 𝐴𝐴 has full column rank. However, practical problems might not satisfy this assumption. If the matrix 𝐴𝐴 is rank-deficient, our proposed method described in this paper can be applied to obtain a solution to the problem 2 min x Ax b− . Handling dense rows in sparse linear least-squares problems is also considered in (Sun, 1995). 18 Concluding Remarks In this paper, we have proposed a new method for handling rank deficiency in solving sparse linear least-squares problems. Our proposed method is mathematically equivalent to an existing method. We have shown that our method has several practical advantages over the existing method in terms of efficiency and applicability to rank-deficient sparse problems. Our experimental results show the practical promise of our approach. We have outlined several future directions of research to further enhance the efficiency of our algorithm and to apply our algorithm to the solution of sparse linear least-squares problems with dense rows. Works Cited Avron H., Ng E. and Toledo S. Using Perturbed QR Factorizations to Solve Linear Least-Squares Problems [Article] // SIAM J. Matrix Anal. Appl.. - 2009. - 2 : Vol. 31. - pp. 674-693. Coleman T. F. and Sun C. Solving Rank-Deficient Sparse Linear Least Squares Problems [Report] / University of Waterloo. - Waterloo, Ontario, Canada : [s.n.], 2010. - Work in Progress. George J. A. and Heath M. T. Solution of Sparse Linear Least Squares Problems using Givens Rotations [Article] // Linear Algebra Appl.. - 1980. - Vol. 34. - pp. 69-83. Golub G. Numerical Methods for Solving Linear Least Squares Problems [Article] // Numerische Mathematik. - 1965. - Vol. 7. - pp. 206-216. Heath M. T. Numerical Methods for Large Sparse Linear Least Squares Problems [Article] // SIAM J. Sci. Stat. Comput.. - 1984. - Vol. 26. - pp. 497-513. Heath M. T. Some Extensions of an Algorithm for Sparse Linear Least Squares Problems [Article] // SIAM J. Sci. Stat. Comput.. - 1982. - Vol. 3. - pp. 223-237. Liu J. W. H. On General Row Merging Schemes for Sparse Givens Transformations [Article] // SIAM J. Sci. Statist. Comput.. - 1986. - Vol. 7. - pp. 127-148. Ng E. A scheme for handling rank deficiency in the solution of sparse linear least squares problems [Article] // SIAM J. Sci. Statist. Comput.. - 1991. - Vol. 12. - pp. 1173-1183. Riley J. D. Solving Systems of Linear Equations with a Positive Definite, Symmetric, but Possibly Ill- Conditioned Matrix [Article] // Math. Tables Aids Comput.. - 1956. - Vol. 9. - pp. 96-101. Sun C. Dealing with Dense Rows in the Solution of Sparse Linear Least Squares Problems [Report] / Advanced Computing Research Institute, Cornell Theory Center ; Cornell University. - Ithaca, NY : [s.n.], 1995. - CTC95TR227. Sun C. Parallel Solution of Sparse Linear Least Squares Problems on Distributed-Memory Multiprocessors [Article] // Parallel Computing. - 1997. - pp. 2075-2093. Sun C. Parallel Sparse Orthogonal Factorization on Distributed-Memory Multiprocessors [Article] // SIAM J. Sci. Comput.. - 1996. - Vol. 17. - pp. 666-685. Works Cited