程序代写 Numerical Computing, Spring 2022

Numerical Computing, Spring 2022
The Cholesky Factorization
An n × n matrix A is symmetric positive definite if vT Av > 0 for all nonzero vectors v ∈ Rn. We will show below that such a matrix can always be written in the form
where G is a lower triangular matrix. I think the discussion of the Cholesky factorization given here1 is easier to understand than the derivation given in AG Section 4.4, although theirs is certainly correct.

Suppose that A is an n×n symmetric, positive definite matrix, and that a11 = 1. Then we can write
A=􏰀1 wT􏰁 wB
where w is a column vector of length n−1 and B is an (n−1)×(n−1) symmetric, positive definite matrix. Let us carry out the first step of Gauss elimination (LU factorization), subtracting wj times the first row from the jth row, for j = 2, . . . , n, to eliminate the first column below the diagonal. We can write this as
􏰀10􏰁􏰀1wT􏰁=􏰀1 wT 􏰁. −wIwB 0B−wwT
Gauss elimination would then go on to use the second row to eliminate the second column below the diagonal, but to exploit symmetry, Cholesky goes on to apply the same transformation from the right, subtracting wj times the first column from the jth column, for j = 2, . . . , n. This gives:
􏰀 1 0􏰁􏰀1 wT􏰁􏰀1 −wT􏰁=􏰀1 wT 􏰁􏰀1 −wT􏰁 −wIwB0I 0B−wwT0I
=􏰀10􏰁. (1) 0 B−wwT
Since the inverse of
1The first part follows Numerical Linear Algebra by Trefethen and Bau.

we can write (1) as
A=􏰀1 wT􏰁=􏰀1 0􏰁􏰀1 0 􏰁􏰀1 wT􏰁.
wB wI0B−wwT 0I
This is the end of the first step of the Cholesky factorization. But now we can continue inductively — if we know that B − wwT is positive definite. But we can see this is true because of equation (1): just multiply the left and right-hand sides by vT from the left and v from the right, giving
vT 􏰀 1 0􏰁􏰀1 wT􏰁􏰀1 −wT􏰁v=vT 􏰀1 0 􏰁v>0 −w I w B 0 I 0 B−wwT
since the original matrix A is positive definite. So,
zT (B − wwT )z > 0 for all z ∈ Rn−1.
But what if a11 ̸= 1? In that case, write A=􏰀α wT􏰁
where we know α > 0 since A is positive definite. Then we can write
􏰀α wT􏰁 􏰂√α 0􏰃􏰀1 0 􏰁􏰂√α 1 wT􏰃 A==1 1T√α.
wB √αwI0B−αww 0 I Let’s write this as
A = G1A1GT1 .
Now we can apply the same idea recursively to the (n − 1) × (n − 1) bottom right submatrix of A1, giving A1 = G2A2GT2 , then to the (n − 2) × (n − 2) bottom right submatrix of A2, giving A2 = G3A3GT3 , and so on. In the last step we just have to take the square root of the final bottom right entry. This finally gives
A = G1G2 …GnGTn …GT2 GT1 or, multiplying out these triangular matrices,
A = GGT , 2

where G is lower triangular: the Cholesky factorization.
The derivation given above proves that the factorization exists for all
positive definite A, because at every stage the next submatrix is known to be positive definite. However, this derivation does not make it clear how to compute G efficiently. There is a more straightforward way to describe how to compute the factorization efficiently, now that we know it exists. We will illustrate it in the case n = 3. Let us write down the equation GGT = A as
g11 0 0  g11 g21 g22 0  0 g31 g32 g33 0
g21 g31 a11 a21 a31 g22 g32 = a21 a22 a32
0 g33 a31 a32 a33
(recalling that A is symmetric). Now let’s determine the gij one at a time, by looking at the product of the i, j entry of this equation, for all i ≥ j, as follows:
• (i=1,j=1): g2 =a11 sog11 =√a11 ifa11 >0;otherwiseAisnot 11
positive definite so stop
• (i=2,j=1): g21g11 =a21 sog21 =a21/g11
• (i=3,j=1): g31g11 =a31 sog31 =a31/g11
•(i=2,j=2):g2 +g2 =a22sog22=􏰄a22−g2 ifa22−g2 >0; 2122 2121
otherwise A is not positive definite so stop
• (i = 3, j = 2): g31g21 + g32g22 = a32 so g32 = (a32 − g31g21)/g22
•(i=3,j=3): g2 +g2 +g2 =a33 sog33 =􏰄a33−g2 −g2 if 31 32 33 31 32
a33 − g2 − g2 > 0; otherwise, A is not positive definite. 31 32
In the case where we cannot take the square root because its argument is not positive, we say that the algorithm breaks down. We know this cannot happen if A is positive definite, and it follows that the diagonal of G has to be positive if A is positive definite. Note that we only divide by the gkk which we know are positive. See my code chol3.m for the implementation. A good exercise would be to write down the algorithm for the n × n case with the appropriate nested loops, using the general property that
g2 +g2 +···+g2 =a . (2) i1 i2 ii ii
Another good exercise is to derive the algorithm for the case that A is tridiagonal (as well as symmetric positive definite).

Note that this algorithm is consistent with the bottom of AG p. 127 for the case n = 2. However, their algorithm on p. 130 overwrites the lower triangle of A with the Cholesky factor G, which makes it harder to see what is going on.
As noted on AG p. 129, very nice property of the Cholesky factor follows
directly from equation (2):
for all i,j. So, the entries in G cannot be large, compared to the original matrix entries, even though there is no pivoting. This is quite different from GE (LU) without pivoting on a general square matrix, which we know can generate arbitrarily large numbers in L or U, or even GEPP on the very rare example where some entries in U could be as big as 2n−1 (although this virtually never happens in practice).
What if A is positive semidefinite, which means that vT Av ≥ 0 for all v ∈ Rn, but singular, so not positive definite, e.g. A = 0? Then the algorithm will still break down, because even though taking the square root of zero would be OK, dividing by this subsequently would not. It is possible to use a Cholesky algorithm with pivoting to handle the positive semidefinite case, but this is not commonly used.
We can easily relate the Cholesky factorization to the LU factorization. Let D = diag(G), and write
which scales the columns of the lower triangular matrix G to give a unit
lower triangular matrix L, with ones on the diagonal, and let’s also write U = DGT
which scales the rows of the upper triangular matrix GT to become another upper triangular matrix (but not with ones on the diagonal). Then we have
LU=GD−1DGT =GGT =A,
so, since L and U are respectively unit lower triangular and upper triangular, this must be the LU factorization of A, which we see must exist even though pivoting is not used. Furthermore, Gauss elimination (LU factorization) without pivoting is stable in this case (since A is positive definite), but computing the Cholesky factorization directly is more efficient.
In matlab, the Cholesky factorization is obtained by chol, but R = chol(A) returns an upper triangular matrix R with A = RT R; in other
|gij| ≤ √aii,

words, it returns GT , not G. You can get the lower triangular factor by transposing R, or by writing G = chol(A,’lower’). The best way to find out if A is positive definite is to call chol(A), which will give an error if the algorithm breaks down due to A not being positive definite – actually, not numerically positive definite, in the sense that this could occur due to rounding error if A is positive definite but nearly singular. Another useful feature of chol is that asking for a second output argument, as in [R,k]=chol(A), will never generate an error, but will return k = 0 if A is numerically positive definite and k > 0 if the factorization breaks down at step k. In this case the leading (k − 1) by (k − 1) block of the computed R is the Cholesky factor of the leading (k − 1) by (k − 1) block of A, which is numerically positive definite.
Like the LU decomposition, the Cholesky factorization of a sparse matrix can be implemented very efficiently using sparse matrix techniques such as the system used by matlab. Furthermore, pivoting can be useful in this case to reduce fill-in, but we will not discuss this.
By the way, Andr ́e- (1875-1918) was a French military officer and mathematician who devised this method to deal with geodesic measurements. It is now one of the most important algorithms used in scientific computing, optimization and machine learning.

程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts