Lecture 9: Conjugate Gradients and Polynomial Analysis of Krylov Methods
Kirk M. Soodhalter
Trinity College Dublin The University of Dublin Ireland http://math.soodhalter.com
Soodhalter Parallel Numerics 5636 Conjugate Gradients and Polynomials
Class Business
• Any questions from last week?
Soodhalter Parallel Numerics 5636 Conjugate Gradients and Polynomials
Previously in Parallel Numerics…
• Steepest descent and minimum residual methods
• Build-up to Krylov subspace methods (i.e.,
Caley-Hamilton)
• Krylov subspaces and the Arnoldi iteration
• GMRES
• Implementation Details
Soodhalter Parallel Numerics 5636 Conjugate Gradients and Polynomials
Overview for today
• Understanding a deficiency of steepest descent
• Deriving the method of conjugate gradients (CG)
• A-conjugacy (i.e., orthogonality) of the search directions • Polynomial-based convergence analysis
Soodhalter Parallel Numerics 5636 Conjugate Gradients and Polynomials
Iterative Projection Methods
Definition (Ascending basis of a sequentially nested subspaces)
Let
K1 ⊂K2 ⊂···⊂Kj ⊂···⊂Rn and L1 ⊂L2 ⊂···⊂Lj ⊂···⊂Rn be two sequentially nested sequences of subspaces with
dim Ki = dim Li = i for all i. Proposition
A iterative method which at step i determines ti according to Select ti ∈Ki such that b−A(x0 +ti)⊥Li
converges in at most n steps.
• Thus, these are direct methods, though we use them as iterative methods.
Soodhalter Parallel Numerics 5636 Conjugate Gradients and Polynomials
Recall – Steepest Descent
1 2
3
4 5 6
We define a one-dimensional, A − norm error optimal method • Let A be symmetric positive-definite
• Current approximation xk and rk = b − Axk
• SetK=L=span{rk}
k = 0 while not converged do
rk ←b−Axk
αk+1 ← ⟨rk ,rk ⟩ ⟨Ark ,rk ⟩
xk+1 ← xk+1 + αk+1rk+1 k←k+1
end
• At step k, we minimize the error along the gradient direction ((k − 1)st residual).
• At step k + 1 we minimize along kth residual, disrupt improvement from previous gradient steps
Soodhalter Parallel Numerics 5636 Conjugate Gradients and Polynomials
Recall – Steepest Descent
1 2
3
4 5 6
We define a one-dimensional, A − norm error optimal method • Let A be symmetric positive-definite
• Current approximation xk and rk = b − Axk
• SetK=L=span{rk}
k = 0 while not converged do
rk ←b−Axk
αk+1 ← ⟨rk ,rk ⟩ ⟨Ark ,rk ⟩
xk+1 ← xk+1 + αk+1rk+1 k←k+1
end
• At step k, we minimize the error along the gradient direction ((k − 1)st residual).
• At step k + 1 we minimize along kth residual, disrupt improvement from previous gradient steps
Soodhalter Parallel Numerics 5636 Conjugate Gradients and Polynomials
The method of Conjugate Gradients (CG) – the basic idea
• Just as in steepest descent, we want to minimize in the direction of the previous residual, i.e.,
αk+1 =argmin∥xtrue −(xk +αrk)∥A α∈R
• But we do not want ruin minimization steps in previous directions
• CG minimizes in the direction pk = rk − k−1 ⟨rk, pi⟩ i=1
i.e.,
αk+1 =argmin∥xtrue −(xk +αpk)∥A α∈R
• CG costs only slightly more than steepest descent because rk isnaturallyorthogonalto{p1,p2,…,pk−2}.
• Rather than derive directly, we use our Krylov subspace technology, the Lanczos iteration, and FOM for SPD matrices
A
pi,
Soodhalter Parallel Numerics 5636 Conjugate Gradients and Polynomials
The method of Conjugate Gradients (CG) – the basic idea
• Just as in steepest descent, we want to minimize in the direction of the previous residual, i.e.,
αk+1 =argmin∥xtrue −(xk +αrk)∥A α∈R
• But we do not want ruin minimization steps in previous directions
• CG minimizes in the direction pk = rk − k−1 ⟨rk, pi⟩ i=1
i.e.,
αk+1 =argmin∥xtrue −(xk +αpk)∥A α∈R
• CG costs only slightly more than steepest descent because rk isnaturallyorthogonalto{p1,p2,…,pk−2}.
• Rather than derive directly, we use our Krylov subspace technology, the Lanczos iteration, and FOM for SPD matrices
A
pi,
Soodhalter Parallel Numerics 5636 Conjugate Gradients and Polynomials
The method of Conjugate Gradients (CG) – the basic idea
• Just as in steepest descent, we want to minimize in the direction of the previous residual, i.e.,
αk+1 =argmin∥xtrue −(xk +αrk)∥A α∈R
• But we do not want ruin minimization steps in previous directions
• CG minimizes in the direction pk = rk − k−1 ⟨rk, pi⟩ i=1
i.e.,
αk+1 =argmin∥xtrue −(xk +αpk)∥A α∈R
• CG costs only slightly more than steepest descent because rk isnaturallyorthogonalto{p1,p2,…,pk−2}.
• Rather than derive directly, we use our Krylov subspace technology, the Lanczos iteration, and FOM for SPD matrices
A
pi,
Soodhalter Parallel Numerics 5636 Conjugate Gradients and Polynomials
The method of Conjugate Gradients (CG) – the basic idea
• Just as in steepest descent, we want to minimize in the direction of the previous residual, i.e.,
αk+1 =argmin∥xtrue −(xk +αrk)∥A α∈R
• But we do not want ruin minimization steps in previous directions
• CG minimizes in the direction pk = rk − k−1 ⟨rk, pi⟩ i=1
i.e.,
αk+1 =argmin∥xtrue −(xk +αpk)∥A α∈R
• CG costs only slightly more than steepest descent because rk isnaturallyorthogonalto{p1,p2,…,pk−2}.
• Rather than derive directly, we use our Krylov subspace technology, the Lanczos iteration, and FOM for SPD matrices
A
pi,
Soodhalter Parallel Numerics 5636 Conjugate Gradients and Polynomials
The method of Conjugate Gradients (CG) – the basic idea
• Just as in steepest descent, we want to minimize in the direction of the previous residual, i.e.,
αk+1 =argmin∥xtrue −(xk +αrk)∥A α∈R
• But we do not want ruin minimization steps in previous directions
• CG minimizes in the direction pk = rk − k−1 ⟨rk, pi⟩ i=1
i.e.,
αk+1 =argmin∥xtrue −(xk +αpk)∥A α∈R
• CG costs only slightly more than steepest descent because rk isnaturallyorthogonalto{p1,p2,…,pk−2}.
• Rather than derive directly, we use our Krylov subspace technology, the Lanczos iteration, and FOM for SPD matrices
A
pi,
Soodhalter Parallel Numerics 5636 Conjugate Gradients and Polynomials
Convergence comparison: Steepest Descent vs. Conjugate Gradients
Here we demonstrate how much of a difference A-orthogonalizing the gradient directions makes in convergence speed
• Steepest Descent Convergence
• Conjugate Gradients Convergence
Soodhalter Parallel Numerics 5636 Conjugate Gradients and Polynomials
Recall: Arnoldi’s method and the Arnoldi relation
• The Arnoldi process iteratively builds an orthonormal basis1
• Columns of Vj = v1, …, vj ∈ Rn×m are a basis of Kj(A,r0)
Arnoldi relation
AVj = Vj+1Hj = VjHj + αvj+1eTj a where Hj ∈ Rj+1×j and Hj ∈ Rj×j are upper-Hessenberg.
aej is the jth canonical basis vector ···
···
···
···
Hj =
.. .
1A Gram-Schmidt-type process in which one orthogonalizes after each application of the operator A
..
Soodhalter Parallel Numerics 5636 Conjugate Gradients and Polynomials
Lanczos method for symmetric systems
• From the Arnoldi relation, we have Hj = VjT AVj ∈ Rj
• Note: A symmetric =⇒ Hj is symmetric
• Hj symmetric and Hessenberg =⇒ Hj is tridiagonal (we
denote Tj := Hj in this case)
α1 β2
Tj =
β2 α2 β3
β3 α3 β4 β4α4β5
β5 α5 β6 β6 α6
• Example: column j of the Arnoldi relation:
Avj =βjvj−1 +αjvj +βj+1vj+1
Definition (The Lanczos Relation and the three-term recurrence)
If A is symmetric, then AVj = Vj+1Tj = VjTj + βj+1vj+1eTj is called the Lanczos relation, and we denote
Avj = βjvj−1 + αjvj + βj+1vj+1 as the three-term recurrence.
Soodhalter Parallel Numerics 5636 Conjugate Gradients and Polynomials
Lanczos method for symmetric systems
• From the Arnoldi relation, we have Hj = VjT AVj ∈ Rj
• Note: A symmetric =⇒ Hj is symmetric
• Hj symmetric and Hessenberg =⇒ Hj is tridiagonal (we denote Tj := Hj in this case)
Tj =
α1 β2 β2 α2 β3
β3 α3 β4 β4α4β5
β5 α5 β6 β6 α6
• Example: column j of the Arnoldi relation:
Avj =βjvj−1 +αjvj +βj+1vj+1
Definition (The Lanczos Relation and the three-term recurrence)
If A is symmetric, then AVj = Vj+1Tj = VjTj + βj+1vj+1eTj is called the Lanczos relation, and we denote
Avj = βjvj−1 + αjvj + βj+1vj+1 as the three-term recurrence.
Soodhalter Parallel Numerics 5636 Conjugate Gradients and Polynomials
Lanczos method for symmetric systems
• From the Arnoldi relation, we have Hj = VjT AVj ∈ Rj
• Note: A symmetric =⇒ Hj is symmetric
• Hj symmetric and Hessenberg =⇒ Hj is tridiagonal (we denote Tj := Hj in this case)
Tj =
α1 β2 β2 α2 β3
β3 α3 β4 β4α4β5
β5 α5 β6 β6 α6
• Example: column j of the Arnoldi relation:
Avj = βj vj−1+αj vj +βj+1vj+1 =⇒ Avj ⊥ vi forall 1≤i≤j−2
Definition (The Lanczos Relation and the three-term recurrence)
If A is symmetric, then AVj = Vj+1Tj = VjTj + βj+1vj+1eTj is called the Lanczos relation, and we denote
Avj = βjvj−1 + αjvj + βj+1vj+1 as the three-term recurrence.
Soodhalter Parallel Numerics 5636 Conjugate Gradients and Polynomials
Stable LU-factorization of Tj
Proposition
Let A be symmetric and Tj = VjT AVj be the symmetric tridiagonal matrix from the Lanczos relation. If A is symmetric positive-definite then so is Tj.
Corollary
If A is symmetric positive-definite, then Tj is always nonsingular and admits a stable LU-factorization without the need for pivoting, for all j < jconv prior to convergence.
Soodhalter Parallel Numerics 5636 Conjugate Gradients and Polynomials
Recall: The Full Orthogonalization Method
An alternative to GMRES enforces the condition Lj = Kj(A,r0) Full Orthogonalization Method (FOM)
Enforcing the FOM residual constraint at iteration j is equivalent to solving
yj =H−1∥r0∥e1 j
and setting xj = x0 + Vjyj.
• The FOM approximation may not exist for every j – Hj can be singular ⇐⇒ GMRES makes no
improvement at iteration j
• Convergence can be erratic
• There is a similar progressive formulation as for GMRES
→ If A is SPD, this is how we derive the conjugate gradient method
Soodhalter Parallel Numerics 5636 Conjugate Gradients and Polynomials
Recall: The Full Orthogonalization Method
An alternative to GMRES enforces the condition Lj = Kj(A,r0) Full Orthogonalization Method (FOM)
Enforcing the FOM residual constraint at iteration j is equivalent to solving
yj =H−1∥r0∥e1 j
and setting xj = x0 + Vjyj.
• The FOM approximation may not exist for every j – Hj can be singular ⇐⇒ GMRES makes no
improvement at iteration j
• Convergence can be erratic
• There is a similar progressive formulation as for GMRES
→ If A is SPD, this is how we derive the conjugate gradient method
Soodhalter Parallel Numerics 5636 Conjugate Gradients and Polynomials
Recall: The Full Orthogonalization Method
An alternative to GMRES enforces the condition Lj = Kj(A,r0) Full Orthogonalization Method (FOM)
Enforcing the FOM residual constraint at iteration j is equivalent to solving
yj =H−1∥r0∥e1 j
and setting xj = x0 + Vjyj.
• The FOM approximation may not exist for every j – Hj can be singular ⇐⇒ GMRES makes no
improvement at iteration j
• Convergence can be erratic
• There is a similar progressive formulation as for GMRES
→ If A is SPD, this is how we derive the conjugate gradient method
Soodhalter Parallel Numerics 5636 Conjugate Gradients and Polynomials
Recall: The Full Orthogonalization Method
An alternative to GMRES enforces the condition Lj = Kj(A,r0) Full Orthogonalization Method (FOM)
Enforcing the FOM residual constraint at iteration j is equivalent to solving
yj =H−1∥r0∥e1 j
and setting xj = x0 + Vjyj.
• The FOM approximation may not exist for every j – Hj can be singular ⇐⇒ GMRES makes no
improvement at iteration j
• Convergence can be erratic
• There is a similar progressive formulation as for GMRES
→ If A is SPD, this is how we derive the conjugate gradient method
Soodhalter Parallel Numerics 5636 Conjugate Gradients and Polynomials
Unpivoted LU Leads to Progressively updated for SPD-FOM
• A is SPD =⇒ Tj = VjT AVj is SPD
→ Tj = Lj Uj is a stable pivot-free LU-factorization
• FOM at step j: xj = x0 + VjT−1 (∥r0e1∥) j
• Tj is tridiagonal =⇒
1 η1 β2
.. ...β λj 1 j ηj
• xj =x0+VjU−1L−1(∥r0e1∥) jj
Pj zj
λ2 1
λ31 η4β5 λ41
β4 Tj=λ51× ..
η5. ....
η2 β3 η3
Soodhalter Parallel Numerics 5636 Conjugate Gradients and Polynomials
Unpivoted LU Leads to Progressively updated for SPD-FOM
• A is SPD =⇒ Tj = VjT AVj is SPD
→ Tj = Lj Uj is a stable pivot-free LU-factorization
• FOM at step j: xj = x0 + VjT−1 (∥r0e1∥) j
• Tj is tridiagonal =⇒
1 η1 β2
.. ...β λj 1 j ηj
• xj =x0+VjU−1L−1(∥r0e1∥) jj
Pj zj
λ2 1
λ31 η4β5 λ41
β4 Tj=λ51× ..
η5. ....
η2 β3 η3
Soodhalter Parallel Numerics 5636 Conjugate Gradients and Polynomials
Unpivoted LU Leads to Progressively updated for SPD-FOM
• A is SPD =⇒ Tj = VjT AVj is SPD
→ Tj = Lj Uj is a stable pivot-free LU-factorization
• FOM at step j: xj = x0 + VjT−1 (∥r0e1∥) j
• Tj is tridiagonal =⇒
1 η1 β2
.. ...β λj 1 j ηj
• xj =x0+VjU−1L−1(∥r0e1∥) jj
Pj zj
λ2 1
λ31 η4β5 λ41
β4 Tj=λ51× ..
η5. ....
η2 β3 η3
Soodhalter Parallel Numerics 5636 Conjugate Gradients and Polynomials
Unpivoted LU Leads to Progressively updated for SPD-FOM
• A is SPD =⇒ Tj = VjT AVj is SPD
→ Tj = Lj Uj is a stable pivot-free LU-factorization
• FOM at step j: xj = x0 + VjT−1 (∥r0e1∥) j
• Tj is tridiagonal =⇒
1 η1 β2
.. ...β λj 1 j ηj
• xj =x0+VjU−1L−1(∥r0e1∥) jj
Pj zj
λ2 1
λ31 η4β5 λ41
β4 Tj=λ51× ..
η5. ....
η2 β3 η3
Soodhalter Parallel Numerics 5636 Conjugate Gradients and Polynomials
Deriving the CG progressive update
• Pj = VjU−1 ⇐⇒ PjUj = Vj (Ascending Basis!!) j
=⇒ η1p1=v1,andηjpj=vj+βjpj−1forj>1 • zj = (ζi) = L−1 (∥r0e1∥) ⇐⇒ Lj (ζi) = ∥r0∥e1
iji
=⇒ ζ1 =∥r0∥andζj =−λjζj−1 forj>1
zj−1 =⇒zj= ζj
• Thusxj =x0+Pjzj =x0+Pj−1zj−1+ζjpj =⇒ xj=xj−1+ζjpj
Comment
Certainly, this procedure minimizes ∥ej∥A for A SPD, but are these search directions A-orthogonal as discussed previously?
Soodhalter Parallel Numerics 5636 Conjugate Gradients and Polynomials
Deriving the CG progressive update
• Pj = VjU−1 ⇐⇒ PjUj = Vj (Ascending Basis!!) j
=⇒ η1p1=v1,andηjpj=vj+βjpj−1forj>1 • zj = (ζi) = L−1 (∥r0e1∥) ⇐⇒ Lj (ζi) = ∥r0∥e1
iji
=⇒ ζ1 =∥r0∥andζj =−λjζj−1 forj>1
zj−1 =⇒zj= ζj
• Thusxj =x0+Pjzj =x0+Pj−1zj−1+ζjpj =⇒ xj=xj−1+ζjpj
Comment
Certainly, this procedure minimizes ∥ej∥A for A SPD, but are these search directions A-orthogonal as discussed previously?
Soodhalter Parallel Numerics 5636 Conjugate Gradients and Polynomials
Deriving the CG progressive update
• Pj = VjU−1 ⇐⇒ PjUj = Vj (Ascending Basis!!) j
=⇒ η1p1=v1,andηjpj=vj+βjpj−1forj>1 • zj = (ζi) = L−1 (∥r0e1∥) ⇐⇒ Lj (ζi) = ∥r0∥e1
iji
=⇒ ζ1 =∥r0∥andζj =−λjζj−1 forj>1
zj−1 =⇒zj= ζj
• Thusxj =x0+Pjzj =x0+Pj−1zj−1+ζjpj =⇒ xj=xj−1+ζjpj
Comment
Certainly, this procedure minimizes ∥ej∥A for A SPD, but are these search directions A-orthogonal as discussed previously?
Soodhalter Parallel Numerics 5636 Conjugate Gradients and Polynomials
Deriving the CG progressive update
• Pj = VjU−1 ⇐⇒ PjUj = Vj (Ascending Basis!!) j
=⇒ η1p1=v1,andηjpj=vj+βjpj−1forj>1 • zj = (ζi) = L−1 (∥r0e1∥) ⇐⇒ Lj (ζi) = ∥r0∥e1
iji
=⇒ ζ1 =∥r0∥andζj =−λjζj−1 forj>1
zj−1 =⇒zj= ζj
• Thusxj =x0+Pjzj =x0+Pj−1zj−1+ζjpj =⇒ xj=xj−1+ζjpj
Comment
Certainly, this procedure minimizes ∥ej∥A for A SPD, but are these search directions A-orthogonal as discussed previously?
Soodhalter Parallel Numerics 5636 Conjugate Gradients and Polynomials
The Conjugate Gradient Method
Using some additional relations, we get an efficient implementation of CG.
1 Algorithm: The method of conjugate gradients
2 Computer0 =b−Ax0,p0 :=r0
3 for j = 1,2,… until convergence do
4 αj := ⟨rj,rj⟩/⟨Apj,pj⟩
5 xj+1 :=xj +αjpj
6 rj+1 :=rj −αjApj
7 βj := ⟨rj+1,rj+1⟩/⟨rj,rj⟩
8 pj+1 := rj+1 + βjpj
9 end
Soodhalter Parallel Numerics 5636 Conjugate Gradients and Polynomials
Orthogonality/conjugacy of residuals and search directions
Theorem
Let rj be the residual produced at iteration j by CG (or FOM if A is non-symmetric), and let pj be the search direction generated at iteration j according to the just-discussed relations.
• Each residual rj is such that rj = σjvj+1 for some scalar σj, meaning that the residuals form a set of orthogonal vectors.
• The search directions pi for an A-conjugate set; i.e., ⟨pi,pk⟩A = 0 for i ̸= k
Proof.
• rj ∈ Kj+1(A,r0) and rj ⊥ Kj(A,r0) =⇒ rj ∈ span{vj+1}
Soodhalter Parallel Numerics 5636 Conjugate Gradients and Polynomials
Orthogonality/conjugacy of residuals and search directions
Theorem
Let rj be the residual produced at iteration j by CG (or FOM if A is non-symmetric), and let pj be the search direction generated at iteration j according to the just-discussed relations.
• Each residual rj is such that rj = σjvj+1 for some scalar σj, meaning that the residuals form a set of orthogonal vectors.
• The search directions pi for an A-conjugate set; i.e., ⟨pi,pk⟩A = 0 for i ̸= k
Proof.
• rj ∈ Kj+1(A,r0) and rj ⊥ Kj(A,r0) =⇒ rj ∈ span{vj+1}
• PTAPj =U−TVTAVjU−1 jjjj
Soodhalter Parallel Numerics 5636 Conjugate Gradients and Polynomials
Orthogonality/conjugacy of residuals and search directions
Theorem
Let rj be the residual produced at iteration j by CG (or FOM if A is non-symmetric), and let pj be the search direction generated at iteration j according to the just-discussed relations.
• Each residual rj is such that rj = σjvj+1 for some scalar σj, meaning that the residuals form a set of orthogonal vectors.
• The search directions pi for an A-conjugate set; i.e., ⟨pi,pk⟩A = 0 for i ̸= k
Proof.
• rj ∈ Kj+1(A,r0) and rj ⊥ Kj(A,r0) =⇒ rj ∈ span{vj+1}
• PTAPj =U−TTjU−1 jjj
Soodhalter Parallel Numerics 5636 Conjugate Gradients and Polynomials
Orthogonality/conjugacy of residuals and search directions
Theorem
Let rj be the residual produced at iteration j by CG (or FOM if A is non-symmetric), and let pj be the search direction generated at iteration j according to the just-discussed relations.
• Each residual rj is such that rj = σjvj+1 for some scalar σj, meaning that the residuals form a set of orthogonal vectors.
• The search directions pi for an A-conjugate set; i.e., ⟨pi,pk⟩A = 0 for i ̸= k
Proof.
• rj ∈ Kj+1(A,r0) and rj ⊥ Kj(A,r0) =⇒ rj ∈ span{vj+1}
• PTAPj = U−TLj jj
Soodhalter Parallel Numerics 5636 Conjugate Gradients and Polynomials
Orthogonality/conjugacy of residuals and search directions
Theorem
Let rj be the residual produced at iteration j by CG (or FOM if A is non-symmetric), and let pj be the search direction generated at iteration j according to the just-discussed relations.
• Each residual rj is such that rj = σjvj+1 for some scalar σj, meaning that the residuals form a set of orthogonal vectors.
• The search directions pi for an A-conjugate set; i.e., ⟨pi,pk⟩A = 0 for i ̸= k
Proof.
• rj ∈ Kj+1(A,r0) and rj ⊥ Kj(A,r0) =⇒ rj ∈ span{vj+1} • PTAPj = U−TLj
jj
• Thus PTj APj is lower-triangular and symmetric
Soodhalter Parallel Numerics 5636 Conjugate Gradients and Polynomials
Orthogonality/conjugacy of residuals and search directions
Theorem
Let rj be the residual produced at iteration j by CG (or FOM if A is non-symmetric), and let pj be the search direction generated at iteration j according to the just-discussed relations.
• Each residual rj is such that rj = σjvj+1 for some scalar σj, meaning that the residuals form a set of orthogonal vectors.
• The search directions pi for an A-conjugate set; i.e., ⟨pi,pk⟩A = 0 for i ̸= k
Proof.
• rj ∈ Kj+1(A,r0) and rj ⊥ Kj(A,r0) =⇒ rj ∈ span{vj+1} • PTAPj = U−TLj
jj
• Thus PTj APj is lower-triangular and symmetric • Thus it is diagonal.
Soodhalter Parallel Numerics 5636 Conjugate Gradients and Polynomials
Optimal short-term recurrence methods for all matrices?
Definition
A matrix A ∈ CG(s) if there is an error-optimal method based on an s-term recurrence ∀ r0 (Which inner product??).
Theorem (Faber-Manteuffel, who won ≈$600 for proving it) A ∈ CG(s) if and only if degp(char) ≤ s or A is normal with
AT =ps(A)anddegps≤s−1.
Comments
• Only the case deg ps ≤ 1 is non-trivial.
=⇒ AiseithersymmetricorA=±(ρI+B)with
B = −BT
• F-M Theorem equivalent to/implies: A admits a tridiagonal factorization A = VT AV where we can choose the first column of V to be any unit norm vector v1.
A
Soodhalter Parallel Numerics 5636 Conjugate Gradients and Polynomials
Optimal short-term recurrence methods for all matrices?
Definition
A matrix A ∈ CG(s) if there is an error-optimal method based on an s-term recurrence ∀ r0 (Which inner product??).
Theorem (Faber-Manteuffel, who won ≈$600 for proving it) A ∈ CG(s) if and only if degp(char) ≤ s or A is normal with
AT =ps(A)anddegps≤s−1.
Comments
• Only the case deg ps ≤ 1 is non-trivial.
=⇒ AiseithersymmetricorA=±(ρI+B)with
B = −BT
• F-M Theorem equivalent to/implies: A admits a tridiagonal factorization A = VT AV where we can choose the first column of V to be any unit norm vector v1.
A
Soodhalter Parallel Numerics 5636 Conjugate Gradients and Polynomials
Optimal short-term recurrence methods for all matrices?
Definition
A matrix A ∈ CG(s) if there is an error-optimal method based on an s-term recurrence ∀ r0 (Which inner product??).
Theorem (Faber-Manteuffel, who won ≈$600 for proving it) A ∈ CG(s) if and only if degp(char) ≤ s or A is normal with
AT =ps(A)anddegps≤s−1.
Comments
• Only the case deg ps ≤ 1 is non-trivial.
=⇒ AiseithersymmetricorA=±(ρI+B)with
B = −BT
• F-M Theorem equivalent to/implies: A admits a tridiagonal factorization A = VT AV where we can choose the first column of V to be any unit norm vector v1.
A
Soodhalter Parallel Numerics 5636 Conjugate Gradients and Polynomials
Convergence Analysis
Recall from Caley-Hamilton
A−1 =q(A)withdegq≤n−1. Thismeansthat Ax=b =⇒ x=q(A)b,and
(1 − Aq(A)) b = 0
=:r(A)
Definition
For any Krylov subspace method, if xj = x0 + pj(A)r0, then we denote as the residual polynomial rj(x), with
rj =r0−Apj(A)r0
Soodhalter Parallel Numerics 5636 Conjugate Gradients and Polynomials
Convergence Analysis
Recall from Caley-Hamilton
A−1 =q(A)withdegq≤n−1. Thismeansthat Ax=b =⇒ x=q(A)b,and
(1 − Aq(A)) b = 0
=:r(A)
Definition
For any Krylov subspace method, if xj = x0 + pj(A)r0, then we denote as the residual polynomial rj(x), with
rj =(I−Apj(A))r0
Soodhalter Parallel Numerics 5636 Conjugate Gradients and Polynomials
Convergence Analysis
Recall from Caley-Hamilton
A−1 =q(A)withdegq≤n−1. Thismeansthat Ax=b =⇒ x=q(A)b,and
(1 − Aq(A)) b = 0
=:r(A)
Definition
For any Krylov subspace method, if xj = x0 + pj(A)r0, then we denote as the residual polynomial rj(x), with
rj =rj(A)r0
Soodhalter Parallel Numerics 5636 Conjugate Gradients and Polynomials
Convergence Analysis
Recall from Caley-Hamilton
A−1 =q(A)withdegq≤n−1. Thismeansthat Ax=b =⇒ x=q(A)b,and
(1 − Aq(A)) b = 0
=:r(A)
Definition
For any Krylov subspace method, if xj = x0 + pj(A)r0, then we denote as the residual polynomial rj(x), with
rj =rj(A)r0
• Thus we want rj(A)r0 as close to 0 as possible! =⇒ rj(x) has roots at the eigenvalues of A
Soodhalter Parallel Numerics 5636 Conjugate Gradients and Polynomials
General Polynomial-based Convergence Results
Residual norm in terms of the polynomial
Observe that for A diagonalizable, rj(A) = Xrj(Λ)X−1. Thus we have
∥rj∥ = ∥rj(A)r0∥
Soodhalter Parallel Numerics 5636 Conjugate Gradients and Polynomials
General Polynomial-based Convergence Results
Residual norm in terms of the polynomial
Observe that for A diagonalizable, rj(A) = Xrj(Λ)X−1. Thus we have
∥rj∥ = Xrj(Λ)X−1r0
Soodhalter Parallel Numerics 5636 Conjugate Gradients and Polynomials
General Polynomial-based Convergence Results
Residual norm in terms of the polynomial
Observe that for A diagonalizable, rj(A) = Xrj(Λ)X−1. Thus we have
∥rj ∥ = Xrj (Λ)X−1r0 ≤ ∥X∥ X−1 ∥rj (Λ)∥ ∥r0∥
Soodhalter Parallel Numerics 5636 Conjugate Gradients and Polynomials
General Polynomial-based Convergence Results
Residual norm in terms of the polynomial
Observe that for A diagonalizable, rj(A) = Xrj(Λ)X−1. Thus we have
∥rj∥ = Xrj(Λ)X−1r0 ≤ κ(X)∥rj(Λ)∥∥r0∥
Soodhalter Parallel Numerics 5636 Conjugate Gradients and Polynomials
General Polynomial-based Convergence Results
Residual norm in terms of the polynomial
Observe that for A diagonalizable, rj(A) = Xrj(Λ)X−1. Thus we have
∥rj∥=Xrj(Λ)X−1r0≤∥r0∥κ(X)max{|rj(λi)||λi ∈σ(A)} i
Soodhalter Parallel Numerics 5636 Conjugate Gradients and Polynomials
General Polynomial-based Convergence Results II
Comment about GMRES
For GMRES, one can use this polynomial formulation to get residual bounds which are pessimistic. See Saad’s book for further information. One thing to note is that for nonsymmetric problems, the condition of the eigenvectors (i.e., how close to orthonormal they are ⇐⇒ how close the matrix is to normal plays a large role in the effectiveness of these bounds
Proposition (Convergence for symmetric systems)
The convergence of Krylov subspace methods for symmetric matrices is completely determined by the distribution of the eigenvalues, and we have the bound
∥rj∥=≤∥r0∥max{|rj(λi)||λi ∈σ(A)} i
Soodhalter Parallel Numerics 5636 Conjugate Gradients and Polynomials
General GMRES convergence
Theorem (Greenbaum and Strakos )
For any right-hand side b, any x0, any set of n monotonically decreasing residual norms, and any set of n eigenvalues, there exists a matrix A with the prescribed eigenvalues such that GMRES applied to Ax = b with starting vector x0 produces that prescribed sequence of residual norms.
Soodhalter Parallel Numerics 5636 Conjugate Gradients and Polynomials
General GMRES convergence
Theorem (Greenbaum and Strakos )
For any right-hand side b, any x0, any set of n monotonically decreasing residual norms, and any set of n eigenvalues, there exists a matrix A with the prescribed eigenvalues such that GMRES applied to Ax = b with starting vector x0 produces that prescribed sequence of residual norms.
• Thus the condition number of the eigenvectors (i.e,. the normality of the matrix) can play a huge role in how predictive eigenvalues are in the convergence behavior of GMRES
Soodhalter Parallel Numerics 5636 Conjugate Gradients and Polynomials
What did we talk about today?
• The method of conjugate gradients • The problem with steepest descent • The Lanczos iteration
• Convergence analysis
Further reading
• Chapter 6 in Y Saad. Iterative Methods for Sparse Linear Systems. SIAM, 2003.
• Available legally online at: http://www-users.cs.umn.edu/~saad/books.html
Soodhalter Parallel Numerics 5636 Conjugate Gradients and Polynomials
Next time: an actual return to parallelism!!!!
• Preconditioning to improve eigenvalue distribution • Sparse matrix-vector products
• Parallelized iterative solvers
Soodhalter Parallel Numerics 5636 Conjugate Gradients and Polynomials