Computational Methods
for Physicists and Materials Engineers
4
Systems of Linear Equations II Iterative Method
xi are unknowns to be determined Aij and bi are coefficients (knowns)
Matrix form:
System of linear equations
N linear equations: AxAxAxb
11 1 12 2 1N N 1 AxAxAxb
21 1 22 2 2N N 2
AxAxAxb N1 1 N2 2 NN N N
AAAxb 11 12 1N 1 1
21 22 2N 2 2 AAAxb
A A Ax b
N1 N2 NN N N A xb
https://www.cise.ufl.edu/research/sparse/matrices/
Rothberg/gearbox.html
Size of K is
153,746 153,746 = 23,637,832,516 9,080,404 nonzero elements
Iterative method: guess the solution and iteratively modify the guess
Iterative method Ax b
By direct method, we obtain the exact answer (if we don’t consider rounding error) in the last step. The computational complexity for LU and QR is N3 if the coefficient matrix A is N N
For most of problems of practical interest, A is a large‐N, sparse matrix E.g. FEM Ku = f
For aircraft flap actuator
Guess a solution x0 (probably it’s bad)
Residual
Correct the guess by residual
bAx0
x1 x0 C(bAx0)f(x0) Convergent? check error
How to correct guess s.t. error converges to 0?
If x* exists s.t. x* = f(x*)? (Fixed point theorem) How to choose f?
Correct the guess by residual
x2 x1 C(bAx1)f(x1)
Idea of iterative method Ax b
x1 x0
Definition of magnitude? Vector norm
How to estimate error?
⁞
Vector norm: “size” of a vector Norm ‖x‖ has the properties:
• Positivity:
• Definiteness:
• Homogeneity:
• Triangle inequality:
‖x‖ ≥ 0
‖x‖ = 0 if and only if x = 0 ‖αx‖ = |α| ‖x‖
‖x + y‖ ≤ ‖x‖ + ‖y‖
y x+y
forallx,yXandallα
When x , norm is just absolute value |x|
x
2
x y x2 y2 2 xy x2 y2 2xy x y 2 x y x y
For example, lp norm:
N
p1/p
xx
p
i
i1
p i1 l norm is called maximum norm
i1,,N
Vector norm: “size” of a vector
2
i
i1
N p 1/p x x
p
i1 N
i
x1 x i1
i
21/2
x x xTx
N
l2 norm is called Euclidean norm because it is just the length of a vector
N
p1/p
xlim x maxx ii
2 x 3
i1
3 2 1/2
Vector norm: “size” of a vector
Equivalence of the norms
Theorem: on a finite‐dimensional vector space, all norms are equivalent. When one norm goes to 0, all the other norms go to 0
x1 xx
3
x1xi x1 x2 x3
x x x2 x2 x2 (lengthofx)
2
i 123 i1
x maxxmaxx,x,x i1,2,3 i 1 2 3
Onlywhenx1,x2,x3 0atthesametime,‖x‖1 0.Andatthesame time ‖x‖2 and ‖x‖ 0
We can prove that
N
A 1 max A
Frobenius norm
AF A
2
Matrix norm: “size” of a matrix
Definition of matrix norm is based on vector norm
A sup Ax x 1
“sup”: supremum, the least upper bound; not necessarily maximum. E.g. {x | x < 2} only has supremum 2
k1,,N i1
ik
N i1,,N k1
A max
N 1/2
A2A2
ik
i,k1 MN
ik i1 k1
AxAx xsupAx xA x x x1 x
A ik
E.g. in 3D
AAA 11 12 13
N Recall x 1 x
AAAA 21 22 23
Matrix norm: “size” of a matrix
AAA i1
i
31 32 33
A max A max A A A maxc ,c ,c 3
ik 1k 2k 3k 1 2 3
1 k1,2,3 i1 k1,2,3 1 1 1
3
ik i1 i2 i3 1 2 3
A max A max A A A maxr ,r ,r
i1,2,3 k1 i1,2,3 1 1 1 3 1/2
A A 2 A2 A2 A2 A2 A2 A2 A2 A2 A2
2
ik 11 12 13 21 22 23 31 32 33 i,k1
import numpy as np
# norms of vectors
# norms of matrices
A = np.array([[-1, 1, 1],
[ 2, -2, 0]])
NumPy: norms of vectors/matrices
numpy.linalg.norm()
a = np.array([-2, 1, 0])
v_L1_norm = np.linalg.norm(a, 1)
v_L2_norm = np.linalg.norm(a, 2)
v_Linf_norm = np.linalg.norm(a, np.inf) print(v_L1_norm); print(v_L2_norm); print(v_Linf_norm)
M_Lfro_norm = np.linalg.norm(A, 'fro')
M_L1_norm = np.linalg.norm(A, 1)
M_L2_norm = np.linalg.norm(A, 2)
M_Linf_norm = np.linalg.norm(A, np.inf) print(M_Lfro_norm); print(M_L1_norm); print(M_L2_norm); print(M_Linf_norm)
# default
# default
Solve the equations: E.g.
f(x)x f(x) xx
Initial guess: Update: Update:
∙∙∙
f(x) x0
f(x)
‖x1 – x0‖ is small enough? NO
Update: Solutionis
∙∙∙
xn+1:= f(xn) x* ≈xn+1
x1 x2
:= f(x0) := f(x1)
‖x2 – x1‖ is small enough? NO ∙∙∙
‖xn+1 – xn‖ is small enough? YES
General idea of iterative method
Axb(AI)xbx or (AI)1x(AI)1bx or
If it converges to x*, f(x*) = x* and x* is the solution (fixed point of f)
However, does {x0, x1, x2, ∙∙∙} converge? or lim xn1 xn 0 n
Exact solution:
f (x) 0.1x 10 x 0.1x*10x*x* 10 11.111
Iterative solution:
General idea of iterative method
Initial guess: x0
Update: x1
Update: x2
Update: x3
Update: x4
Update: x5
If we think it is good enough, x* ≈ 11.111
|x1 –x0|=10 |x2 –x1|=1
|x3 – x2| = 0.1 |x4 – x3| = 0.01 |x5 – x4| = 0.001
:=0
:= f(0)
:= f(10)
:= f(11)
:= f(11.1) := f(11.11)
=0
= 10
= 11
= 11.1
= 11.11 = 11.111
Example
0.9
Exact solution:
f (x) 10x 0.1 x
10 x * 0.1 x* x* 0.1 0.0111
Iterative solution:
Initial guess: x0 Update: x1 Update: x2 Update: x3 Update: x4 Update: x5 Iteration diverges
:=0
:= f(0)
:= f(0.1)
:= f(1.1)
:= f(11.1) := f(111.1)
=0
= 0.1
= 1.1
= 11.1
= 111.1 = 1111.1
|x1 – x0| = 0.1 |x2 – x1| =1
|x3 – x2| = 10 |x4 – x3| = 100 |x5 – x4| = 1000
General idea of iterative method
Whether {x0, x1, x2, ∙∙∙} converges or not depends on the form of function f(x). Which kind of f(x) guarantees convergence?
Example
9
From 1‐equation/1‐variable to N‐equation/N‐variable f (x) x f(x) x
f(x,x,,x) x 1 1 2 n 1
f(x) is linear xx1 x
f(x,x,,x) x 212n2
fN(x1,x2,,xN) xN
121 x1x22 x2
f(x,x,,x)x 112 N1
f(x) is nonlinear (next lecture)
f(x,x,,x)x 212N2
0.5cosx1 0.5sinx2 0.5sinx1 0.5cosx2
x1 x2
f(x,x,,x) x
N 1 2 NN
Whether {x0, x1, x2, ∙∙∙} converges or not depends on the form of function f(x). Which kind of f(x) guarantees convergence?
Convergent sequence
A sequence {xn} = {x0, x1, x2, ∙∙∙} is called convergent if there exists an
element x* s.t. xn
lim xn x* 0 or limxn x* n n
n
0 1 2 3 4 5 6 7 8 9 10 11 12
∙∙∙ 10000 ∙∙∙
E.g. 11 lim
2n n2n 0 x 21/n , limx lim21/n 1
n nnn 1
nsin1 1 limnsin
n n n from sympy import *
n = symbols('n', integer=True, nonnegative=True) pprint(limit(1/2**n, n, oo)) pprint(limit(2**(1/n), n, oo)) pprint(limit(n*sin(1/n), n, oo))
Contraction
An operator f is called contraction if there exists a constant q [0, 1) s.t.
f(x)f(y) q xy We may think of f as a function
for all x, y D 1‐variable example:
f(x)0.1x10
f (x) f (y) 0.1 x y
So, this f(x) is contraction
f(x)10x10
f (x) f (y) 10 x y
So, this f(x) is NOT contraction
When f(x) is contraction, iteration for f(x) = x will converge
Banach fixed point theorem
Existence of a unique fixed point
Let f be a contraction. There exists a unique element x* s.t. f(x*)x*
This x* is called fixed point of f, because f maps x* to itself
1‐variable example:
f 100 0.1 100 10 Fixed point is x* 100
f(x)0.1x10 999
How to find the fixed point
Start from an arbitrary element x0. Define a sequence {xn} by recursion: xn1 :f(xn)
We can find the fixed point by
limxn x* n
1‐variable example:
f (x) 0.1x 3 (contraction)
Exact solution for fixed point:
0.1x*3x*x* 3 0.9
xn 0.1xn1 3
xn1 : f(xn)0.1xn 3 3)30.12 x (0.10 0.11)3
0.1(0.1x 0.12(0.1x
n2
3)(0.10 0.11)30.13 x
n2 n3
n3
(0.10 0.11 0.12)3 10.1n
Banach fixed point theorem
0.1 x0 (0.1 0.1 0.1 )30.1 x0 10.1 3
n 0 1 n1 n
1rn Sum of a geometric series: ar0 ar1 ar2 arn1 a 1r
limxn 3 n 0.9
Exact solution for fixed point:
10 x * 3 x* x* 3 9
xn 10xn1 3
xn1 : f(xn)10xn 3 3)3102 x (100 101)3
10(10x 102(10x
n2
3)(100 101)3103 x
n2 n3
n3
(100 101 102)3 110n
Banach fixed point theorem
1‐variable example:f (x) 10x 3 (not a contraction)
10n x0 (100 101 10n1)310n x0 110 3
limxn n
xn 0.1xn1 0.3
0.1(0.1x 0.12(0.1x
n2 0.3)(0.10 0.11)0.30.13 x
n2 n3
n3
(0.10 0.11 0.12)0.3 10.1n
Banach fixed point theorem
1‐variable example:f (x) 10x 3 (not a contraction)
We can rearrange the equation to construct a convergent recursion
10x 3 x 0.1x 0.3 x
f (x) 0.1x 0.3 (this is a contraction)
xn1 : f(xn)0.1xn 0.3 0.3)0.30.12 x (0.10 0.11)0.3
0.1 x0 (0.1 0.1 0.1 )0.30.1 x0 10.1 0.3
n 0 1 n1 n
limxn 3 n 9
Banach fixed point theorem: proof Starting from an arbitrary x0 D, construct {xn} by recursion:
Is this sufficient to prove convergence? NO E.g.
xn1 :f(xn) (n0,1,) f is contraction
x x f ( x ) f ( x ) q x x n1 n n n1 n n1
2 n n
q f(xn1)f(xn2) q xn1 xn2 q x1 x0 0
limxn1xn 0 n
xn n
limxn1xn lim n1 n0
n n
But, obviously, {xn} does not converge with n
Stronger convergence condition:
Banach fixed point theorem: proof
lim xmxn0 n ,m
If {xn} satisfies this condition, it is called Cauchy sequence xn n
mα2n lim x x lim α2n n limα1 n n,m m n n n
So, {xn} is NOT Cauchy sequence. It does not converge xn1 : f(xn ) f is contraction
Prove that {xn} is Cauchy sequence, i.e.,
lim xmxn 0 n ,m
Banach fixed point theorem: proof
xn1 : f(xn ) f is contraction Prove that {xn} is Cauchy sequence
( q q q q ) x 1 x 0 0
Is this sufficient to prove convergence? NO
E.g.
x 11n n
n
x x qn x x n1n 10
xm xn xm xm1 xm1 xm2 xn2 xn1 xn1 xn
xx x x x x x x
triangle inequality
n1
m m1 m1 m2 n2 n1 m m1 n2 n1 n ,m
n
1n
limxn lim1 e (irrationalnumber)
n n n
But xn must be a rational number for any n. So, whichever n, xn can never reach the limit e
One additional condition:
Banach fixed point theorem: proof
xn1 :f(xn) For this sequence, there exists x* s.t.
lim f(xn)f(x*) qlim xn x* 0 n n
or limf(xn)f(x*) n
So, x* is the fixed point
xn U and limxn U n
If in the domain U all Cauchy sequences satisfy this condition, this domain U is called complete; as below, we only consider all elements in the complete domain
If f(x) is contraction, we can generate sequence by recursion:
limxn x* n
x*limxn1 limf(xn)f(x*) n n
Priori (earlier) and posteriori (later) errors
Let f be a contraction with constant q < 1. The sequence constructed by
xn1 :f(xn)
converges to the unique fixed point x* with arbitrary initial guess x0
Priori error ‖x1 ‒ x0‖ bounds the exact error at step (n) via qn
xn x* 1q x1 x0
Posteriori error ‖xn ‒ xn‒1‖ bounds the exact error at step (n) via
Proof is simple
Pickx asx n‒1 0
xx*q xx n 1q 1 0
xn xm lim xn xm
(qn qn1 qm2 qm1) x1 x0
lim(qn qn1 qm2 qm1) x1 x0
m
m n
xnx* q xnxn1 1q
min n for convergence
If the desired accuracy is ε s.t. ‖xn ‒ x*‖ ≤ ε, how many iterative steps are needed? (i.e., n = ?)
qn ln(1q)1 ε xnx*1qx1x0 ε n lnq lnqlnxx
0.0 0.2 0.4 So, small q leads to fast convergence
q
100
80
60
40
20
0
0.6 0.8 1.0
Posteriori error is used to check accuracy in the process of computation and terminate iterations when the desired accuracy is reached
10
Construct an operator:
f(x)Bxz
f(x)f(y) B(xy) B xy
‖B‖ plays the role of constant q. f is contraction when ‖B‖ < 1 Hence, we can construct sequence by recursion
xn1 :f(xn)Bxn z
With arbitrary initial guess x0, it converges to a unique fixed point x* s.t.
Priori error estimate
Bn xnx*1B x1x0
x*f(x*)Bx*z or (IB)x*z
Posteriori error estimate B xnx*1B xnxn1
A
11
A
22
D
A
N1,N1 A
NN
0AAA
0
121,N11N 0AA
A0 21
Jacobi iteration Ax b ADAL AU
2,N1 2N AL AU
AA00A
N1,1 N1,2
N1,N
A AA 0 0 N1N2 N,N1
A 11
A1
A1
A 22
22
D1
D
A
A1
N1,N1
A
N1,N1
1 NN
0
121,N11N 0AA
A0 21
Jacobi iteration Ax b ADAL AU
Inverse of a
straightforward 11
NN
0AAA
2,N1 2N AL AU
AA00A
N1,1 N1,2
N1,N
A AA 0 0 N1N2 N,N1
A
diagonal matrix is
Assume that all diagonal elements are nonzero xD1(AL AU)xD1b
(DAL AU)xb Dx(AL AU)xb
Recall Banach fixed point theorem: when ‖B‖ < 1, we can construct
Bz
sequence by recursion
With arbitrary initial guess x0, it converges to a unique fixed point x* s.t.
Jacobi iteration:
xn1 :D1(AL AU)xn D1b
xn1 :Bxn z x*Bx*z
xn1 :xn D1(bAxn) residual
xn1 :D1(AL AU)xn D1b
ik i
x N A x b ( i 1 , , N )
n1,i
ki ii ii
n,k k1A A
Jacobi iteration algorithm
eps=1e-10 #convergencecriterionforposteriorierror x = zeros(N) # initial guess
for n = 1, ···, max_n
xprev = x # store x obtained from previous iterative step
for i = 1, ···, N
sum = 0
Method of simultaneous displacements
for k = 1, ···, N
if k == i
continue
sum = sum - A[i][k] * xprev[k]
x[i] = (sum + b[i]) / A[i][i]
if norm(x – xprev) <= eps
break
if x is None: x = np.zeros(N)
for count in range(1, max_n+1):
xprev = x.copy()
for i in range(N):
Can be simplified by element‐wise operations
xn1 :D1(AL AU)xn D1b
ik i
x N A x b ( i 1 , , N )
n1,i
ki ii ii
sum = 0
for k in range(N):
n,k k1A A
Jacobi iteration code
def jacobi(A, b, x=None, eps=1e-10, max_n=1000000): N = A.shape[0]
if k == i: continue
sum -= A[i,k] * xprev[k]
x[i] = (sum + b[i]) / A[i,i]
if np.linalg.norm(x - xprev) < eps: break print(f"Number of iteractions is {count}") return x
xn1 :D1(AL AU)xn D1b
ik i
x N A x b ( i 1 , , N )
n1,i
ki ii ii
n,k k1A A
Jacobi iteration code
def jacobi(A, b, x=None, eps=1e-10, max_n=1000000): N = A.shape[0]
if x is None: x = np.zeros(N)
D = np.diag(A)
R = A - np.diag(D)
# .diag(x): if x is 2D, returns 1D list of diagonal elements # if x is 1D, returns 2D list with x as diagonal elements for count in range(1, max_n+1):
xprev = x.copy()
x = (-np.matmul(R, xprev) + b) / D
if np.linalg.norm(x - xprev) < eps: break print(f"Number of iteractions is {count}") return x
Convergence condition xD1(AL AU)xD1b
Recap:
A max A
Bz
N i1,,N k1
B D1(AL AU) 1
ik
D1(AA) max
A A1 or
k 1 , , N i k i1
L U
i k
i i
N i1,,N k1 ki
N A1max A
Jacobi iteration
1 N A2A2
D (ALAU) max A A 1 or i,k1
ik ii ik
1 k1,,N i1
1/2
N
D1(AA) AA2 1 ik ii
LU 2 i,k1
ik
General trend: if {|Aii|} are large, convergence condition is satisfied
N 1/2 ik
Gauss‐Seidel iteration Axb, ADAL AU
(DAL AU)xb (D AL )x AUx b
x (D AL )1 AUx (D AL )1 b Bz
Recall Banach fixed point theorem: when ‖B‖ < 1, we can construct
sequence by recursion
With arbitrary initial guess x0, it converges to a unique fixed point x* s.t.
Gauss‐Seidel iteration:
xn1 :Bxn z
x*Bx*z
xn1 :(DAL)1AUxn (DAL)1b
xn1 :(DAL)1AUxn (DAL)1b
In practice, at Step (n), we solve the system of linear equations:
(DAL)xn1 AUxn b
Since D + AL is lower triangular matrix, it can be solved simply by
forward substitution (recall LU decomposition)
A 11
xy xn1,1y1A11
n1,1 1
A A 21 22
x y x yAx A n1,2 2 n1,2 2 21 n1,1 22
N1 AAAxyxyAxA
N1 N2
k1
NN n1,N N n1,N N
i1 N N
xn1,iyi Ax A, y A x b Ax b
ikn1,k ii i
U,ik n,k i
k1 k1 ki1
ikn,k i
ik ik i
x i 1 A x N A x b ( i 1 , , N )
n1,i
n1,k
n,k
k1 A ki1 A A ii ii ii
Nk n1,k NN
x
ikx
ikx i n,k
(i1,,N)
n1,i
n1,k
ii ii ii
i1A NA b
k1 A
ki1 A A
Step (n) & (n+1) data can be stored in same array ‐‐ Method of successive displacements
already updated
not updated yet
Gauss‐Seidel iteration algorithm
eps=1e-10 #convergencecriterionforposteriorierror x = zeros(N) # initial guess
for n = 1, ···, max_n
xprev = x
for i = 1, ···, N
sum = 0
for k = 1, ···, N
if k == i
continue
sum = sum – A[i][k] * x[k]
x[i] = (sum + b[i]) / A[i][i]
if norm(x – xprev) <= eps
break
x
ikx
ikx i n,k
(i1,,N)
n1,i
n1,k
ii ii ii
def
gauss_seidel(A, b, x=None, eps=1e-10, max_n=1000000): N = A.shape[0]
if x is None: x = np.zeros(N)
for count in range(1, max_n+1):
i1A NA b
k1 A
ki1 A A
Step (n) & (n+1) data can be stored in same array ‐‐ Method of successive displacements
already updated
not updated yet
xprev = x.copy()
for i in range(N):
Cannot be written as element‐wise operations
Gauss‐Seidel iteration code
sum = 0
for k in range(N):
if k == i: continue
This part can be
sum -= A[i,k] * x[k]
x[i] = (sum + b[i]) / A[i,i]
element‐wise
if np.linalg.norm(x - xprev) < eps: break print(f"Number of iteractions is {count}") return x
x
ikx
ikx i n,k
(i1,,N)
n1,i
n1,k
ii ii ii
def
gauss_seidel(A, b, x=None, eps=1e-10, max_n=1000000): N = A.shape[0]
if x is None: x = np.zeros(N)
D = np.diag(A)
i1A NA b
k1 A
ki1 A A
Step (n) & (n+1) data can be stored in same array ‐‐ Method of successive displacements
already updated
not updated yet
R = A - np.diag(D)
for count in range(1, max_n+1):
xprev = x.copy()
for i in range(N):
Gauss‐Seidel iteration code
x[i] = (-np.inner(R[i,:], x) + b[i]) / D[i]
if np.linalg.norm(x - xprev) < eps: break print(f"Number of iteractions is {count}") return x
Start from Jacobi method:
Jacobi method with relaxation:
xn1 xn ωD1(bAxn)
Small q or ‖B‖ fast convergence
xn1 (IωD1A)xn ωD1b
Relaxation method xn1 xn D1(bAxn)
Weight ω > 0 is called relaxation parameter
Problem: which value of ω achieves the largest convergence rate?
‖B‖ reaches minimum, i.e., the largest convergence rate, when ω2
residual
Bz
2λ λ max min
where λmax and λmin are the maximum and minimum eigenvalues of the matrix ‒D‒1(AL + AU), i.e., the B of the original Jacobi with ω = 1
Relaxation method (SOR method)
Start from Gauss‐Seidel method:
xn1 (DAL)1AUxn (DAL)1b
xn1 xn D bALxn1 (DAU)xn 1
Gauss‐Seidel method with relaxation or Successive overrelaxation (SOR) method:
xn1 xn ωD bALxn1 (DAU)xn 1
(DωAL)xn1 [(1ω)DωAU]xn ωb Problem: which value of ω guarantees convergence of SOR?
Kahan theorem: A necessary condition for convergence of SOR is that 0 <ω<2
Ostrowski theorem: If A is Hermitian (A = A†) and positive definite (v∙Av > 0), SOR converges for all 0 < ω < 2
Stationary iterative methods vs. Krylov subspace methods
Jacobi iteration Gauss‐Seidel iteration SORmethod
Mxn1 Nxn b M = D
N = ‒(AL + AU)
N = ‒AU
N=(ω‒1 ‒1)D‒AU
Stationary iterative methods
M = D + AL M=ω‒1D+AL
Jacobi iteration, Gauss‐Seidel iteration, and SOR method have largely lost their interest with the advent of Krylov subspace method. So, they are not implemented in NumPy or SciPy
Krylov subspace methods
Krylov subspace method is the modern iterative method ‐‐ one the top 10 algorithms in 20th century
Generalized minimal residual method (GMRES) Conjugate gradient method (CG)
A 3D space is spanned by 3 orthonormal basis vectors e1, e2, e3
e3
e1,1 e2,1 e3,1 ee , ee , ee
1 1,2 e
2 2,2 e
3 3,2 e
x
1,3
eTe δ 1,
2,3 ik
3,3
i k ik 0, ik
e2
Any vector in 3D space is a linear combination of basis vectors
xx1e1 x2e2 x3e3
Krylov subspace method
3D space = span{e1, e2, e3}
e1
2D space = span{e1, e2}
2D space is a subspace of 3D space
e3
2D space 3D space xx1e1 x2e2
x e2
is a vector in the subspace
e1
Krylov subspace method
e1,1 e2,1 e3,1 ee , ee , ee
e3
1 1,2 e
2 2,2 e
3 3,2 e
3,3 span{e1, e2, e3, e4} is impossible,
1,3
ei contains 3 components. So,
e4
e2
2,3
Krylov subspace method
because e4 can be represented by
e4 ae1 be2 ce3 e1
The number of basis vectors ≤ dimension of basis vector
K (A,v) span
M v,Av,,A
M1 v
Krylov subspace method
The m‐th Krylov subspace:
K (A,v)span 2 m1
m v,Av,A v,,A v Basis vectors are v, Av, A2v, ∙∙∙, Am‒1v
If v is N1 and A is NN, m ≤ N (number of basis vectors ≤ dimension) K (A,v)span
v K2(A,v)spanv,Av
1
K (A,v)span 2 3 v,Av,A v
K1 K2 KM KM1 KN
It’s possible that all Km with m ≥ M are same
K (A,v) K (A,v) M M1M
AK (A,v) span Av,A2v,,AMv
So, KM is invariant w.r.t. A
Krylov subspace method Ax b
K (A,x)span 2 M1 M x,Ax,A x,,A x
Solution x is a vector in KM(A, x), i.e., x is some linear combination of the basis vectors
KM is invariant: AKM(A, x) = KM(A, x)
K (A,x) AK (A,x) span
Ax,A2x,A3x,,AMx span 2 M1 K (A,b)
MM
b,Ab,A b,,A b M
KM (A,x) KM (A,b)
So, x KM(A, b), i.e., x is some linear combination of its basis vectors: x c1bc2Abc3A2bcMAM1b
We can construct the Krylov subspace:
K (A,b)span 2
M1 b
Krylov subspace method Ax b
M b,Ab,A b,,A Then, construct the solution:
x c1bc2Abc3A2bcMAM1b
Two problems:
(i) How to construct the Krylov subspace? (ii) How to construct the solution?
Step (1):
Construct K1(A, b) = span{b}; Construct x = c1b Good enough? NO
Step (2):
Krylov subspace method Ax b
Iterative improvement
Construct K2(A, b) = span{b, Ab}; Construct x = c1b + c2Ab Good enough? NO
Step (3):
Construct K3(A, b) = span{b, Ab, A2b}; Construct x = c1b + c2Ab + c3A2b Good enough? NO
∙∙∙
Krylov subspace method Generalized minimal residual method (GMRES)
(i) Construction of Krylov subspace
Ax b
Vectors {b, Ab, A2b, ∙∙∙, Am–1b} are in general NOT orthogonal Gram‐Schmidt process: Given a set of linearly independent basis vectors {b, Ab, A2b, ∙∙∙} (m ≤ N), generate an orthonormal set of basis vectors {q1, q2, q3, ∙∙∙}
(ii) Construction of solution
The solution is constructed by the basis vectors of m‐th Krylov subspace xm =c1q1 +c2q2 +∙∙∙+cmqm
Find {c1, ∙∙∙, cm} s.t. ‖b – Axm‖2 is minimized
Gram‐Schmidt process
Given {v1, v2, v3}
They are linearly independent Anyvectorin3Dspacex=av1 +bv2 +cv3 Find orthonormal basis set {q1, q2, q3}
v2
v3
v1
Gram‐Schmidt process
Given {v1, v2, v3}
uv,q 111
u1 u1 2
q1
v1 = u1
Gram‐Schmidt process
Given {v1, v2, v3} uv(vTq)qvq(qTv), q
u2 u2 2
2221121122
q1
Proj v2 to q1
v2
q2
u2
Proj v3 to q2 q2
Gram‐Schmidt process
Given {v1, v2, v3}
u v (vTq )q (vTq )q v q (qTv )q (qTv ), q
u3 u3 2
3 3311 3223113 2233
q1
Proj v3 to q1
q3
u3
v3
q2
Gram‐Schmidt process
Given {v1, v2, v3}
Find orthonormal basis set {q1, q2, q3}
Anyvectorin3Dspacex=av1 +bv2 +cv3 Equivalently, x = a’q1 + b’q2 + c’q3
q3
q1
v1
v2
v3
Gram‐Schmidt process
Given {v1, v2, v3, ∙∙∙}
Find orthonormal basis set {q1, q2, q3, ∙∙∙}
uv, q 111
u1 u1 2
u v q (qTv ), q 22112 2
u2 u2 2
uvq(qTv)q(qTv), q 331132233
u3 u3 2
k
u v q(qTv ),
q k1
uk1 uk1 2
k1 k1
i1
i i k1
Modified Gram‐Schmidt process (Arnoldi)
Given {b, Ab, A2b, ∙∙∙}
Find orthonormal basis set {q1, q2, q3, ∙∙∙}
v b, u v , q 1111
u1 u1 2
v Aq , u v q (qTv ), q 2122112 2
u2 u2 2
v Aq , u v q (qTv )q (qTv ), q 32331132233
u3 u3 2
k
v Aq, u v q(qTv ),
q k1
uk1 uk1 2
k1 k k1 k1
i1
i i k1
Modified Gram‐Schmidt process (Arnoldi)
Given {b, Ab, A2b, ∙∙∙}
Find orthonormal basis set {q1, q2, q3, ∙∙∙}
q1 q
b b2
2
Aq q (qT Aq ) 1111
Aq q (qT Aq ) 1111
2
Aq q(qTAq)q(qTAq)
2112222 Aq q(qTAq)q(qTAq)
q3
21122222
Aq k
k
q (qT Aq ) iik
i1 q
k1
k
k
iik
Aq
q (qT Aq )
i1
2
A q 1 q 1 H 1 1 q 2 H 2 1
Aq qH qH qH
q Ti A q k , k
jjk
Modified Gram‐Schmidt process (Arnoldi)
Given {b, Ab, A2b, ∙∙∙}
Find orthonormal basis set {q1, q2, q3, ∙∙∙}
k1
k
k
C
2 112 222 332
H j1 2
m1 Aqm qH
i im i1
k k1
i1 i1
i ik
i1
kk
i1
i1
Aq
qk
q (qT Aq ) Aq iikk
q (qT Aq ) iik
Aq
Aqq C q(qTAq) qH
q (qT Aq ) iik
k
k1
2
iik
i k
ik Aqkq(qTAq),ik1
A A| | 11 1N
q1 qm A A| |
Modified Gram‐Schmidt process (Arnoldi)
N1 NN
H11 H12
H1m HHHH
21 22 2,m1 2m | ||0 H32 H3,m1 H3m
qqq 1 m m1
| ||00HH m1,m1 m1,m
H1,m1
0 0 Hm,m1 Hmm
000H
m1,m
AQm = Qm+1Hm
Hessenberg matrix
q1
b b2
H qTAq 11 1 1
Modified Gram‐Schmidt process (Arnoldi) How to obtain Hessenberg matrix Hm?
q Ti A q k , k
i k Aqk q(qTAq) , ik1
H ik
jjk j1 2
q:Aqq(qTAq)H q
1 1 1 1 21 2
q1 q2
b b2
H qTAq 11 1 1
Modified Gram‐Schmidt process (Arnoldi) How to obtain Hessenberg matrix Hm?
q Ti A q k , k
i k Aqk q(qTAq) , ik1
H ik
jjk j1 2
q:Aqq(qTAq)H q
1 1 1 1 21 2
H qTAq, H qTAq 12 1 2 22 2 2
q qTT
2 q:Aqq(qAq)q(qAq)H q
2 1 1 2 2 2 2 32 2
q1 q2
b b2
H qTAq 11 1 1
q3
q qTTT
2
q:Aqq(qAq)q(qAq)H q
2 1 1 2 2 2 2 32 2
Modified Gram‐Schmidt process (Arnoldi) How to obtain Hessenberg matrix Hm?
q Ti A q k , k
i k Aqk q(qTAq) , ik1
H ik
jjk j1 2
q:Aqq(qTAq)H q
1 1 1 1 21 2
H qTAq, H qTAq 12 1 2 22 2 2
q qTT
H qTAq, H qTAq, H qTAq 13 1 3 23 2 3 33 3 3
2 q:Aqq(qAq)q(qAq)q(qAq)H q
3 1 1 3 2 2 3 3 3 3 43 2
q1 q2
b b2
H qTAq 11 1 1
q3 q4
q qTTT
q q2
∙∙∙
2
q:Aqq(qAq)q(qAq)q(qAq)H q
3 1 1 3 2 2 3 3 3 3 43 2
Modified Gram‐Schmidt process (Arnoldi) How to obtain Hessenberg matrix Hm?
q Ti A q k , k
i k Aqk q(qTAq) , ik1
H ik
jjk j1 2
q:Aqq(qTAq)H q
1 1 1 1 21 2
H qTAq, H qTAq 12 1 2 22 2 2
q qTT
2
q:Aqq(qAq)q(qAq)H q
2 1 1 2 2 2 2 32 2
H qTAq, H qTAq, H qTAq 13 1 3 23 2 3 33 3 3
Find the optimized solution
GMRES: generalized minimal residual method Construct the solution:
| | c1 xKcqcqcqq qQc
m m 11 22 mm1 mmm | |c
Minimize the residual:
bb2q1b2Qm1e1, AxmAQmcmQm1Hmcm
min bAx min r cm2cm2
mm
bAxm 2 b2Qm1e1Qm1Hmcm 2
T 1/2
m
b Q e Q H c b Q e Q H c 2 m1 1 m1 m m 2 m1 1 m1 m m
beTcTHTQTQ beHc
21 mm m1m1 21 mm
T 1/2
b e H c b e H c b e H c 21mm21mm21mm 2
1/2
Find the optimized solution
min b e H c AQR cm 21 mm2 RxQTb
bAx
Recall Page 84 of Lecture 3. This is a linear regression problem.
The shape of Hm is (m + 1)m. It can be solved by QR decomposition: Hm ΩmRm
Ωm : orthogonal matrix
Rm : upper (right) triangular matrix
R c b ΩT e mm2m1
Since Rm is upper (right) triangular matrix. cm can be solved by backward substitution
q
H21,H22 H ΩR c x 2qH22222
q
H31, H32, H33 H Ω R c x 3qH33333
Krylov subspace method Generalized minimal residual method (GMRES)
H Ax b
q 11 H ΩR c x
1qH11111 21
No
If‖b‒Ax1‖2 <ε?
No
32 If‖b‒Ax2‖2 <ε?
x*=x Yes If‖b‒Ax‖ <ε? 332
43
SciPy : Generate a random sparse matrix
import numpy as np
import matplotlib.pyplot as plt
import scipy.sparse
M = 5; N = 10 # dimensions of matrix
A = scipy.sparse.random(M, N, 0.5)
# M*N matrix. Randomly pick 50% elements to be 0. The other 50% elements are random numbers in [0, 1)
print(A) # note how the sparse matrix is stored print(A.toarray()) # convert A to format of NumPy array fig, axs = plt.subplots()
axs.imshow(A.toarray(), cmap='rainbow')
axs.set_xlabel('col index')
axs.set_ylabel('row index')
plt.tight_layout()
plt.show()
SciPy : Solve system of linear equations by GMRES
import numpy as np
import matplotlib.pyplot as plt
import scipy.sparse
import scipy.sparse.linalg
N = 100
A = scipy.sparse.random(N, N, 0.5) + 4*scipy.sparse.eye(N) b = np.random.rand(N, 1)
fig, axs = plt.subplots()
axs.imshow(A.toarray(), cmap='rainbow')
plt.show()
guess = np.zeros(shape=b.shape)
x, exitCode = scipy.sparse.linalg.gmres(A, b, x0=guess, tol=1e-10, maxiter=10000)
# x0 : the initial guess of the solution
# tol : tolerance |b - Ax|/|b|
# maxiter : maximum steps allowed
print(x)
print(exitCode)
# =0: successful exit; >0: number of iterations but tolerance has not been achieved; <0: illegal input or breakdown
Krylov subspace method Conjugate gradient method (CG)
Ax b
Only consider the case where A is symmetric and positive‐definite
If the initial guess is x0, then the residual is r bAx
00
A(xx )r 00
A’ x’ b’ xK (A,b)span 2
M1 b
Recall
xK (A,b)xx K (A,r )spanr ,Ar ,A2r ,,AM1r
M b,Ab,A b,,A
M 0M0
0000
xx span 0
2 M1
r ,Ar ,A r ,,A r
0000
So, exact solution x ‒ x0 is linear combination of r0, Ar0, A2r0, ∙∙∙, AM‒1r0
Conjugate gradient method (CG)
Iterative improvement
x x
We want to approach this exact solution by iteration
The exact solution is
0
span
0000
2 M1
r ,Ar ,A r ,,A r
xxαr n1 n nn
How to determine rn? How to determine αn?
Conditions for the choice of residuals {ri}
x x span 2 n1 K (A,r )
r,Ar,Ar,,A r
n0 000 0n0
r bAx r A(x x )span 2 n K (A,r ) r ,Ar ,A r ,,A r
n n 0 n 0 0 0 0 0 n1 0
The 1st condition:
α r K
ii i1
Let k ≥ i + 1
rK span
A2r 0
Conjugate gradient method (CG) Conditions for the choice of residuals {ri}
xix0Ki andxi1x0Ki1
αr x x (x x )(x x )K
ii i1 i i1 0 i 0 i1
k1 k r,Ar,,A r,Ar
k k1 0 0 0 0
r2
K3
rk is outside Kk space. E.g. if Kk is a plane, rk must be a vector pointing out of the plane (the additional basis Akr0 leads to the out‐ of‐plane component)
Ar0 r1 K2
Idea of CG Most efficient search is rk all basis vectors of Kk:
r0
r(K).Sor (K) k k k1 k
Conjugate gradient method (CG) Conditions for the choice of residuals {ri}
Ki+1
(Kk) (Ki+1) Kk
Let k ≥ i + 1 A(αr)Ax Ax (bAx )bAx r r(K)(K )
kkk1k k1 kk1kki1
The 2nd condition:
Both conditions must be satisfied when
A(α r )(K ) kk i1
A‐conjugate: (αr)T A(α r )0rTAr 0 (ki) iikkik
1 2
x2A 1T1
xA1b 2 A
Conjugate gradient method (CG)
Understanding CG assumption ({ri} are A‐conjugate) from geometry
Define A‐norm:
Recall that A is symmetric and positive‐definite
Seek x s.t. the A‐norm of the residual is minimum: xargmin1 xA1b2
x xTAx A
2xA1b AxA1b2xTAxxTbbTATAxbTATb ATA1T TTT11T T1T1
2 x A x x b b x b A b 2 x A x x b 2 b A b constant
So, solving system of linear equations is equivalent to finding minimum
of the function
f (x) 1 xT Ax xT b 2
How does the function f(x) look like? Consider 2D case (N = 2):
Conjugate gradient method (CG)
Understanding CG assumption ({ri} are A‐conjugate) from geometry Seek x to minimize the function:
f (x) 1 xT Ax xT b 2
f(x ,x )1x x A A x x 1 2 1 2 11 12 1 1
x b 2 1
2AAxb
21 22 2 2
1A x21(A A )xx1A x2bxbx 2 11 1 2 12 21 1 2 2 22 2 1 1 2 2
Example:
A A 1 0, b 0 11 12 1
AA01b0 21 22 2
f(x ,x )1x2 1x2 122122
Example:
A A 1 0, b 0 11 12 1
AA01b0 21 22 2
Example:
A A 2.5 0 , b 0 11 12 1
A A 0 0.4 b 0 21 22 2
Example:
A A 1 0.8, b 0 11 12 1
A A 0.8 1 b 0 21 22 2
In 2D, N = 2, and A = I: converge by at most 2 steps. New search direction is along gradient (rapid decrease) and orthogonal to old one
x0
α1r1
x
x1 α0r0
rTr0, rf(x) 101
xx1
r bAx 00
So, direction of r0 is arbitrary. But, always we only need 2 steps
If A ≠ I and still use orthogonal vectors, rTr0, rf(x)
xxk
(ki)
x0
α1r1 α2r2 x3 α4r4
α0r0
x1
kik
Then, we need much more steps for convergence
The orthogonal vectors are not good basis vectors for searching the minimum
x2
α3r3x4
x
If A ≠ I and use A‐orthogonal vectors, rTAr0 (ki)
ki
Then, again we only need 2 steps for convergence in 2D. Why?
α0r0 x0
x1
α1r1
x
If A ≠ I and use A‐orthogonal vectors, rTAr0 (ki)
ki
Then, again we only need 2 steps for convergence in 2D. Why?
f xT Ax f xT (FT AF1)x
transformationofcoordinate: xFx
If pick F s.t. F–TAF–1 = I A = FTF, after transformation by F and under the new coordinate system (primed), the search directions should be
orthogonal
What does this require for {ri}?
rTr0 ki
rTrrTFTFr rT Ar 0 kikiki
A‐conjugate
In the coordinate system with A = I (primed), the search directions should be orthogonal, while in the coordinate system with A ≠ I (unprimed), the search direction should be A‐orthogonal
Conjugate gradient method (CG)
How to construct A‐orthogonal basis vectors? Gram‐Schmidt process
Given {r0, r1, r2, ∙∙∙}
Find A‐orthogonal basis set {q0, q1, q2, ∙∙∙}
ur, q 000
u0 u0
u r (qTAr)q , q 11010 1
A u1
ur(qTAr)q(qTAr)q, q 220201212
u2 A
k 1
ur (qTAr)q, q
uk uk
kk
i0
iki k
u1 u2
A
A
qT k1
Aq k1 k qT Ar (qT Ar )(qT Aq )
Conjugate gradient method (CG)
How to construct A‐orthogonal basis vectors? Gram‐Schmidt process
Check if {q0, q1, q2, ∙∙∙} constructed by Gram‐Schmidt process are A‐orthogonal
T
kk qTAq uTAu 1
kk
2
uk
qTAqq0Au1 1 qTAr(qTAr)(qTAq)0
01u1 u1010100 AA
A
k1
qAu 1
T
k k1 k
i k k1 i i0
uk A
uk A
1 qT Ar(qT Ar)(qT Aq )0
uk k1 k k1 k k1 k1 A
kk
i0
i k i k
Conjugate gradient method (CG)
r bAx 00
u r, q u0 x x αq 000u 1000
0A
r bAx bAx α Aq r α Aq
1 1 000000
ur(qTAr)q, q u1 xxαq 110101u 2111
1A r r α Aq
2111
ur(qTAr)q(qTAr)q, q u2 xxαq 220201212u 3222
∙∙∙ ∙∙∙ k 1
How to determine {αi}?
ur (qTAr)q,q
uk uk
xxαq k1 k k k
A
2A
Conjugate gradient method (CG)
How to determine {αi}? Search αk s.t. f(xk+1) is minimized
f(x )f(x αq)1(x αq)TA(x αq)(x αq)Tb k1 k kk 2k kk k kk k kk
1qTAq α2 1xTAq 1qTAx qTbα 1xTAx xTb kkkkkkkkkkkk
2222 11
f(xαq)qTAq α xTAq qTAxqTb0 αkkk kkk2kk2kkk
k
αqTAq qTb xTAq qTAx
1 1 1 kkkkkkkk
22
T 1TT 1T ATA T T
q b q A x q A x q ( b A x ) q r k2kk2kkkkkk
α k
qr
T
kk qk Aqk
T
Conjugate gradient method (CG)
r bAx 00
T
u r, q u0
000u 1000
0A 1000
r r α Aq ur(qTAr)q, q u1
T
xxαq 110101u 2111
r r α Aq 2111
1A
α
T
2 220201212u 3222
ur(qTAr)q(qTAr)q, q u2 2A
T
∙∙∙ ∙∙∙ k 1
T
ur (qTAr)q,q
k
xxαq k1 k k k
kk
i0
i k i k
u uk
T
A
α 0
qr
α 1
qr
α k
qr
00
q0Aq0 x x αq
T
11 q1Aq1
T
qr
22 q2Aq2
kk qk Aqk
xxαq
N = 100
SciPy : Solve system of linear equations by CG
import numpy as np
import matplotlib.pyplot as plt
import scipy.sparse
import scipy.sparse.linalg
# note that A must be a symmetric, positive-definite matrix
A = scipy.sparse.random(N, N, 0.5)
A = A.toarray()
A = A + np.transpose(A)
A = scipy.sparse.csr_matrix(A) # convert numpy array to a “compressed sparse row matrix” (csr)
print(A)
A += 4*scipy.sparse.eye(N)
b = np.random.rand(N, 1)
guess = np.zeros(shape=b.shape)
x, exitCode = scipy.sparse.linalg.cg(A, b, x0=guess, tol=1e-10, maxiter=10000)
print(x)
print(exitCode)
System of linear equations
Ax b
How to solve a system of linear equations? Direct method
(i) Gaussian elimination (ii) LU decomposition (iii) QR decomposition
Iterative method
Basics 1: Normed space
Basics 2: Banach fixed point theorem Stationary methods (old)
(i) Jacobi method
(ii) Gauss‐Seidel method
(iii) Relaxation method
Krylov subspace methods (modern)
(i) Generalized minimal residual (GMRES) (ii) Conjugate gradient (CG)