CS计算机代考程序代写 algorithm Computational Methods

Computational Methods
for Physicists and Materials Engineers
4
Systems of Linear Equations II Iterative Method

xi are unknowns to be determined Aij and bi are coefficients (knowns)
Matrix form:
System of linear equations
N linear equations: AxAxAxb
11 1 12 2 1N N 1 AxAxAxb
21 1 22 2 2N N 2 
AxAxAxb N1 1 N2 2 NN N N
AAAxb  11 12 1N   1   1 
21 22 2N 2 2 AAAxb
 A A Ax b
 N1 N2 NN   N   N  A xb

https://www.cise.ufl.edu/research/sparse/matrices/
Rothberg/gearbox.html
Size of K is
153,746  153,746 = 23,637,832,516 9,080,404 nonzero elements
Iterative method: guess the solution and iteratively modify the guess
Iterative method Ax  b
By direct method, we obtain the exact answer (if we don’t consider rounding error) in the last step. The computational complexity for LU and QR is N3 if the coefficient matrix A is N  N
For most of problems of practical interest, A is a large‐N, sparse matrix E.g. FEM Ku = f
For aircraft flap actuator

Guess a solution x0 (probably it’s bad)
Residual
Correct the guess by residual
bAx0
x1 x0 C(bAx0)f(x0) Convergent? check error
How to correct guess s.t. error converges to 0?
If x* exists s.t. x* = f(x*)? (Fixed point theorem) How to choose f?
Correct the guess by residual
x2 x1 C(bAx1)f(x1)
Idea of iterative method Ax  b
x1  x0
Definition of magnitude? Vector norm
How to estimate error?

Vector norm: “size” of a vector Norm ‖x‖ has the properties:
• Positivity:
• Definiteness:
• Homogeneity:
• Triangle inequality:
‖x‖ ≥ 0
‖x‖ = 0 if and only if x = 0 ‖αx‖ = |α| ‖x‖
‖x + y‖ ≤ ‖x‖ + ‖y‖
y x+y
forallx,yXandallα 
When x  , norm is just absolute value |x|
x
2
 x  y   x2  y2  2 xy  x2  y2  2xy  x  y 2  x  y  x  y
For example, lp norm:
N
 p1/p
xx
p
i
 i1 

p i1  l norm is called maximum norm
i1,,N
Vector norm: “size” of a vector
2
i
 i1 
 N p 1/p x x 
p
 i1 N
i

x1 x i1
i
 21/2
x  x   xTx
N
l2 norm is called Euclidean norm because it is just the length of a vector
N
 p1/p
xlim x  maxx ii

2 x 3
i1
 3 2 1/2
Vector norm: “size” of a vector
Equivalence of the norms
Theorem: on a finite‐dimensional vector space, all norms are equivalent. When one norm goes to 0, all the other norms go to 0
 x1  xx
3
x1xi x1 x2 x3
x x   x2 x2 x2 (lengthofx)
2
i 123  i1 
x maxxmaxx,x,x  i1,2,3 i 1 2 3
Onlywhenx1,x2,x3 0atthesametime,‖x‖1 0.Andatthesame time ‖x‖2 and ‖x‖  0

We can prove that
N
A 1  max  A
Frobenius norm
AF A
2
Matrix norm: “size” of a matrix
Definition of matrix norm is based on vector norm
A sup Ax x 1
“sup”: supremum, the least upper bound; not necessarily maximum. E.g. {x | x < 2} only has supremum 2 k1,,N i1 ik N i1,,N k1 A   max N 1/2 A2A2 ik  i,k1  MN ik i1 k1 AxAx xsupAx xA x x x1 x A ik E.g. in 3D AAA  11 12 13  N Recall x 1  x AAAA  21 22 23  Matrix norm: “size” of a matrix  AAA i1 i  31 32 33  A max A max A A A maxc ,c ,c  3   ik 1k 2k 3k 1 2 3 1 k1,2,3 i1 k1,2,3  1 1 1 3   ik i1 i2 i3 1 2 3 A max A max A A A maxr ,r ,r   i1,2,3 k1 i1,2,3  1 1 1  3 1/2 A A 2  A2 A2 A2 A2 A2 A2 A2 A2 A2 2 ik 11 12 13 21 22 23 31 32 33 i,k1  import numpy as np # norms of vectors # norms of matrices A = np.array([[-1, 1, 1], [ 2, -2, 0]]) NumPy: norms of vectors/matrices numpy.linalg.norm() a = np.array([-2, 1, 0]) v_L1_norm = np.linalg.norm(a, 1) v_L2_norm = np.linalg.norm(a, 2) v_Linf_norm = np.linalg.norm(a, np.inf) print(v_L1_norm); print(v_L2_norm); print(v_Linf_norm) M_Lfro_norm = np.linalg.norm(A, 'fro') M_L1_norm = np.linalg.norm(A, 1) M_L2_norm = np.linalg.norm(A, 2) M_Linf_norm = np.linalg.norm(A, np.inf) print(M_Lfro_norm); print(M_L1_norm); print(M_L2_norm); print(M_Linf_norm) # default # default Solve the equations: E.g. f(x)x f(x) xx Initial guess: Update: Update: ∙∙∙ f(x) x0 f(x) ‖x1 – x0‖ is small enough? NO Update: Solutionis ∙∙∙ xn+1:= f(xn) x* ≈xn+1 x1 x2 := f(x0) := f(x1) ‖x2 – x1‖ is small enough? NO ∙∙∙ ‖xn+1 – xn‖ is small enough? YES General idea of iterative method Axb(AI)xbx or (AI)1x(AI)1bx or  If it converges to x*, f(x*) = x* and x* is the solution (fixed point of f) However, does {x0, x1, x2, ∙∙∙} converge? or lim xn1  xn  0 n Exact solution: f (x)  0.1x  10  x 0.1x*10x*x* 10 11.111 Iterative solution: General idea of iterative method Initial guess: x0 Update: x1 Update: x2 Update: x3 Update: x4 Update: x5 If we think it is good enough, x* ≈ 11.111 |x1 –x0|=10 |x2 –x1|=1 |x3 – x2| = 0.1 |x4 – x3| = 0.01 |x5 – x4| = 0.001 :=0 := f(0) := f(10) := f(11) := f(11.1) := f(11.11) =0 = 10 = 11 = 11.1 = 11.11 = 11.111 Example 0.9 Exact solution: f (x)  10x  0.1  x 10 x * 0.1  x*  x*   0.1  0.0111 Iterative solution: Initial guess: x0 Update: x1 Update: x2 Update: x3 Update: x4 Update: x5 Iteration diverges :=0 := f(0) := f(0.1) := f(1.1) := f(11.1) := f(111.1) =0 = 0.1 = 1.1 = 11.1 = 111.1 = 1111.1 |x1 – x0| = 0.1 |x2 – x1| =1 |x3 – x2| = 10 |x4 – x3| = 100 |x5 – x4| = 1000 General idea of iterative method Whether {x0, x1, x2, ∙∙∙} converges or not depends on the form of function f(x). Which kind of f(x) guarantees convergence? Example 9 From 1‐equation/1‐variable to N‐equation/N‐variable f (x)  x  f(x)  x f(x,x,,x)  x 1 1 2 n 1 f(x) is linear xx1  x f(x,x,,x)  x 212n2  fN(x1,x2,,xN)  xN 121 x1x22  x2 f(x,x,,x)x 112 N1 f(x) is nonlinear (next lecture) f(x,x,,x)x 212N2 0.5cosx1 0.5sinx2 0.5sinx1 0.5cosx2  x1  x2  f(x,x,,x) x N 1 2 NN Whether {x0, x1, x2, ∙∙∙} converges or not depends on the form of function f(x). Which kind of f(x) guarantees convergence? Convergent sequence A sequence {xn} = {x0, x1, x2, ∙∙∙} is called convergent if there exists an element x* s.t. xn lim xn x* 0 or limxn x* n n n 0 1 2 3 4 5 6 7 8 9 10 11 12 ∙∙∙ 10000 ∙∙∙ E.g. 11    lim   2n   n2n  0 x  21/n , limx  lim21/n 1 n nnn      1 nsin1 1     limnsin   n n n from sympy import * n = symbols('n', integer=True, nonnegative=True) pprint(limit(1/2**n, n, oo)) pprint(limit(2**(1/n), n, oo)) pprint(limit(n*sin(1/n), n, oo)) Contraction An operator f is called contraction if there exists a constant q  [0, 1) s.t. f(x)f(y) q xy We may think of f as a function for all x, y  D 1‐variable example: f(x)0.1x10 f (x)  f (y)  0.1 x  y So, this f(x) is contraction f(x)10x10 f (x)  f (y)  10 x  y So, this f(x) is NOT contraction When f(x) is contraction, iteration for f(x) = x will converge Banach fixed point theorem Existence of a unique fixed point Let f be a contraction. There exists a unique element x* s.t. f(x*)x* This x* is called fixed point of f, because f maps x* to itself 1‐variable example: f  100   0.1 100  10  Fixed point is x*  100 f(x)0.1x10 999  How to find the fixed point Start from an arbitrary element x0. Define a sequence {xn} by recursion: xn1 :f(xn) We can find the fixed point by limxn x* n 1‐variable example: f (x)  0.1x  3 (contraction) Exact solution for fixed point: 0.1x*3x*x* 3 0.9 xn 0.1xn1 3 xn1 : f(xn)0.1xn 3 3)30.12 x (0.10 0.11)3 0.1(0.1x 0.12(0.1x n2 3)(0.10 0.11)30.13 x n2 n3 n3 (0.10 0.11 0.12)3 10.1n Banach fixed point theorem  0.1 x0 (0.1 0.1 0.1 )30.1 x0  10.1 3 n 0 1 n1 n 1rn  Sum of a geometric series: ar0 ar1 ar2 arn1 a 1r  limxn  3   n 0.9 Exact solution for fixed point: 10 x * 3  x*  x*   3 9 xn 10xn1 3 xn1 : f(xn)10xn 3 3)3102 x (100 101)3 10(10x 102(10x n2 3)(100 101)3103 x n2 n3 n3 (100 101 102)3 110n Banach fixed point theorem 1‐variable example:f (x)  10x  3 (not a contraction)  10n x0 (100 101 10n1)310n x0  110 3 limxn  n xn 0.1xn1 0.3 0.1(0.1x 0.12(0.1x n2 0.3)(0.10 0.11)0.30.13 x n2 n3 n3 (0.10 0.11 0.12)0.3 10.1n Banach fixed point theorem 1‐variable example:f (x)  10x  3 (not a contraction) We can rearrange the equation to construct a convergent recursion 10x  3  x  0.1x  0.3  x f (x)  0.1x  0.3 (this is a contraction) xn1 : f(xn)0.1xn 0.3 0.3)0.30.12 x (0.10 0.11)0.3  0.1 x0 (0.1 0.1 0.1 )0.30.1 x0  10.1 0.3 n 0 1 n1 n limxn 3 n 9 Banach fixed point theorem: proof Starting from an arbitrary x0  D, construct {xn} by recursion: Is this sufficient to prove convergence? NO E.g. xn1 :f(xn) (n0,1,) f is contraction x  x  f ( x )  f ( x )       q x  x n1 n n n1 n n1 2 n n q f(xn1)f(xn2) q xn1 xn2 q x1 x0 0 limxn1xn 0 n xn  n limxn1xn lim n1 n0 n n But, obviously, {xn} does not converge with n Stronger convergence condition: Banach fixed point theorem: proof lim xmxn0 n ,m If {xn} satisfies this condition, it is called Cauchy sequence xn  n mα2n lim x x lim α2n n limα1 n n,m m n n n So, {xn} is NOT Cauchy sequence. It does not converge xn1 : f(xn ) f is contraction Prove that {xn} is Cauchy sequence, i.e., lim xmxn 0 n ,m Banach fixed point theorem: proof xn1 : f(xn ) f is contraction Prove that {xn} is Cauchy sequence  ( q  q    q  q ) x 1  x 0     0 Is this sufficient to prove convergence? NO E.g. x 11n n n x x qn x x n1n 10 xm xn  xm xm1 xm1 xm2 xn2 xn1 xn1 xn xx x x x x x x triangle inequality n1 m m1 m1 m2 n2 n1 m m1 n2 n1 n ,m n  1n limxn lim1  e (irrationalnumber) n n n But xn must be a rational number for any n. So, whichever n, xn can never reach the limit e One additional condition: Banach fixed point theorem: proof xn1 :f(xn) For this sequence, there exists x* s.t. lim f(xn)f(x*) qlim xn x* 0 n n or limf(xn)f(x*) n So, x* is the fixed point xn U and limxn U n If in the domain U all Cauchy sequences satisfy this condition, this domain U is called complete; as below, we only consider all elements in the complete domain If f(x) is contraction, we can generate sequence by recursion: limxn x* n x*limxn1 limf(xn)f(x*) n n Priori (earlier) and posteriori (later) errors Let f be a contraction with constant q < 1. The sequence constructed by xn1 :f(xn) converges to the unique fixed point x* with arbitrary initial guess x0 Priori error ‖x1 ‒ x0‖ bounds the exact error at step (n) via qn xn x* 1q x1 x0 Posteriori error ‖xn ‒ xn‒1‖ bounds the exact error at step (n) via Proof is simple Pickx asx n‒1 0 xx*q xx n 1q 1 0 xn xm lim xn xm (qn qn1 qm2 qm1) x1 x0  lim(qn qn1 qm2 qm1) x1 x0 m m n xnx* q xnxn1 1q min n for convergence If the desired accuracy is ε s.t. ‖xn ‒ x*‖ ≤ ε, how many iterative steps are needed? (i.e., n = ?) qn ln(1q)1 ε xnx*1qx1x0 ε  n lnq lnqlnxx 0.0 0.2 0.4 So, small q leads to fast convergence q 100 80 60 40 20 0 0.6 0.8 1.0 Posteriori error is used to check accuracy in the process of computation and terminate iterations when the desired accuracy is reached 10 Construct an operator: f(x)Bxz f(x)f(y)  B(xy)  B xy ‖B‖ plays the role of constant q. f is contraction when ‖B‖ < 1 Hence, we can construct sequence by recursion xn1 :f(xn)Bxn z With arbitrary initial guess x0, it converges to a unique fixed point x* s.t. Priori error estimate Bn xnx*1B x1x0 x*f(x*)Bx*z or (IB)x*z Posteriori error estimate B xnx*1B xnxn1 A  11  A 22 D A  N1,N1  A  NN  0AAA 0  121,N11N 0AA A0 21 Jacobi iteration Ax  b ADAL AU 2,N1 2N AL AU AA00A  N1,1 N1,2   N1,N  A AA 0  0 N1N2 N,N1  A  11   A1 A1  A 22 22 D1   D   A   A1     N1,N1   A  N1,N1  1 NN  0  121,N11N 0AA A0 21 Jacobi iteration Ax  b ADAL AU Inverse of a straightforward  11  NN  0AAA 2,N1 2N AL AU AA00A  N1,1 N1,2   N1,N  A AA 0  0 N1N2 N,N1   A diagonal matrix is Assume that all diagonal elements are nonzero xD1(AL AU)xD1b (DAL AU)xb Dx(AL AU)xb Recall Banach fixed point theorem: when ‖B‖ < 1, we can construct Bz sequence by recursion With arbitrary initial guess x0, it converges to a unique fixed point x* s.t. Jacobi iteration: xn1 :D1(AL AU)xn D1b xn1 :Bxn z x*Bx*z xn1 :xn D1(bAxn) residual xn1 :D1(AL AU)xn D1b ik i x   N A x  b ( i  1 ,  , N ) n1,i ki ii ii n,k k1A A Jacobi iteration algorithm eps=1e-10 #convergencecriterionforposteriorierror x = zeros(N) # initial guess for n = 1, ···, max_n xprev = x # store x obtained from previous iterative step for i = 1, ···, N sum = 0 Method of simultaneous displacements for k = 1, ···, N if k == i continue sum = sum - A[i][k] * xprev[k] x[i] = (sum + b[i]) / A[i][i] if norm(x – xprev) <= eps break if x is None: x = np.zeros(N) for count in range(1, max_n+1): xprev = x.copy() for i in range(N): Can be simplified by element‐wise operations xn1 :D1(AL AU)xn D1b ik i x   N A x  b ( i  1 ,  , N ) n1,i ki ii ii sum = 0 for k in range(N): n,k k1A A Jacobi iteration code def jacobi(A, b, x=None, eps=1e-10, max_n=1000000): N = A.shape[0] if k == i: continue sum -= A[i,k] * xprev[k] x[i] = (sum + b[i]) / A[i,i] if np.linalg.norm(x - xprev) < eps: break print(f"Number of iteractions is {count}") return x xn1 :D1(AL AU)xn D1b ik i x   N A x  b ( i  1 ,  , N ) n1,i ki ii ii n,k k1A A Jacobi iteration code def jacobi(A, b, x=None, eps=1e-10, max_n=1000000): N = A.shape[0] if x is None: x = np.zeros(N) D = np.diag(A) R = A - np.diag(D) # .diag(x): if x is 2D, returns 1D list of diagonal elements # if x is 1D, returns 2D list with x as diagonal elements for count in range(1, max_n+1): xprev = x.copy() x = (-np.matmul(R, xprev) + b) / D if np.linalg.norm(x - xprev) < eps: break print(f"Number of iteractions is {count}") return x Convergence condition xD1(AL AU)xD1b Recap: A   max  A Bz N i1,,N k1 B  D1(AL AU) 1 ik D1(AA) max A A1 or k  1 , , N i k i1 L U i k i i N i1,,N k1 ki N A1max A Jacobi iteration  1 N A2A2 D (ALAU) max A A 1 or i,k1  ik ii ik 1 k1,,N i1  1/2 N D1(AA) AA2 1 ik ii LU 2 i,k1  ik  General trend: if {|Aii|} are large, convergence condition is satisfied N 1/2 ik  Gauss‐Seidel iteration Axb, ADAL AU (DAL AU)xb (D  AL )x  AUx  b x  (D  AL )1 AUx  (D  AL )1 b Bz Recall Banach fixed point theorem: when ‖B‖ < 1, we can construct sequence by recursion With arbitrary initial guess x0, it converges to a unique fixed point x* s.t. Gauss‐Seidel iteration: xn1 :Bxn z x*Bx*z xn1 :(DAL)1AUxn (DAL)1b xn1 :(DAL)1AUxn (DAL)1b In practice, at Step (n), we solve the system of linear equations: (DAL)xn1 AUxn b Since D + AL is lower triangular matrix, it can be solved simply by forward substitution (recall LU decomposition) A  11 xy xn1,1y1A11  n1,1   1    A A 21 22 x y x yAx A n1,2  2 n1,2 2 21 n1,1 22     N1  AAAxyxyAxA  N1 N2  k1  NN  n1,N   N  n1,N  N i1 N N  xn1,iyi Ax A, y A x b Ax b ikn1,k ii i U,ik n,k i  k1  k1 ki1 ikn,k i ik ik i x   i  1 A x  N A x  b ( i  1 ,  , N ) n1,i n1,k n,k k1 A ki1 A A ii ii ii Nk n1,k NN  x ikx ikx i n,k (i1,,N) n1,i n1,k ii ii ii i1A NA b k1 A ki1 A A Step (n) & (n+1) data can be stored in same array ‐‐ Method of successive displacements already updated not updated yet Gauss‐Seidel iteration algorithm eps=1e-10 #convergencecriterionforposteriorierror x = zeros(N) # initial guess for n = 1, ···, max_n xprev = x for i = 1, ···, N sum = 0 for k = 1, ···, N if k == i continue sum = sum – A[i][k] * x[k] x[i] = (sum + b[i]) / A[i][i] if norm(x – xprev) <= eps break x ikx ikx i n,k (i1,,N) n1,i n1,k ii ii ii def gauss_seidel(A, b, x=None, eps=1e-10, max_n=1000000): N = A.shape[0] if x is None: x = np.zeros(N) for count in range(1, max_n+1): i1A NA b k1 A ki1 A A Step (n) & (n+1) data can be stored in same array ‐‐ Method of successive displacements already updated not updated yet xprev = x.copy() for i in range(N): Cannot be written as element‐wise operations Gauss‐Seidel iteration code sum = 0 for k in range(N): if k == i: continue This part can be sum -= A[i,k] * x[k] x[i] = (sum + b[i]) / A[i,i] element‐wise if np.linalg.norm(x - xprev) < eps: break print(f"Number of iteractions is {count}") return x x ikx ikx i n,k (i1,,N) n1,i n1,k ii ii ii def gauss_seidel(A, b, x=None, eps=1e-10, max_n=1000000): N = A.shape[0] if x is None: x = np.zeros(N) D = np.diag(A) i1A NA b k1 A ki1 A A Step (n) & (n+1) data can be stored in same array ‐‐ Method of successive displacements already updated not updated yet R = A - np.diag(D) for count in range(1, max_n+1): xprev = x.copy() for i in range(N): Gauss‐Seidel iteration code x[i] = (-np.inner(R[i,:], x) + b[i]) / D[i] if np.linalg.norm(x - xprev) < eps: break print(f"Number of iteractions is {count}") return x Start from Jacobi method: Jacobi method with relaxation: xn1 xn ωD1(bAxn) Small q or ‖B‖  fast convergence xn1 (IωD1A)xn ωD1b Relaxation method xn1 xn D1(bAxn) Weight ω > 0 is called relaxation parameter
Problem: which value of ω achieves the largest convergence rate?
‖B‖ reaches minimum, i.e., the largest convergence rate, when ω2
residual
Bz
2λ λ max min
where λmax and λmin are the maximum and minimum eigenvalues of the matrix ‒D‒1(AL + AU), i.e., the B of the original Jacobi with ω = 1

Relaxation method (SOR method)
Start from Gauss‐Seidel method:
xn1 (DAL)1AUxn (DAL)1b
xn1 xn D bALxn1 (DAU)xn 1 
Gauss‐Seidel method with relaxation or Successive overrelaxation (SOR) method:
xn1 xn ωD bALxn1 (DAU)xn 1 
(DωAL)xn1 [(1ω)DωAU]xn ωb Problem: which value of ω guarantees convergence of SOR?
Kahan theorem: A necessary condition for convergence of SOR is that 0 <ω<2 Ostrowski theorem: If A is Hermitian (A = A†) and positive definite (v∙Av > 0), SOR converges for all 0 < ω < 2 Stationary iterative methods vs. Krylov subspace methods Jacobi iteration Gauss‐Seidel iteration SORmethod Mxn1 Nxn b M = D N = ‒(AL + AU) N = ‒AU N=(ω‒1 ‒1)D‒AU Stationary iterative methods M = D + AL M=ω‒1D+AL Jacobi iteration, Gauss‐Seidel iteration, and SOR method have largely lost their interest with the advent of Krylov subspace method. So, they are not implemented in NumPy or SciPy Krylov subspace methods Krylov subspace method is the modern iterative method ‐‐ one the top 10 algorithms in 20th century Generalized minimal residual method (GMRES) Conjugate gradient method (CG) A 3D space is spanned by 3 orthonormal basis vectors e1, e2, e3 e3 e1,1  e2,1  e3,1  ee , ee , ee  1  1,2  e  2  2,2  e  3  3,2  e  x  1,3 eTe δ 1,  2,3 ik  3,3 i k ik 0, ik  e2 Any vector in 3D space is a linear combination of basis vectors xx1e1 x2e2 x3e3 Krylov subspace method 3D space = span{e1, e2, e3} e1 2D space = span{e1, e2} 2D space is a subspace of 3D space e3 2D space  3D space xx1e1 x2e2 x e2 is a vector in the subspace e1 Krylov subspace method e1,1  e2,1  e3,1  ee , ee , ee  e3 1  1,2  e  2  2,2  e  3  3,2  e   3,3 span{e1, e2, e3, e4} is impossible,  1,3 ei contains 3 components. So, e4 e2  2,3 Krylov subspace method because e4 can be represented by e4 ae1 be2 ce3 e1 The number of basis vectors ≤ dimension of basis vector K (A,v)  span M v,Av,,A M1  v Krylov subspace method The m‐th Krylov subspace: K (A,v)span 2 m1  m v,Av,A v,,A v Basis vectors are v, Av, A2v, ∙∙∙, Am‒1v If v is N1 and A is NN, m ≤ N (number of basis vectors ≤ dimension) K (A,v)span  v K2(A,v)spanv,Av 1 K (A,v)span 2  3 v,Av,A v  K1 K2 KM KM1 KN It’s possible that all Km with m ≥ M are same  K (A,v)  K (A,v) M M1M AK (A,v)  span Av,A2v,,AMv So, KM is invariant w.r.t. A Krylov subspace method Ax  b K (A,x)span 2 M1  M x,Ax,A x,,A x Solution x is a vector in KM(A, x), i.e., x is some linear combination of the basis vectors KM is invariant: AKM(A, x) = KM(A, x) K (A,x)  AK (A,x)  span  Ax,A2x,A3x,,AMx span 2 M1 K (A,b) MM b,Ab,A b,,A b M  KM (A,x)  KM (A,b) So, x KM(A, b), i.e., x is some linear combination of its basis vectors: x c1bc2Abc3A2bcMAM1b We can construct the Krylov subspace: K (A,b)span 2 M1  b Krylov subspace method Ax  b M b,Ab,A b,,A Then, construct the solution: x c1bc2Abc3A2bcMAM1b Two problems: (i) How to construct the Krylov subspace? (ii) How to construct the solution? Step (1): Construct K1(A, b) = span{b}; Construct x = c1b Good enough? NO Step (2): Krylov subspace method Ax  b Iterative improvement Construct K2(A, b) = span{b, Ab}; Construct x = c1b + c2Ab Good enough? NO Step (3): Construct K3(A, b) = span{b, Ab, A2b}; Construct x = c1b + c2Ab + c3A2b Good enough? NO ∙∙∙ Krylov subspace method Generalized minimal residual method (GMRES) (i) Construction of Krylov subspace Ax  b Vectors {b, Ab, A2b, ∙∙∙, Am–1b} are in general NOT orthogonal Gram‐Schmidt process: Given a set of linearly independent basis vectors {b, Ab, A2b, ∙∙∙} (m ≤ N), generate an orthonormal set of basis vectors {q1, q2, q3, ∙∙∙} (ii) Construction of solution The solution is constructed by the basis vectors of m‐th Krylov subspace xm =c1q1 +c2q2 +∙∙∙+cmqm Find {c1, ∙∙∙, cm} s.t. ‖b – Axm‖2 is minimized Gram‐Schmidt process Given {v1, v2, v3} They are linearly independent Anyvectorin3Dspacex=av1 +bv2 +cv3 Find orthonormal basis set {q1, q2, q3} v2 v3 v1 Gram‐Schmidt process Given {v1, v2, v3} uv,q 111 u1 u1 2 q1 v1 = u1 Gram‐Schmidt process Given {v1, v2, v3} uv(vTq)qvq(qTv), q u2 u2 2 2221121122 q1 Proj v2 to q1 v2 q2 u2 Proj v3 to q2 q2 Gram‐Schmidt process Given {v1, v2, v3} u v (vTq )q (vTq )q v q (qTv )q (qTv ), q  u3 u3 2 3 3311 3223113 2233 q1 Proj v3 to q1 q3 u3 v3 q2 Gram‐Schmidt process Given {v1, v2, v3} Find orthonormal basis set {q1, q2, q3} Anyvectorin3Dspacex=av1 +bv2 +cv3 Equivalently, x = a’q1 + b’q2 + c’q3 q3 q1 v1 v2 v3 Gram‐Schmidt process Given {v1, v2, v3, ∙∙∙} Find orthonormal basis set {q1, q2, q3, ∙∙∙} uv, q 111 u1 u1 2 u v q (qTv ), q  22112 2 u2 u2 2 uvq(qTv)q(qTv), q 331132233 u3 u3 2  k u v  q(qTv ), q  k1 uk1 uk1 2 k1 k1 i1 i i k1 Modified Gram‐Schmidt process (Arnoldi) Given {b, Ab, A2b, ∙∙∙} Find orthonormal basis set {q1, q2, q3, ∙∙∙} v b, u v , q  1111 u1 u1 2 v Aq , u v q (qTv ), q  2122112 2 u2 u2 2 v Aq , u v q (qTv )q (qTv ), q  32331132233 u3 u3 2  k v Aq, u v  q(qTv ), q  k1 uk1 uk1 2 k1 k k1 k1 i1 i i k1 Modified Gram‐Schmidt process (Arnoldi) Given {b, Ab, A2b, ∙∙∙} Find orthonormal basis set {q1, q2, q3, ∙∙∙} q1  q b b2 2 Aq  q (qT Aq ) 1111 Aq  q (qT Aq ) 1111 2 Aq q(qTAq)q(qTAq) 2112222 Aq q(qTAq)q(qTAq) q3  21122222  Aq  k k q (qT Aq ) iik i1 q k1 k k iik Aq  q (qT Aq ) i1 2 A q 1  q 1 H 1 1  q 2 H 2 1 Aq qH qH qH  q Ti A q k ,  k  jjk Modified Gram‐Schmidt process (Arnoldi) Given {b, Ab, A2b, ∙∙∙} Find orthonormal basis set {q1, q2, q3, ∙∙∙} k1 k k C 2 112 222 332 H   j1 2 m1 Aqm qH i im i1 k k1  i1 i1 i ik  i1 kk  i1  i1 Aq  qk  q (qT Aq ) Aq  iikk q (qT Aq ) iik Aq  Aqq C q(qTAq) qH q (qT Aq ) iik k k1  2 iik i  k ik Aqkq(qTAq),ik1 A A| |  11 1N    q1 qm A A| | Modified Gram‐Schmidt process (Arnoldi)  N1 NN    H11 H12 H1m  HHHH  21 22 2,m1 2m  | ||0 H32 H3,m1 H3m qqq    1 m m1  | ||00HH   m1,m1 m1,m   H1,m1  0 0  Hm,m1 Hmm  000H  m1,m  AQm = Qm+1Hm Hessenberg matrix q1  b b2 H qTAq 11 1 1 Modified Gram‐Schmidt process (Arnoldi) How to obtain Hessenberg matrix Hm?  q Ti A q k , k i  k Aqk q(qTAq) , ik1 H ik jjk j1 2 q:Aqq(qTAq)H q 1 1 1 1 21 2 q1  q2  b b2 H qTAq 11 1 1 Modified Gram‐Schmidt process (Arnoldi) How to obtain Hessenberg matrix Hm?  q Ti A q k , k i  k Aqk q(qTAq) , ik1 H ik jjk j1 2 q:Aqq(qTAq)H q 1 1 1 1 21 2 H qTAq, H qTAq 12 1 2 22 2 2 q qTT 2 q:Aqq(qAq)q(qAq)H q 2 1 1 2 2 2 2 32 2 q1  q2  b b2 H qTAq 11 1 1 q3  q qTTT 2 q:Aqq(qAq)q(qAq)H q 2 1 1 2 2 2 2 32 2 Modified Gram‐Schmidt process (Arnoldi) How to obtain Hessenberg matrix Hm?  q Ti A q k , k i  k Aqk q(qTAq) , ik1 H ik jjk j1 2 q:Aqq(qTAq)H q 1 1 1 1 21 2 H qTAq, H qTAq 12 1 2 22 2 2 q qTT H qTAq, H qTAq, H qTAq 13 1 3 23 2 3 33 3 3 2 q:Aqq(qAq)q(qAq)q(qAq)H q 3 1 1 3 2 2 3 3 3 3 43 2 q1  q2  b b2 H qTAq 11 1 1 q3  q4  q qTTT q q2 ∙∙∙ 2 q:Aqq(qAq)q(qAq)q(qAq)H q 3 1 1 3 2 2 3 3 3 3 43 2 Modified Gram‐Schmidt process (Arnoldi) How to obtain Hessenberg matrix Hm?  q Ti A q k , k i  k Aqk q(qTAq) , ik1 H ik jjk j1 2 q:Aqq(qTAq)H q 1 1 1 1 21 2 H qTAq, H qTAq 12 1 2 22 2 2 q qTT 2 q:Aqq(qAq)q(qAq)H q 2 1 1 2 2 2 2 32 2 H qTAq, H qTAq, H qTAq 13 1 3 23 2 3 33 3 3 Find the optimized solution GMRES: generalized minimal residual method Construct the solution:  | |  c1  xKcqcqcqq qQc m m 11 22 mm1 mmm | |c  Minimize the residual: bb2q1b2Qm1e1, AxmAQmcmQm1Hmcm min bAx min r cm2cm2 mm bAxm 2 b2Qm1e1Qm1Hmcm 2  T 1/2   m   b Q e Q H c b Q e Q H c   2 m1 1 m1 m m 2 m1 1 m1 m m    beTcTHTQTQ beHc  21 mm m1m1 21 mm  T 1/2  b e H c b e H c   b e H c 21mm21mm21mm 2 1/2 Find the optimized solution min b e H c AQR cm 21 mm2 RxQTb bAx Recall Page 84 of Lecture 3. This is a linear regression problem. The shape of Hm is (m + 1)m. It can be solved by QR decomposition: Hm ΩmRm Ωm : orthogonal matrix Rm : upper (right) triangular matrix R c  b ΩT e mm2m1 Since Rm is upper (right) triangular matrix. cm can be solved by backward substitution q H21,H22 H ΩR c x 2qH22222 q H31, H32, H33 H Ω R c x 3qH33333 Krylov subspace method Generalized minimal residual method (GMRES) H Ax  b q 11 H ΩR c x 1qH11111 21 No If‖b‒Ax1‖2 <ε? No 32 If‖b‒Ax2‖2 <ε? x*=x Yes If‖b‒Ax‖ <ε? 332 43 SciPy : Generate a random sparse matrix import numpy as np import matplotlib.pyplot as plt import scipy.sparse M = 5; N = 10 # dimensions of matrix A = scipy.sparse.random(M, N, 0.5) # M*N matrix. Randomly pick 50% elements to be 0. The other 50% elements are random numbers in [0, 1) print(A) # note how the sparse matrix is stored print(A.toarray()) # convert A to format of NumPy array fig, axs = plt.subplots() axs.imshow(A.toarray(), cmap='rainbow') axs.set_xlabel('col index') axs.set_ylabel('row index') plt.tight_layout() plt.show() SciPy : Solve system of linear equations by GMRES import numpy as np import matplotlib.pyplot as plt import scipy.sparse import scipy.sparse.linalg N = 100 A = scipy.sparse.random(N, N, 0.5) + 4*scipy.sparse.eye(N) b = np.random.rand(N, 1) fig, axs = plt.subplots() axs.imshow(A.toarray(), cmap='rainbow') plt.show() guess = np.zeros(shape=b.shape) x, exitCode = scipy.sparse.linalg.gmres(A, b, x0=guess, tol=1e-10, maxiter=10000) # x0 : the initial guess of the solution # tol : tolerance |b - Ax|/|b| # maxiter : maximum steps allowed print(x) print(exitCode) # =0: successful exit; >0: number of iterations but tolerance has not been achieved; <0: illegal input or breakdown Krylov subspace method Conjugate gradient method (CG) Ax  b Only consider the case where A is symmetric and positive‐definite If the initial guess is x0, then the residual is r bAx 00 A(xx )r 00 A’ x’ b’ xK (A,b)span 2 M1  b Recall xK (A,b)xx K (A,r )spanr ,Ar ,A2r ,,AM1r  M b,Ab,A b,,A M 0M0 0000 xx span 0 2 M1 r ,Ar ,A r ,,A r  0000 So, exact solution x ‒ x0 is linear combination of r0, Ar0, A2r0, ∙∙∙, AM‒1r0 Conjugate gradient method (CG) Iterative improvement x  x We want to approach this exact solution by iteration The exact solution is 0  span 0000 2 M1 r ,Ar ,A r ,,A r  xxαr n1 n nn How to determine rn? How to determine αn? Conditions for the choice of residuals {ri} x x span 2 n1 K (A,r ) r,Ar,Ar,,A r n0 000 0n0 r bAx r A(x x )span 2 n K (A,r ) r ,Ar ,A r ,,A r  n n 0 n 0 0 0 0 0 n1 0 The 1st condition: α r K ii i1 Let k ≥ i + 1 rK span A2r 0 Conjugate gradient method (CG) Conditions for the choice of residuals {ri} xix0Ki andxi1x0Ki1 αr x x (x x )(x x )K ii i1 i i1 0 i 0 i1 k1 k r,Ar,,A r,Ar k k1 0 0 0 0 r2 K3 rk is outside Kk space. E.g. if Kk is a plane, rk must be a vector pointing out of the plane (the additional basis Akr0 leads to the out‐ of‐plane component) Ar0 r1 K2 Idea of CG Most efficient search is rk  all basis vectors of Kk: r0 r(K).Sor (K) k k k1 k Conjugate gradient method (CG) Conditions for the choice of residuals {ri} Ki+1 (Kk) (Ki+1) Kk Let k ≥ i + 1 A(αr)Ax Ax (bAx )bAx r r(K)(K ) kkk1k k1 kk1kki1 The 2nd condition: Both conditions must be satisfied when A(α r )(K ) kk i1 A‐conjugate: (αr)T A(α r )0rTAr 0 (ki) iikkik 1 2 x2A 1T1 xA1b 2 A Conjugate gradient method (CG) Understanding CG assumption ({ri} are A‐conjugate) from geometry Define A‐norm: Recall that A is symmetric and positive‐definite Seek x s.t. the A‐norm of the residual is minimum: xargmin1 xA1b2 x  xTAx A 2xA1b AxA1b2xTAxxTbbTATAxbTATb ATA1T TTT11T T1T1    2 x A x  x b  b x  b A b  2 x A x  x b  2 b A b constant So, solving system of linear equations is equivalent to finding minimum of the function f (x)  1 xT Ax  xT b 2 How does the function f(x) look like? Consider 2D case (N = 2): Conjugate gradient method (CG) Understanding CG assumption ({ri} are A‐conjugate) from geometry Seek x to minimize the function: f (x)  1 xT Ax  xT b 2 f(x ,x )1x x A A x x 1 2 1 2  11 12  1  1 x b  2  1  2AAxb  21 22  2   2  1A x21(A A )xx1A x2bxbx 2 11 1 2 12 21 1 2 2 22 2 1 1 2 2 Example: A A 1 0, b 0  11 12     1    AA01b0  21 22     2    f(x ,x )1x2 1x2 122122 Example: A A 1 0, b 0  11 12     1    AA01b0  21 22     2    Example: A A 2.5 0 , b 0  11 12     1    A A 0 0.4 b 0  21 22     2    Example: A A  1 0.8, b 0  11 12     1    A A 0.8 1 b 0  21 22     2    In 2D, N = 2, and A = I: converge by at most 2 steps. New search direction is along gradient (rapid decrease) and orthogonal to old one x0 α1r1 x x1 α0r0 rTr0, rf(x) 101 xx1 r bAx 00 So, direction of r0 is arbitrary. But, always we only need 2 steps If A ≠ I and still use orthogonal vectors, rTr0, rf(x) xxk (ki) x0 α1r1 α2r2 x3 α4r4 α0r0 x1 kik Then, we need much more steps for convergence The orthogonal vectors are not good basis vectors for searching the minimum x2 α3r3x4 x If A ≠ I and use A‐orthogonal vectors, rTAr0 (ki) ki Then, again we only need 2 steps for convergence in 2D. Why? α0r0 x0 x1 α1r1 x If A ≠ I and use A‐orthogonal vectors, rTAr0 (ki) ki Then, again we only need 2 steps for convergence in 2D. Why? f  xT Ax f  xT (FT AF1)x transformationofcoordinate: xFx  If pick F s.t. F–TAF–1 = I A = FTF, after transformation by F and under the new coordinate system (primed), the search directions should be orthogonal What does this require for {ri}? rTr0 ki rTrrTFTFr rT Ar 0 kikiki A‐conjugate In the coordinate system with A = I (primed), the search directions should be orthogonal, while in the coordinate system with A ≠ I (unprimed), the search direction should be A‐orthogonal Conjugate gradient method (CG) How to construct A‐orthogonal basis vectors? Gram‐Schmidt process Given {r0, r1, r2, ∙∙∙} Find A‐orthogonal basis set {q0, q1, q2, ∙∙∙} ur, q 000 u0 u0 u r (qTAr)q , q  11010 1 A u1 ur(qTAr)q(qTAr)q, q 220201212 u2 A  k  1 ur (qTAr)q, q uk uk kk i0 iki k u1 u2 A A qT k1 Aq  k1 k  qT Ar  (qT Ar )(qT Aq ) Conjugate gradient method (CG) How to construct A‐orthogonal basis vectors? Gram‐Schmidt process Check if {q0, q1, q2, ∙∙∙} constructed by Gram‐Schmidt process are A‐orthogonal T kk qTAq uTAu 1 kk 2 uk qTAqq0Au1 1 qTAr(qTAr)(qTAq)0 01u1 u1010100 AA A k1 qAu 1   T k k1 k i k k1 i i0  uk A  uk A  1 qT Ar(qT Ar)(qT Aq )0 uk k1 k k1 k k1 k1 A kk i0 i k i k Conjugate gradient method (CG) r bAx 00 u r, q  u0 x x αq 000u 1000 0A r bAx bAx α Aq r α Aq 1 1 000000 ur(qTAr)q, q u1 xxαq 110101u 2111 1A r r α Aq 2111 ur(qTAr)q(qTAr)q, q u2 xxαq 220201212u 3222 ∙∙∙ ∙∙∙ k  1 How to determine {αi}? ur (qTAr)q,q uk uk xxαq k1 k k k A 2A Conjugate gradient method (CG) How to determine {αi}? Search αk s.t. f(xk+1) is minimized f(x )f(x αq)1(x αq)TA(x αq)(x αq)Tb k1 k kk 2k kk k kk k kk 1qTAq α2 1xTAq 1qTAx qTbα 1xTAx xTb kkkkkkkkkkkk 2222 11 f(xαq)qTAq α xTAq qTAxqTb0 αkkk kkk2kk2kkk k αqTAq qTb xTAq qTAx 1 1 1  kkkkkkkk 22 T 1TT 1T ATA T T q b  q A x  q A x    q ( b  A x )  q r k2kk2kkkkkk α k qr T kk qk Aqk T Conjugate gradient method (CG) r bAx 00 T u r, q  u0 000u 1000 0A 1000 r r α Aq ur(qTAr)q, q u1 T xxαq 110101u 2111 r r α Aq 2111 1A α T 2 220201212u 3222 ur(qTAr)q(qTAr)q, q u2 2A T ∙∙∙ ∙∙∙ k  1 T ur (qTAr)q,q k xxαq k1 k k k kk i0 i k i k u uk T A α 0 qr α 1 qr α k qr 00 q0Aq0 x x αq T 11 q1Aq1 T qr 22 q2Aq2 kk qk Aqk xxαq N = 100 SciPy : Solve system of linear equations by CG import numpy as np import matplotlib.pyplot as plt import scipy.sparse import scipy.sparse.linalg # note that A must be a symmetric, positive-definite matrix A = scipy.sparse.random(N, N, 0.5) A = A.toarray() A = A + np.transpose(A) A = scipy.sparse.csr_matrix(A) # convert numpy array to a “compressed sparse row matrix” (csr) print(A) A += 4*scipy.sparse.eye(N) b = np.random.rand(N, 1) guess = np.zeros(shape=b.shape) x, exitCode = scipy.sparse.linalg.cg(A, b, x0=guess, tol=1e-10, maxiter=10000) print(x) print(exitCode) System of linear equations Ax  b How to solve a system of linear equations? Direct method (i) Gaussian elimination (ii) LU decomposition (iii) QR decomposition Iterative method Basics 1: Normed space Basics 2: Banach fixed point theorem Stationary methods (old) (i) Jacobi method (ii) Gauss‐Seidel method (iii) Relaxation method Krylov subspace methods (modern) (i) Generalized minimal residual (GMRES) (ii) Conjugate gradient (CG)