代写 algorithm Scheme html math scala parallel graph Lecture 10: Parallel Preconditioning

Lecture 10: Parallel Preconditioning
Kirk M. Soodhalter
Trinity College Dublin The University of Dublin Ireland http://math.soodhalter.com
Soodhalter Parallel Numerics 5636 Parallel Preconditioning

Class Business
• Any questions from yesterday?
Soodhalter Parallel Numerics 5636 Parallel Preconditioning

Previously in Parallel Numerics…
• Forgotten topic: Incorporating boundary conditions • Sparse matrix-vector product in parallel
• Preconditioning
• Preconditioned CG
• Parallel CG with preconditioning
Soodhalter Parallel Numerics 5636 Parallel Preconditioning

Overview for today
• Afterthoughts from last lecture
• Jacobi vs. Gauss-Seidel – Parallelizability vs. Information
communication
• Compromise between parallel computations and communication
• Synchronous versus asynchronous methods
• Graph colorings
• Red-black Gauss-Seidel
• Polynomial Preconditioning
Soodhalter Parallel Numerics 5636 Parallel Preconditioning

How to deal with boundary conditions – Dirichlet
In a finite difference scheme, how do we incorporate boundary conditions?
φ(x,y+a)
φ(x−a,y)
φ(x,y)
φ(x+a,y)
φ(x,y−a)
• Dirichlet is straightforward. Value for associate stencil point is known. Adjust right-hand side, i.e.,
1 (4φ11 −φ2,1 −x1,2 −g01 −g10)=f11 h2
wheregij =g(xi,yj).
Soodhalter Parallel Numerics 5636 Parallel Preconditioning

How to deal with boundary conditions – Dirichlet
In a finite difference scheme, how do we incorporate boundary conditions?
φ(x,y+a)
φ(x−a,y)
φ(x,y)
φ(x+a,y)
φ(x,y−a)
• Dirichlet is straightforward. Value for associate stencil point is known. Adjust right-hand side, i.e.,
1 (4φ11 −x2,1 −φ1,2)=f11 + 1 (g01 −g10) h2 h2
wheregij =g(xi,yj).
Soodhalter Parallel Numerics 5636 Parallel Preconditioning

How to deal with boundary conditions – Neumann
In a finite difference scheme, how do we incorporate boundary conditions?
φ(x,y+a)
φ(x−a,y)
φ(x,y)
φ(x+a,y)
φ(x,y−a)
• Focus on equation for unknown φ21 = φ(x2, y1)
• Neumann requires us to use ghost points. normal derivative
valueisknown, ∂φ =g;gij =g(xi,yj) ∂n
• Approx. normal derivative with 2nd order difference φ22−φ20 2h
• We have two equations
(4φ21 −φ20 −x22 −φ31 −φ11) = h2f2 φ22 − φ20 = 2hg11
Soodhalter Parallel Numerics 5636 Parallel Preconditioning

How to deal with boundary conditions – Neumann at a corner point
Observe that at corners of the boundaries, a unique normal vector (and therefore normal derivative) does not exist.
• We take the average of the two possible normal vectors
• Focus on equation for unknown φ11 = φ(x1, y1)
• Horiz. normal derivative with 2nd order difference φ01−φ21 2h
• Vert. normal derivative with 2nd order difference φ10−φ12 2h
• Average approximate normal derivative: 1 􏰅φ01−φ21 + φ10−φ12 􏰆 = 1 (g10 + g01)
2 2h 2h 2 • We have two equations
(4φ11 −φ10 −φ21 −φ12 −φ01) = φ01 −φ21 +φ10 −φ12 =
h2f2
2h(g10 +g01)
Soodhalter Parallel Numerics 5636 Parallel Preconditioning

Recall Gauss-Seidel iteration
Recall
The (forward) Gauss-Seidel (GS) iteration determines the ith component of xk+1 by annihilating the ith component of
rk = b − Axk, like Jacobi, BUT each component of xk is overwritten immediately.
• aiiξ(k+1) =−􏰍i−1 aijξ(k+1) −􏰍n aijξ(k) +bi i j=1 j j=i+1 j
• Each iteration: sweep through all coordinates once; overwrite x(k) component-wise immediately
r1  r2 
x1 
x2 
= b − A x3
 x4 
x5
r3 
r4  r5
Soodhalter Parallel Numerics 5636 Parallel Preconditioning

Recall Gauss-Seidel iteration
Recall
The (forward) Gauss-Seidel (GS) iteration determines the ith component of xk+1 by annihilating the ith component of
rk = b − Axk, like Jacobi, BUT each component of xk is overwritten immediately.
• aiiξ(k+1) =−􏰍i−1 aijξ(k+1) −􏰍n aijξ(k) +bi i j=1 j j=i+1 j
• Each iteration: sweep through all coordinates once; overwrite x(k) component-wise immediately
0 r2 
x1
x2 
= b − A x3
 x4 
x5
r3 
r4  r5
Soodhalter Parallel Numerics 5636 Parallel Preconditioning

Recall Gauss-Seidel iteration
Recall
The (forward) Gauss-Seidel (GS) iteration determines the ith component of xk+1 by annihilating the ith component of
rk = b − Axk, like Jacobi, BUT each component of xk is overwritten immediately.
• aiiξ(k+1) =−􏰍i−1 aijξ(k+1) −􏰍n aijξ(k) +bi i j=1 j j=i+1 j
• Each iteration: sweep through all coordinates once; overwrite x(k) component-wise immediately
r1  0
x1 
x2
= b − A x3
 x4 
x5
r3 
r4  r5
Soodhalter Parallel Numerics 5636 Parallel Preconditioning

Recall Gauss-Seidel iteration
Recall
The (forward) Gauss-Seidel (GS) iteration determines the ith component of xk+1 by annihilating the ith component of
rk = b − Axk, like Jacobi, BUT each component of xk is overwritten immediately.
• aiiξ(k+1) =−􏰍i−1 aijξ(k+1) −􏰍n aijξ(k) +bi i j=1 j j=i+1 j
• Each iteration: sweep through all coordinates once; overwrite x(k) component-wise immediately
r1  r2 
x1 
x2 
= b − A x3
 x4 
x5
 0  
r4  r5
Soodhalter Parallel Numerics 5636 Parallel Preconditioning

Recall Gauss-Seidel iteration
Recall
The (forward) Gauss-Seidel (GS) iteration determines the ith component of xk+1 by annihilating the ith component of
rk = b − Axk, like Jacobi, BUT each component of xk is overwritten immediately.
• aiiξ(k+1) =−􏰍i−1 aijξ(k+1) −􏰍n aijξ(k) +bi i j=1 j j=i+1 j
• Each iteration: sweep through all coordinates once; overwrite x(k) component-wise immediately
r1  r2 
r3 0 
r5
x1 
x2 
= b − A x3
 x4
x5
Soodhalter Parallel Numerics 5636 Parallel Preconditioning

Recall Gauss-Seidel iteration
Recall
The (forward) Gauss-Seidel (GS) iteration determines the ith component of xk+1 by annihilating the ith component of
rk = b − Axk, like Jacobi, BUT each component of xk is overwritten immediately.
• aiiξ(k+1) =−􏰍i−1 aijξ(k+1) −􏰍n aijξ(k) +bi i j=1 j j=i+1 j
• Each iteration: sweep through all coordinates once; overwrite x(k) component-wise immediately
r1  r2 
r3 
x1 
x2 
= b − A x3
r4 
0 x5
 x4 
Soodhalter Parallel Numerics 5636 Parallel Preconditioning

Recall Gauss-Seidel iteration
Recall
The (forward) Gauss-Seidel (GS) iteration determines the ith component of xk+1 by annihilating the ith component of
rk = b − Axk, like Jacobi, BUT each component of xk is overwritten immediately.
• aiiξ(k+1) =−􏰍i−1 aijξ(k+1) −􏰍n aijξ(k) +bi i j=1 j j=i+1 j
• Each iteration: sweep through all coordinates once; overwrite x(k) component-wise immediately
0 r2 
x1
x2 
= b − A x3
 x4 
x5
r3 
r4  r5
Soodhalter Parallel Numerics 5636 Parallel Preconditioning

Recall Gauss-Seidel iteration
Recall
The (forward) Gauss-Seidel (GS) iteration determines the ith component of xk+1 by annihilating the ith component of
rk = b − Axk, like Jacobi, BUT each component of xk is overwritten immediately.
• aiiξ(k+1) =−􏰍i−1 aijξ(k+1) −􏰍n aijξ(k) +bi i j=1 j j=i+1 j
• Each iteration: sweep through all coordinates once; overwrite x(k) component-wise immediately
r1  0
x1 
x2
= b − A x3
 x4 
x5
r3 
r4  r5
Soodhalter Parallel Numerics 5636 Parallel Preconditioning

Recall Gauss-Seidel iteration
Recall
The (forward) Gauss-Seidel (GS) iteration determines the ith component of xk+1 by annihilating the ith component of
rk = b − Axk, like Jacobi, BUT each component of xk is overwritten immediately.
• aiiξ(k+1) =−􏰍i−1 aijξ(k+1) −􏰍n aijξ(k) +bi i j=1 j j=i+1 j
• Each iteration: sweep through all coordinates once; overwrite x(k) component-wise immediately
r1  r2 
x1 
x2 
= b − A x3
 x4 
x5
 0  
r4  r5
Soodhalter Parallel Numerics 5636 Parallel Preconditioning

Recall Gauss-Seidel iteration
Recall
The (forward) Gauss-Seidel (GS) iteration determines the ith component of xk+1 by annihilating the ith component of
rk = b − Axk, like Jacobi, BUT each component of xk is overwritten immediately.
• aiiξ(k+1) =−􏰍i−1 aijξ(k+1) −􏰍n aijξ(k) +bi i j=1 j j=i+1 j
• Each iteration: sweep through all coordinates once; overwrite x(k) component-wise immediately
r1  r2 
r3 0 
r5
x1 
x2 
= b − A x3
 x4
x5
Soodhalter Parallel Numerics 5636 Parallel Preconditioning

Recall Gauss-Seidel iteration
Recall
The (forward) Gauss-Seidel (GS) iteration determines the ith component of xk+1 by annihilating the ith component of
rk = b − Axk, like Jacobi, BUT each component of xk is overwritten immediately.
• aiiξ(k+1) =−􏰍i−1 aijξ(k+1) −􏰍n aijξ(k) +bi i j=1 j j=i+1 j
• Each iteration: sweep through all coordinates once; overwrite x(k) component-wise immediately
r1  r2 
r3 
x1 
x2 
= b − A x3
r4 
0 x5
 x4 
Soodhalter Parallel Numerics 5636 Parallel Preconditioning

Compare with Jacobi iteration
Recall
The Jacobi iteration determines the ith component of xk+1 by annihilating the ith component of rk = b − Axk, but does not overwrite xk until after full sweep.
• aiiξ(k+1) =−􏰍n aijξ(k) +bi i j=1j
• Each iteration: sweep through all coordinates once; overwrite x(k) component-wise immediately
r1 x1
r2 x2
r3 = b − A x3
0 0
r4 x4
r5 x5 0

with
xtemp = 0 0
Soodhalter Parallel Numerics 5636 Parallel Preconditioning

Compare with Jacobi iteration
Recall
The Jacobi iteration determines the ith component of xk+1 by annihilating the ith component of rk = b − Axk, but does not overwrite xk until after full sweep.
• aiiξ(k+1) =−􏰍n aijξ(k) +bi i j=1j
• Each iteration: sweep through all coordinates once; overwrite x(k) component-wise immediately
0 x1 x1 r2 x2 0
r3=b−Ax3 with xtemp =0 r4 x4 0

r5 x5 0
Soodhalter Parallel Numerics 5636 Parallel Preconditioning

Compare with Jacobi iteration
Recall
The Jacobi iteration determines the ith component of xk+1 by annihilating the ith component of rk = b − Axk, but does not overwrite xk until after full sweep.
• aiiξ(k+1) =−􏰍n aijξ(k) +bi i j=1j
• Each iteration: sweep through all coordinates once; overwrite x(k) component-wise immediately
r1 x1 x1 0 x2 x2
r3=b−Ax3 with xtemp =0 r4 x4 0

r5 x5 0
Soodhalter Parallel Numerics 5636 Parallel Preconditioning

Compare with Jacobi iteration
Recall
The Jacobi iteration determines the ith component of xk+1 by annihilating the ith component of rk = b − Axk, but does not overwrite xk until after full sweep.
• aiiξ(k+1) =−􏰍n aijξ(k) +bi i j=1j
• Each iteration: sweep through all coordinates once; overwrite x(k) component-wise immediately
r1 x1
r2 x2
0=b−Ax3
with
x1
x2 xtemp =x3
r4 x4 0

r5 x5 0
Soodhalter Parallel Numerics 5636 Parallel Preconditioning

Compare with Jacobi iteration
Recall
The Jacobi iteration determines the ith component of xk+1 by annihilating the ith component of rk = b − Axk, but does not overwrite xk until after full sweep.
• aiiξ(k+1) =−􏰍n aijξ(k) +bi i j=1j
• Each iteration: sweep through all coordinates once; overwrite x(k) component-wise immediately
r1 x1
r2 x2
x1
x2 xtemp = x3
r3 = b − A x3
0 x4 x4

r5 x5 0
with
Soodhalter Parallel Numerics 5636 Parallel Preconditioning

Compare with Jacobi iteration
Recall
The Jacobi iteration determines the ith component of xk+1 by annihilating the ith component of rk = b − Axk, but does not overwrite xk until after full sweep.
• aiiξ(k+1) =−􏰍n aijξ(k) +bi i j=1j
• Each iteration: sweep through all coordinates once; overwrite x(k) component-wise immediately
r1 x1
r2 x2
r3 = b − A x3
with
x1
x2 xtemp = x3 x4
r4 x4
0x5 x5

Soodhalter Parallel Numerics 5636 Parallel Preconditioning

Compare with Jacobi iteration
Recall
The Jacobi iteration determines the ith component of xk+1 by annihilating the ith component of rk = b − Axk, but does not overwrite xk until after full sweep.
• aiiξ(k+1) =−􏰍n aijξ(k) +bi i j=1j
• Each iteration: sweep through all coordinates once; overwrite x(k) component-wise immediately
r1 x1
r2 x2
r3 = b − A x3
0 0
r4 x4
r5 x5 0

with
xtemp = 0 0
Soodhalter Parallel Numerics 5636 Parallel Preconditioning

Compare with Jacobi iteration
Recall
The Jacobi iteration determines the ith component of xk+1 by annihilating the ith component of rk = b − Axk, but does not overwrite xk until after full sweep.
• aiiξ(k+1) =−􏰍n aijξ(k) +bi i j=1j
• Each iteration: sweep through all coordinates once; overwrite x(k) component-wise immediately
0 x1 x1 r2 x2 0
r3=b−Ax3 with xtemp =0 r4 x4 0

r5 x5 0
Soodhalter Parallel Numerics 5636 Parallel Preconditioning

Compare with Jacobi iteration
Recall
The Jacobi iteration determines the ith component of xk+1 by annihilating the ith component of rk = b − Axk, but does not overwrite xk until after full sweep.
• aiiξ(k+1) =−􏰍n aijξ(k) +bi i j=1j
• Each iteration: sweep through all coordinates once; overwrite x(k) component-wise immediately
r1 x1 x1 0 x2 x2
r3=b−Ax3 with xtemp =0 r4 x4 0

r5 x5 0
Soodhalter Parallel Numerics 5636 Parallel Preconditioning

Compare with Jacobi iteration
Recall
The Jacobi iteration determines the ith component of xk+1 by annihilating the ith component of rk = b − Axk, but does not overwrite xk until after full sweep.
• aiiξ(k+1) =−􏰍n aijξ(k) +bi i j=1j
• Each iteration: sweep through all coordinates once; overwrite x(k) component-wise immediately
r1 x1
r2 x2
0=b−Ax3
with
x1
x2 xtemp =x3
r4 x4 0

r5 x5 0
Soodhalter Parallel Numerics 5636 Parallel Preconditioning

Compare with Jacobi iteration
Recall
The Jacobi iteration determines the ith component of xk+1 by annihilating the ith component of rk = b − Axk, but does not overwrite xk until after full sweep.
• aiiξ(k+1) =−􏰍n aijξ(k) +bi i j=1j
• Each iteration: sweep through all coordinates once; overwrite x(k) component-wise immediately
r1 x1
r2 x2
x1
x2 xtemp = x3
r3 = b − A x3
0 x4 x4

r5 x5 0
with
Soodhalter Parallel Numerics 5636 Parallel Preconditioning

Compare with Jacobi iteration
Recall
The Jacobi iteration determines the ith component of xk+1 by annihilating the ith component of rk = b − Axk, but does not overwrite xk until after full sweep.
• aiiξ(k+1) =−􏰍n aijξ(k) +bi i j=1j
• Each iteration: sweep through all coordinates once; overwrite x(k) component-wise immediately
r1 x1
r2 x2
r3 = b − A x3
with
x1
x2 xtemp = x3 x4
r4 x4
0x5 x5

Soodhalter Parallel Numerics 5636 Parallel Preconditioning

Compare with Jacobi iteration
Recall
The Jacobi iteration determines the ith component of xk+1 by annihilating the ith component of rk = b − Axk, but does not overwrite xk until after full sweep.
• aiiξ(k+1) =−􏰍n aijξ(k) +bi i j=1j
• Each iteration: sweep through all coordinates once; overwrite x(k) component-wise immediately
r1 x1
r2 x2
r3 = b − A x3
0 0
r4 x4
r5 x5 0

with
xtemp = 0 0
Soodhalter Parallel Numerics 5636 Parallel Preconditioning

About Jacobi and Gauss-Seidel (block or scalar)
• Jacobi: duplicate storage for xtemp
• Jacobi: components independent of one another; can be
computed independently
• Gauss-Seidel: forward- and reverse sweeps mean each component x depends on any previously updated components
• Gauss-Seidel: instantaneous updating leads to faster convergence
• Gauss-Seidel: satisfies the Gauss principle Definition (Gauss Principle)
An iterative method in which new partial results are immediately used anew for subsequent calculations is said to satisfy the Gauss principle.
Soodhalter Parallel Numerics 5636 Parallel Preconditioning

About Jacobi and Gauss-Seidel (block or scalar)
• Jacobi: duplicate storage for xtemp
• Jacobi: components independent of one another; can be
computed independently
• Gauss-Seidel: forward- and reverse sweeps mean each component x depends on any previously updated components
• Gauss-Seidel: instantaneous updating leads to faster convergence
• Gauss-Seidel: satisfies the Gauss principle Definition (Gauss Principle)
An iterative method in which new partial results are immediately used anew for subsequent calculations is said to satisfy the Gauss principle.
Soodhalter Parallel Numerics 5636 Parallel Preconditioning

About Jacobi and Gauss-Seidel (block or scalar)
• Jacobi: duplicate storage for xtemp
• Jacobi: components independent of one another; can be
computed independently
• Gauss-Seidel: forward- and reverse sweeps mean each component x depends on any previously updated components
• Gauss-Seidel: instantaneous updating leads to faster convergence
• Gauss-Seidel: satisfies the Gauss principle Definition (Gauss Principle)
An iterative method in which new partial results are immediately used anew for subsequent calculations is said to satisfy the Gauss principle.
Soodhalter Parallel Numerics 5636 Parallel Preconditioning

About Jacobi and Gauss-Seidel (block or scalar)
• Jacobi: duplicate storage for xtemp
• Jacobi: components independent of one another; can be
computed independently
• Gauss-Seidel: forward- and reverse sweeps mean each component x depends on any previously updated components
• Gauss-Seidel: instantaneous updating leads to faster convergence
• Gauss-Seidel: satisfies the Gauss principle Definition (Gauss Principle)
An iterative method in which new partial results are immediately used anew for subsequent calculations is said to satisfy the Gauss principle.
Soodhalter Parallel Numerics 5636 Parallel Preconditioning

About Jacobi and Gauss-Seidel (block or scalar)
• Jacobi: duplicate storage for xtemp
• Jacobi: components independent of one another; can be
computed independently
• Gauss-Seidel: forward- and reverse sweeps mean each component x depends on any previously updated components
• Gauss-Seidel: instantaneous updating leads to faster convergence
• Gauss-Seidel: satisfies the Gauss principle Definition (Gauss Principle)
An iterative method in which new partial results are immediately used anew for subsequent calculations is said to satisfy the Gauss principle.
Soodhalter Parallel Numerics 5636 Parallel Preconditioning

How to effectively parallelize
• Jacobi and block Jacobi are inherently fully parallel
→ Each processor solves its own Jacobi (block)
subproblem
→ Update of approximation happens at the end
→ Disadvantage: all processors need to finish before
update and moving to next sweep
→ Disadvantage:Large communication choke points
→ Disadvantage:Slower convergence because Gauss
principle not satisfied
• Parallelizing Gauss-Seidel (G-S)
→ Each processor solves its own G-S (block) subproblem
→ Update of the unknowns happens instantaneously
→ Disadvantage: forward and backward G-S are
strongly sequential
→ Disadvantage: they require new unknowns to be
communicated at each iteration
→ Advantage: Satisfy the Gauss principle, faster
Soodhalter Parallel Numerics 5636 Parallel Preconditioning
convergence, more effective preconditioner

How to effectively parallelize
• Jacobi and block Jacobi are inherently fully parallel
→ Each processor solves its own Jacobi (block)
subproblem
→ Update of approximation happens at the end
→ Disadvantage: all processors need to finish before
update and moving to next sweep
→ Disadvantage:Large communication choke points
→ Disadvantage:Slower convergence because Gauss
principle not satisfied
• Parallelizing Gauss-Seidel (G-S)
→ Each processor solves its own G-S (block) subproblem
→ Update of the unknowns happens instantaneously
→ Disadvantage: forward and backward G-S are
strongly sequential
→ Disadvantage: they require new unknowns to be
communicated at each iteration
→ Advantage: Satisfy the Gauss principle, faster
Soodhalter Parallel Numerics 5636 Parallel Preconditioning
convergence, more effective preconditioner

How to effectively parallelize
• Jacobi and block Jacobi are inherently fully parallel
→ Each processor solves its own Jacobi (block)
subproblem
→ Update of approximation happens at the end
→ Disadvantage: all processors need to finish before
update and moving to next sweep
→ Disadvantage:Large communication choke points
→ Disadvantage:Slower convergence because Gauss
principle not satisfied
• Parallelizing Gauss-Seidel (G-S)
→ Each processor solves its own G-S (block) subproblem
→ Update of the unknowns happens instantaneously
→ Disadvantage: forward and backward G-S are
strongly sequential
→ Disadvantage: they require new unknowns to be
communicated at each iteration
→ Advantage: Satisfy the Gauss principle, faster
Soodhalter Parallel Numerics 5636 Parallel Preconditioning
convergence, more effective preconditioner

How to effectively parallelize
• Jacobi and block Jacobi are inherently fully parallel
→ Each processor solves its own Jacobi (block)
subproblem
→ Update of approximation happens at the end
→ Disadvantage: all processors need to finish before
update and moving to next sweep
→ Disadvantage:Large communication choke points
→ Disadvantage:Slower convergence because Gauss
principle not satisfied
• Parallelizing Gauss-Seidel (G-S)
→ Each processor solves its own G-S (block) subproblem
→ Update of the unknowns happens instantaneously
→ Disadvantage: forward and backward G-S are
strongly sequential
→ Disadvantage: they require new unknowns to be
communicated at each iteration
→ Advantage: Satisfy the Gauss principle, faster
Soodhalter Parallel Numerics 5636 Parallel Preconditioning
convergence, more effective preconditioner

How to effectively parallelize
• Jacobi and block Jacobi are inherently fully parallel
→ Each processor solves its own Jacobi (block)
subproblem
→ Update of approximation happens at the end
→ Disadvantage: all processors need to finish before
update and moving to next sweep
→ Disadvantage:Large communication choke points
→ Disadvantage:Slower convergence because Gauss
principle not satisfied
• Parallelizing Gauss-Seidel (G-S)
→ Each processor solves its own G-S (block) subproblem
→ Update of the unknowns happens instantaneously
→ Disadvantage: forward and backward G-S are
strongly sequential
→ Disadvantage: they require new unknowns to be
communicated at each iteration
→ Advantage: Satisfy the Gauss principle, faster
Soodhalter Parallel Numerics 5636 Parallel Preconditioning
convergence, more effective preconditioner

How to effectively parallelize
• Jacobi and block Jacobi are inherently fully parallel
→ Each processor solves its own Jacobi (block)
subproblem
→ Update of approximation happens at the end
→ Disadvantage: all processors need to finish before
update and moving to next sweep
→ Disadvantage:Large communication choke points
→ Disadvantage:Slower convergence because Gauss
principle not satisfied
• Parallelizing Gauss-Seidel (G-S)
→ Each processor solves its own G-S (block) subproblem
→ Update of the unknowns happens instantaneously
→ Disadvantage: forward and backward G-S are
strongly sequential
→ Disadvantage: they require new unknowns to be
communicated at each iteration
→ Advantage: Satisfy the Gauss principle, faster
Soodhalter Parallel Numerics 5636 Parallel Preconditioning
convergence, more effective preconditioner

How to effectively parallelize
• Jacobi and block Jacobi are inherently fully parallel
→ Each processor solves its own Jacobi (block)
subproblem
→ Update of approximation happens at the end
→ Disadvantage: all processors need to finish before
update and moving to next sweep
→ Disadvantage:Large communication choke points
→ Disadvantage:Slower convergence because Gauss
principle not satisfied
• Parallelizing Gauss-Seidel (G-S)
→ Each processor solves its own G-S (block) subproblem
→ Update of the unknowns happens instantaneously
→ Disadvantage: forward and backward G-S are
strongly sequential
→ Disadvantage: they require new unknowns to be
communicated at each iteration
→ Advantage: Satisfy the Gauss principle, faster
Soodhalter Parallel Numerics 5636 Parallel Preconditioning
convergence, more effective preconditioner

How to effectively parallelize
• Jacobi and block Jacobi are inherently fully parallel
→ Each processor solves its own Jacobi (block)
subproblem
→ Update of approximation happens at the end
→ Disadvantage: all processors need to finish before
update and moving to next sweep
→ Disadvantage:Large communication choke points
→ Disadvantage:Slower convergence because Gauss
principle not satisfied
• Parallelizing Gauss-Seidel (G-S)
→ Each processor solves its own G-S (block) subproblem
→ Update of the unknowns happens instantaneously
→ Disadvantage: forward and backward G-S are
strongly sequential
→ Disadvantage: they require new unknowns to be communicated at each iteration
→ Advantage: Satisfy the Gauss principle, faster
Soodhalter Parallel Numerics 5636 Parallel Preconditioning
convergence, more effective preconditioner

How to effectively parallelize
• Jacobi and block Jacobi are inherently fully parallel
→ Each processor solves its own Jacobi (block)
subproblem
→ Update of approximation happens at the end
→ Disadvantage: all processors need to finish before
update and moving to next sweep
→ Disadvantage:Large communication choke points
→ Disadvantage:Slower convergence because Gauss
principle not satisfied
• Parallelizing Gauss-Seidel (G-S)
→ Each processor solves its own G-S (block) subproblem
→ Update of the unknowns happens instantaneously
→ Disadvantage: forward and backward G-S are
strongly sequential
→ Disadvantage: they require new unknowns to be
communicated at each iteration
→ Advantage: Satisfy the Gauss principle, faster
Soodhalter Parallel Numerics 5636 Parallel Preconditioning
convergence, more effective preconditioner

How to effectively parallelize
• Jacobi and block Jacobi are inherently fully parallel
→ Each processor solves its own Jacobi (block)
subproblem
→ Update of approximation happens at the end
→ Disadvantage: all processors need to finish before
update and moving to next sweep
→ Disadvantage:Large communication choke points
→ Disadvantage:Slower convergence because Gauss
principle not satisfied
• Parallelizing Gauss-Seidel (G-S)
→ Each processor solves its own G-S (block) subproblem
→ Update of the unknowns happens instantaneously
→ Disadvantage: forward and backward G-S are
strongly sequential
→ Disadvantage: they require new unknowns to be
communicated at each iteration
→ Advantage: Satisfy the Gauss principle, faster
Soodhalter Parallel Numerics 5636 Parallel Preconditioning
convergence, more effective preconditioner

Graph Colorings
The best of both worlds
We would like the parallelizable characteristics of Jacobi but with an iteration that satisfies the Gauss principle to reduce communication bottlenecks and waiting times. For this we turn to the idea of colorings.
Definition (Graph coloring)
Let G be a graph. A coloring of G is an assignment of labels (often called “colors”) to elements of a graph according to some constraints. A vertex coloring is assigns colors to the vertices of a graph such that no two nodes sharing an edge have the same color.
• A vertices of the same color depend correspond to unknowns which do not depend on one another for an update
Soodhalter Parallel Numerics 5636 Parallel Preconditioning

Graph Colorings
The best of both worlds
We would like the parallelizable characteristics of Jacobi but with an iteration that satisfies the Gauss principle to reduce communication bottlenecks and waiting times. For this we turn to the idea of colorings.
Definition (Graph coloring)
Let G be a graph. A coloring of G is an assignment of labels (often called “colors”) to elements of a graph according to some constraints. A vertex coloring is assigns colors to the vertices of a graph such that no two nodes sharing an edge have the same color.
• A vertices of the same color depend correspond to unknowns which do not depend on one another for an update
Soodhalter Parallel Numerics 5636 Parallel Preconditioning

Red-black Gauss-Seidel
• We break sequential ordering of Gauss-Seidel using a coloring of adjacency graph induced by 2d Laplacian
• For the five-point stencil, adjacency graph is simply a 2-dimensional grid of nodes (connection to 4 nearest neighbors)
• A minimal vertex coloring of this graph is simply alternating red and black nodes
• With this coloring, a step of Gauss-Seidel can be done in parallel for all black nodes, as they only depend on the red nodes
• Gauss-SeRiede-BllasctkeGpacusasn-Setihdeln be done in parallel for all red nodes using only information from black nodes.
Soodhalter ParRaeldledlepNenudms oenrlyicosn b5la6c3k6, and vice-versa. Parallel Preconditioning
Generalization: multi-color orderings

Multicolor Orderings
More generally:
• More general, complex matrices/graphs require more colors
=⇒ multiple phases of parallel operations
• Nodes must also be partitioned among tasks, and load
should be balanced for each color
• Reordering nodes may affect convergence rate, however, so gain in parallel performance may be offset somewhat by slower convergence rate
→ Tradeoff: Just as we sometimes exchanged increased floating-point operations to reduce communications or fill the cache (see BLAS-3), here was also may do something mathematically more costly to gain an advantage in parallel computing.
Soodhalter Parallel Numerics 5636 Parallel Preconditioning

Sparse triangular systems
Multicolorings to identify unknown groupings
More generally, multicolor ordering of graph of matrix enhances parallel performance of sparse triangular solution by identifying sets of solution components that can be computed simultaneously (rather than in usual sequential order for triangular solution)
Soodhalter Parallel Numerics 5636 Parallel Preconditioning

Red-black Gauss-Seidel
1
2 3 4 5 6 7 8 9
10
11
12
13
14
15
Algorithm: Red-black (even-odd) Gauss-Seidel
fori=2,3,…,n−1do for j = 2, 3, . . . , n − 1 do
ifi+j=0 mod2then
Update Φ(i, j) according to G-S scheme
end end
end fori=2,3,…,n−1do
for j = 2, 3, . . . , n − 1 do ifi+j=1 mod2then
Update Φ(i, j) according to G-S scheme end
end end
Soodhalter Parallel Numerics 5636 Parallel Preconditioning

If we actually reorder unknowns
Induced matrix structure
This reordering induces (implicitly or explicitly depending on your goal) a linear system with the structure
􏰉D1 F 􏰊􏰉x1􏰊 􏰉b1􏰊 Serial Iterative Methods Partitioning =
Red-Black Ordering for 2-D Grid
with D1 and D2 diagonal. 15 7 16 8
EDxb
Parallel Iterative Methods Ordering 2 2 2 Chaotic Relaxation
5 13 6 14
11 3 12 4
1 9 2 10
G (A)
××× ××××
×××××
×× ×××× × × × ×××× ××××
××× ×× ××
××××× ×× ×××× ×
××××× ××××× ××
××× ×
Soodhalter Parallel Numerics A5636 Parallel Preconditioning

Solving the reordered system via block LU
• We can decompose
􏰉D1F􏰊􏰉I0􏰊􏰉D1 F􏰊
E D = ED−1 I 0 −ED−1E+D 2112
• We solve
􏰉D1 F 􏰊􏰉x1􏰊􏰉 b1 􏰊
0 −ED−1E+D x = b −ED−1Eb 122211
• x2 = 􏰕−ED−1E + D2􏰖−1 􏰕b2 − ED−1Eb1􏰖 11
• x1 = D−1 (b1 − Fx2) 1
Soodhalter Parallel Numerics 5636 Parallel Preconditioning

Solving the reordered system via block LU
• We can decompose
􏰉D1F􏰊􏰉I0􏰊􏰉D1 F􏰊
E D = ED−1 I 0 −ED−1E+D 2112
• We solve
􏰉D1 F 􏰊􏰉x1􏰊􏰉 b1 􏰊
0 −ED−1E+D x = b −ED−1Eb 122211
• x2 = 􏰕−ED−1E + D2􏰖−1 􏰕b2 − ED−1Eb1􏰖 11
• x1 = D−1 (b1 − Fx2) 1
Soodhalter Parallel Numerics 5636 Parallel Preconditioning

Solving the reordered system via block LU
• We can decompose
􏰉D1F􏰊􏰉I0􏰊􏰉D1 F􏰊
E D = ED−1 I 0 −ED−1E+D 2112
• We solve
􏰉D1 F 􏰊􏰉x1􏰊􏰉 b1 􏰊
0 −ED−1E+D x = b −ED−1Eb 122211
• x2 = 􏰕−ED−1E + D2􏰖−1 􏰕b2 − ED−1Eb1􏰖 11
• x1 = D−1 (b1 − Fx2) 1
Soodhalter Parallel Numerics 5636 Parallel Preconditioning

Solving the reordered system via block LU
• We can decompose
􏰉D1F􏰊􏰉I0􏰊􏰉D1 F􏰊
E D = ED−1 I 0 −ED−1E+D 2112
• We solve
􏰉D1 F 􏰊􏰉x1􏰊􏰉 b1 􏰊
0 −ED−1E+D x = b −ED−1Eb 122211
• x2 = 􏰕−ED−1E + D2􏰖−1 􏰕b2 − ED−1Eb1􏰖 11
• x1 = D−1 (b1 − Fx2) 1
Soodhalter Parallel Numerics 5636 Parallel Preconditioning

Solving the reordered system via block LU
• We can decompose
􏰉D1F􏰊􏰉I0􏰊􏰉D1 F􏰊
E D = ED−1 I 0 −ED−1E+D 2112
• We solve
􏰉D1 F 􏰊􏰉x1􏰊􏰉 b1 􏰊
0 −ED−1E+D x = b −ED−1Eb 122211
• x2 = 􏰕−ED−1E + D2􏰖−1 􏰕b2 − ED−1Eb1􏰖 Symmetric 11
solve (maybe Conjugate Gradients…)
• x1 = D−1 (b1 − Fx2) 1
Soodhalter Parallel Numerics 5636 Parallel Preconditioning

Solving the reordered system via block LU
• We can decompose
􏰉D1F􏰊􏰉I0􏰊􏰉D1 F􏰊
E D = ED−1 I 0 −ED−1E+D 2112
• We solve
􏰉D1 F 􏰊􏰉x1􏰊􏰉 b1 􏰊
0 −ED−1E+D x = b −ED−1Eb 122211
• x2 = 􏰕−ED−1E + D2􏰖−1 􏰕b2 − ED−1Eb1􏰖 Symmetric 11
solve (maybe Conjugate Gradients…)
• x1 = D−1 (b1 − Fx2) Inverting the diagonal 1
Soodhalter Parallel Numerics 5636 Parallel Preconditioning

Asynchronous or chaotic relaxation
• Using updated values for solution components in Gauss-Seidel (and SOR) methods improves convergence rate, but limits parallelism and requires synchronization
• Alternatively, in computing next iterate, each processor could use most recent value it has for each solution component, rather than waiting for latest value on any processor
• This approach, sometimes called asynchronous or chaotic relaxation, can be effective, but stochastic behavior complicates analysis of convergence and convergence rate
Note
Some of convergence deterioration experienced with red-black G-S can be mitigated by switching to a red-black version of SOR with a properly tuned ω parameter.
Soodhalter Parallel Numerics 5636 Parallel Preconditioning

Polynomial preconditioning
We again use the fact that Cayley-Hamilton tells us A−1 = p(A) to now derive a preconditioner.
Definition
A polynomial preconditioner is directly defined simply as
M−1 = s(A) where s(x) is (for now) a fixed, usually low-degree polynomial. Thus the preconditioned system becomes
s(A)A = s(A)b Examples of polynomial preconditioning strategies
• Neumann polynomials: approximate A−1 using a Neumann series expansion of ωA = I − (I − ωA)
• Chebyshev polynomials
• Least-squares polynomials – approximation of A−1 determined by LS-approximation
Soodhalter Parallel Numerics 5636 Parallel Preconditioning

Polynomial preconditioning
We again use the fact that Cayley-Hamilton tells us A−1 = p(A) to now derive a preconditioner.
Definition
A polynomial preconditioner is directly defined simply as
M−1 = s(A) where s(x) is (for now) a fixed, usually low-degree polynomial. Thus the preconditioned system becomes
s(A)A = s(A)b Examples of polynomial preconditioning strategies
• Neumann polynomials: approximate A−1 using a Neumann series expansion of ωA = I − (I − ωA)
• Chebyshev polynomials
• Least-squares polynomials – approximation of A−1 determined by LS-approximation
Soodhalter Parallel Numerics 5636 Parallel Preconditioning

Chebyshev Polynomials I
Basic idea: choose s(x) to be optimal (in some sense). Let σ(A) denote the spectrum of A.
We want to solve
Find s ∈ Pk that minimizes maxλ∈σ(A)|1 − λs(λ)| where Pk is the space of polynomials with degree not exceeding k
Getting eigenvalues is harder than solving the linear system! So instead…
We will solve
Find s ∈ Pk that minimizes maxλ∈E |1 − λs(λ)| where E is some continuous set containing σ(A).
If A is SPD, this reduces to
Find s ∈ Pk that minimizes max |1−λs(λ)| where σ(A) ⊂ [α, β] λ∈[α,β]
Soodhalter Parallel Numerics 5636 Parallel Preconditioning

Chebyshev Polynomials II
Definition (Chebyshev Polynomials of the first kind)
The Chebyshev polynomial of degree k of the first kind is defined by Ck(t) = cos[karccost] for −1 ≤ t ≤ 1. One can prove by induction that this is a polynomial. These polynomials satisfy the three-term recurrence Ck+1(t) = 2tCk(t) − Ck−1(t)
Theorem
Let σk = Ck(β+α), θ = β+α, and δ = β−α, then the polynomial β−α 2 2
Tk(t) satisfying Tk = argmin maxt∈[α,β]|T(t)| is T ∈Pk ,T (0)=1
1
Tk = σ Ck
k
andwecanrewriteσk =Ck􏰕θ􏰖. δ
􏰇θ − t􏰈 δ
Soodhalter Parallel Numerics 5636 Parallel Preconditioning

Efficient computation of these minimizing polynomials
3-term recurrences
Wehaveσk+1=2θσk−σk−1withσ1=θ andθ0=1and δδ
2 δ αkTk(t) − σkTk−1(t) withT1(t)=1−t andT0(t)=1
Tk+1 = σ θ
1􏰉θ−t 􏰊
k+1
A more concise recurrence
Setρk = 1 withρ−1 =0andρ1 = 1 . Thenwehave 2σ1 −ρk−1 2σ1
􏰉􏰇t􏰈􏰊 Tk+1(t) = ρk 2 σ1 − δ Tk(t) − ρk−1Tk−1(t)
Soodhalter Parallel Numerics 5636 Parallel Preconditioning

Implicit multiplication by polynomial
1
2 3 4 5 6 7 8 9
10 11
Algorithm: Chebyshev Acceleration
r0=b−Ax0 σ1 = θ/δ
ρ0 = 1/σ1 d0=1r0
θ
for i = 0, 1, . . . until conv. or preconditioner degree k do xi+1 = xi + di
ri+1 = ri − Adi
ρi+1 = (2σ1 − ρi)−1
di+1 = ρi+1ρidi + 2ρi+1 ri+1 δ
end
Soodhalter Parallel Numerics 5636 Parallel Preconditioning

Filtering some eigenvalues
Definition (Filtering)
One also uses Chebyshev polynomial preconditioners to reduce only part of the spectrum, allowing an algorithm like CG to focus on another part. This process is called filtering.
Example (ChebFilterCG)
If we know the majority of the eigenvalues are in an interval [α,β] bounded away from zero, then we can construct a Chebyshev polynomial preconditioner to reduce the residual norm in those associated eigendirections (essentially clustering the preconditioned eigenvalues near 1) which then allow the CG method to concentrate on constructing a residual polynomial for the remaining eigenvalues near zero.
Soodhalter Parallel Numerics 5636 Parallel Preconditioning

What did we talk about today?
• Graph colorings to determine unknown dependency
• Red-black Gauss-Seidel
• Trade-offs between total parallelism and some information exchange
• Red-black Gauss-Seidel
• Asynchronous methods
• Polynomial preconditioning (Chebyshev)
Further reading
• Chapter 12 in Y Saad. Iterative Methods for Sparse Linear Systems. SIAM, 2003.
• Available legally online at: http://www-users.cs.umn.edu/~saad/books.html
Soodhalter Parallel Numerics 5636 Parallel Preconditioning

Next time: an actual return to parallelism!!!!
• Multigrid
• Parallel multigrid
• Domain decomposition
Soodhalter Parallel Numerics 5636 Parallel Preconditioning