Microsoft PowerPoint – assignment2-background [Compatibility Mode]
High Performance Computing
Course Notes
Coursework 2
Dr Ligang He
2Computer Science, University of Warwick
Problem Domain
Coursework taken from the field of Computational Fluid
Dynamics (CFD)
Fluid dynamics based on three fundamental principles: (i) mass
is conserved; (ii) Newton’s second law (acceleration of object
dependent on net force acting on object and mass of object); (iii)
energy is conserved
These are usually expressed as partial differential equations,
showing how velocity and pressure are related, etc (called
governing equations).
In the governing equations the coordinates and time are
independent variables while velocity and pressure are dependent
variables
Computational Fluid Dynamics is the science of finding a
numerical solution to the governing equations of fluid flow,
over the discretized space or time
3Computer Science, University of Warwick
Governing Equations
4Computer Science, University of Warwick
CFD code
The code in the coursework calculates the velocity
and pressure of a 2D flow
The code writes a binary file that contains solution
values
Currently the code is sequential; the purpose of the
coursework is to parallelize the code
5Computer Science, University of Warwick
Data and stencil
The area represented as a 2D Grid (discretize)
Calculate one point in each cell
6Computer Science, University of Warwick
CFD code
jmax
There are two types of cell: fluid (C_F) and obstacle (C_B)
7Computer Science, University of Warwick
Data and stencil
• The properties of a cell in the grid
• Communication pattern: using a five-point stencil to
calculate a point
8Computer Science, University of Warwick
Numerical method for solving the
governing equations
Initialize each cell value
Check if the solution satisfies the
governing equations
If not, generate the new solution
based on a stencil of current
solutions from neighbouring cells
Iterations are advanced until the
termination condition is met
9Computer Science, University of Warwick
General Steps for using the numerical
method to solve the linear equations
Aim: solve AΦ=B
Step 1: Guess a initial solution Φ0
Step 2: Check if convergence is reached by checking the
residual B-AΦi
e.g., f(Φ(i))=Φ(i)+(B-AΦ(i))
Key: repeating iterative steps and each step generates a
better approximation of the solution
10Computer Science, University of Warwick
Successive Over-Relaxation(SOR)
The SOR method can speed up convergence
For a set of linear equations: AΦ=B
let A=D+U+L, where D, L and U denote the diagonal, strictly
lower triangular, and strictly upper triangular parts of A,
respectively
The successive over-relaxation (SOR) iteration is defined by the
following
Where values of ω > 1 are used to speedup convergence of a
slow-converging process, while values of ω < 1 are often help
establish convergence of diverging iterative process
11Computer Science, University of Warwick
SOR
Successive Over-Relaxation
(SOR) method is used to speed up
convergence
12Computer Science, University of Warwick
SOR
• SOR is originally written so that
calculating p_new(i, j) depends
on
1) a stencil of current solutions
from neighbouring cells,
2) p_new(i-1, j),
3) p_new(i, j-1).
• This introduces a chain of
dependencies for each cell
• Fine on sequential machines:
just calculate values from left to
right, and top to bottom
for all i imax, j jmax, do
pij
(k+1)=f(P(k), pi-1,j
(k+1), pi,j-1
(k+1))
P(k) is the solution at the iteration k (P is a matrix [pij])
13Computer Science, University of Warwick
SOR
• SOR is originally written so that
calculating p_new(i, j) depends
on
1) a stencil of current solutions
from neighbouring cells,
2) p_new(i-1, j),
3) p_new(i, j-1).
• This introduces a chain of
dependencies for each cell
• Fine on sequential machines:
just calculate values from left to
right, and top to bottom
P(k) is the solution at the iteration k (P is a matrix [pij])
14Computer Science, University of Warwick
SOR
• SOR is originally written so that
calculating p_new(i, j) depends
on
1) a stencil of current solutions
from neighbouring cells,
2) p_new(i-1, j),
3) p_new(i, j-1).
• This introduces a chain of
dependencies for each cell
• Fine on sequential machines:
just calculate values from left to
right, and top to bottom
P(k) is the solution at the iteration k (P is a matrix [pij])
15Computer Science, University of Warwick
SOR
• SOR is originally written so that
calculating p_new(i, j) depends
on
1) a stencil of current solutions
from neighbouring cells,
2) p_new(i-1, j),
3) p_new(i, j-1).
• This introduces a chain of
dependencies for each cell
• Fine on sequential machines:
just calculate values from left to
right, and top to bottom
P(k) is the solution at the iteration k (P is a matrix [pij])
16Computer Science, University of Warwick
SOR
• This approach forms a “wave
front”
• It does not make for a good
parallel solution
• If we implement a parallel wave-
front then the work is highly
unbalanced
• The fundamental reason is that
the number of cells to be
computed varies in each step
17Computer Science, University of Warwick
Red/Black SOR
• Another solution that achieves the same
result is red/black ordering
• Rewrite the SOR equations so the cells
are divided into two populations.
• Red is calculated first (using current
black values)
• Then black is calculated using the just
updated red values.
Red cells:
pij
(k+1)=f(pi-1,j
(k), pi,j-1
(k), pi+1,j
(k) , pi,j+1
(k))
Black cells:
pij
(k+1)=f(pi-1,j
(k+1), pi,j-1
(k+1), pi+1,j
(k+1), pi,j+1
(k+1))
Note: pi-1,j
(k+1), pi,j-1
(k+1), … are red cells
• Since the stencil for any cell is the cell itself
and cells of opposite color, they can all be
updated in parallel
18Computer Science, University of Warwick
Domain decomposition
• Make each processor responsible for
updating a block of cells
• At the end of each iteration it must send the
cells at the edge of its domain to its
neighbours, and receive a copy of the edge
cells from its neighbours
Why?
• You need to think whether you implement 1D
or 2D decomposition
19Computer Science, University of Warwick
The main loop
for (t = 0.0; t< t_end; t = t + delt)
{
1.Calculate an approximate time-step size by seeing how much
movement occurred in the last time-step. The discrete approximation
is only stable when the maximum motion < 1 cell per time-step.
2.For each cell, compute a tentative new velocity (f,g) based on the
previous (u,v) values. It takes as input the u, v and flag matrices, and
updates the f and g matrices.
3.For each cell calculate the RHS of the pressure equation. This uses
two f and g values. It takes as input the f, g, and flag matrices and
updates the RHS matrix.
4.For the entire pressure matrix, use Red/Black SOR to solve the
Poisson equation. This takes a large number iterations of the
Red/Black process as shown in the slides:. It takes as input the
current pressure matrix, flag matrix, and the RHS matrix and outputs
a new pressure matrix
20Computer Science, University of Warwick
The main loop
5. For each cell, update the real (u,v) values based on the pressure
matrix and the tentative velocity values (f,g). It takes as input the
pressure, f, g and flag matrices and updates the u and v matrices.
6. For each cell that is adjacent to an edge cell, the (u,v) values of
the boundary cells are updated (by taking values from their
neighbours). It takes as input the u, v and flag matrices and
updates the u and v matrices
}
Note that it is not going to be worth parallelizing the whole
program, just parts of it.
21Computer Science, University of Warwick
Assignment 2 – What to Do in General
- You will be given a serial program, called Karman
- You need to parallelize the Karman code and write a
report
- 80% of the full mark comes from the parallelization
using MPI
- 20% of the mark comes from the parallelization using
both MPI and OpenMP (a hybrid approach), i.e.,
parallelizing the computations in one machine using
OpenMP, while parallelizing the computations across
machines using MPI
22Computer Science, University of Warwick
Assignment 2 – Detailed Requirements for Programming
• Parallelize the Karman code, using a pure MPI
approach (80% of marks) and a hybrid MPI-OpenMP
approach (20% marks)
• Profiling which Karman functions are more time
consuming
• Design your decomposition strategy, date exchange
strategies between neighboring partitions, the MPI
functions (e.g., collective operations)
• For a hybrid approach, design the parallelization
strategy for OpenMP
23Computer Science, University of Warwick
• Benchmarking the execution time of your parallel
program as you increase the number of processes
• If you develop a hybrid MPI-OpenMP implementation,
• benchmark the runtime of OpenMP code
• Compare the performance between OpenMP and MPI
• Ensure the simulation results are the same after the
parallelization
• Your parallel program should contain the adequate
comments as the evidence of good programming
practice.
Assignment 2 – Detailed Requirements for Programming
24Computer Science, University of Warwick
Assignment 2 – Requirements for Report
• Profiling the functions to determine which functions
are more time consuming; discuss the profiling
results
• Discuss your MPI implementation, for example,
• your decomposition strategy and load balancing strategy;
• your strategy of exchanging the boundary data between
neighboring partitions;
• collective operations you used;
• Benchmark the change in execution times of your
parallel program as you increase the number of
processes; present the results in graph or table
25Computer Science, University of Warwick
Assignment 2 – Requirements for Report
• Analyze and discuss the performance of your parallel
program
• If you develop the hybrid MPI-OpenMP
implementation,
• Discuss OpenMP implementation
• Benchmark the runtime of the OpenMP codes and present the
results
• Discuss the performance comparison between the hybrid
approach and the pure MPI approach
• Writing skills: pay attention to spelling, punctuation
and grammar
26Computer Science, University of Warwick
Assignment 2 – Submission
- Submit a compressed folder (.zip) to Tabula; the
folder should contain
- your parallel code (the directory structure used in the project
should not change)
- your report (pdf format and place your report in the top level
directory in the folder)
- Deadline: 12pm, March 14th, 2018