程序代写代做代考 chain Microsoft PowerPoint – assignment2-background [Compatibility Mode]

Microsoft PowerPoint – assignment2-background [Compatibility Mode]

High Performance Computing
Course Notes

Coursework 2

Dr Ligang He

2Computer Science, University of Warwick

Problem Domain

Coursework taken from the field of Computational Fluid
Dynamics (CFD)

 Fluid dynamics based on three fundamental principles: (i) mass
is conserved; (ii) Newton’s second law (acceleration of object
dependent on net force acting on object and mass of object); (iii)
energy is conserved

 These are usually expressed as partial differential equations,
showing how velocity and pressure are related, etc (called
governing equations).

 In the governing equations the coordinates and time are
independent variables while velocity and pressure are dependent
variables

Computational Fluid Dynamics is the science of finding a
numerical solution to the governing equations of fluid flow,
over the discretized space or time

3Computer Science, University of Warwick

Governing Equations

4Computer Science, University of Warwick

CFD code

The code in the coursework calculates the velocity
and pressure of a 2D flow

The code writes a binary file that contains solution
values

Currently the code is sequential; the purpose of the
coursework is to parallelize the code

5Computer Science, University of Warwick

Data and stencil

The area represented as a 2D Grid (discretize)

Calculate one point in each cell

6Computer Science, University of Warwick

CFD code

jmax

There are two types of cell: fluid (C_F) and obstacle (C_B)

7Computer Science, University of Warwick

Data and stencil

• The properties of a cell in the grid

• Communication pattern: using a five-point stencil to
calculate a point

8Computer Science, University of Warwick

Numerical method for solving the
governing equations

Initialize each cell value

Check if the solution satisfies the
governing equations

If not, generate the new solution
based on a stencil of current
solutions from neighbouring cells

Iterations are advanced until the
termination condition is met

9Computer Science, University of Warwick

General Steps for using the numerical
method to solve the linear equations

 Aim: solve AΦ=B

 Step 1: Guess a initial solution Φ0

 Step 2: Check if convergence is reached by checking the
residual B-AΦi=0), Φ(i+1)= f(Φ(i)), go to Step 2

e.g., f(Φ(i))=Φ(i)+(B-AΦ(i))

 Key: repeating iterative steps and each step generates a
better approximation of the solution

10Computer Science, University of Warwick

Successive Over-Relaxation(SOR)

 The SOR method can speed up convergence

 For a set of linear equations: AΦ=B

 let A=D+U+L, where D, L and U denote the diagonal, strictly
lower triangular, and strictly upper triangular parts of A,
respectively

 The successive over-relaxation (SOR) iteration is defined by the
following

 Where values of ω > 1 are used to speedup convergence of a
slow-converging process, while values of ω < 1 are often help establish convergence of diverging iterative process 11Computer Science, University of Warwick SOR Successive Over-Relaxation (SOR) method is used to speed up convergence 12Computer Science, University of Warwick SOR • SOR is originally written so that calculating p_new(i, j) depends on 1) a stencil of current solutions from neighbouring cells, 2) p_new(i-1, j), 3) p_new(i, j-1). • This introduces a chain of dependencies for each cell • Fine on sequential machines: just calculate values from left to right, and top to bottom for all i  imax, j  jmax, do pij (k+1)=f(P(k), pi-1,j (k+1), pi,j-1 (k+1)) P(k) is the solution at the iteration k (P is a matrix [pij]) 13Computer Science, University of Warwick SOR • SOR is originally written so that calculating p_new(i, j) depends on 1) a stencil of current solutions from neighbouring cells, 2) p_new(i-1, j), 3) p_new(i, j-1). • This introduces a chain of dependencies for each cell • Fine on sequential machines: just calculate values from left to right, and top to bottom P(k) is the solution at the iteration k (P is a matrix [pij]) 14Computer Science, University of Warwick SOR • SOR is originally written so that calculating p_new(i, j) depends on 1) a stencil of current solutions from neighbouring cells, 2) p_new(i-1, j), 3) p_new(i, j-1). • This introduces a chain of dependencies for each cell • Fine on sequential machines: just calculate values from left to right, and top to bottom P(k) is the solution at the iteration k (P is a matrix [pij]) 15Computer Science, University of Warwick SOR • SOR is originally written so that calculating p_new(i, j) depends on 1) a stencil of current solutions from neighbouring cells, 2) p_new(i-1, j), 3) p_new(i, j-1). • This introduces a chain of dependencies for each cell • Fine on sequential machines: just calculate values from left to right, and top to bottom P(k) is the solution at the iteration k (P is a matrix [pij]) 16Computer Science, University of Warwick SOR • This approach forms a “wave front” • It does not make for a good parallel solution • If we implement a parallel wave- front then the work is highly unbalanced • The fundamental reason is that the number of cells to be computed varies in each step 17Computer Science, University of Warwick Red/Black SOR • Another solution that achieves the same result is red/black ordering • Rewrite the SOR equations so the cells are divided into two populations. • Red is calculated first (using current black values) • Then black is calculated using the just updated red values. Red cells: pij (k+1)=f(pi-1,j (k), pi,j-1 (k), pi+1,j (k) , pi,j+1 (k)) Black cells: pij (k+1)=f(pi-1,j (k+1), pi,j-1 (k+1), pi+1,j (k+1), pi,j+1 (k+1)) Note: pi-1,j (k+1), pi,j-1 (k+1), … are red cells • Since the stencil for any cell is the cell itself and cells of opposite color, they can all be updated in parallel 18Computer Science, University of Warwick Domain decomposition • Make each processor responsible for updating a block of cells • At the end of each iteration it must send the cells at the edge of its domain to its neighbours, and receive a copy of the edge cells from its neighbours Why? • You need to think whether you implement 1D or 2D decomposition 19Computer Science, University of Warwick The main loop for (t = 0.0; t< t_end; t = t + delt) { 1.Calculate an approximate time-step size by seeing how much movement occurred in the last time-step. The discrete approximation is only stable when the maximum motion < 1 cell per time-step. 2.For each cell, compute a tentative new velocity (f,g) based on the previous (u,v) values. It takes as input the u, v and flag matrices, and updates the f and g matrices. 3.For each cell calculate the RHS of the pressure equation. This uses two f and g values. It takes as input the f, g, and flag matrices and updates the RHS matrix. 4.For the entire pressure matrix, use Red/Black SOR to solve the Poisson equation. This takes a large number iterations of the Red/Black process as shown in the slides:. It takes as input the current pressure matrix, flag matrix, and the RHS matrix and outputs a new pressure matrix 20Computer Science, University of Warwick The main loop 5. For each cell, update the real (u,v) values based on the pressure matrix and the tentative velocity values (f,g). It takes as input the pressure, f, g and flag matrices and updates the u and v matrices. 6. For each cell that is adjacent to an edge cell, the (u,v) values of the boundary cells are updated (by taking values from their neighbours). It takes as input the u, v and flag matrices and updates the u and v matrices } Note that it is not going to be worth parallelizing the whole program, just parts of it. 21Computer Science, University of Warwick Assignment 2 – What to Do in General - You will be given a serial program, called Karman - You need to parallelize the Karman code and write a report - 80% of the full mark comes from the parallelization using MPI - 20% of the mark comes from the parallelization using both MPI and OpenMP (a hybrid approach), i.e., parallelizing the computations in one machine using OpenMP, while parallelizing the computations across machines using MPI 22Computer Science, University of Warwick Assignment 2 – Detailed Requirements for Programming • Parallelize the Karman code, using a pure MPI approach (80% of marks) and a hybrid MPI-OpenMP approach (20% marks) • Profiling which Karman functions are more time consuming • Design your decomposition strategy, date exchange strategies between neighboring partitions, the MPI functions (e.g., collective operations) • For a hybrid approach, design the parallelization strategy for OpenMP 23Computer Science, University of Warwick • Benchmarking the execution time of your parallel program as you increase the number of processes • If you develop a hybrid MPI-OpenMP implementation, • benchmark the runtime of OpenMP code • Compare the performance between OpenMP and MPI • Ensure the simulation results are the same after the parallelization • Your parallel program should contain the adequate comments as the evidence of good programming practice. Assignment 2 – Detailed Requirements for Programming 24Computer Science, University of Warwick Assignment 2 – Requirements for Report • Profiling the functions to determine which functions are more time consuming; discuss the profiling results • Discuss your MPI implementation, for example, • your decomposition strategy and load balancing strategy; • your strategy of exchanging the boundary data between neighboring partitions; • collective operations you used; • Benchmark the change in execution times of your parallel program as you increase the number of processes; present the results in graph or table 25Computer Science, University of Warwick Assignment 2 – Requirements for Report • Analyze and discuss the performance of your parallel program • If you develop the hybrid MPI-OpenMP implementation, • Discuss OpenMP implementation • Benchmark the runtime of the OpenMP codes and present the results • Discuss the performance comparison between the hybrid approach and the pure MPI approach • Writing skills: pay attention to spelling, punctuation and grammar 26Computer Science, University of Warwick Assignment 2 – Submission - Submit a compressed folder (.zip) to Tabula; the folder should contain - your parallel code (the directory structure used in the project should not change) - your report (pdf format and place your report in the top level directory in the folder) - Deadline: 12pm, March 14th, 2018