程序代写代做代考 Microsoft PowerPoint – Performance-2 [Compatibility Mode]

Microsoft PowerPoint – Performance-2 [Compatibility Mode]

High Performance Computing
Course Notes

Performance II

Dr Ligang He

2Computer Science, University of Warwick

Four approaches to modelling the
performance of a parallel application

Speedup

Amdahl’s law

Asymptotic analysis

Model execution time (Construct the performance model)

3Computer Science, University of Warwick

Modelling execution time – an example

Atmosphere model
 The atmosphere model is used in weather forecast

 Capture the relation among atmospheric attributes, such as

velocity of air, temperature, pressure, moisture, etc.

 The model is established by physical laws and represented by

a set of partial differential equations

4Computer Science, University of Warwick

Partial differential equations in the
atmosphere model

5Computer Science, University of Warwick

Modelling execution time – an example

Atmosphere model
 too complicated to be solved by mathematical derivation

 We have to resort to the numerical method

 Cannot generate the solution for every physical point

 discretize the space, i.e., partition the space into a set of cells
and use one point in a cell to represent all points in the cell.

 a continuous space is approximated by a finite set of
regularly spaced points in that space

 Calculate the solution at these discrete points

6Computer Science, University of Warwick

Communication pattern

Each point uses the nine-point stencil to calculate its horizontal
motion and uses the three-point stencil to calculate its vertical
motion

Fig.1. The 3-point stencil for calculating the vertical velocity

Fig. 2. The 9-point stencil for calculating the horizontal velocity

7Computer Science, University of Warwick

Question

When processor 1 wants to calculate the data points
allocated to it, how many data items does it have to
obtain from processor 2?

8Computer Science, University of Warwick

Modelling computation time

If we assume a grid of size N*N*Z points, and using 1-
D decomposition (along y-axis) to partition the grid
among P processors, then

 each task is responsible for a subgrid of size N*(N/P)*Z
 then, Tcomp for each subgrid can be calculated as follows,

where tc is the average time of calculating a single grid point

Tcomp=tc*N*(N/P)*z

9Computer Science, University of Warwick

Modelling the time of sending one message

 Tmsg (the time spent in sending one message) can be
calculated as follows,

Tmsg=ts+twL
where ts is the message startup time, tw is the transfer time per byte, L is the

size of the message

Tmsg is a function of L; ts and tw are constants given a computing platform

Question: if I plot the function of Tmsg over L, what does the plot look like? How are ts and
tw represented in the line?

10Computer Science, University of Warwick

Modelling communication
time

 Communication time for calculating a subgrid can be

computed as

Tcomm=2(ts+tw2NZ)

 Hence, the performance model for the execution time

of calculating the velocity of the grid of points is

Tp=Tcomp+Tcomm=tc*N*(N/p)*Z+2(ts+tw2NZ)

 From the performance model of the execution time,

we can know a lot of information.

 What will happen when we increase p?

Execution time decreases with increasing P; the

proportion of communication cost increases

 Execution time increases with increasing N, Z,

tc, ts, tw

11Computer Science, University of Warwick

Speedup and parallel efficiency

 The execution time on one processor is

 Speedup is

 Parallel efficiency can be calculated as

12Computer Science, University of Warwick

Iso-efficiency

Question: What is the iso-efficiency function?

E can be reduced to

When N=P, E remains approximately constant as P
changes(except when P is small)

13Computer Science, University of Warwick

2D decomposition

 If applying 2-D decomposition, then

 Execution time can be modelled as 1D decomposition

14Computer Science, University of Warwick

Iso-efficiency

Parallel efficiency can be modelled as

 When N= , E remains constant as P increases

 Iso-efficiency in 1D is N=P

 Therefore, for this particular communication pattern, applying
2D decomposition will achieve better scalability than 1D
decomposition

15Computer Science, University of Warwick

Decomposition analysis

 Boundary surfaces between sub-grids are shaded.

 Data on boundary surfaces need to be communicated.

 The lower surface-to-volume ratio, the better:

• Surface = communication

• Volume = computation

16Computer Science, University of Warwick

Decomposition analysis

Consider a 3-D grid and assume the grid is a cube

Volume V=c*n, where c = number of cells per PE, n is the
number of processors

The length of the grid in each dimension is V1/3

17Computer Science, University of Warwick

Decomposition analysis

Sub-grid
Volume

I J K Sub-grid
Surfaces

Surface to
Volume

1-D c V1/3 / n V1/3 V1/3 2c2/3 .n2/3 2n2/3 / c1/3

2-D c V1/3 / n1/2 V1/3 / n1/2 V1/3 4c2/3 .n1/6 4n1/6 / c1/3

3-D c V1/3 / n1/3 V1/3 / n1/3 V1/3 / n1/3 6c2/3 6 / c1/3

Sub-grid Length