Microsoft PowerPoint – Performance-2 [Compatibility Mode]
High Performance Computing
Course Notes
Performance II
Dr Ligang He
2Computer Science, University of Warwick
Four approaches to modelling the
performance of a parallel application
Speedup
Amdahl’s law
Asymptotic analysis
Model execution time (Construct the performance model)
3Computer Science, University of Warwick
Modelling execution time – an example
Atmosphere model
The atmosphere model is used in weather forecast
Capture the relation among atmospheric attributes, such as
velocity of air, temperature, pressure, moisture, etc.
The model is established by physical laws and represented by
a set of partial differential equations
4Computer Science, University of Warwick
Partial differential equations in the
atmosphere model
5Computer Science, University of Warwick
Modelling execution time – an example
Atmosphere model
too complicated to be solved by mathematical derivation
We have to resort to the numerical method
Cannot generate the solution for every physical point
discretize the space, i.e., partition the space into a set of cells
and use one point in a cell to represent all points in the cell.
a continuous space is approximated by a finite set of
regularly spaced points in that space
Calculate the solution at these discrete points
6Computer Science, University of Warwick
Communication pattern
Each point uses the nine-point stencil to calculate its horizontal
motion and uses the three-point stencil to calculate its vertical
motion
Fig.1. The 3-point stencil for calculating the vertical velocity
Fig. 2. The 9-point stencil for calculating the horizontal velocity
7Computer Science, University of Warwick
Question
When processor 1 wants to calculate the data points
allocated to it, how many data items does it have to
obtain from processor 2?
8Computer Science, University of Warwick
Modelling computation time
If we assume a grid of size N*N*Z points, and using 1-
D decomposition (along y-axis) to partition the grid
among P processors, then
each task is responsible for a subgrid of size N*(N/P)*Z
then, Tcomp for each subgrid can be calculated as follows,
where tc is the average time of calculating a single grid point
Tcomp=tc*N*(N/P)*z
9Computer Science, University of Warwick
Modelling the time of sending one message
Tmsg (the time spent in sending one message) can be
calculated as follows,
Tmsg=ts+twL
where ts is the message startup time, tw is the transfer time per byte, L is the
size of the message
Tmsg is a function of L; ts and tw are constants given a computing platform
Question: if I plot the function of Tmsg over L, what does the plot look like? How are ts and
tw represented in the line?
10Computer Science, University of Warwick
Modelling communication
time
Communication time for calculating a subgrid can be
computed as
Tcomm=2(ts+tw2NZ)
Hence, the performance model for the execution time
of calculating the velocity of the grid of points is
Tp=Tcomp+Tcomm=tc*N*(N/p)*Z+2(ts+tw2NZ)
From the performance model of the execution time,
we can know a lot of information.
What will happen when we increase p?
Execution time decreases with increasing P; the
proportion of communication cost increases
Execution time increases with increasing N, Z,
tc, ts, tw
11Computer Science, University of Warwick
Speedup and parallel efficiency
The execution time on one processor is
Speedup is
Parallel efficiency can be calculated as
12Computer Science, University of Warwick
Iso-efficiency
Question: What is the iso-efficiency function?
E can be reduced to
When N=P, E remains approximately constant as P
changes(except when P is small)
13Computer Science, University of Warwick
2D decomposition
If applying 2-D decomposition, then
Execution time can be modelled as 1D decomposition
14Computer Science, University of Warwick
Iso-efficiency
Parallel efficiency can be modelled as
When N= , E remains constant as P increases
Iso-efficiency in 1D is N=P
Therefore, for this particular communication pattern, applying
2D decomposition will achieve better scalability than 1D
decomposition
15Computer Science, University of Warwick
Decomposition analysis
Boundary surfaces between sub-grids are shaded.
Data on boundary surfaces need to be communicated.
The lower surface-to-volume ratio, the better:
• Surface = communication
• Volume = computation
16Computer Science, University of Warwick
Decomposition analysis
Consider a 3-D grid and assume the grid is a cube
Volume V=c*n, where c = number of cells per PE, n is the
number of processors
The length of the grid in each dimension is V1/3
17Computer Science, University of Warwick
Decomposition analysis
Sub-grid
Volume
I J K Sub-grid
Surfaces
Surface to
Volume
1-D c V1/3 / n V1/3 V1/3 2c2/3 .n2/3 2n2/3 / c1/3
2-D c V1/3 / n1/2 V1/3 / n1/2 V1/3 4c2/3 .n1/6 4n1/6 / c1/3
3-D c V1/3 / n1/3 V1/3 / n1/3 V1/3 / n1/3 6c2/3 6 / c1/3
Sub-grid Length